# Kepler 

For local installation of Kepler follow https://docs.kepler.gl/docs/keplergl-jupyter.

In AWS it is already installed for use in Jupyter Notebooks in EMR.

# Set-up

In [1]:
import findspark
findspark.init()

import pyspark

spark = pyspark.SparkContext(appName="kepler")
sc = spark.getOrCreate()

from pyspark.sql import SQLContext
sqlContext = SQLContext(sc)

# Data to plot

Kepler accepts data as CSV, GeoJSON, Pandas, and geopandas dataframe.

In [1]:
import pandas as pd
df = pd.read_csv('/Users/prietom/Downloads/nyctrips.csv')
print('Shape=>', df.shape)
df.head()

Shape=> (97986, 12)


Unnamed: 0,VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,pickup_longitude,pickup_latitude,dropoff_longitude,dropoff_latitude,fare_amount,tip_amount,total_amount
0,2,2015-01-15 19:05:39 +00:00,2015-01-15 19:23:42 +00:00,1,1.59,-73.993896,40.750111,-73.974785,40.750618,12.0,3.25,17.05
1,2,2015-01-15 19:05:39 +00:00,2015-01-15 19:32:00 +00:00,1,2.38,-73.976425,40.739811,-73.983978,40.757889,16.5,4.38,22.68
2,2,2015-01-15 19:05:40 +00:00,2015-01-15 19:21:00 +00:00,5,2.83,-73.968704,40.754246,-73.955124,40.786858,12.5,0.0,14.3
3,2,2015-01-15 19:05:40 +00:00,2015-01-15 19:28:18 +00:00,5,8.33,-73.86306,40.769581,-73.952713,40.785782,26.0,8.08,41.21
4,2,2015-01-15 19:05:41 +00:00,2015-01-15 19:20:36 +00:00,1,2.37,-73.945541,40.779423,-73.98085,40.786083,11.5,0.0,13.3


In [2]:
df.columns

Index(['VendorID', 'tpep_pickup_datetime', 'tpep_dropoff_datetime',
       'passenger_count', 'trip_distance', 'pickup_longitude',
       'pickup_latitude', 'dropoff_longitude', 'dropoff_latitude',
       'fare_amount', 'tip_amount', 'total_amount'],
      dtype='object')

We check if the dataset has any null values for any column

In [3]:
df.isnull().sum()

VendorID                 0
tpep_pickup_datetime     0
tpep_dropoff_datetime    0
passenger_count          0
trip_distance            0
pickup_longitude         0
pickup_latitude          0
dropoff_longitude        0
dropoff_latitude         0
fare_amount              0
tip_amount               0
total_amount             0
dtype: int64

# Plotting base map and adding the dataset

For creating maps using kepler.gl, we first have to create a map object using KeplerGl() class. This can take 3 arguments – height (optional), data (optional) and config (optional). The height is the height of the kepler.gl widget, data is the data to be added to the map, and config is the configuration file of kepler.gl map:

In [2]:
from keplergl import KeplerGl
map1 = KeplerGl(height=700)
map1

User Guide: https://docs.kepler.gl/docs/keplergl-jupyter


KeplerGl(height=700)

Adding the dataset: Data can be added to a kepler.gl map by using the add_data() method of the map object. This method takes two arguments – data and name. It accepts data as CSV, GeoJSON, Pandas, and geopandas dataframe. The name argument is used for assigning the name to the dataset in the configuration of the map:

In [10]:
map1.add_data(data=df, name='New York City Taxi Trips')

In [11]:
map1

KeplerGl(config={'version': 'v1', 'config': {'visState': {'filters': [], 'layers': [], 'interactionConfig': {'…