Disabling autoscroll of all pages in this notebook using some JavaScript magic <br>
REF: https://stackoverflow.com/questions/36757301/disable-ipython-notebook-autoscrolling  <br>
REF: https://github.com/ipython/ipython/issues/2172  <br>
NOTICE:  The cell has to be written without comments.

In [1]:
%%javascript 
IPython.OutputArea.auto_scroll_threshold = 9999;

<IPython.core.display.Javascript object>

In [13]:
# GMAPS API 
# https://jupyter-gmaps.readthedocs.io/en/latest/api.html
import gmaps
import gmaps.datasets
gmaps.configure(api_key="")

## Loading the Incident Locations (Latitude and Longitude)

In our database of incidents the locations are also encoded conveniently as points.

## Removing any Existing Spark Session

This is used to force any existing spark session to close, starting with a clean slate.

In [2]:
try:
    session.stop()
except:
    pass

## Creating a Spark Session

First we need to import Spark and some useful spark objects in order to use Spark. In particular we need to create the Spark Session. Previous versions of Spark relied on three different classes, among them the SparkContext, but in later versions of Spark, the use has been simplified and most of the complicated stuff is largely abstracted by the Spark Dataframe and Spark SQL APIs.

Proceed to create the session and configure some necessary JVM parameters like the amount of memory used by the spark driver and the number of cores used for parallel processing of spark tasks. 

In [3]:
import pyspark

from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import SparkSession

number_cores = 4
memory_gb = 4
conf = (
    pyspark.SparkConf()
        .setMaster('local[{}]'.format(number_cores))
        .set('spark.driver.memory', '{}g'.format(memory_gb))
)

builder = SparkSession.builder\
    .master('local[{}]'.format(number_cores))\
    .appName('mapping-crime-in-san-francisco')

builder.config('spark.driver.memory', '{}g'.format(memory_gb))
    
session = builder.getOrCreate()

Displaying session parameters. If session creation step above did not succeed an error will be triggered at this point.

In [6]:
session.sparkContext.getConf().getAll()

[('spark.master', 'local[4]'),
 ('spark.driver.memory', '4g'),
 ('spark.driver.port', '39035'),
 ('spark.rdd.compress', 'True'),
 ('spark.app.id', 'local-1573192865265'),
 ('spark.serializer.objectStreamReset', '100'),
 ('spark.executor.id', 'driver'),
 ('spark.submit.deployMode', 'client'),
 ('spark.driver.host', '192.168.1.181'),
 ('spark.app.name', 'mapping-crime-in-san-francisco'),
 ('spark.ui.showConsoleProgress', 'true')]

## Familiarizing with the Data

As a first step to get familiar with the data we print the schema to see which fields are there.

In [8]:
data_path = "data/Police_Department_Incident_Reports_2018_to_Present.csv"
crime_reports = session.read.csv(data_path, header=True)

incidents = crime_reports.select(["Latitude","Longitude"])

incidents.printSchema()
incidents.count()

root
 |-- Latitude: string (nullable = true)
 |-- Longitude: string (nullable = true)



278806

In [26]:
incidents_pd = incidents.toPandas()
incidents_pd = incidents_pd.mask(incidents_pd.eq('None')).dropna()
incidents_pd[0:5]

Unnamed: 0,Latitude,Longitude
2,37.77507596005672,-122.51129492624534
3,37.76877049785351,-122.427462058806
4,37.781176766186576,-122.5030864538133
5,37.80696290988273,-122.410497554147
6,37.78240595374784,-122.47630655314094


In [28]:
incidents_pd.values.tolist()

[['37.77507596005672', '-122.51129492624534'],
 ['37.76877049785351', '-122.42746205880601'],
 ['37.781176766186576', '-122.5030864538133'],
 ['37.80696290988273', '-122.410497554147'],
 ['37.78240595374784', '-122.47630655314094'],
 ['37.7310799713596', '-122.48354489158451'],
 ['37.709210351433505', '-122.41196819816332'],
 ['37.77527205930737', '-122.4159081801724'],
 ['37.76257883049033', '-122.42166247826907'],
 ['37.73971000844432', '-122.4119916391387'],
 ['37.72573637177566', '-122.38137737558108'],
 ['37.78901633734544', '-122.41024243940024'],
 ['37.79792540700026', '-122.42891397690136'],
 ['37.714694511853885', '-122.4852289359513'],
 ['37.78872115135928', '-122.4020657306611'],
 ['37.806780111468534', '-122.4195772441978'],
 ['37.775558160382', '-122.4186585835447'],
 ['37.7829399329451', '-122.46442502609422'],
 ['37.78325923532804', '-122.40270815508224'],
 ['37.79309708139333', '-122.39650448774395'],
 ['37.817823897791946', '-122.3712458991155'],
 ['37.7684733901082', 

In [20]:
latitude_np = incidents_pd['latitude'].to_numpy()
longitude_np = incidents_pd['longitude'].to_numpy()
#locations_pd = incidents_pd[incidents_pd['latitude'],incidents'longitude']]

KeyError: 'latitude'

In [3]:
df = gmaps.datasets.load_dataset_as_df('starbucks_kfc_uk')

starbucks_df = df[df['chain_name'] == 'starbucks']
starbucks_df = starbucks_df[['latitude', 'longitude']]

starbucks_layer = gmaps.symbol_layer(
    starbucks_df, fill_color="green", stroke_color="green", scale=2
)

In [4]:
# To control how the widget is layout a dictionary object is passed to the
# gmaps.figure method as an optional argument. (attempting to remove the scrollbar)

fig = gmaps.figure(layout={
        'height': '800px',
        'display': 'flex',
        'flex_flow': 'column',
        'align_items': 'stretch',
        'border': '1px solid black'
})

fig.add_layer(starbucks_layer)
fig

Figure(layout=FigureLayout(align_items='stretch', border='1px solid black', display='flex', flex_flow='column'…

In [15]:
# load a Numpy array of (latitude, longitude) pairs
locations = gmaps.datasets.load_dataset("taxi_rides")

locations[0:5]

#fig = gmaps.figure()
#fig.add_layer(gmaps.heatmap_layer(locations))
#fig

[(37.782551, -122.445368),
 (37.782745, -122.444586),
 (37.782842, -122.443688),
 (37.782919, -122.442815),
 (37.782992, -122.442112)]