Disabling autoscroll of all pages in this notebook using some JavaScript magic <br>
REF: https://stackoverflow.com/questions/36757301/disable-ipython-notebook-autoscrolling  <br>
REF: https://github.com/ipython/ipython/issues/2172  <br>
NOTICE:  The cell has to be written without comments.

In [1]:
%%javascript 
IPython.OutputArea.auto_scroll_threshold = 9999;

<IPython.core.display.Javascript object>

In [2]:
# GMAPS API 
# https://jupyter-gmaps.readthedocs.io/en/latest/api.html
import gmaps
import gmaps.datasets
gmaps.configure(api_key="<-- REPLACE GOOGLE MAPS API KEY HERE -->")

### Loading the Incident Locations (Latitude and Longitude)

In our database of incidents the locations are also encoded conveniently as points.

### Removing any Existing Spark Session

This is used to force any existing spark session to close, starting with a clean slate.

In [None]:
try:
    session.stop()
except:
    pass

### Creating a Spark Session

Create the Spark session and configure the amount of memory used by the spark driver and the number of cores used for parallel processing of spark tasks.

In [3]:
import pyspark

from pyspark import SparkConf
from pyspark import SparkContext
from pyspark.sql import SparkSession

memory_gb = 4
number_cores = 4

conf = (
    pyspark.SparkConf()
        .setMaster('local[{}]'.format(number_cores))
        .set('spark.driver.memory', '{}g'.format(memory_gb))
)

builder = SparkSession.builder\
    .master('local[{}]'.format(number_cores))\
    .appName('mapping-crime-in-san-francisco')

builder.config('spark.driver.memory', '{}g'.format(memory_gb))
    
session = builder.getOrCreate()

Displaying session parameters. If session creation step above did not succeed an error will be triggered at this point.

In [4]:
session.sparkContext.getConf().getAll()

[('spark.app.id', 'local-1573217702660'),
 ('spark.master', 'local[4]'),
 ('spark.driver.memory', '4g'),
 ('spark.rdd.compress', 'True'),
 ('spark.serializer.objectStreamReset', '100'),
 ('spark.executor.id', 'driver'),
 ('spark.submit.deployMode', 'client'),
 ('spark.app.name', 'mapping-crime-in-san-francisco'),
 ('spark.ui.showConsoleProgress', 'true'),
 ('spark.driver.port', '54754'),
 ('spark.driver.host', '172.16.180.180')]

### Familiarizing with the Data

As a first step to get familiar with the data we print the schema to see which fields are there.

In [5]:
data_path = "data/Police_Department_Incident_Reports_2018_to_Present.csv"
crime_reports = session.read.csv(data_path, header=True)

incidents = crime_reports.select(["Latitude","Longitude"])

incidents.printSchema()
incidents.count()

root
 |-- Latitude: string (nullable = true)
 |-- Longitude: string (nullable = true)



278806

In [6]:
incidents_pd = incidents.toPandas()
incidents_pd = incidents_pd.mask(incidents_pd.eq('None')).dropna()
incidents_pd[0:5]

Unnamed: 0,Latitude,Longitude
2,37.77507596005672,-122.51129492624534
3,37.76877049785351,-122.427462058806
4,37.781176766186576,-122.5030864538133
5,37.80696290988273,-122.410497554147
6,37.78240595374784,-122.47630655314094


### Convert Data to Floats

The data in pandas are initially read as string and will have to be converted to floats.

In [7]:
incidents_pd['Latitude'] = incidents_pd['Latitude'].astype(float)
incidents_pd['Longitude'] = incidents_pd['Longitude'].astype(float)

incidents_pd.dtypes

Latitude     float64
Longitude    float64
dtype: object

### Converting the Locations in Pandas to a List of Points

In [8]:
incidents_locations = incidents_pd.values.tolist()
incidents_locations[0:5]

[[37.77507596005672, -122.51129492624534],
 [37.76877049785351, -122.42746205880601],
 [37.781176766186576, -122.5030864538133],
 [37.80696290988273, -122.410497554147],
 [37.78240595374784, -122.47630655314094]]

In [9]:
# To control how the widget is layout a dictionary object is passed to the
# gmaps.figure method as an optional argument. (attempting to remove the scrollbar)
fig = gmaps.figure(layout={
        'height': '600px',
        'display': 'flex',
        'flex_flow': 'column',
        'align_items': 'stretch',
        'border': '1px solid black'
})

# Adding a heatmap_layer with the locations of incidents
fig.add_layer(gmaps.heatmap_layer(incidents_locations))
fig

Figure(layout=FigureLayout(align_items='stretch', border='1px solid black', display='flex', flex_flow='column'…