# Mapping SDPD data with python

### Preparation: installing `folium`

Plotting mapping data requires using a python library called `folium`. To install this library, type the following command in a terminal:
```
pip install --upgrade --user folium
```
Answer `Y` to confirm you want the library installed. Once finished, you can import the library in your notebooks.

In [1]:
%matplotlib inline
from datascience import *
import folium
import numpy as np
import json
from IPython.core.display import display

### Import the traffic stops data and the collision data

In [2]:
stops_path = 'SDPD/vehicle_stops_2016_datasd.csv'
collisions_path = 'SDPD/pd_collisions_datasd.csv'

In [3]:
stops = Table.read_table(stops_path)
collisions = Table.read_table(collisions_path)

### Counting the number of traffic stops by police service area

In [4]:
stops.show(1)

stop_id,stop_cause,service_area,subject_race,subject_sex,subject_age,timestamp,stop_date,stop_time,sd_resident,arrested,searched,obtained_consent,contraband_found,property_seized
1308198,Equipment Violation,530,W,M,28,2016-01-01 00:06:00,2016-01-01,0:06,Y,N,N,N,N,N


We need to clean the service_area field,
1. there are non-digits in the field
2. because of the non-digits, even the digits are of string type

To join with our map, we have to clean this column.

In [5]:
type(stops.column('service_area').item(0))

str

In [6]:
def isdigit(x):
    return x.isdigit()

stops_cleaned = stops.where('service_area', isdigit)
stops_cleaned = stops_cleaned.with_column(
    'service_area', 
     stops_cleaned.column('service_area').astype(int)
)

In [7]:
stop_counts = stops_cleaned.group('service_area')
stop_counts

service_area,count
110,7273
120,8177
130,23
230,5602
240,6433
310,8074
320,4722
430,4018
440,4181
510,4937


### Load and clean the map

In [8]:
geo_path = 'SDPD/pd_beats_datasd.geojson'

Now we need to load the geographical data and filter out the service areas that aren't present in our data.
* The join key to the geojson for the stops data is `serv`
* The join key to the geojson for the collisions data is `beat`

In [9]:
gj = json.load(open(geo_path))

An example region encoded in a geojson format (the list of coordinates are lat/long):

In [10]:
gj['features'][0]

{'type': 'Feature',
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-117.087138, 32.583822],
    [-117.08695, 32.583007],
    [-117.08693249, 32.58293112],
    [-117.08686, 32.582617],
    [-117.08648, 32.581573],
    [-117.086201, 32.58091],
    [-117.086047, 32.580678],
    [-117.08595258, 32.58053575],
    [-117.08567562, 32.58011851],
    [-117.085585, 32.579982],
    [-117.085417, 32.579739],
    [-117.084812, 32.579065],
    [-117.08475964, 32.57900662],
    [-117.084725, 32.578968],
    [-117.083637, 32.577854],
    [-117.08227701, 32.57648368],
    [-117.08224938, 32.57645654],
    [-117.082146, 32.576355],
    [-117.081926, 32.576137],
    [-117.081774, 32.575953],
    [-117.0811, 32.575136],
    [-117.08091818, 32.57491522],
    [-117.08091801, 32.574915],
    [-117.080274, 32.574046],
    [-117.08014418, 32.57387261],
    [-117.079544, 32.573071],
    [-117.07952, 32.573039],
    [-117.07899506, 32.57233912],
    [-117.07899498, 32.57233901],
    [-117.07899282, 32.5723

In [11]:
gj['features'] = [f for f in gj['features'] if f['properties']['serv'] in stop_counts.column('service_area')]

### Create a map object, overlay the counts, and plot it!

In [12]:
stops_map = folium.Map(location=(32.7157, -117.1611), zoom_start=10)

In [13]:
stops_map.choropleth(
    geo_data=gj,
    data=stop_counts.to_df(),   # needs to be a pandas dataframe
    columns=['service_area', 'count'],
    fill_color = 'YlOrRd',
    fill_opacity = 0.5, 
    line_opacity = 0.2,
    key_on='feature.properties.serv',
)



You can view HTML right inside a jupyter notebook!  So to view the map we can just use `display` that we imported from `Ipython.core.display` earlier.

In [None]:
display(stops_map)

You can also save the map to an `html` file. View the file by visiting the jupyter server page, selecting the file, and clicking `view` from the menu at top.

In [14]:
stops_map.save('stops.html')

## Mapping the collisions data

The collisions data is joined to the map using `police_beat` -- we need to assess and clean the data. Is it of `int` type?

In [15]:
collisions.column('police_beat')

array([113, 524, 437, ..., 246, 613, 821])

In [16]:
collision_counts = collisions.group('police_beat')

In [17]:
collision_counts.sort('count', descending=True)

police_beat,count
122,1347
242,839
813,811
313,809
124,732
627,636
115,623
521,617
315,610
611,573


In [18]:
collision_map = folium.Map(location=(32.7157, -117.1611), zoom_start=10)

collision_map.choropleth(
    geo_data=gj,
    data=collision_counts.to_df(),   # needs to be a pandas dataframe
    columns=['police_beat', 'count'],
    fill_color = 'YlGn',
    fill_opacity = 0.5, 
    line_opacity = 0.2,
    threshold_scale=[0,300,600,900,1200,1500],
    key_on='feature.properties.beat',
)

In [23]:
display(collision_map)

# Copy this notebook and plot your own statistics by geography
* Percentage of stops that result in a search.
* Average age of drivers.
* Percentage of traffic stops that occur at night.
* Number of Hispanic/Black/White/Asian drivers pulled over.