# Mapping SDPD data with python

### Preparation: installing `folium`

Plotting mapping data requires using a python library called `folium`. To install this library, type the following command in a terminal:
```
pip install --upgrade --user folium
```
Answer `Y` to confirm you want the library installed. Once finished, you can import the library in your notebooks.

In [None]:
%matplotlib inline
from datascience import *
import folium
import numpy as np
import json

### Import the traffic stops data and the collision data

In [None]:
stops_path = '../week-2/vehicle_stops_2016_datasd.csv'
collisions_path = '../week-3/pd_collisions_datasd.csv'

In [None]:
stops = Table.read_table(stops_path)
collisions = Table.read_table(collisions_path)

### Counting the number of traffic stops by police service area

In [None]:
stops.show(1)

We need to clean the service_area field,
1. there are non-digits in the field
2. because of the non-digits, even the digits are of string type

To join with our map, we have to clean this column.

In [None]:
type(stops.column('service_area').item(0))

In [None]:
def isdigit(x):
    return x.isdigit()

stops_cleaned = stops.where('service_area', isdigit)
stops_cleaned = stops_cleaned.with_column(
    'service_area', 
     stops_cleaned.column('service_area').astype(int)
)

In [None]:
stop_counts = stops_cleaned.group('service_area')
stop_counts

### Load and clean the map

In [None]:
geo_path = 'pd_beats_datasd.geojson'

Now we need to load the geographical data and filter out the service areas that aren't present in our data.
* The join key to the geojson for the stops data is `serv`
* The join key to the geojson for the collisions data is `beat`

In [None]:
gj = json.load(open(geo_path))

An example region encoded in a geojson format (the list of coordinates are lat/long):

In [None]:
gj['features'][0]

In [None]:
gj['features'] = [f for f in gj['features'] if f['properties']['serv'] in stop_counts.column('service_area')]

### Create a map object, overlay the counts, and plot it!

In [None]:
stops_map = folium.Map(location=(32.7157, -117.1611), zoom_start=10)

In [None]:
stops_map.choropleth(
    geo_data=gj,
    data=stop_counts.to_df(),   # needs to be a pandas dataframe
    columns=['service_area', 'count'],
    fill_color = 'YlOrRd',
    fill_opacity = 0.5, 
    line_opacity = 0.2,
    key_on='feature.properties.serv',
)

Save the map to an `html` file. View the file by visiting the jupyter server page, selecting the file, and clicking `view` from the menu at top.

In [None]:
stops_map.save('stops.html')

## Mapping the collisions data

The collisions data is joined to the map using `police_beat` -- we need to assess and clean the data. Is it of `int` type?

In [None]:
collisions.column('police_beat')

In [None]:
collision_counts = collisions.group('police_beat')

In [None]:
collision_counts.sort('count', descending=True)

In [None]:
collision_map = folium.Map(location=(32.7157, -117.1611), zoom_start=10)

collision_map.choropleth(
    geo_data=gj,
    data=collision_counts.to_df(),   # needs to be a pandas dataframe
    columns=['police_beat', 'count'],
    fill_color = 'YlGn',
    fill_opacity = 0.5, 
    line_opacity = 0.2,
    threshold_scale=[0,300,600,900,1200,1500],
    key_on='feature.properties.beat',
)

In [None]:
collision_map.save('collisions.html')

# Copy this notebook and plot your own statistics by geography
* Percentage of stops that result in a search.
* Average age of drivers.
* Percentage of traffic stops that occur at night.
* Number of Hispanic/Black/White/Asian drivers pulled over.