# Mapping SDPD data with python

### Preparation: installing `folium`

Plotting mapping data requires using a python library called `folium`. To install this library, type the following command in a terminal:
```
pip install --upgrade --user folium
```
Answer `Y` to confirm you want the library installed. Once finished, you can import the library in your notebooks.

In [None]:
%matplotlib inline
import pandas as pd
import folium
import numpy as np
import json
from IPython.core.display import display

### Import the traffic stops data and the collision data

In [None]:
stops_path = 'vehicle_stops_2016_datasd.csv'
collisions_path = 'pd_collisions_datasd.csv'

In [None]:
stops = pd.read_csv(stops_path)
collisions = pd.read_csv(collisions_path)

In [None]:
stops.head()

### Counting the number of traffic stops by police service area

In [None]:
stops.iloc[0]

We need to clean the service_area field,
1. there are non-digits in the field
2. because of the non-digits, even the digits are of object (string) type

To join with our map, we have to clean this column.

In [None]:
type(stops['service_area'].iloc[0])

In [None]:
stops_cleaned = stops.loc[pd.to_numeric(stops['service_area'], errors='coerce').notna()].copy()
stops_cleaned['service_area'] = stops_cleaned['service_area'].astype(int)

In [None]:
stop_counts = stops_cleaned.groupby('service_area').count()['stop_id'].reset_index()
stop_counts.rename(columns={'stop_id': 'count'}, inplace=True)
stop_counts

### Load and clean the map

In [None]:
geo_path = 'pd_beats_datasd.geojson'

Now we need to load the geographical data and filter out the service areas that aren't present in our data.
* The join key to the geojson for the stops data is `serv`
* The join key to the geojson for the collisions data is `beat`

In [None]:
gj = json.load(open(geo_path))

An example region encoded in a geojson format (the list of coordinates are lat/long):

In [None]:
gj['features'] = [f for f in gj['features'] if f['properties']['serv'] in list(stop_counts['service_area'])]

In [None]:
len(gj['features'])

### Create a map object, overlay the counts, and plot it!

In [None]:
stops_map = folium.Map(location=(32.7157, -117.1611), zoom_start=10)

In [None]:
folium.Choropleth(
    geo_data=gj,
    data=stop_counts,
    columns=['service_area', 'count'],
    fill_color = 'YlOrRd',
    fill_opacity = 0.5, 
    line_opacity = 0.2,
    key_on='feature.properties.serv',
).add_to(stops_map)

You can view HTML right inside a jupyter notebook!  So to view the map we can just use `display` that we imported from `Ipython.core.display` earlier.

In [None]:
display(stops_map)

You can also save the map to an `html` file. View the file by visiting the jupyter server page, selecting the file, and clicking `view` from the menu at top.

In [None]:
stops_map.save('stops.html')

## Mapping the collisions data

The collisions data is joined to the map using `police_beat` -- we need to assess and clean the data. Is it of `int` type?

In [None]:
collisions['police_beat']

In [None]:
collision_counts = collisions.groupby('police_beat').count()['report_id'].reset_index()
collision_counts.rename(columns={'report_id': 'count'}, inplace=True)

In [None]:
collision_counts.sort_values('count')

In [None]:
collision_map = folium.Map(location=(32.7157, -117.1611), zoom_start=10)

folium.Choropleth(
    geo_data=gj,
    data=collision_counts,
    columns=['police_beat', 'count'],
    fill_color = 'YlGn',
    fill_opacity = 0.5, 
    line_opacity = 0.2,
    threshold_scale=[0,300,600,900,1200,1500],
    key_on='feature.properties.beat',
).add_to(collision_map)

In [None]:
display(collision_map)

# Copy this notebook and plot your own statistics by geography
* Percentage of stops that result in a search.
* Average age of drivers.
* Percentage of traffic stops that occur at night.
* Number of Hispanic/Black/White/Asian drivers pulled over.