In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from datetime import date
import calendar
import folium

plt.style.use('seaborn')

# Part 1: Temporal Patterns



In [2]:
police_data_all = pd.read_csv('../../police_data.csv')

In [3]:
focuscrimes = set(['WEAPON LAWS', 'PROSTITUTION', 'DRIVING UNDER THE INFLUENCE', 'ROBBERY', 'BURGLARY', 'ASSAULT', 'DRUNKENNESS', 'DRUG/NARCOTIC', 'TRESPASS', 'LARCENY/THEFT', 'VANDALISM', 'VEHICLE THEFT', 'STOLEN PROPERTY', 'DISORDERLY CONDUCT'])

In [4]:
police_data = police_data_all.where(police_data_all.Category.isin(focuscrimes))

In [5]:
police_data['Date'] = pd.to_datetime(police_data['Date'], format="%m/%d/%Y")
police_data['Time'] = pd.to_datetime(police_data['Time'], format="%H:%M")
police_data['Year'] = police_data['Date'].dt.year
police_data['Month'] = police_data['Date'].dt.month
police_data['Hour'] = police_data['Time'].dt.hour
police_data['Hour_of_week'] = police_data['Date'].dt.dayofweek * 24 + (police_data['Hour'] + 1)
police_data['Day'] = police_data['Date'].dt.day
police_data['Minute'] = police_data['Time'].dt.minute

# Part 4: Heatmaps of geo-data

*Exercise: A new take on geospatial data using Folium (see the Week 4 exercises for full info and tutorials).*

*Now we look at studying geospatial data by plotting raw data points as well as heatmaps on top of actual maps.*

* *First start by plotting a map of San Francisco with a nice tight zoom.*
* *Next, use the the coordinates for SF City Hall 37.77919, -122.41914 to indicate its location on the map with a nice, pop-up enabled maker.*

We create a map at the coordinates of San Francisco and places the marker for the city hall.

In [6]:
map_1 = folium.Map([37.77919, -122.41914],tiles = "Stamen Toner", zoom_start=13)
folium.Marker([37.77919, -122.41914], popup='SF City Hall').add_to(map_1)

map_1 # Calls the map to display

* *Now, let's plot some more data (no need for popups this time). Select a couple of months of data for 'DRUG/NARCOTIC' and draw a little dot for each arrest for those two months.*

We start by extracting the data we want to plot

In [7]:
# extract drug data for period Jan 1st to Jan 10th
drugData = police_data.loc[police_data['Category'] == 'DRUG/NARCOTIC']
drugData = drugData.set_index(['Date'])
drugData = drugData.loc['2015-1-1':'2015-1-10']

Next Lets plot the data

In [8]:
map_2 = folium.Map([37.77919, -122.41914],tiles = "Stamen Toner", zoom_start=13)

for index, row in drugData.iterrows():
    folium.Marker([row['Y'],row['X']]).add_to(map_2)

map_2

Next we are going to have a look at heatmaps

* *To create your first heatmap, grab all arrests for the category 'SEX OFFENSES, NON FORCIBLE' across all time. Play with parameters to get plots you like.*

In [9]:
# extract 'SEX OFFENSES, NON FORCIBLE'
sexData = police_data_all.loc[police_data_all['Category'] == 'SEX OFFENSES, NON FORCIBLE']

In [10]:
from folium.plugins import HeatMap
map_3 = folium.Map([37.77919, -122.41914],tiles = "Stamen Toner", zoom_start=13)

heat_data = [[row['Y'],row['X']] for index, row in sexData.iterrows()]
# Plot it on the map
HeatMap(heat_data, radius = 20,blur=7, max_zoom =16, gradient ={0.8: 'blue', 0.85: 'lime', 1: 'red'}).add_to(map_3)

map_3

* *Now, comment on the differences between scatter plots and heatmaps.*

**Diferences between Scatter plots and Heat Maps**

Scatter plots are good at displaying individual cases, but when the density is high it is imposibble to see how many cases there are in an area.
Heat maps are good for displaying this density, but it is bad at showing the individual cases.


* *Comment on the effect on the various parameters for the heatmaps. How do they change the picture? (at least talk about the radius and max_zoom).*

There are several parameters that you can change in your heatmaps to get a different plot, here are some we used
* radius changes the size of the individual blobs the bigger radius the more overlap the different cases will have.
* blur changes the blur of the individual blobs and makes the plot less sharp, you don't want it to be too sharp as the heatmap will loose it usefullness.
* max_zoom is the zoom level where the individual points will reach max intensity, you'll usually want this to be high to prevent your entire heatmap to be at full intensity.

Theres a suspiciously high amount of cases at the 800 block of Bryant street, let's investigate!

* *Use your detective skills to find out what's going on in that building on the 800 block of Bryant street ... and explain in your own words.*

First let's look at all the cases at that address.

In [11]:
sexDataBryant = sexData.set_index(['Address'])
sexDataBryant = sexDataBryant.loc['800 Block of BRYANT ST'] 
sexDataBryant.head()

Unnamed: 0_level_0,IncidntNum,Category,Descript,DayOfWeek,Date,Time,PdDistrict,Resolution,X,Y,Location,PdId
Address,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
800 Block of BRYANT ST,100613523,"SEX OFFENSES, NON FORCIBLE",UNLAWFUL SEXUAL INTERCOURSE,Saturday,07/03/2010,11:00,SOUTHERN,NONE,-122.403405,37.775421,POINT (-122.403404791479 37.775420706711),10061352302010
800 Block of BRYANT ST,120137876,"SEX OFFENSES, NON FORCIBLE",UNLAWFUL SEXUAL INTERCOURSE,Saturday,02/18/2012,14:15,SOUTHERN,"ARREST, BOOKED",-122.403405,37.775421,POINT (-122.403404791479 37.775420706711),12013787602010
800 Block of BRYANT ST,30230368,"SEX OFFENSES, NON FORCIBLE",UNLAWFUL SEXUAL INTERCOURSE,Saturday,02/22/2003,13:30,SOUTHERN,DISTRICT ATTORNEY REFUSES TO PROSECUTE,-122.403405,37.775421,POINT (-122.403404791479 37.775420706711),3023036802010
800 Block of BRYANT ST,40177346,"SEX OFFENSES, NON FORCIBLE",UNLAWFUL SEXUAL INTERCOURSE,Thursday,02/12/2004,09:00,SOUTHERN,NONE,-122.403405,37.775421,POINT (-122.403404791479 37.775420706711),4017734602010
800 Block of BRYANT ST,70723947,"SEX OFFENSES, NON FORCIBLE",UNLAWFUL SEXUAL INTERCOURSE,Wednesday,07/18/2007,11:45,SOUTHERN,NONE,-122.403405,37.775421,POINT (-122.403404791479 37.775420706711),7072394702010


Just from looking at the data, there doesn't seem to be a big connection, the cases are all spread out over several years.

Next lets try and look at the location in google maps.
<img src="800BryantStreet.png">

So this is interesting, the location is the street in front of the Criminal Courts Division, and the Hall of Justice. Since these locations are heavily connected to crime, the reason for the many cases is probably because of these buildings. Maybe these cases are being reported to the police at the reception, and thus the report is taken in at 800 Bryant Street.

Let's look at heatmaps over time.
When looking at the vehicle theft data there is an interesting dip in 2006 because of better car security.
Let's see if we can see this dip on a heatmap.

In [12]:
# extract 'ROBBERY'
robData = police_data.loc[police_data['Category'] == 'VEHICLE THEFT']
#robData = robData[(robData['Date'] >= '2007-01-01') & (robData['Date'] < '2008-05-01')]
#drugData = drugData.loc[drugData['2015-1-1':'2015-28-2']]
#drugData = drugData['Date'].between('2015-1-1','2015-28-2', inclusive=False)
#drugData = drugData.set_index(['Date'])
#drugData = drugData.loc['2015-1-1':'2015-2-28']
#robData.head()

In [18]:
from folium import plugins
map_4 = folium.Map([37.77919, -122.41914],tiles = "Stamen Toner", zoom_start=13)

# Plot it on the map
cats = [ 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday']
heat_data = [[[row['Y'],row['X']] for index, row in robData[robData['Year'] == i].iterrows()] for i in range(2003,2017)]

hm = plugins.HeatMapWithTime(heat_data,radius = 10, auto_play=True,min_opacity=0.3, max_opacity=0.9,gradient ={0.8: 'blue', 0.85: 'lime', 1: 'red'},speed_step=0.05, use_local_extrema=True)
hm.add_to(map_4)

map_4.save('website/folium/map.html')
map_4