# Plotting Crimes on a Map

We noticed from the heatmap that there existed a difference between crimes over $ 500 and those 500 and under. We wanted to see if these crimes, specifically those where arrests happened, differed by region. Hence, we plotted all those crimes from 2001 - 2016 onto a map to see the differences using the longitude and latitude data. The following links are the resources we used to map the data: 

* https://bokeh.pydata.org/en/latest/docs/user_guide/styling.html
* https://bokeh.pydata.org/en/latest/docs/user_guide/geo.html
* https://www.youtube.com/watch?v=P60qokxPPZc

For reference, this does require an API key. It is included in this file, but if necessary, please visit: 
* https://developers.google.com/maps/documentation/javascript/get-api-key 

to obtain a key. 

## Importing Packages 

In [1]:
import pandas as pd 
import numpy as np 
import math

# Importing Data

This is the dataset with all the information of crimes from 2001-2016. The dataset should be in the crimesInChicagoData folder and use 'dataset.csv' which is produced by the finalDataCleansing.ipynb file. 

In [2]:
data = pd.read_csv("../../../crimesInChicagoData/dataset.csv", error_bad_lines = False)

  interactivity=interactivity, compiler=compiler, result=result)


In [3]:
data.head()

Unnamed: 0.1,Unnamed: 0,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,District,Year,Latitude,Longitude,Month,Day,Hour,Weekday
0,0,840,THEFT,FINANCIAL ID THEFT: OVER $300,RESIDENCE,False,False,4,2004.0,0.0,0.0,1,1,0,3
1,1,2825,OTHER OFFENSE,HARASSMENT BY TELEPHONE,RESIDENCE,False,True,9,2003.0,41.8172,-87.637328,3,1,0,5
2,2,1752,OFFENSE INVOLVING CHILDREN,AGG CRIM SEX ABUSE FAM MEMBER,RESIDENCE,False,False,14,2004.0,0.0,0.0,6,20,11,6
3,3,840,THEFT,FINANCIAL ID THEFT: OVER $300,OTHER,False,False,25,2004.0,0.0,0.0,12,30,20,3
4,4,841,THEFT,FINANCIAL ID THEFT:$300 &UNDER,RESIDENCE,False,False,22,2003.0,41.6918,-87.635116,5,1,1,3


In [4]:
data = data.drop(['Unnamed: 0'], axis = 1)

In [5]:
data.head()

Unnamed: 0,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,District,Year,Latitude,Longitude,Month,Day,Hour,Weekday
0,840,THEFT,FINANCIAL ID THEFT: OVER $300,RESIDENCE,False,False,4,2004.0,0.0,0.0,1,1,0,3
1,2825,OTHER OFFENSE,HARASSMENT BY TELEPHONE,RESIDENCE,False,True,9,2003.0,41.8172,-87.637328,3,1,0,5
2,1752,OFFENSE INVOLVING CHILDREN,AGG CRIM SEX ABUSE FAM MEMBER,RESIDENCE,False,False,14,2004.0,0.0,0.0,6,20,11,6
3,840,THEFT,FINANCIAL ID THEFT: OVER $300,OTHER,False,False,25,2004.0,0.0,0.0,12,30,20,3
4,841,THEFT,FINANCIAL ID THEFT:$300 &UNDER,RESIDENCE,False,False,22,2003.0,41.6918,-87.635116,5,1,1,3


# Taking the Subset of Data 

We selected the data with the features indicated from the centroids. 
* We first chose all the observations from the districts the clustered deemed important.
* Then of those observations, we only chose the top 4 'Location Description' because we decided that those were the most prominent data. 
* Then we chose all the crimes that the centroids deemed important. 
* And finally, the observations where arrests were made. 

In [248]:
districtsList = [3, 4, 6, 7, 8, 9, 10, 11, 12, 19, 25]

In [249]:
subsetDistrict = data.loc[data['District'] == 2]
for district in districtsList:
    tmpDistrict = data.loc[data['District'] == district]
    subsetDistrict = subsetDistrict.append(tmpDistrict)

In [250]:
subsetDistrict['District'].unique()

array([2.0, 3.0, 4.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0, 12.0, 19.0, 25.0], dtype=object)

In [251]:
len(subsetDistrict)

4519164

In [252]:
locationsList = ['RESIDENCE', 'APARTMENT', 'SIDEWALK']

In [253]:
subsetLocation = subsetDistrict.loc[subsetDistrict['Location Description'] == 'STREET']
for location in locationsList:
    tmpLocation = subsetDistrict.loc[subsetDistrict['Location Description'] == location]
    subsetLocation = subsetLocation.append(tmpLocation)

In [254]:
subsetLocation['Location Description'].unique()

array(['STREET', 'RESIDENCE', 'APARTMENT', 'SIDEWALK'], dtype=object)

In [255]:
len(subsetLocation)

3016577

In [256]:
crimesList = ['$500 AND UNDER', 'OVER $500', 'TO VEHICLE', 'TO PROPERTY', 'AUTOMOBILE', 'FORCIBLE ENTRY', 'DOMESTIC BATTERY SIMPLE', 'FROM BUILDING', 'POSS: CANNABIS 30GMS OR LESS']

In [257]:
subsetCrime = subsetLocation.loc[subsetLocation['Description'] == 'SIMPLE']
for crime in crimesList:
    tmpCrime = subsetLocation.loc[subsetLocation['Description'] == crime]
    subsetCrime = subsetCrime.append(tmpCrime)

In [258]:
subsetCrime['Description'].unique()

array(['SIMPLE', '$500 AND UNDER', 'OVER $500', 'TO VEHICLE',
       'TO PROPERTY', 'AUTOMOBILE', 'FORCIBLE ENTRY',
       'DOMESTIC BATTERY SIMPLE', 'FROM BUILDING',
       'POSS: CANNABIS 30GMS OR LESS'], dtype=object)

In [259]:
len(subsetCrime)

1827266

In [260]:
subsetArrests = subsetCrime.loc[subsetCrime['Arrest'] == True]

In [261]:
subsetArrests.head()

Unnamed: 0,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,District,Year,Latitude,Longitude,Month,Day,Hour,Weekday
4738,460,BATTERY,SIMPLE,STREET,True,False,2,2002.0,0.0,0.0,2,8,15,4
7547,460,BATTERY,SIMPLE,STREET,True,True,2,2001.0,41.828,-87.608192,1,1,2,0
8712,460,BATTERY,SIMPLE,STREET,True,False,2,2001.0,41.8244,-87.607308,1,2,18,1
9999,460,BATTERY,SIMPLE,STREET,True,False,2,2001.0,41.802,-87.619551,1,4,20,3
11754,460,BATTERY,SIMPLE,STREET,True,True,2,2001.0,41.8013,-87.606099,1,6,18,5


In [262]:
len(subsetArrests)

354355

# Map Visualization

We decided to map only the arrests of crimes over 500 and 500 and Under. The crimes over 500 are the red points. Else, Blue. 

For features of the map, we added the following: 
    - Pan 
    - Wheel Zoom 
    - Box Select
    - Zoom In 
    - Zoom Out
    - Hover (To see more information on latitude longitude) 
    - Save (To save an image of the points) 
    - Reset (To reset the map and points to its original setting) 
    - Legend 

## Import Visualization Packages 

In [263]:
from bokeh.io import output_file, show
from bokeh.models import (
  GMapPlot, GMapOptions, ColumnDataSource, Circle, Range1d, PanTool, WheelZoomTool, BoxSelectTool, ZoomInTool, ZoomOutTool, HoverTool, SaveTool, ResetTool, Legend
)

In [264]:
map_options = GMapOptions(lat = 41.8, lng=-87.6,
                         map_type = 'roadmap',
                         zoom = 10)

In [290]:
plot = GMapPlot(x_range=Range1d(), 
                y_range=Range1d(), 
                map_options=map_options,
                api_key ='AIzaSyD0jgCd0kmm_5IiLuw-dsIkH4oW4POnf5Y' )
plot.title.text = "Chicago Arrests: Crimes Over $500 & 500 and Under"
plot.add_tools(PanTool(), WheelZoomTool(), BoxSelectTool(), ZoomInTool(), ZoomOutTool(), HoverTool(), SaveTool(), ResetTool())

In [291]:
source = ColumnDataSource(
    data=dict(
        lonOver500 = subsetArrests.loc[subsetArrests['Description'] == 'OVER $500']['Longitude'].tolist(),
        latOver500 = subsetArrests.loc[subsetArrests['Description'] == 'OVER $500']['Latitude'].tolist(),
        lonUnder500 = subsetArrests.loc[subsetArrests['Description'] == '$500 AND UNDER']['Longitude'].tolist(),
        latUnder500 = subsetArrests.loc[subsetArrests['Description'] == '$500 AND UNDER']['Latitude'].tolist()
    )
)



In [292]:
circleOver500 = Circle(x="lonOver500", y="latOver500", size=10, fill_color="red", fill_alpha=0.6, line_color=None)

In [293]:
circleUnder500 = Circle(x="lonUnder500", y="latUnder500", size=10, fill_color="blue", fill_alpha=0.6, line_color=None)

In [294]:
cOver = plot.add_glyph(source, circleOver500)
cUnder =plot.add_glyph(source, circleUnder500)

In [295]:
legend = Legend(items=[ ("Over $500", [cOver]),
                            ("$500 and Under" , [cUnder])])
legend.name = 'Legend'

In [296]:
plot.add_layout(legend, 'right')

In [297]:
output_file("arrestsChicagoByRegion.html")
show(plot)

W-1005 (SNAPPED_TOOLBAR_ANNOTATIONS): Snapped toolbars and annotations on the same side MAY overlap visually: GMapPlot(id='955c63ba-4847-4540-9b3b-2fa86ecc8a6d', ...)
W-1005 (SNAPPED_TOOLBAR_ANNOTATIONS): Snapped toolbars and annotations on the same side MAY overlap visually: GMapPlot(id='e34bdb7c-ef65-49ed-954b-f4f0b26743a2', ...)
W-1005 (SNAPPED_TOOLBAR_ANNOTATIONS): Snapped toolbars and annotations on the same side MAY overlap visually: GMapPlot(id='8b0d6ae7-59e1-4e77-916e-63a26b2e4729', ...)
W-1005 (SNAPPED_TOOLBAR_ANNOTATIONS): Snapped toolbars and annotations on the same side MAY overlap visually: GMapPlot(id='e50ee943-0d28-4dae-92b0-28d56e210197', ...)
W-1005 (SNAPPED_TOOLBAR_ANNOTATIONS): Snapped toolbars and annotations on the same side MAY overlap visually: GMapPlot(id='e317472f-f6cb-4768-8466-72dc85506318', ...)
W-1005 (SNAPPED_TOOLBAR_ANNOTATIONS): Snapped toolbars and annotations on the same side MAY overlap visually: GMapPlot(id='608c25b5-3d8d-45e5-bf67-d4ff90b3d7b0', ...

To view plot, visit: 
* https://rawgit.com/g1isgone/Unsupervised-MachineLearning/master/finalProject/detailedAnalysis/arrestsChicagoByRegion.html

# Comparative Analysis

We do a comparative analysis of the regions with those listed on the trulia website with regions divided by house listing prices.

* https://www.trulia.com/home_prices/Illinois/Chicago-heat_map/ 
