# Crime over Time of Chicago

In this .ipynb we analyze the change of crime over time in Chicago.

Overall there has been a significant decrease in crime in Chicago.

## Load the data

First we need to load the data.

In [1]:
import matplotlib.pyplot as plt
import numpy as np

import pandas as pd 

import folium
import folium.plugins as plugins

In [None]:
%%time

# hdfs_port = "hdfs://orion11:26990"
# hdfs_path = "/FL_insurance_sample.csv"

hdfs_port = "hdfs://orion11:13030"
hdfs_path = "/crime-since-2001-chicago.csv"
df = spark.read.format('csv').option("header", "true").load(hdfs_port + hdfs_path)

In [None]:
df.columns

## Binning

Here we use an SQL query to bin the data and select the features we want. We are binning by the lat/lon and counting the number of crimes that happen per year.

In [None]:
dg = df

dg.createOrReplaceTempView("crime_data")

query_str = f'''
SELECT ROUND(Latitude, 4) AS lat,
    ROUND(Longitude, 4) AS lon,
    COUNT(ID) AS count,
    CAST(Year AS INT) AS year
FROM crime_data
WHERE Latitude is NOT NULL AND Longitude is NOT NULL
GROUP BY lat, lon, year
ORDER BY count DESC
'''

print(query_str)

## Load into Pandas

Here we apply the SQL query and load the results into a pandas dataframe

In [None]:
%%time

dh = spark.sql(query_str)
p_df = dh.toPandas()

In [None]:
p_df['normcount']= (p_df['count']-p_df['count'].min())/(p_df['count'].max()-p_df['count'].min())
maximum_count = p_df['count'].max()

p_df['logcount'] = np.log(p_df['count'])

## Setting some constants

Chicago Lat and Lon: 41.8781° N, 87.6298° W

In [None]:
chicago_location = (41.8781, -87.6298)

# 2001 Data

Crime is extremely prevelant and is happening along the coast and in Central and South Chicago.

A screenshot of the leaflet is provided below.

![2001_chicago](img/2001_chicago.png)

In [None]:
m = folium.Map(location=chicago_location, zoom_start=10)

data_list = p_df.loc[p_df['year'] == 2001][['lat', 'lon', 'normcount']].values

hm = plugins.HeatMap(data_list, min_opacity=0.2, radius=7, max_zoom=1)

m.add_child(hm)

m

# 2006 Data

Crime seems to have coalesced inwards a little. It seems have increased as well.

![2006_chicago](img/2006_chicago.png)

In [None]:
m = folium.Map(location=chicago_location, zoom_start=10)

data_list = p_df.loc[p_df['year'] == 2006][['lat', 'lon', 'normcount']].values

hm = plugins.HeatMap(data_list, min_opacity=0.2, radius=7, max_zoom=1)

m.add_child(hm)

m

# 2011 Data

Crime along the coast has nearly dissappeared. 

It seems to be focused in 4 major areas.

![2011_chicago](img/2011_chicago.png)

In [None]:
m = folium.Map(location=chicago_location, zoom_start=10)

data_list = p_df.loc[p_df['year'] == 2011][['lat', 'lon', 'normcount']].values

hm = plugins.HeatMap(data_list, min_opacity=0.2, radius=7, max_zoom=1)

m.add_child(hm)

m

## 2015 Data

There has been a significant decrease in crime in South Chicago, but Central Chicago still has some major problems.

![2015_chicago](img/2015_chicago.png)

In [None]:
m = folium.Map(location=chicago_location, zoom_start=10)

data_list = p_df.loc[p_df['year'] == 2015][['lat', 'lon', 'normcount']].values

hm = plugins.HeatMap(data_list, min_opacity=0.2, radius=7, max_zoom=1)

m.add_child(hm)

m

## 2018 Data

This data shows that crime has dramatically decreased in the South Chicago, so much so that is almost the same as the background.

While considerably better, Central Chicago still holds some very dense pockets of crime.

![2018_chicago](img/2018_chicago.png)

In [None]:
m = folium.Map(location=chicago_location, zoom_start=10)

data_list = p_df.loc[p_df['year'] == 2018][['lat', 'lon', 'normcount']].values.tolist()

hm = plugins.HeatMap(data_list, min_opacity=0.2, radius=7, max_zoom=1)

m.add_child(hm)

m

## Takeaway

After studying this data, it appears that the police in Chicago are doing a great job in cleaning up crime. This is also due to the South Chicago gentrification that is going on.