# Cleaning various Chicago Open Data Portal datasets
#### Abhilash Biswas
#### 11/23/2021

This file merges Census tract id from ACS dataset to the geo-coordinates of 5 datasets obtained from the Chicago open data portal. Additionally, this file also attaches census tract ID to the 84 civilian killings that has happened in Chicago till date. 

The 5 datasets extracted from Chicago open data portal are:
1. Police station locations
2. Fire station locations
3. List of public schools (as of 2019) (2016-17 data also available)
4. List of Parks maintained by Chicago Park district
5. List of all licensed commercial establishments
6. List of all crimes from 2015 to 2019

In addition to these 5, the police killings dataset is obtained from the MPV csv file. 

### Process
1. Get the clean ACS dataset (containing boundary polygons and census ids (geo id)) and convert it into a geopandas dataframe
2. Obtain each of the above 6 predictor datasets and convert it into geopandas dataframe
3. For police stations, fire stations, public schools and parks, attach the number of such units close to each census tract (by calculating the distance of each of these units from the centroid of each census tract polygon)
5. For commercial establishments, police killings and crimes, attach the number of such incidents/units that exist/occured within each census tract
5. Aggregate the information for each predictor at a census tract level
6. Combine all the datasets into 1

### Important
The Chicago crime datasets are very large and hence cannot be uploaded to Github. Make sure you have those individual files in your raw folder before running this. To get those files, click the export as csv button in the following links and change the name to "Crimes_year" after download. 

1. 2015 (change name to crimes_2015) - https://data.cityofchicago.org/Public-Safety/Crimes-2015/vwwp-7yr9
2. 2016 (change name to crimes_2016) - https://data.cityofchicago.org/Public-Safety/Crimes-2016/kf95-mnd6
3. 2017 (change name to crimes_2017) - https://data.cityofchicago.org/Public-Safety/Crimes-2017/d62x-nvdr
4. 2018 (change name to crimes_2018) - https://data.cityofchicago.org/Public-Safety/Crimes-2018/3i3m-jwuy
5. 2019 (change name to crimes_2019) - https://data.cityofchicago.org/Public-Safety/Crimes-2019/w98m-zvie


In [1]:
#Import all packages
import pandas as pd
import censusdata
from tabulate import tabulate
import matplotlib.pyplot as plt
from sodapy import Socrata
import geopandas as gpd
from shapely import wkt
import json
import requests
from pyprojroot import here
from shapely.geometry import Polygon, Point
import haversine as hs
import numpy as np



# Census tract information

In [2]:
#Get the cleaned csv
acs = pd.read_csv(here('./data/CleanACSFile.csv'))

#Convert it into a geopandas dataframe
acs['geometry'] = acs['geometry'].apply(wkt.loads)
gdf_acs = gpd.GeoDataFrame(acs, crs = 'epsg:4326')

gdf_acs

Unnamed: 0,geo_id,B01001_001E,DP02_0002PE,DP02_0004PE,DP02_0006PE,DP02_0010PE,DP02_0014PE,DP02_0015PE,DP02_0016E,DP02_0017E,...,DP05_0018E,DP05_0019PE,DP05_0024PE,DP05_0037PE,DP05_0038PE,DP05_0044PE,DP05_0058PE,DP05_0071PE,geometry,GEOID10
0,1400000US17031010100,4599.0,23.8,2.5,39.5,34.2,21.5,8.7,1.89,3.05,...,35.6,19.9,6.0,46.7,45.2,1.0,3.4,11.4,"MULTIPOLYGON (((-87.67720 42.02294, -87.67007 ...",17031010100
1,1400000US17031010201,7455.0,33.7,7.2,28.3,30.8,28.2,14.9,2.65,3.50,...,34.8,25.6,6.8,46.4,33.8,4.0,8.0,22.4,"MULTIPOLYGON (((-87.68465 42.01949, -87.68045 ...",17031010201
2,1400000US17031010202,2896.0,23.1,13.6,23.0,40.4,26.9,17.9,2.27,3.31,...,35.0,20.3,13.1,46.7,33.9,5.4,1.4,26.0,"MULTIPOLYGON (((-87.67685 42.01941, -87.67339 ...",17031010202
3,1400000US17031010300,6485.0,25.3,7.4,25.2,42.1,17.0,18.7,1.80,2.79,...,42.2,14.5,18.5,59.6,30.9,1.0,4.2,16.9,"MULTIPOLYGON (((-87.67133 42.01937, -87.66950 ...",17031010300
4,1400000US17031010400,5213.0,17.4,5.7,36.4,40.5,12.5,10.7,1.82,2.93,...,25.2,10.7,5.0,70.8,21.3,4.6,1.8,7.5,"MULTIPOLYGON (((-87.66345 42.01283, -87.66133 ...",17031010400
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
793,1400000US17031843500,10169.0,33.0,22.8,25.6,18.6,50.9,10.2,3.58,4.03,...,29.9,3.7,2.2,31.0,63.4,0.3,0.0,26.3,"MULTIPOLYGON (((-87.70504 41.84452, -87.70258 ...",17031843500
794,1400000US17031843600,2898.0,12.5,8.8,18.9,59.7,28.4,20.9,2.06,2.90,...,33.6,26.9,11.7,7.6,87.6,0.9,2.4,7.9,"MULTIPOLYGON (((-87.61150 41.81128, -87.60661 ...",17031843600
795,1400000US17031843700,2527.0,51.3,5.6,17.9,25.1,39.6,22.4,2.55,3.50,...,35.8,27.9,9.8,80.0,4.0,6.7,6.4,25.7,"MULTIPOLYGON (((-87.69676 41.95046, -87.69445 ...",17031843700
796,1400000US17031843800,1520.0,19.8,9.5,31.9,38.8,32.4,32.5,2.23,3.04,...,39.9,22.4,17.4,25.8,66.4,7.3,0.5,7.0,"MULTIPOLYGON (((-87.64554 41.80886, -87.64068 ...",17031843800


## Creating a distance function from a polygon centroid to a geocoordinate

In [3]:
#Calculating distance of each census tract to each PS
def distance(polygon,lat,long):
    centroid = polygon.centroid.coords
    centroid = list(centroid)[0]

    #Exchanging lat longs
    element_1 = centroid[1]
    element_2 = centroid[0]
    centroid = (element_1,element_2) 
    
    #Calculate distance
    point = (float(lat),float(long))
    dist = hs.haversine(centroid,point)
    
    return dist


# Chicago police district station locations

In [13]:
import warnings
warnings.filterwarnings('ignore')

# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cityofchicago.org", None)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.cityofchicago.org,
#                  MyAppToken,
#                  userame="user@example.com",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("z8bn-74gv", limit=2000)

# Convert to pandas DataFrame
police_stations = pd.DataFrame.from_records(results)



In [14]:
police_stations['lat'] = ''
police_stations['long'] = ''


for i in range(0,len(police_stations)):
    police_stations['lat'].iloc[i] = police_stations['location'].iloc[i]['latitude']
    police_stations['long'].iloc[i] = police_stations['location'].iloc[i]['longitude']
    

    
police_stations = police_stations[['district','district_name','zip','location','lat','long']]
police_stations

Unnamed: 0,district,district_name,zip,location,lat,long
0,Headquarters,Headquarters,60653,"{'latitude': '41.8307016873', 'longitude': '-8...",41.8307016873,-87.6233953459
1,18,Near North,60610,"{'latitude': '41.9032416531', 'longitude': '-8...",41.9032416531,-87.6433521393
2,19,Town Hall,60613,"{'latitude': '41.9474004564', 'longitude': '-8...",41.9474004564,-87.651512018
3,20,Lincoln,60625,"{'latitude': '41.9795495131', 'longitude': '-8...",41.9795495131,-87.6928445094
4,22,Morgan Park,60643,"{'latitude': '41.6914347795', 'longitude': '-8...",41.6914347795,-87.6685203937
5,24,Rogers Park,60626,"{'latitude': '41.9997634842', 'longitude': '-8...",41.9997634842,-87.6713242922
6,25,Grand Central,60639,"{'latitude': '41.9186088912', 'longitude': '-8...",41.9186088912,-87.765574479
7,1,Central,60616,"{'latitude': '41.8583725929', 'longitude': '-8...",41.8583725929,-87.627356171
8,2,Wentworth,60609,"{'latitude': '41.8018110912', 'longitude': '-8...",41.8018110912,-87.6305601801
9,3,Grand Crossing,60637,"{'latitude': '41.7664308925', 'longitude': '-8...",41.7664308925,-87.6057478606


In [15]:
#Creating a cross join to pair every census tract with every police station in Chicago

gdf_acs_temp = gdf_acs
gdf_acs_temp['merge'] = 1

police_stations_temp = police_stations
police_stations_temp['merge'] = 1

df = pd.merge(gdf_acs_temp,police_stations_temp, on='merge')


#Applying this function to the df
df['dist'] = df.apply(lambda row: distance(row['geometry'],row['lat'],row['long']), axis = 1)


In [16]:
#Aggregate close by police stations (less than 2 miles) at a census tract level
df['within_distance'] = np.where(df['dist']<=3.2,1,0)

ps_acs = df.groupby(['geo_id'],as_index = False)['within_distance'].sum()
ps_acs.rename(columns = {"within_distance":"police_stations"}, inplace = True)
ps_acs

Unnamed: 0,geo_id,police_stations
0,1400000US17031010100,1
1,1400000US17031010201,1
2,1400000US17031010202,1
3,1400000US17031010300,1
4,1400000US17031010400,1
...,...,...
793,1400000US17031843500,1
794,1400000US17031843600,2
795,1400000US17031843700,1
796,1400000US17031843800,2


# Fire station locations

In [17]:
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cityofchicago.org", None)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.cityofchicago.org,
#                  MyAppToken,
#                  userame="user@example.com",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("28km-gtjn", limit=2000)

# Convert to pandas DataFrame
fire_stations_locations = pd.DataFrame.from_records(results)




In [18]:
fire_stations_locations['lat'] = ''
fire_stations_locations['long'] = ''


for i in range(0,len(fire_stations_locations)):
    fire_stations_locations['lat'].iloc[i] = fire_stations_locations['location'].iloc[i]['latitude']
    fire_stations_locations['long'].iloc[i] = fire_stations_locations['location'].iloc[i]['longitude']
    

    
fire_stations_locations = fire_stations_locations[['name','address','zip','location','lat','long']]
fire_stations_locations

Unnamed: 0,name,address,zip,location,lat,long
0,E5,324 S DESPLAINES ST,60661,"{'latitude': '41.877028304420755', 'longitude'...",41.877028304420755,-87.64430865193455
1,E11,5343 N CUMBERLAND AVE,60656,"{'latitude': '41.97685625348317', 'longitude':...",41.97685625348317,-87.836495886321
2,E81,10458 S HOXIE AVE,60617,"{'latitude': '41.705334319654064', 'longitude'...",41.705334319654064,-87.56088524816063
3,E22,605 W ARMITAGE AVE,60614,"{'latitude': '41.91792047709303', 'longitude':...",41.91792047709303,-87.64396690956342
4,E50,5000 S UNION AVE,60609,"{'latitude': '41.80344788181221', 'longitude':...",41.80344788181221,-87.64299386409898
...,...,...,...,...,...,...
87,E19,3421 S CALUMET AVE,60616,"{'latitude': '41.83227804024279', 'longitude':...",41.83227804024279,-87.61779663851078
88,E26,10 N LEAVITT ST,60612,"{'latitude': '41.88151592134697', 'longitude':...",41.88151592134697,-87.68185534665783
89,E82,817 E 91ST ST,60619,"{'latitude': '41.72933327959225', 'longitude':...",41.72933327959225,-87.60425730151255
90,E34,4034 W 47TH ST,60632,"{'latitude': '41.80790024096418', 'longitude':...",41.80790024096418,-87.72485128276466


In [19]:
#Creating a cross join to pair every census tract with every fire station in Chicago

gdf_acs_temp = gdf_acs
gdf_acs_temp['merge'] = 1

fire_stations_temp = fire_stations_locations
fire_stations_temp['merge'] = 1

df = pd.merge(gdf_acs_temp,fire_stations_temp, on='merge')


#Applying this function to the df
df['dist'] = df.apply(lambda row: distance(row['geometry'],row['lat'],row['long']), axis = 1)

In [20]:
#Aggregate close by fire stations (less than 2 miles) at a census tract level
df['within_distance'] = np.where(df['dist']<=3.2,1,0)

fs_acs = df.groupby(['geo_id'],as_index = False)['within_distance'].sum()
fs_acs.rename(columns = {"within_distance":"fire_stations"}, inplace = True)
fs_acs

Unnamed: 0,geo_id,fire_stations
0,1400000US17031010100,1
1,1400000US17031010201,3
2,1400000US17031010202,3
3,1400000US17031010300,2
4,1400000US17031010400,2
...,...,...
793,1400000US17031843500,5
794,1400000US17031843600,6
795,1400000US17031843700,6
796,1400000US17031843800,6


# Public Schools (as of 2019)

In [21]:
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cityofchicago.org", None)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.cityofchicago.org,
#                  MyAppToken,
#                  userame="user@example.com",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("tz49-n8ze", limit=2000)

# Convert to pandas DataFrame
schools = pd.DataFrame.from_records(results)




In [22]:
schools = schools.rename(columns = {'x':'long','y':'lat'})
schools = schools[['school_id','school_nm','sch_type','lat','long']]
schools

Unnamed: 0,school_id,school_nm,sch_type,lat,long
0,610587,DYETT ARTS HS,Traditional,41.80120417,-87.61223911
1,400111,LEARN - PERKINS,Traditional,41.74312177,-87.66572106
2,610568,PATHWAYS - AVONDALE HS,Options,41.93943321,-87.70520632
3,610027,KIPLING,Traditional,41.72362691,-87.63952072
4,609712,HIRSCH HS,Traditional,41.75374796,-87.60172727
...,...,...,...,...,...
649,610200,THORP J,Traditional,41.73332424,-87.54427998
650,610139,PULLMAN,Traditional,41.68881935,-87.60943097
651,610026,KINZIE,Traditional,41.78996463,-87.7794826
652,609844,CARTER,Traditional,41.78982791,-87.62245275


In [23]:
#Creating a cross join to pair every census tract with every public school in Chicago

gdf_acs_temp = gdf_acs
gdf_acs_temp['merge'] = 1

schools_temp = schools
schools_temp['merge'] = 1

df = pd.merge(gdf_acs_temp,schools_temp, on='merge')


#Applying this function to the df
df['dist'] = df.apply(lambda row: distance(row['geometry'],row['lat'],row['long']), axis = 1)

In [24]:
#Aggregate close by public schools (less than 2 miles) at a census tract level
df['within_distance'] = np.where(df['dist']<=3.2,1,0)

schools_acs = df.groupby(['geo_id'],as_index = False)['within_distance'].sum()
schools_acs.rename(columns = {"within_distance":"public_schools"}, inplace = True)
schools_acs

Unnamed: 0,geo_id,public_schools
0,1400000US17031010100,14
1,1400000US17031010201,17
2,1400000US17031010202,16
3,1400000US17031010300,15
4,1400000US17031010400,17
...,...,...
793,1400000US17031843500,57
794,1400000US17031843600,44
795,1400000US17031843700,41
796,1400000US17031843800,46


# Parks (maintained by Chicago Park district)

In [25]:
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cityofchicago.org", None)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.cityofchicago.org,
#                  MyAppToken,
#                  userame="user@example.com",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("y7qa-tvqx", limit=5000)

# Convert to pandas DataFrame
parks = pd.DataFrame.from_records(results)



In [26]:
parks['lat'] = ''
parks['long'] = ''


for i in range(0,len(parks)):
    parks['lat'].iloc[i] = parks['location'].iloc[i]['latitude']
    parks['long'].iloc[i] = parks['location'].iloc[i]['longitude']
    

    
parks = parks[['park','park_number','location','lat','long']]
parks = parks.drop_duplicates(subset = ['park_number'])

parks

Unnamed: 0,park,park_number,location,lat,long
0,ABBOTT (ROBERT),259,"{'latitude': '41.72096', 'longitude': '-87.621...",41.72096,-87.621351
10,ADA (SAWYER GARRETT),45,"{'latitude': '41.687785', 'longitude': '-87.65...",41.687785,-87.655389
26,ADAMS (GEORGE & ADELE),1019,"{'latitude': '41.91689', 'longitude': '-87.655...",41.91689,-87.655092
28,AIELLO (JOHN),1280,"{'latitude': '41.919151', 'longitude': '-87.77...",41.919151,-87.776356
29,ALGONQUIN,1161,"{'latitude': '41.935202', 'longitude': '-87.69...",41.935202,-87.694918
...,...,...,...,...,...
3997,ROWAN (WILLIAM),248,"{'latitude': '41.686061', 'longitude': '-87.53...",41.686061,-87.538167
4022,RUTHERFORD SAYRE,127,"{'latitude': '41.920557', 'longitude': '-87.79...",41.920557,-87.795929
4051,SCHAEFER (EDWARD),1148,"{'latitude': '41.925746', 'longitude': '-87.66...",41.925746,-87.669035
4060,SENECA,1242,"{'latitude': '41.897006', 'longitude': '-87.62...",41.897006,-87.622414


In [27]:
#Creating a cross join to pair every census tract with every park in Chicago

gdf_acs_temp = gdf_acs
gdf_acs_temp['merge'] = 1

parks_temp = parks
parks_temp['merge'] = 1

df = pd.merge(gdf_acs_temp,parks_temp, on='merge')


#Applying this function to the df
df['dist'] = df.apply(lambda row: distance(row['geometry'],row['lat'],row['long']), axis = 1)

In [28]:
#Aggregate close by parks (less than 1 mile) at a census tract level
df['within_distance'] = np.where(df['dist']<=1.6,1,0)

parks_acs = df.groupby(['geo_id'],as_index = False)['within_distance'].sum()
parks_acs.rename(columns = {"within_distance":"parks"}, inplace = True)
parks_acs

Unnamed: 0,geo_id,parks
0,1400000US17031010100,14
1,1400000US17031010201,18
2,1400000US17031010202,18
3,1400000US17031010300,17
4,1400000US17031010400,12
...,...,...
793,1400000US17031843500,4
794,1400000US17031843600,20
795,1400000US17031843700,13
796,1400000US17031843800,11


# Commerical establishments

In [29]:
# Unauthenticated client only works with public data sets. Note 'None'
# in place of application token, and no username or password:
client = Socrata("data.cityofchicago.org", None)

# Example authenticated client (needed for non-public datasets):
# client = Socrata(data.cityofchicago.org,
#                  MyAppToken,
#                  userame="user@example.com",
#                  password="AFakePassword")

# First 2000 results, returned as JSON from API / converted to Python list of
# dictionaries by sodapy.
results = client.get("uupf-x98q", limit=60000)

# Convert to pandas DataFrame
comm_est = pd.DataFrame.from_records(results)



In [30]:
comm_est.dropna(subset = ['location'], inplace = True)

comm_est['lat'] = ''
comm_est['long'] = ''

#Loop takes some time to run
for i in range(0,len(comm_est)):
    comm_est['lat'].iloc[i] = comm_est['location'].iloc[i]['latitude']
    comm_est['long'].iloc[i] = comm_est['location'].iloc[i]['longitude']
    

    
comm_est = comm_est[['zip_code','license_description','license_id','police_district','location','lat','long']]
comm_est

Unnamed: 0,zip_code,license_description,license_id,police_district,location,lat,long
0,60617,Limited Business License,2785414,4,"{'latitude': '41.720142597897045', 'human_addr...",41.720142597897045,-87.53514092731464
1,60647,Tobacco,2810282,14,"{'latitude': '41.91728227369935', 'human_addre...",41.91728227369935,-87.70777592011419
2,60622,Limited Business License,2750416,14,"{'latitude': '41.90308236038782', 'human_addre...",41.90308236038782,-87.69155216014227
3,60657,Tobacco,2812802,,"{'latitude': '41.93937480738', 'human_address'...",41.93937480738,-87.64452486899746
4,60652,Tobacco,2810267,8,"{'latitude': '41.74247488279688', 'human_addre...",41.74247488279688,-87.70205128738314
...,...,...,...,...,...,...,...
55242,60603,Valet Parking Operator,2786789,1,"{'latitude': '41.87929007329572', 'human_addre...",41.87929007329572,-87.63236272529976
55243,60601,Valet Parking Operator,2791661,1,"{'latitude': '41.88637481521078', 'human_addre...",41.88637481521078,-87.6246754356595
55244,60608,Commercial Garage,2797818,12,"{'latitude': '41.85598353104698', 'human_addre...",41.85598353104698,-87.67314851164002
55245,60603,Valet Parking Operator,2802760,1,"{'latitude': '41.88066457839605', 'human_addre...",41.88066457839605,-87.6270893644417


In [31]:
#Attach geo ids
gdf_comm_est = gpd.GeoDataFrame(
    comm_est, geometry=gpd.points_from_xy(comm_est.long, comm_est.lat), 
    crs = 'epsg:4326')

comm_est_acs = gpd.sjoin(gdf_comm_est, gdf_acs[['geo_id','geometry']], how='left' )

comm_est_acs

Unnamed: 0,zip_code,license_description,license_id,police_district,location,lat,long,geometry,index_right,geo_id
0,60617,Limited Business License,2785414,4,"{'latitude': '41.720142597897045', 'human_addr...",41.720142597897045,-87.53514092731464,POINT (-87.53514 41.72014),499.0,1400000US17031520100
1,60647,Tobacco,2810282,14,"{'latitude': '41.91728227369935', 'human_addre...",41.91728227369935,-87.70777592011419,POINT (-87.70778 41.91728),266.0,1400000US17031222700
2,60622,Limited Business License,2750416,14,"{'latitude': '41.90308236038782', 'human_addre...",41.90308236038782,-87.69155216014227,POINT (-87.69155 41.90308),289.0,1400000US17031241100
3,60657,Tobacco,2812802,,"{'latitude': '41.93937480738', 'human_address'...",41.93937480738,-87.64452486899746,POINT (-87.64452 41.93937),101.0,1400000US17031063100
4,60652,Tobacco,2810267,8,"{'latitude': '41.74247488279688', 'human_addre...",41.74247488279688,-87.70205128738314,POINT (-87.70205 41.74247),638.0,1400000US17031700501
...,...,...,...,...,...,...,...,...,...,...
55242,60603,Valet Parking Operator,2786789,1,"{'latitude': '41.87929007329572', 'human_addre...",41.87929007329572,-87.63236272529976,POINT (-87.63236 41.87929),755.0,1400000US17031839100
55243,60601,Valet Parking Operator,2791661,1,"{'latitude': '41.88637481521078', 'human_addre...",41.88637481521078,-87.6246754356595,POINT (-87.62468 41.88637),386.0,1400000US17031320100
55244,60608,Commercial Garage,2797818,12,"{'latitude': '41.85598353104698', 'human_addre...",41.85598353104698,-87.67314851164002,POINT (-87.67315 41.85598),385.0,1400000US17031310900
55245,60603,Valet Parking Operator,2802760,1,"{'latitude': '41.88066457839605', 'human_addre...",41.88066457839605,-87.6270893644417,POINT (-87.62709 41.88066),387.0,1400000US17031320400


In [32]:
#Aggregate information at a census tract level
comm_est_acs['count'] = 1
comm_est_acs = comm_est_acs.groupby(['geo_id'],as_index = False)['count'].sum()
comm_est_acs.rename(columns = {"count":"commercial_establishments"}, inplace = True)
comm_est_acs

Unnamed: 0,geo_id,commercial_establishments
0,1400000US17031010100,24
1,1400000US17031010201,30
2,1400000US17031010202,58
3,1400000US17031010300,71
4,1400000US17031010400,41
...,...,...
788,1400000US17031843500,121
789,1400000US17031843600,42
790,1400000US17031843700,117
791,1400000US17031843800,32


# Police Killings

In [33]:
police_deaths = pd.read_csv(here("./data/raw/police_killings_MPV.csv"))

df_mask = police_deaths['City'] == "Chicago"
police_deaths = police_deaths[df_mask]

police_deaths = police_deaths[['Street Address of Incident','City','Zipcode','Agency responsible for death','Cause of death','MPV ID']]

police_deaths

Unnamed: 0,Street Address of Incident,City,Zipcode,Agency responsible for death,Cause of death,MPV ID
1,4900 South Lavergne Avenue,Chicago,60638.0,Chicago Police Department,Gunshot,8446
14,1300 block West 19th Street,Chicago,60608.0,Chicago Police Department,Gunshot,8438
663,3600 North Ashland Avenue,Chicago,60613.0,Chicago Police Department,Gunshot,7780
769,2660 East 79th Street,Chicago,60649.0,Chicago Police Department,"Gunshot, Taser",7676
772,2100 North McVicker Ave,Chicago,60639.0,Chicago Police Department,Gunshot,7670
...,...,...,...,...,...,...
7907,W 18th St & S Springfield Ave,Chicago,60623.0,Chicago Police Department,Gunshot,518
8061,1300 South Independence Boulevard,Chicago,60623.0,Chicago Police Department,Gunshot,370
8212,3300 West Wilson Avenue,Chicago,60625.0,Chicago Police Department,Gunshot,217
8221,200 North Homan Avenue,Chicago,60624.0,Chicago Police Department,Gunshot,208


In [34]:
#Getting geo-coordinates for the street addresses
google_apikey = 'AIzaSyDitOkTVs4g0ibg_Yt04DQqLaUYlxZ1o30'

#Defining the function that gets the lat long associated with a street address using google API

def getAddressCoords(input_address, api_key = google_apikey):
    params = {'key' : api_key,
              'address' : input_address}
    url = 'https://maps.googleapis.com/maps/api/geocode/json?'
    response = requests.get(url, params)
    result = json.loads(response.text)
    
    # Check these error codes again - there may be more
    if result['status'] not in ['INVALID_REQUEST', 'ZERO_RESULTS']:
                
        lat = result['results'][0]['geometry']['location']['lat']
        long = result['results'][0]['geometry']['location']['lng']
        place_id = result['results'][0]['place_id']

        return {"lat":lat, "long":long}
    
    # Flagging if there was an error
    else:
        return "Invalid address"
    
#Applying the function to police_deaths dataframe
police_deaths['place_coords'] = police_deaths[['Street Address of Incident']].apply(getAddressCoords, axis=1)

police_deaths

Unnamed: 0,Street Address of Incident,City,Zipcode,Agency responsible for death,Cause of death,MPV ID,place_coords
1,4900 South Lavergne Avenue,Chicago,60638.0,Chicago Police Department,Gunshot,8446,Invalid address
14,1300 block West 19th Street,Chicago,60608.0,Chicago Police Department,Gunshot,8438,"{'lat': 30.1829279, 'long': -85.68045610000001}"
663,3600 North Ashland Avenue,Chicago,60613.0,Chicago Police Department,Gunshot,7780,"{'lat': 41.9473431, 'long': -87.6693382}"
769,2660 East 79th Street,Chicago,60649.0,Chicago Police Department,"Gunshot, Taser",7676,"{'lat': 41.7521292, 'long': -87.5591115}"
772,2100 North McVicker Ave,Chicago,60639.0,Chicago Police Department,Gunshot,7670,"{'lat': 41.9180154, 'long': -87.77708539999999}"
...,...,...,...,...,...,...,...
7907,W 18th St & S Springfield Ave,Chicago,60623.0,Chicago Police Department,Gunshot,518,"{'lat': 41.8570763, 'long': -87.7223995}"
8061,1300 South Independence Boulevard,Chicago,60623.0,Chicago Police Department,Gunshot,370,"{'lat': 41.8642567, 'long': -87.7204673}"
8212,3300 West Wilson Avenue,Chicago,60625.0,Chicago Police Department,Gunshot,217,"{'lat': 41.96487219999999, 'long': -87.7110163..."
8221,200 North Homan Avenue,Chicago,60624.0,Chicago Police Department,Gunshot,208,"{'lat': 41.8847163, 'long': -87.71110279999999}"


In [35]:
#Creating a lat long column

police_deaths = police_deaths[police_deaths['place_coords'] != 'Invalid address']

police_deaths['lat'] = police_deaths['place_coords'].apply(lambda x: x.get('lat'))
police_deaths['long'] = police_deaths['place_coords'].apply(lambda x: x.get('long'))

police_deaths

Unnamed: 0,Street Address of Incident,City,Zipcode,Agency responsible for death,Cause of death,MPV ID,place_coords,lat,long
14,1300 block West 19th Street,Chicago,60608.0,Chicago Police Department,Gunshot,8438,"{'lat': 30.1829279, 'long': -85.68045610000001}",30.182928,-85.680456
663,3600 North Ashland Avenue,Chicago,60613.0,Chicago Police Department,Gunshot,7780,"{'lat': 41.9473431, 'long': -87.6693382}",41.947343,-87.669338
769,2660 East 79th Street,Chicago,60649.0,Chicago Police Department,"Gunshot, Taser",7676,"{'lat': 41.7521292, 'long': -87.5591115}",41.752129,-87.559112
772,2100 North McVicker Ave,Chicago,60639.0,Chicago Police Department,Gunshot,7670,"{'lat': 41.9180154, 'long': -87.77708539999999}",41.918015,-87.777085
916,4318 W Irving Park Rd,Chicago,60641.0,Des Plaines Police Department,Gunshot,7526,"{'lat': 41.9537193, 'long': -87.7362994}",41.953719,-87.736299
...,...,...,...,...,...,...,...,...,...
7907,W 18th St & S Springfield Ave,Chicago,60623.0,Chicago Police Department,Gunshot,518,"{'lat': 41.8570763, 'long': -87.7223995}",41.857076,-87.722399
8061,1300 South Independence Boulevard,Chicago,60623.0,Chicago Police Department,Gunshot,370,"{'lat': 41.8642567, 'long': -87.7204673}",41.864257,-87.720467
8212,3300 West Wilson Avenue,Chicago,60625.0,Chicago Police Department,Gunshot,217,"{'lat': 41.96487219999999, 'long': -87.7110163...",41.964872,-87.711016
8221,200 North Homan Avenue,Chicago,60624.0,Chicago Police Department,Gunshot,208,"{'lat': 41.8847163, 'long': -87.71110279999999}",41.884716,-87.711103


In [36]:
#Attach geo ids
gdf_police_deaths = gpd.GeoDataFrame(
    police_deaths, geometry=gpd.points_from_xy(police_deaths.long, police_deaths.lat), 
    crs = 'epsg:4326')

police_deaths_acs = gpd.sjoin(gdf_police_deaths, gdf_acs[['geo_id','geometry']], how='left' )

police_deaths_acs

Unnamed: 0,Street Address of Incident,City,Zipcode,Agency responsible for death,Cause of death,MPV ID,place_coords,lat,long,geometry,index_right,geo_id
14,1300 block West 19th Street,Chicago,60608.0,Chicago Police Department,Gunshot,8438,"{'lat': 30.1829279, 'long': -85.68045610000001}",30.182928,-85.680456,POINT (-85.68046 30.18293),,
663,3600 North Ashland Avenue,Chicago,60613.0,Chicago Police Department,Gunshot,7780,"{'lat': 41.9473431, 'long': -87.6693382}",41.947343,-87.669338,POINT (-87.66934 41.94734),78.0,1400000US17031060300
769,2660 East 79th Street,Chicago,60649.0,Chicago Police Department,"Gunshot, Taser",7676,"{'lat': 41.7521292, 'long': -87.5591115}",41.752129,-87.559112,POINT (-87.55911 41.75213),452.0,1400000US17031431301
772,2100 North McVicker Ave,Chicago,60639.0,Chicago Police Department,Gunshot,7670,"{'lat': 41.9180154, 'long': -87.77708539999999}",41.918015,-87.777085,POINT (-87.77709 41.91802),231.0,1400000US17031191301
916,4318 W Irving Park Rd,Chicago,60641.0,Des Plaines Police Department,Gunshot,7526,"{'lat': 41.9537193, 'long': -87.7362994}",41.953719,-87.736299,POINT (-87.73630 41.95372),191.0,1400000US17031160200
...,...,...,...,...,...,...,...,...,...,...,...,...
7907,W 18th St & S Springfield Ave,Chicago,60623.0,Chicago Police Department,Gunshot,518,"{'lat': 41.8570763, 'long': -87.7223995}",41.857076,-87.722399,POINT (-87.72240 41.85708),363.0,1400000US17031292400
8061,1300 South Independence Boulevard,Chicago,60623.0,Chicago Police Department,Gunshot,370,"{'lat': 41.8642567, 'long': -87.7204673}",41.864257,-87.720467,POINT (-87.72047 41.86426),752.0,1400000US17031838700
8212,3300 West Wilson Avenue,Chicago,60625.0,Chicago Police Department,Gunshot,217,"{'lat': 41.96487219999999, 'long': -87.7110163...",41.964872,-87.711016,POINT (-87.71102 41.96487),175.0,1400000US17031140702
8221,200 North Homan Avenue,Chicago,60624.0,Chicago Police Department,Gunshot,208,"{'lat': 41.8847163, 'long': -87.71110279999999}",41.884716,-87.711103,POINT (-87.71110 41.88472),740.0,1400000US17031836800


In [37]:
#Aggregate information at a census tract level
police_deaths_acs['count'] = 1
police_deaths_acs = police_deaths_acs.groupby(['geo_id'],as_index = False)['count'].sum()
police_deaths_acs.rename(columns = {"count":"number_of_police_killings"}, inplace = True)
police_deaths_acs

Unnamed: 0,geo_id,number_of_police_killings
0,1400000US17031020801,1
1,1400000US17031060300,1
2,1400000US17031071500,1
3,1400000US17031110400,1
4,1400000US17031140702,1
...,...,...
63,1400000US17031838700,2
64,1400000US17031839800,1
65,1400000US17031843000,2
66,1400000US17031843400,1


# Reported crimes (2015-2019)

### 2015

In [49]:
crimes_2015 = pd.read_csv(here("./data/raw/Crimes_2015.csv"))
crimes_2015


Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10224738,HY411648,09/05/2015 01:30:00 PM,043XX S WOOD ST,0486,BATTERY,DOMESTIC BATTERY SIMPLE,RESIDENCE,False,True,...,12.0,61,08B,1165074.0,1875917.0,2015,02/10/2018 03:50:01 PM,41.815117,-87.670000,"(41.815117282, -87.669999562)"
1,10224739,HY411615,09/04/2015 11:30:00 AM,008XX N CENTRAL AVE,0870,THEFT,POCKET-PICKING,CTA BUS,False,False,...,29.0,25,06,1138875.0,1904869.0,2015,02/10/2018 03:50:01 PM,41.895080,-87.765400,"(41.895080471, -87.765400451)"
2,10224740,HY411595,09/05/2015 12:45:00 PM,035XX W BARRY AVE,2023,NARCOTICS,POSS: HEROIN(BRN/TAN),SIDEWALK,True,False,...,35.0,21,18,1152037.0,1920384.0,2015,02/10/2018 03:50:01 PM,41.937406,-87.716650,"(41.937405765, -87.716649687)"
3,10224741,HY411610,09/05/2015 01:00:00 PM,0000X N LARAMIE AVE,0560,ASSAULT,SIMPLE,APARTMENT,False,True,...,28.0,25,08A,1141706.0,1900086.0,2015,02/10/2018 03:50:01 PM,41.881903,-87.755121,"(41.881903443, -87.755121152)"
4,10224742,HY411435,09/05/2015 10:55:00 AM,082XX S LOOMIS BLVD,0610,BURGLARY,FORCIBLE ENTRY,RESIDENCE,False,False,...,21.0,71,05,1168430.0,1850165.0,2015,02/10/2018 03:50:01 PM,41.744379,-87.658431,"(41.744378879, -87.658430635)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
264604,10317357,HY506310,11/18/2015 02:56:00 PM,057XX W AUGUSTA BLVD,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,29.0,25,04B,1137754.0,1906086.0,2015,11/26/2021 03:45:45 PM,41.898440,-87.769488,"(41.898440359, -87.769488281)"
264605,10349425,HY540230,12/09/2015 09:15:00 AM,014XX W PERSHING RD,1537,OFFENSE INVOLVING CHILDREN,POSSESSION OF PORNOGRAPHIC PRINT,GOVERNMENT BUILDING / PROPERTY,False,False,...,11.0,59,26,1167297.0,1878878.0,2015,12/01/2021 03:49:12 PM,41.823195,-87.661760,"(41.823195202, -87.661760367)"
264606,12477732,JE367243,10/10/2015 03:49:00 PM,104XX S AVENUE E,1752,OFFENSE INVOLVING CHILDREN,AGGRAVATED CRIMINAL SEXUAL ABUSE BY FAMILY MEMBER,RESIDENCE,False,True,...,10.0,52,17,,,2015,11/27/2021 03:49:46 PM,,,
264607,21879,HY271945,05/23/2015 01:15:00 PM,000XX N LARAMIE AVE,0110,HOMICIDE,FIRST DEGREE MURDER,PARKING LOT,True,False,...,28.0,25,01A,1141718.0,1899725.0,2015,12/01/2021 03:49:12 PM,41.880913,-87.755086,"(41.880912591, -87.75508602)"


In [50]:
crimes_2015 = crimes_2015.rename(columns = {"Latitude":"lat","Longitude":"long","Primary Type":"primary_type"})

crimes_2015 = crimes_2015.dropna()
crimes_2015

Unnamed: 0,ID,Case Number,Date,Block,IUCR,primary_type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,lat,long,Location
0,10224738,HY411648,09/05/2015 01:30:00 PM,043XX S WOOD ST,0486,BATTERY,DOMESTIC BATTERY SIMPLE,RESIDENCE,False,True,...,12.0,61,08B,1165074.0,1875917.0,2015,02/10/2018 03:50:01 PM,41.815117,-87.670000,"(41.815117282, -87.669999562)"
1,10224739,HY411615,09/04/2015 11:30:00 AM,008XX N CENTRAL AVE,0870,THEFT,POCKET-PICKING,CTA BUS,False,False,...,29.0,25,06,1138875.0,1904869.0,2015,02/10/2018 03:50:01 PM,41.895080,-87.765400,"(41.895080471, -87.765400451)"
2,10224740,HY411595,09/05/2015 12:45:00 PM,035XX W BARRY AVE,2023,NARCOTICS,POSS: HEROIN(BRN/TAN),SIDEWALK,True,False,...,35.0,21,18,1152037.0,1920384.0,2015,02/10/2018 03:50:01 PM,41.937406,-87.716650,"(41.937405765, -87.716649687)"
3,10224741,HY411610,09/05/2015 01:00:00 PM,0000X N LARAMIE AVE,0560,ASSAULT,SIMPLE,APARTMENT,False,True,...,28.0,25,08A,1141706.0,1900086.0,2015,02/10/2018 03:50:01 PM,41.881903,-87.755121,"(41.881903443, -87.755121152)"
4,10224742,HY411435,09/05/2015 10:55:00 AM,082XX S LOOMIS BLVD,0610,BURGLARY,FORCIBLE ENTRY,RESIDENCE,False,False,...,21.0,71,05,1168430.0,1850165.0,2015,02/10/2018 03:50:01 PM,41.744379,-87.658431,"(41.744378879, -87.658430635)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
264603,10206921,HY394105,08/23/2015 03:20:00 AM,081XX S WHIPPLE ST,0281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,RESIDENCE,False,False,...,18.0,70,02,1157458.0,1850583.0,2015,11/30/2021 03:43:46 PM,41.745755,-87.698622,"(41.745755051, -87.698622295)"
264604,10317357,HY506310,11/18/2015 02:56:00 PM,057XX W AUGUSTA BLVD,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,29.0,25,04B,1137754.0,1906086.0,2015,11/26/2021 03:45:45 PM,41.898440,-87.769488,"(41.898440359, -87.769488281)"
264605,10349425,HY540230,12/09/2015 09:15:00 AM,014XX W PERSHING RD,1537,OFFENSE INVOLVING CHILDREN,POSSESSION OF PORNOGRAPHIC PRINT,GOVERNMENT BUILDING / PROPERTY,False,False,...,11.0,59,26,1167297.0,1878878.0,2015,12/01/2021 03:49:12 PM,41.823195,-87.661760,"(41.823195202, -87.661760367)"
264607,21879,HY271945,05/23/2015 01:15:00 PM,000XX N LARAMIE AVE,0110,HOMICIDE,FIRST DEGREE MURDER,PARKING LOT,True,False,...,28.0,25,01A,1141718.0,1899725.0,2015,12/01/2021 03:49:12 PM,41.880913,-87.755086,"(41.880912591, -87.75508602)"


In [51]:
#Attach geo ids
gdf_crimes_2015 = gpd.GeoDataFrame(
    crimes_2015, geometry=gpd.points_from_xy(crimes_2015.long, crimes_2015.lat), 
    crs = 'epsg:4326')

crimes_2015_acs = gpd.sjoin(gdf_crimes_2015, gdf_acs[['geo_id','geometry']], how='left' )

crimes_2015_acs

Unnamed: 0,ID,Case Number,Date,Block,IUCR,primary_type,Description,Location Description,Arrest,Domestic,...,X Coordinate,Y Coordinate,Year,Updated On,lat,long,Location,geometry,index_right,geo_id
0,10224738,HY411648,09/05/2015 01:30:00 PM,043XX S WOOD ST,0486,BATTERY,DOMESTIC BATTERY SIMPLE,RESIDENCE,False,True,...,1165074.0,1875917.0,2015,02/10/2018 03:50:01 PM,41.815117,-87.670000,"(41.815117282, -87.669999562)",POINT (-87.67000 41.81512),547.0,1400000US17031610300
1,10224739,HY411615,09/04/2015 11:30:00 AM,008XX N CENTRAL AVE,0870,THEFT,POCKET-PICKING,CTA BUS,False,False,...,1138875.0,1904869.0,2015,02/10/2018 03:50:01 PM,41.895080,-87.765400,"(41.895080471, -87.765400451)",POINT (-87.76540 41.89508),320.0,1400000US17031251200
2,10224740,HY411595,09/05/2015 12:45:00 PM,035XX W BARRY AVE,2023,NARCOTICS,POSS: HEROIN(BRN/TAN),SIDEWALK,True,False,...,1152037.0,1920384.0,2015,02/10/2018 03:50:01 PM,41.937406,-87.716650,"(41.937405765, -87.716649687)",POINT (-87.71665 41.93741),242.0,1400000US17031210601
3,10224741,HY411610,09/05/2015 01:00:00 PM,0000X N LARAMIE AVE,0560,ASSAULT,SIMPLE,APARTMENT,False,True,...,1141706.0,1900086.0,2015,02/10/2018 03:50:01 PM,41.881903,-87.755121,"(41.881903443, -87.755121152)",POINT (-87.75512 41.88190),326.0,1400000US17031251800
4,10224742,HY411435,09/05/2015 10:55:00 AM,082XX S LOOMIS BLVD,0610,BURGLARY,FORCIBLE ENTRY,RESIDENCE,False,False,...,1168430.0,1850165.0,2015,02/10/2018 03:50:01 PM,41.744379,-87.658431,"(41.744378879, -87.658430635)",POINT (-87.65843 41.74438),646.0,1400000US17031710700
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
264603,10206921,HY394105,08/23/2015 03:20:00 AM,081XX S WHIPPLE ST,0281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,RESIDENCE,False,False,...,1157458.0,1850583.0,2015,11/30/2021 03:43:46 PM,41.745755,-87.698622,"(41.745755051, -87.698622295)",POINT (-87.69862 41.74576),638.0,1400000US17031700501
264604,10317357,HY506310,11/18/2015 02:56:00 PM,057XX W AUGUSTA BLVD,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,1137754.0,1906086.0,2015,11/26/2021 03:45:45 PM,41.898440,-87.769488,"(41.898440359, -87.769488281)",POINT (-87.76949 41.89844),321.0,1400000US17031251300
264605,10349425,HY540230,12/09/2015 09:15:00 AM,014XX W PERSHING RD,1537,OFFENSE INVOLVING CHILDREN,POSSESSION OF PORNOGRAPHIC PRINT,GOVERNMENT BUILDING / PROPERTY,False,False,...,1167297.0,1878878.0,2015,12/01/2021 03:49:12 PM,41.823195,-87.661760,"(41.823195202, -87.661760367)",POINT (-87.66176 41.82320),785.0,1400000US17031842600
264607,21879,HY271945,05/23/2015 01:15:00 PM,000XX N LARAMIE AVE,0110,HOMICIDE,FIRST DEGREE MURDER,PARKING LOT,True,False,...,1141718.0,1899725.0,2015,12/01/2021 03:49:12 PM,41.880913,-87.755086,"(41.880912591, -87.75508602)",POINT (-87.75509 41.88091),326.0,1400000US17031251800


In [52]:
#Aggregate information at a census tract level
crimes_2015_acs['count'] = 1
crimes_2015_acs = crimes_2015_acs.groupby(['geo_id','primary_type'],as_index = False)['count'].sum()

table_2015 = pd.pivot_table(crimes_2015_acs, values = 'count', index = ['geo_id'], columns = ['primary_type'], aggfunc = np.sum)
table_2015 = table_2015.fillna(0)
table_2015

primary_type,ARSON,ASSAULT,BATTERY,BURGLARY,CONCEALED CARRY LICENSE VIOLATION,CRIM SEXUAL ASSAULT,CRIMINAL DAMAGE,CRIMINAL SEXUAL ASSAULT,CRIMINAL TRESPASS,DECEPTIVE PRACTICE,...,OTHER NARCOTIC VIOLATION,OTHER OFFENSE,PROSTITUTION,PUBLIC INDECENCY,PUBLIC PEACE VIOLATION,ROBBERY,SEX OFFENSE,STALKING,THEFT,WEAPONS VIOLATION
geo_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1400000US17031010100,0.0,37.0,138.0,28.0,0.0,2.0,87.0,0.0,25.0,25.0,...,0.0,45.0,0.0,0.0,6.0,11.0,1.0,2.0,75.0,7.0
1400000US17031010201,1.0,43.0,98.0,13.0,0.0,1.0,57.0,0.0,6.0,16.0,...,0.0,27.0,0.0,0.0,2.0,8.0,2.0,0.0,52.0,3.0
1400000US17031010202,0.0,26.0,70.0,15.0,0.0,4.0,34.0,0.0,27.0,16.0,...,0.0,18.0,0.0,0.0,1.0,11.0,2.0,0.0,124.0,2.0
1400000US17031010300,2.0,22.0,69.0,25.0,0.0,5.0,38.0,0.0,11.0,16.0,...,0.0,27.0,0.0,0.0,0.0,12.0,4.0,0.0,86.0,1.0
1400000US17031010400,0.0,11.0,47.0,15.0,0.0,3.0,26.0,0.0,10.0,13.0,...,0.0,21.0,0.0,0.0,2.0,6.0,3.0,0.0,60.0,2.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1400000US17031843500,1.0,22.0,58.0,8.0,0.0,0.0,34.0,0.0,1.0,7.0,...,0.0,32.0,0.0,0.0,3.0,6.0,1.0,0.0,108.0,4.0
1400000US17031843600,0.0,29.0,68.0,15.0,0.0,1.0,26.0,0.0,8.0,21.0,...,0.0,22.0,0.0,0.0,0.0,16.0,1.0,0.0,63.0,1.0
1400000US17031843700,0.0,17.0,30.0,7.0,0.0,0.0,17.0,0.0,9.0,20.0,...,0.0,11.0,0.0,0.0,0.0,7.0,2.0,0.0,88.0,1.0
1400000US17031843800,1.0,30.0,86.0,13.0,0.0,0.0,52.0,0.0,6.0,16.0,...,0.0,30.0,1.0,0.0,6.0,17.0,0.0,1.0,50.0,12.0


### 2016

In [53]:
crimes_2016 = pd.read_csv(here("./data/raw/Crimes_2016.csv"))
crimes_2016

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,11645836,JC212333,05/01/2016 12:25:00 AM,055XX S ROCKWELL ST,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,,False,False,...,15,63,11,,,2016,04/06/2019 04:04:43 PM,,,
1,11043021,JA367631,10/19/2016 07:00:00 PM,075XX S YATES BLVD,0610,BURGLARY,FORCIBLE ENTRY,RESTAURANT,False,False,...,7,43,05,,,2016,08/05/2017 03:50:08 PM,,,
2,11243066,JB168427,03/29/2016 07:00:00 AM,067XX S RIDGELAND AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,OTHER,False,False,...,5,43,11,,,2016,03/01/2018 03:54:55 PM,,,
3,11243020,HZ184094,03/11/2016 11:00:00 PM,052XX N ST LOUIS AVE,0281,CRIM SEXUAL ASSAULT,NON-AGGRAVATED,RESIDENCE PORCH/HALLWAY,False,False,...,39,13,02,,,2016,03/01/2018 03:54:55 PM,,,
4,11227940,JB148122,01/01/2016 11:00:00 AM,108XX S CALUMET AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,9,49,11,,,2016,02/12/2018 03:49:14 PM,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
269638,12544819,JE449346,01/08/2016 12:00:00 AM,120XX S YALE AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,False,False,...,34,53,11,,,2016,11/19/2021 03:50:26 PM,,,
269639,12546334,JE451668,09/01/2016 12:00:00 PM,042XX N LONG AVE,0266,CRIMINAL SEXUAL ASSAULT,PREDATORY,PARK PROPERTY,False,False,...,38,15,02,,,2016,11/21/2021 03:48:03 PM,,,
269640,23034,HZ563253,12/23/2016 11:19:00 PM,005XX N LARAMIE AVE,0110,HOMICIDE,FIRST DEGREE MURDER,ALLEY,True,False,...,37,25,01A,1141599.0,1903227.0,2016,11/29/2021 03:47:47 PM,41.890525,-87.755436,"(41.890524711, -87.755436387)"
269641,23035,HZ563253,12/23/2016 11:44:00 PM,005XX N LARAMIE AVE,0110,HOMICIDE,FIRST DEGREE MURDER,ALLEY,True,False,...,37,25,01A,1141599.0,1903227.0,2016,11/29/2021 03:47:47 PM,41.890525,-87.755436,"(41.890524711, -87.755436387)"


In [54]:
crimes_2016 = crimes_2016.rename(columns = {"Latitude":"lat","Longitude":"long","Primary Type":"primary_type"})

crimes_2016 = crimes_2016.dropna()
crimes_2016

Unnamed: 0,ID,Case Number,Date,Block,IUCR,primary_type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,lat,long,Location
99,10508127,HZ249588,05/03/2016 09:45:00 AM,055XX W VAN BUREN ST,051A,ASSAULT,AGGRAVATED - HANDGUN,STREET,False,False,...,29,25,04A,1139747.0,1897469.0,2016,03/25/2020 03:45:43 PM,41.874758,-87.762379,"(41.874758047, -87.762378601)"
100,10527576,HZ270298,05/19/2016 12:00:00 AM,026XX W NORTH AVE,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,1,24,04B,1158611.0,1910540.0,2016,03/25/2020 03:45:43 PM,41.910261,-87.692759,"(41.910260802, -87.692759327)"
101,10531884,HZ275449,05/22/2016 09:25:00 PM,057XX W BLOOMINGDALE AVE,041A,BATTERY,AGGRAVATED - HANDGUN,PARK PROPERTY,False,False,...,29,25,04B,1137890.0,1911257.0,2016,03/25/2020 03:45:43 PM,41.912628,-87.768864,"(41.912627751, -87.768863775)"
102,10542566,HZ288250,06/01/2016 01:22:00 AM,013XX W LELAND AVE,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,False,False,...,46,3,04B,1166470.0,1931330.0,2016,03/25/2020 03:45:43 PM,41.967145,-87.663292,"(41.967144912, -87.663291549)"
103,10555317,HZ301659,06/10/2016 03:00:00 PM,007XX S HOMAN AVE,051A,ASSAULT,AGGRAVATED - HANDGUN,STREET,False,False,...,24,27,04A,1153819.0,1896654.0,2016,03/25/2020 03:45:43 PM,41.872253,-87.710733,"(41.872253112, -87.710733482)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
269627,10640340,HZ392191,08/14/2016 08:46:00 PM,004XX E 111TH ST,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,9,49,04B,1181108.0,1831452.0,2016,10/21/2021 04:48:08 PM,41.692746,-87.612551,"(41.692745804, -87.61255129)"
269629,10550067,HZ296742,06/07/2016 04:28:00 AM,018XX N MARSHFIELD AVE,1020,ARSON,BY FIRE,PARKING LOT / GARAGE (NON RESIDENTIAL),False,False,...,32,22,09,1165055.0,1912362.0,2016,10/24/2021 04:44:33 PM,41.915126,-87.669035,"(41.915125989, -87.669034784)"
269635,10790343,HZ560307,12/21/2016 03:00:00 PM,069XX S KOMENSKY AVE,1120,DECEPTIVE PRACTICE,FORGERY,OTHER (SPECIFY),False,False,...,13,65,10,1150565.0,1858207.0,2016,11/10/2021 03:49:29 PM,41.766813,-87.723682,"(41.766813349, -87.723681935)"
269640,23034,HZ563253,12/23/2016 11:19:00 PM,005XX N LARAMIE AVE,0110,HOMICIDE,FIRST DEGREE MURDER,ALLEY,True,False,...,37,25,01A,1141599.0,1903227.0,2016,11/29/2021 03:47:47 PM,41.890525,-87.755436,"(41.890524711, -87.755436387)"


In [56]:
#Attach geo ids
gdf_crimes_2016 = gpd.GeoDataFrame(
    crimes_2016, geometry=gpd.points_from_xy(crimes_2016.long, crimes_2016.lat), 
    crs = 'epsg:4326')

crimes_2016_acs = gpd.sjoin(gdf_crimes_2016, gdf_acs[['geo_id','geometry']], how='left' )

crimes_2016_acs

Unnamed: 0,ID,Case Number,Date,Block,IUCR,primary_type,Description,Location Description,Arrest,Domestic,...,X Coordinate,Y Coordinate,Year,Updated On,lat,long,Location,geometry,index_right,geo_id
99,10508127,HZ249588,05/03/2016 09:45:00 AM,055XX W VAN BUREN ST,051A,ASSAULT,AGGRAVATED - HANDGUN,STREET,False,False,...,1139747.0,1897469.0,2016,03/25/2020 03:45:43 PM,41.874758,-87.762379,"(41.874758047, -87.762378601)",POINT (-87.76238 41.87476),330.0,1400000US17031252102
100,10527576,HZ270298,05/19/2016 12:00:00 AM,026XX W NORTH AVE,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,1158611.0,1910540.0,2016,03/25/2020 03:45:43 PM,41.910261,-87.692759,"(41.910260802, -87.692759327)",POINT (-87.69276 41.91026),288.0,1400000US17031241000
101,10531884,HZ275449,05/22/2016 09:25:00 PM,057XX W BLOOMINGDALE AVE,041A,BATTERY,AGGRAVATED - HANDGUN,PARK PROPERTY,False,False,...,1137890.0,1911257.0,2016,03/25/2020 03:45:43 PM,41.912628,-87.768864,"(41.912627751, -87.768863775)",POINT (-87.76886 41.91263),313.0,1400000US17031250400
102,10542566,HZ288250,06/01/2016 01:22:00 AM,013XX W LELAND AVE,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,False,False,...,1166470.0,1931330.0,2016,03/25/2020 03:45:43 PM,41.967145,-87.663292,"(41.967144912, -87.663291549)",POINT (-87.66329 41.96714),49.0,1400000US17031031700
103,10555317,HZ301659,06/10/2016 03:00:00 PM,007XX S HOMAN AVE,051A,ASSAULT,AGGRAVATED - HANDGUN,STREET,False,False,...,1153819.0,1896654.0,2016,03/25/2020 03:45:43 PM,41.872253,-87.710733,"(41.872253112, -87.710733482)",POINT (-87.71073 41.87225),744.0,1400000US17031837300
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
269627,10640340,HZ392191,08/14/2016 08:46:00 PM,004XX E 111TH ST,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,1181108.0,1831452.0,2016,10/21/2021 04:48:08 PM,41.692746,-87.612551,"(41.692745804, -87.61255129)",POINT (-87.61255 41.69275),486.0,1400000US17031490901
269629,10550067,HZ296742,06/07/2016 04:28:00 AM,018XX N MARSHFIELD AVE,1020,ARSON,BY FIRE,PARKING LOT / GARAGE (NON RESIDENTIAL),False,False,...,1165055.0,1912362.0,2016,10/24/2021 04:44:33 PM,41.915126,-87.669035,"(41.915125989, -87.669034784)",POINT (-87.66903 41.91513),706.0,1400000US17031832300
269635,10790343,HZ560307,12/21/2016 03:00:00 PM,069XX S KOMENSKY AVE,1120,DECEPTIVE PRACTICE,FORGERY,OTHER (SPECIFY),False,False,...,1150565.0,1858207.0,2016,11/10/2021 03:49:29 PM,41.766813,-87.723682,"(41.766813349, -87.723681935)",POINT (-87.72368 41.76681),585.0,1400000US17031650500
269640,23034,HZ563253,12/23/2016 11:19:00 PM,005XX N LARAMIE AVE,0110,HOMICIDE,FIRST DEGREE MURDER,ALLEY,True,False,...,1141599.0,1903227.0,2016,11/29/2021 03:47:47 PM,41.890525,-87.755436,"(41.890524711, -87.755436387)",POINT (-87.75544 41.89052),324.0,1400000US17031251600


In [57]:
#Aggregate information at a census tract level
crimes_2016_acs['count'] = 1
crimes_2016_acs = crimes_2016_acs.groupby(['geo_id','primary_type'],as_index = False)['count'].sum()

table_2016 = pd.pivot_table(crimes_2016_acs, values = 'count', index = ['geo_id'], columns = ['primary_type'], aggfunc = np.sum)
table_2016 = table_2016.fillna(0)
table_2016

primary_type,ARSON,ASSAULT,BATTERY,BURGLARY,CONCEALED CARRY LICENSE VIOLATION,CRIM SEXUAL ASSAULT,CRIMINAL DAMAGE,CRIMINAL SEXUAL ASSAULT,CRIMINAL TRESPASS,DECEPTIVE PRACTICE,...,OTHER NARCOTIC VIOLATION,OTHER OFFENSE,PROSTITUTION,PUBLIC INDECENCY,PUBLIC PEACE VIOLATION,ROBBERY,SEX OFFENSE,STALKING,THEFT,WEAPONS VIOLATION
geo_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1400000US17031010100,0.0,37.0,124.0,20.0,0.0,3.0,67.0,2.0,15.0,27.0,...,0.0,39.0,0.0,0.0,5.0,11.0,2.0,1.0,89.0,8.0
1400000US17031010201,1.0,35.0,95.0,24.0,0.0,3.0,60.0,0.0,14.0,18.0,...,0.0,28.0,0.0,0.0,0.0,21.0,5.0,0.0,66.0,3.0
1400000US17031010202,0.0,27.0,69.0,9.0,0.0,2.0,35.0,1.0,27.0,26.0,...,0.0,12.0,0.0,0.0,1.0,25.0,2.0,0.0,119.0,6.0
1400000US17031010300,1.0,19.0,90.0,15.0,0.0,5.0,30.0,0.0,7.0,26.0,...,0.0,31.0,0.0,1.0,1.0,15.0,1.0,0.0,95.0,0.0
1400000US17031010400,0.0,17.0,67.0,18.0,0.0,3.0,29.0,0.0,11.0,31.0,...,0.0,22.0,0.0,0.0,1.0,6.0,4.0,1.0,93.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1400000US17031843500,2.0,22.0,57.0,9.0,0.0,3.0,29.0,0.0,3.0,15.0,...,0.0,37.0,0.0,0.0,3.0,6.0,2.0,0.0,118.0,0.0
1400000US17031843600,0.0,21.0,66.0,25.0,0.0,2.0,28.0,0.0,9.0,26.0,...,0.0,21.0,0.0,0.0,1.0,24.0,1.0,0.0,63.0,2.0
1400000US17031843700,0.0,12.0,33.0,11.0,0.0,0.0,30.0,0.0,8.0,11.0,...,0.0,24.0,0.0,0.0,1.0,11.0,1.0,0.0,94.0,0.0
1400000US17031843800,2.0,33.0,78.0,21.0,0.0,2.0,39.0,0.0,6.0,15.0,...,0.0,27.0,0.0,0.0,2.0,17.0,1.0,1.0,50.0,6.0


### 2017

In [58]:
crimes_2017 = pd.read_csv(here("./data/raw/Crimes_2017.csv"))
crimes_2017

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,11227287,JB147188,10/08/2017 03:00:00 AM,092XX S RACINE AVE,0281,CRIM SEXUAL ASSAULT,NON-AGGRAVATED,RESIDENCE,False,False,...,21.0,73,02,,,2017,02/11/2018 03:57:41 PM,,,
1,11227583,JB147595,03/28/2017 02:00:00 PM,026XX W 79TH ST,0620,BURGLARY,UNLAWFUL ENTRY,OTHER,False,False,...,18.0,70,05,,,2017,02/11/2018 03:57:41 PM,,,
2,11227293,JB147230,09/09/2017 08:17:00 PM,060XX S EBERHART AVE,0810,THEFT,OVER $500,RESIDENCE,False,False,...,20.0,42,06,,,2017,02/11/2018 03:57:41 PM,,,
3,11227634,JB147599,08/26/2017 10:00:00 AM,001XX W RANDOLPH ST,0281,CRIM SEXUAL ASSAULT,NON-AGGRAVATED,HOTEL/MOTEL,False,False,...,42.0,32,02,,,2017,02/11/2018 03:57:41 PM,,,
4,11227508,JB146365,01/01/2017 12:01:00 AM,027XX S WHIPPLE ST,1754,OFFENSE INVOLVING CHILDREN,AGG SEX ASSLT OF CHILD FAM MBR,RESIDENCE,False,False,...,12.0,30,02,,,2017,02/11/2018 03:57:41 PM,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
268871,12543953,JE448649,01/28/2017 12:00:00 AM,050XX W CARMEN AVE,1754,OFFENSE INVOLVING CHILDREN,AGGRAVATED SEXUAL ASSAULT OF CHILD BY FAMILY M...,RESIDENCE,False,True,...,45.0,12,02,,,2017,11/18/2021 03:49:04 PM,,,
268872,12544581,JE449630,01/28/2017 12:00:00 AM,050XX W CARMEN AVE,1261,DECEPTIVE PRACTICE,UNAUTHORIZED VIDEOTAPING,RESIDENCE,False,True,...,45.0,12,11,,,2017,11/19/2021 03:50:26 PM,,,
268873,10894456,JA205072,03/29/2017 04:10:00 PM,004XX E 89TH PL,0610,BURGLARY,FORCIBLE ENTRY,RESIDENCE,True,False,...,9.0,44,05,1180823.0,1845738.0,2017,12/01/2021 03:49:12 PM,41.731955,-87.613157,"(41.731954973, -87.613157252)"
268874,10901233,JA212952,04/05/2017 01:44:00 AM,008XX N PARKSIDE AVE,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,False,False,...,29.0,25,04B,1138536.0,1905085.0,2017,11/26/2021 03:45:45 PM,41.895679,-87.766640,"(41.895679352, -87.766640289)"


In [59]:
crimes_2017 = crimes_2017.rename(columns = {"Latitude":"lat","Longitude":"long","Primary Type":"primary_type"})


crimes_2017 = crimes_2017.dropna()
crimes_2017

Unnamed: 0,ID,Case Number,Date,Block,IUCR,primary_type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,lat,long,Location
302,11079431,JA421425,09/04/2017 12:00:00 AM,048XX S SEELEY AVE,0266,CRIM SEXUAL ASSAULT,PREDATORY,RESIDENCE,False,True,...,15.0,61,02,1163510.0,1872501.0,2017,06/04/2019 04:01:14 PM,41.805776,-87.675832,"(41.805776361, -87.675832432)"
309,11152146,JA515521,11/16/2017 02:00:00 PM,0000X S WACKER DR,1150,DECEPTIVE PRACTICE,CREDIT CARD FRAUD,OTHER,False,False,...,42.0,32,11,1173939.0,1900212.0,2017,06/04/2019 04:01:14 PM,41.881592,-87.636759,"(41.881592273, -87.636758638)"
310,11172884,JA542699,12/04/2017 09:00:00 AM,092XX S FOREST AVE,4387,OTHER OFFENSE,VIOLATE ORDER OF PROTECTION,RESIDENCE,False,False,...,9.0,49,26,1179874.0,1843858.0,2017,06/04/2019 04:01:14 PM,41.726818,-87.616691,"(41.726817759, -87.616691141)"
313,11181299,JA554158,12/18/2017 03:34:00 PM,071XX S MORGAN ST,0545,ASSAULT,PRO EMP HANDS NO/MIN INJURY,"SCHOOL, PUBLIC, GROUNDS",False,False,...,6.0,68,08A,1170879.0,1857445.0,2017,06/04/2019 04:01:14 PM,41.764303,-87.649245,"(41.764303087, -87.649245116)"
441,10882195,JA191744,03/18/2017 12:30:00 AM,044XX N BERNARD ST,0281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,APARTMENT,False,False,...,35.0,14,02,1152592.0,1929194.0,2017,12/18/2020 03:48:35 PM,41.961570,-87.714376,"(41.961570067, -87.714376129)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
268866,11132074,JA488994,10/28/2017 02:30:00 AM,027XX W HARRISON ST,051A,ASSAULT,AGGRAVATED - HANDGUN,STREET,False,False,...,28.0,27,04A,1158295.0,1897254.0,2017,11/09/2021 03:46:25 PM,41.873809,-87.694284,"(41.873809306, -87.694283701)"
268869,10997546,JA324778,06/13/2017 04:41:00 PM,003XX E 75TH ST,1122,DECEPTIVE PRACTICE,COUNTERFEIT CHECK,CURRENCY EXCHANGE,True,False,...,6.0,69,10,1179904.0,1855363.0,2017,11/18/2021 03:46:44 PM,41.758388,-87.616230,"(41.758388133, -87.616230052)"
268873,10894456,JA205072,03/29/2017 04:10:00 PM,004XX E 89TH PL,0610,BURGLARY,FORCIBLE ENTRY,RESIDENCE,True,False,...,9.0,44,05,1180823.0,1845738.0,2017,12/01/2021 03:49:12 PM,41.731955,-87.613157,"(41.731954973, -87.613157252)"
268874,10901233,JA212952,04/05/2017 01:44:00 AM,008XX N PARKSIDE AVE,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,False,False,...,29.0,25,04B,1138536.0,1905085.0,2017,11/26/2021 03:45:45 PM,41.895679,-87.766640,"(41.895679352, -87.766640289)"


In [62]:
#Attach geo ids
gdf_crimes_2017 = gpd.GeoDataFrame(
    crimes_2017, geometry=gpd.points_from_xy(crimes_2017.long, crimes_2017.lat), 
    crs = 'epsg:4326')

crimes_2017_acs = gpd.sjoin(gdf_crimes_2017, gdf_acs[['geo_id','geometry']], how='left' )

crimes_2017_acs

Unnamed: 0,ID,Case Number,Date,Block,IUCR,primary_type,Description,Location Description,Arrest,Domestic,...,X Coordinate,Y Coordinate,Year,Updated On,lat,long,Location,geometry,index_right,geo_id
302,11079431,JA421425,09/04/2017 12:00:00 AM,048XX S SEELEY AVE,0266,CRIM SEXUAL ASSAULT,PREDATORY,RESIDENCE,False,True,...,1163510.0,1872501.0,2017,06/04/2019 04:01:14 PM,41.805776,-87.675832,"(41.805776361, -87.675832432)",POINT (-87.67583 41.80578),555.0,1400000US17031611500
309,11152146,JA515521,11/16/2017 02:00:00 PM,0000X S WACKER DR,1150,DECEPTIVE PRACTICE,CREDIT CARD FRAUD,OTHER,False,False,...,1173939.0,1900212.0,2017,06/04/2019 04:01:14 PM,41.881592,-87.636759,"(41.881592273, -87.636758638)",POINT (-87.63676 41.88159),755.0,1400000US17031839100
310,11172884,JA542699,12/04/2017 09:00:00 AM,092XX S FOREST AVE,4387,OTHER OFFENSE,VIOLATE ORDER OF PROTECTION,RESIDENCE,False,False,...,1179874.0,1843858.0,2017,06/04/2019 04:01:14 PM,41.726818,-87.616691,"(41.726817759, -87.616691141)",POINT (-87.61669 41.72682),480.0,1400000US17031490300
313,11181299,JA554158,12/18/2017 03:34:00 PM,071XX S MORGAN ST,0545,ASSAULT,PRO EMP HANDS NO/MIN INJURY,"SCHOOL, PUBLIC, GROUNDS",False,False,...,1170879.0,1857445.0,2017,06/04/2019 04:01:14 PM,41.764303,-87.649245,"(41.764303087, -87.649245116)",POINT (-87.64925 41.76430),621.0,1400000US17031681400
441,10882195,JA191744,03/18/2017 12:30:00 AM,044XX N BERNARD ST,0281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,APARTMENT,False,False,...,1152592.0,1929194.0,2017,12/18/2020 03:48:35 PM,41.961570,-87.714376,"(41.961570067, -87.714376129)",POINT (-87.71438 41.96157),175.0,1400000US17031140702
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
268866,11132074,JA488994,10/28/2017 02:30:00 AM,027XX W HARRISON ST,051A,ASSAULT,AGGRAVATED - HANDGUN,STREET,False,False,...,1158295.0,1897254.0,2017,11/09/2021 03:46:25 PM,41.873809,-87.694284,"(41.873809306, -87.694283701)",POINT (-87.69428 41.87381),745.0,1400000US17031837400
268869,10997546,JA324778,06/13/2017 04:41:00 PM,003XX E 75TH ST,1122,DECEPTIVE PRACTICE,COUNTERFEIT CHECK,CURRENCY EXCHANGE,True,False,...,1179904.0,1855363.0,2017,11/18/2021 03:46:44 PM,41.758388,-87.616230,"(41.758388133, -87.616230052)",POINT (-87.61623 41.75839),626.0,1400000US17031691000
268873,10894456,JA205072,03/29/2017 04:10:00 PM,004XX E 89TH PL,0610,BURGLARY,FORCIBLE ENTRY,RESIDENCE,True,False,...,1180823.0,1845738.0,2017,12/01/2021 03:49:12 PM,41.731955,-87.613157,"(41.731954973, -87.613157252)",POINT (-87.61316 41.73195),463.0,1400000US17031440900
268874,10901233,JA212952,04/05/2017 01:44:00 AM,008XX N PARKSIDE AVE,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,False,False,...,1138536.0,1905085.0,2017,11/26/2021 03:45:45 PM,41.895679,-87.766640,"(41.895679352, -87.766640289)",POINT (-87.76664 41.89568),321.0,1400000US17031251300


In [63]:
#Aggregate information at a census tract level
crimes_2017_acs['count'] = 1
crimes_2017_acs = crimes_2017_acs.groupby(['geo_id','primary_type'],as_index = False)['count'].sum()

table_2017 = pd.pivot_table(crimes_2017_acs, values = 'count', index = ['geo_id'], columns = ['primary_type'], aggfunc = np.sum)
table_2017 = table_2017.fillna(0)
table_2017

primary_type,ARSON,ASSAULT,BATTERY,BURGLARY,CONCEALED CARRY LICENSE VIOLATION,CRIM SEXUAL ASSAULT,CRIMINAL DAMAGE,CRIMINAL SEXUAL ASSAULT,CRIMINAL TRESPASS,DECEPTIVE PRACTICE,...,OTHER NARCOTIC VIOLATION,OTHER OFFENSE,PROSTITUTION,PUBLIC INDECENCY,PUBLIC PEACE VIOLATION,ROBBERY,SEX OFFENSE,STALKING,THEFT,WEAPONS VIOLATION
geo_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1400000US17031010100,0.0,46.0,113.0,23.0,0.0,3.0,137.0,0.0,22.0,26.0,...,0.0,39.0,0.0,0.0,3.0,31.0,1.0,0.0,106.0,8.0
1400000US17031010201,0.0,35.0,98.0,27.0,0.0,4.0,88.0,0.0,19.0,19.0,...,0.0,32.0,0.0,0.0,3.0,9.0,3.0,0.0,129.0,5.0
1400000US17031010202,0.0,44.0,70.0,7.0,0.0,3.0,51.0,0.0,25.0,25.0,...,0.0,15.0,0.0,0.0,1.0,19.0,5.0,0.0,188.0,4.0
1400000US17031010300,1.0,23.0,84.0,29.0,0.0,7.0,49.0,0.0,16.0,18.0,...,0.0,16.0,0.0,0.0,1.0,16.0,5.0,0.0,131.0,2.0
1400000US17031010400,0.0,24.0,57.0,16.0,1.0,4.0,18.0,0.0,12.0,31.0,...,0.0,19.0,0.0,0.0,1.0,9.0,2.0,0.0,81.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1400000US17031843500,2.0,24.0,58.0,6.0,0.0,4.0,32.0,0.0,5.0,10.0,...,0.0,28.0,0.0,1.0,4.0,9.0,2.0,0.0,175.0,5.0
1400000US17031843600,0.0,43.0,85.0,17.0,0.0,1.0,50.0,0.0,7.0,28.0,...,0.0,24.0,0.0,0.0,2.0,20.0,0.0,0.0,92.0,2.0
1400000US17031843700,0.0,17.0,26.0,14.0,0.0,1.0,19.0,1.0,8.0,24.0,...,0.0,17.0,0.0,0.0,3.0,6.0,2.0,0.0,113.0,0.0
1400000US17031843800,1.0,40.0,83.0,27.0,0.0,1.0,45.0,2.0,1.0,12.0,...,0.0,27.0,0.0,0.0,3.0,16.0,0.0,1.0,77.0,5.0


### 2018

In [64]:
crimes_2018 = pd.read_csv(here("./data/raw/Crimes_2018.csv"))
crimes_2018

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,11646166,JC213529,09/01/2018 12:01:00 AM,082XX S INGLESIDE AVE,0810,THEFT,OVER $500,RESIDENCE,False,True,...,8.0,44,06,,,2018,04/06/2019 04:04:43 PM,,,
1,11645648,JC212959,01/01/2018 08:00:00 AM,024XX N MONITOR AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,False,False,...,30.0,19,11,,,2018,04/06/2019 04:04:43 PM,,,
2,11645959,JC211511,12/20/2018 04:00:00 PM,045XX N ALBANY AVE,2820,OTHER OFFENSE,TELEPHONE THREAT,RESIDENCE,False,False,...,33.0,14,08A,,,2018,04/06/2019 04:04:43 PM,,,
3,11645557,JC212685,04/01/2018 12:01:00 AM,080XX S VERNON AVE,1153,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT OVER $ 300,RESIDENCE,False,False,...,6.0,44,11,,,2018,04/06/2019 04:04:43 PM,,,
4,11646293,JC213749,12/20/2018 03:00:00 PM,023XX N LOCKWOOD AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,APARTMENT,False,False,...,36.0,19,11,,,2018,04/06/2019 04:04:43 PM,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
268490,12546142,JE451406,01/01/2018 06:00:00 PM,006XX W IRVING PARK RD,0281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,APARTMENT,False,False,...,46.0,6,02,,,2018,11/21/2021 03:48:03 PM,,,
268491,11535869,JB553909,12/15/2018 12:10:00 AM,005XX N STATE ST,0484,BATTERY,"PROTECTED EMPLOYEE - HANDS, FISTS, FEET, NO / ...",CTA PLATFORM,False,False,...,42.0,8,08B,1176278.0,1903807.0,2018,11/24/2021 03:47:03 PM,41.891405,-87.628062,"(41.891404732, -87.628061509)"
268492,11392459,JB365521,07/25/2018 11:35:00 PM,007XX N HOMAN AVE,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,False,False,...,27.0,23,04B,1153580.0,1904667.0,2018,11/29/2021 03:47:47 PM,41.894246,-87.711398,"(41.894246358, -87.711397753)"
268493,11350163,JB310208,06/17/2018 01:40:00 AM,022XX W MAYPOLE AVE,051A,ASSAULT,AGGRAVATED - HANDGUN,STREET,False,False,...,27.0,28,04A,1161219.0,1900966.0,2018,12/01/2021 03:49:12 PM,41.883935,-87.683445,"(41.883935126, -87.683444967)"


In [65]:
crimes_2018 = crimes_2018.rename(columns = {"Latitude":"lat","Longitude":"long","Primary Type":"primary_type"})

crimes_2018 = crimes_2018.dropna()
crimes_2018

Unnamed: 0,ID,Case Number,Date,Block,IUCR,primary_type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,lat,long,Location
152,11431014,JB415266,08/30/2018 11:50:00 AM,016XX W CHICAGO AVE,2820,OTHER OFFENSE,TELEPHONE THREAT,GOVERNMENT BUILDING/PROPERTY,True,False,...,1.0,24,08A,1165417.0,1905418.0,2018,06/04/2019 04:01:14 PM,41.896063,-87.667903,"(41.896063478, -87.667902742)"
154,11458569,JB451615,09/25/2018 03:45:00 PM,068XX S WASHTENAW AVE,0560,ASSAULT,SIMPLE,"SCHOOL, PUBLIC, BUILDING",False,False,...,17.0,66,08A,1159554.0,1859215.0,2018,06/04/2019 04:01:14 PM,41.769400,-87.690706,"(41.769399886, -87.690705813)"
155,11492590,JB496541,10/30/2018 04:59:00 PM,040XX W 30TH ST,041A,BATTERY,AGGRAVATED: HANDGUN,SIDEWALK,True,False,...,22.0,30,04B,1149760.0,1884400.0,2018,06/04/2019 04:01:14 PM,41.838706,-87.725954,"(41.838706452, -87.72595418)"
156,11514713,JB525811,11/23/2018 01:30:00 AM,059XX W IOWA ST,1753,OFFENSE INVOLVING CHILDREN,SEX ASSLT OF CHILD BY FAM MBR,RESIDENCE,True,False,...,29.0,25,02,1136588.0,1905390.0,2018,06/04/2019 04:01:14 PM,41.896551,-87.773788,"(41.896551396, -87.773787663)"
322,11420858,JB402182,08/21/2018 01:40:00 AM,029XX N LAKE SHORE DR,0281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,HOSPITAL BUILDING / GROUNDS,False,False,...,44.0,6,02,1173796.0,1919557.0,2018,12/18/2020 03:48:35 PM,41.934679,-87.636707,"(41.93467909, -87.636706827)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
268485,11451754,JB442687,09/19/2018 09:08:00 PM,0000X E CERMAK RD,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,3.0,33,04B,1176766.0,1889757.0,2018,11/19/2021 03:48:25 PM,41.852840,-87.626694,"(41.852839662, -87.626694117)"
268486,11302121,JB246222,05/01/2018 11:58:00 PM,022XX S MILLARD AVE,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,22.0,30,04B,1152393.0,1888659.0,2018,11/23/2021 03:46:43 PM,41.850342,-87.716180,"(41.850342174, -87.716179977)"
268491,11535869,JB553909,12/15/2018 12:10:00 AM,005XX N STATE ST,0484,BATTERY,"PROTECTED EMPLOYEE - HANDS, FISTS, FEET, NO / ...",CTA PLATFORM,False,False,...,42.0,8,08B,1176278.0,1903807.0,2018,11/24/2021 03:47:03 PM,41.891405,-87.628062,"(41.891404732, -87.628061509)"
268492,11392459,JB365521,07/25/2018 11:35:00 PM,007XX N HOMAN AVE,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,False,False,...,27.0,23,04B,1153580.0,1904667.0,2018,11/29/2021 03:47:47 PM,41.894246,-87.711398,"(41.894246358, -87.711397753)"


In [66]:
#Attach geo ids
gdf_crimes_2018 = gpd.GeoDataFrame(
    crimes_2018, geometry=gpd.points_from_xy(crimes_2018.long, crimes_2018.lat), 
    crs = 'epsg:4326')

crimes_2018_acs = gpd.sjoin(gdf_crimes_2018, gdf_acs[['geo_id','geometry']], how='left' )

crimes_2018_acs

Unnamed: 0,ID,Case Number,Date,Block,IUCR,primary_type,Description,Location Description,Arrest,Domestic,...,X Coordinate,Y Coordinate,Year,Updated On,lat,long,Location,geometry,index_right,geo_id
152,11431014,JB415266,08/30/2018 11:50:00 AM,016XX W CHICAGO AVE,2820,OTHER OFFENSE,TELEPHONE THREAT,GOVERNMENT BUILDING/PROPERTY,True,False,...,1165417.0,1905418.0,2018,06/04/2019 04:01:14 PM,41.896063,-87.667903,"(41.896063478, -87.667902742)",POINT (-87.66790 41.89606),307.0,1400000US17031243200
154,11458569,JB451615,09/25/2018 03:45:00 PM,068XX S WASHTENAW AVE,0560,ASSAULT,SIMPLE,"SCHOOL, PUBLIC, BUILDING",False,False,...,1159554.0,1859215.0,2018,06/04/2019 04:01:14 PM,41.769400,-87.690706,"(41.769399886, -87.690705813)",POINT (-87.69071 41.76940),593.0,1400000US17031660900
155,11492590,JB496541,10/30/2018 04:59:00 PM,040XX W 30TH ST,041A,BATTERY,AGGRAVATED: HANDGUN,SIDEWALK,True,False,...,1149760.0,1884400.0,2018,06/04/2019 04:01:14 PM,41.838706,-87.725954,"(41.838706452, -87.72595418)",POINT (-87.72595 41.83871),377.0,1400000US17031301803
156,11514713,JB525811,11/23/2018 01:30:00 AM,059XX W IOWA ST,1753,OFFENSE INVOLVING CHILDREN,SEX ASSLT OF CHILD BY FAM MBR,RESIDENCE,True,False,...,1136588.0,1905390.0,2018,06/04/2019 04:01:14 PM,41.896551,-87.773788,"(41.896551396, -87.773787663)",POINT (-87.77379 41.89655),321.0,1400000US17031251300
322,11420858,JB402182,08/21/2018 01:40:00 AM,029XX N LAKE SHORE DR,0281,CRIMINAL SEXUAL ASSAULT,NON-AGGRAVATED,HOSPITAL BUILDING / GROUNDS,False,False,...,1173796.0,1919557.0,2018,12/18/2020 03:48:35 PM,41.934679,-87.636707,"(41.93467909, -87.636706827)",POINT (-87.63671 41.93468),105.0,1400000US17031063303
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
268485,11451754,JB442687,09/19/2018 09:08:00 PM,0000X E CERMAK RD,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,1176766.0,1889757.0,2018,11/19/2021 03:48:25 PM,41.852840,-87.626694,"(41.852839662, -87.626694117)",POINT (-87.62669 41.85284),769.0,1400000US17031841000
268486,11302121,JB246222,05/01/2018 11:58:00 PM,022XX S MILLARD AVE,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,1152393.0,1888659.0,2018,11/23/2021 03:46:43 PM,41.850342,-87.716180,"(41.850342174, -87.716179977)",POINT (-87.71618 41.85034),366.0,1400000US17031300600
268491,11535869,JB553909,12/15/2018 12:10:00 AM,005XX N STATE ST,0484,BATTERY,"PROTECTED EMPLOYEE - HANDS, FISTS, FEET, NO / ...",CTA PLATFORM,False,False,...,1176278.0,1903807.0,2018,11/24/2021 03:47:03 PM,41.891405,-87.628062,"(41.891404732, -87.628061509)",POINT (-87.62806 41.89140),138.0,1400000US17031081500
268492,11392459,JB365521,07/25/2018 11:35:00 PM,007XX N HOMAN AVE,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,False,False,...,1153580.0,1904667.0,2018,11/29/2021 03:47:47 PM,41.894246,-87.711398,"(41.894246358, -87.711397753)",POINT (-87.71140 41.89425),739.0,1400000US17031836700


In [67]:
#Aggregate information at a census tract level
crimes_2018_acs['count'] = 1
crimes_2018_acs = crimes_2018_acs.groupby(['geo_id','primary_type'],as_index = False)['count'].sum()

table_2018 = pd.pivot_table(crimes_2018_acs, values = 'count', index = ['geo_id'], columns = ['primary_type'], aggfunc = np.sum)
table_2018 = table_2018.fillna(0)
table_2018

primary_type,ARSON,ASSAULT,BATTERY,BURGLARY,CONCEALED CARRY LICENSE VIOLATION,CRIM SEXUAL ASSAULT,CRIMINAL DAMAGE,CRIMINAL SEXUAL ASSAULT,CRIMINAL TRESPASS,DECEPTIVE PRACTICE,...,OTHER NARCOTIC VIOLATION,OTHER OFFENSE,PROSTITUTION,PUBLIC INDECENCY,PUBLIC PEACE VIOLATION,ROBBERY,SEX OFFENSE,STALKING,THEFT,WEAPONS VIOLATION
geo_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1400000US17031010100,2.0,47.0,130.0,20.0,0.0,4.0,182.0,0.0,12.0,13.0,...,0.0,40.0,0.0,0.0,1.0,24.0,1.0,0.0,80.0,14.0
1400000US17031010201,0.0,33.0,89.0,26.0,0.0,2.0,55.0,0.0,17.0,22.0,...,0.0,32.0,0.0,0.0,3.0,13.0,4.0,0.0,94.0,1.0
1400000US17031010202,1.0,49.0,78.0,7.0,0.0,1.0,35.0,0.0,42.0,18.0,...,0.0,14.0,0.0,0.0,2.0,21.0,4.0,1.0,190.0,1.0
1400000US17031010300,0.0,24.0,72.0,20.0,0.0,5.0,26.0,0.0,10.0,26.0,...,0.0,18.0,0.0,0.0,1.0,17.0,6.0,2.0,91.0,1.0
1400000US17031010400,0.0,21.0,56.0,14.0,0.0,2.0,24.0,0.0,15.0,24.0,...,0.0,19.0,0.0,0.0,0.0,12.0,1.0,0.0,76.0,3.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1400000US17031843500,0.0,28.0,70.0,7.0,0.0,0.0,14.0,0.0,7.0,11.0,...,0.0,20.0,0.0,0.0,1.0,7.0,2.0,0.0,64.0,6.0
1400000US17031843600,0.0,24.0,77.0,24.0,0.0,2.0,43.0,0.0,14.0,34.0,...,0.0,30.0,0.0,1.0,2.0,17.0,3.0,0.0,91.0,5.0
1400000US17031843700,0.0,17.0,32.0,10.0,0.0,1.0,15.0,0.0,6.0,17.0,...,0.0,7.0,0.0,0.0,2.0,3.0,3.0,1.0,65.0,2.0
1400000US17031843800,2.0,42.0,71.0,20.0,0.0,1.0,40.0,0.0,4.0,14.0,...,0.0,21.0,0.0,0.0,2.0,13.0,1.0,0.0,56.0,4.0


### 2019

In [68]:
crimes_2019 = pd.read_csv(here("./data/raw/Crimes_2019.csv"))
crimes_2019

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,11864018,JC476123,09/24/2019 08:00:00 AM,022XX S MICHIGAN AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,COMMERCIAL / BUSINESS OFFICE,False,False,...,3.0,33,11,1177560.0,1889548.0,2019,10/20/2019 03:56:02 PM,41.852248,-87.623786,"(41.852248185, -87.623786256)"
1,11859805,JC471592,10/13/2019 08:30:00 PM,024XX W CHICAGO AVE,0860,THEFT,RETAIL THEFT,GROCERY FOOD STORE,False,False,...,26.0,24,06,1160005.0,1905256.0,2019,10/20/2019 04:03:03 PM,41.895732,-87.687784,"(41.895732399, -87.687784384)"
2,11863808,JC476236,10/05/2019 06:30:00 PM,0000X N LOOMIS ST,0810,THEFT,OVER $500,RESIDENCE,False,False,...,27.0,28,06,1166986.0,1900306.0,2019,10/20/2019 03:56:02 PM,41.882002,-87.662287,"(41.88200224, -87.662286977)"
3,11859727,JC471542,10/13/2019 07:00:00 PM,016XX W ADDISON ST,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,47.0,6,14,1164930.0,1923972.0,2019,10/20/2019 04:03:03 PM,41.946987,-87.669164,"(41.946987144, -87.669163602)"
4,11859656,JC471240,10/13/2019 02:10:00 PM,051XX N BROADWAY,0560,ASSAULT,SIMPLE,GAS STATION,False,False,...,47.0,3,08A,1167380.0,1934505.0,2019,10/20/2019 04:03:03 PM,41.975838,-87.659854,"(41.975837637, -87.659853835)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260860,12553419,JE460133,09/01/2019 12:00:00 AM,046XX S WOLCOTT AVE,5001,OTHER OFFENSE,OTHER CRIME INVOLVING PROPERTY,SCHOOL - PUBLIC BUILDING,False,False,...,15.0,61,26,,,2019,11/30/2021 03:46:04 PM,,,
260861,12553625,JE460327,08/01/2019 08:00:00 AM,090XX S EXCHANGE AVE,0810,THEFT,OVER $500,SCHOOL - PUBLIC BUILDING,False,False,...,10.0,46,06,,,2019,11/30/2021 03:46:04 PM,,,
260862,12554652,JE461521,05/30/2019 12:00:00 AM,018XX N MOBILE AVE,0810,THEFT,OVER $500,APARTMENT,False,False,...,29.0,25,06,,,2019,12/01/2021 03:51:55 PM,,,
260863,12554335,JE461367,07/01/2019 12:00:00 AM,077XX S CHAPPEL AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,RESIDENCE,False,False,...,8.0,43,11,,,2019,12/01/2021 03:51:55 PM,,,


In [69]:
crimes_2019 = crimes_2019.rename(columns = {"Latitude":"lat","Longitude":"long","Primary Type":"primary_type"})

crimes_2019 = crimes_2019.dropna()
crimes_2019

Unnamed: 0,ID,Case Number,Date,Block,IUCR,primary_type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,lat,long,Location
0,11864018,JC476123,09/24/2019 08:00:00 AM,022XX S MICHIGAN AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,COMMERCIAL / BUSINESS OFFICE,False,False,...,3.0,33,11,1177560.0,1889548.0,2019,10/20/2019 03:56:02 PM,41.852248,-87.623786,"(41.852248185, -87.623786256)"
1,11859805,JC471592,10/13/2019 08:30:00 PM,024XX W CHICAGO AVE,0860,THEFT,RETAIL THEFT,GROCERY FOOD STORE,False,False,...,26.0,24,06,1160005.0,1905256.0,2019,10/20/2019 04:03:03 PM,41.895732,-87.687784,"(41.895732399, -87.687784384)"
2,11863808,JC476236,10/05/2019 06:30:00 PM,0000X N LOOMIS ST,0810,THEFT,OVER $500,RESIDENCE,False,False,...,27.0,28,06,1166986.0,1900306.0,2019,10/20/2019 03:56:02 PM,41.882002,-87.662287,"(41.88200224, -87.662286977)"
3,11859727,JC471542,10/13/2019 07:00:00 PM,016XX W ADDISON ST,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,47.0,6,14,1164930.0,1923972.0,2019,10/20/2019 04:03:03 PM,41.946987,-87.669164,"(41.946987144, -87.669163602)"
4,11859656,JC471240,10/13/2019 02:10:00 PM,051XX N BROADWAY,0560,ASSAULT,SIMPLE,GAS STATION,False,False,...,47.0,3,08A,1167380.0,1934505.0,2019,10/20/2019 04:03:03 PM,41.975838,-87.659854,"(41.975837637, -87.659853835)"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260847,11914503,JC538094,12/07/2019 12:49:00 PM,016XX W 49TH ST,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,True,False,...,20.0,61,04B,1166056.0,1872220.0,2019,11/16/2021 03:51:23 PM,41.804951,-87.666503,"(41.804951438, -87.666502657)"
260851,11896585,JC516773,11/19/2019 07:20:00 AM,025XX S PULASKI RD,1792,KIDNAPPING,CHILD ABDUCTION / STRANGER,CTA BUS STOP,False,False,...,22.0,30,26,1150120.0,1886605.0,2019,11/19/2021 03:48:25 PM,41.844750,-87.724576,"(41.844750263, -87.724575794)"
260853,11564233,JC113610,01/11/2019 11:19:00 PM,002XX W 43RD ST,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,3.0,37,04B,1175483.0,1876470.0,2019,11/21/2021 03:45:41 PM,41.816408,-87.631801,"(41.816407929, -87.631801368)"
260854,11604172,JC162487,02/22/2019 09:35:00 PM,033XX S ASHLAND AVE,0486,BATTERY,DOMESTIC BATTERY SIMPLE,"VEHICLE - OTHER RIDE SHARE SERVICE (LYFT, UBER...",True,True,...,12.0,59,08B,1166218.0,1882395.0,2019,11/23/2021 03:46:43 PM,41.832869,-87.665619,"(41.832869291, -87.66561852)"


In [70]:
#Attach geo ids
gdf_crimes_2019 = gpd.GeoDataFrame(
    crimes_2019, geometry=gpd.points_from_xy(crimes_2019.long, crimes_2019.lat), 
    crs = 'epsg:4326')

crimes_2019_acs = gpd.sjoin(gdf_crimes_2019, gdf_acs[['geo_id','geometry']], how='left' )

crimes_2019_acs

Unnamed: 0,ID,Case Number,Date,Block,IUCR,primary_type,Description,Location Description,Arrest,Domestic,...,X Coordinate,Y Coordinate,Year,Updated On,lat,long,Location,geometry,index_right,geo_id
0,11864018,JC476123,09/24/2019 08:00:00 AM,022XX S MICHIGAN AVE,1154,DECEPTIVE PRACTICE,FINANCIAL IDENTITY THEFT $300 AND UNDER,COMMERCIAL / BUSINESS OFFICE,False,False,...,1177560.0,1889548.0,2019,10/20/2019 03:56:02 PM,41.852248,-87.623786,"(41.852248185, -87.623786256)",POINT (-87.62379 41.85225),769.0,1400000US17031841000
1,11859805,JC471592,10/13/2019 08:30:00 PM,024XX W CHICAGO AVE,0860,THEFT,RETAIL THEFT,GROCERY FOOD STORE,False,False,...,1160005.0,1905256.0,2019,10/20/2019 04:03:03 PM,41.895732,-87.687784,"(41.895732399, -87.687784384)",POINT (-87.68778 41.89573),303.0,1400000US17031242800
2,11863808,JC476236,10/05/2019 06:30:00 PM,0000X N LOOMIS ST,0810,THEFT,OVER $500,RESIDENCE,False,False,...,1166986.0,1900306.0,2019,10/20/2019 03:56:02 PM,41.882002,-87.662287,"(41.88200224, -87.662286977)",POINT (-87.66229 41.88200),711.0,1400000US17031833000
3,11859727,JC471542,10/13/2019 07:00:00 PM,016XX W ADDISON ST,1320,CRIMINAL DAMAGE,TO VEHICLE,STREET,False,False,...,1164930.0,1923972.0,2019,10/20/2019 04:03:03 PM,41.946987,-87.669164,"(41.946987144, -87.669163602)",POINT (-87.66916 41.94699),78.0,1400000US17031060300
4,11859656,JC471240,10/13/2019 02:10:00 PM,051XX N BROADWAY,0560,ASSAULT,SIMPLE,GAS STATION,False,False,...,1167380.0,1934505.0,2019,10/20/2019 04:03:03 PM,41.975838,-87.659854,"(41.975837637, -87.659853835)",POINT (-87.65985 41.97584),43.0,1400000US17031031100
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
260847,11914503,JC538094,12/07/2019 12:49:00 PM,016XX W 49TH ST,041A,BATTERY,AGGRAVATED - HANDGUN,SIDEWALK,True,False,...,1166056.0,1872220.0,2019,11/16/2021 03:51:23 PM,41.804951,-87.666503,"(41.804951438, -87.666502657)",POINT (-87.66650 41.80495),553.0,1400000US17031611300
260851,11896585,JC516773,11/19/2019 07:20:00 AM,025XX S PULASKI RD,1792,KIDNAPPING,CHILD ABDUCTION / STRANGER,CTA BUS STOP,False,False,...,1150120.0,1886605.0,2019,11/19/2021 03:48:25 PM,41.844750,-87.724576,"(41.844750263, -87.724575794)",POINT (-87.72458 41.84475),376.0,1400000US17031301802
260853,11564233,JC113610,01/11/2019 11:19:00 PM,002XX W 43RD ST,041A,BATTERY,AGGRAVATED - HANDGUN,STREET,False,False,...,1175483.0,1876470.0,2019,11/21/2021 03:45:41 PM,41.816408,-87.631801,"(41.816407929, -87.631801368)",POINT (-87.63180 41.81641),727.0,1400000US17031835500
260854,11604172,JC162487,02/22/2019 09:35:00 PM,033XX S ASHLAND AVE,0486,BATTERY,DOMESTIC BATTERY SIMPLE,"VEHICLE - OTHER RIDE SHARE SERVICE (LYFT, UBER...",True,True,...,1166218.0,1882395.0,2019,11/23/2021 03:46:43 PM,41.832869,-87.665619,"(41.832869291, -87.66561852)",POINT (-87.66562 41.83287),766.0,1400000US17031840400


In [71]:
#Aggregate information at a census tract level
crimes_2019_acs['count'] = 1
crimes_2019_acs = crimes_2019_acs.groupby(['geo_id','primary_type'],as_index = False)['count'].sum()

table_2019 = pd.pivot_table(crimes_2019_acs, values = 'count', index = ['geo_id'], columns = ['primary_type'], aggfunc = np.sum)
table_2019 = table_2019.fillna(0)
table_2019

primary_type,ARSON,ASSAULT,BATTERY,BURGLARY,CONCEALED CARRY LICENSE VIOLATION,CRIM SEXUAL ASSAULT,CRIMINAL DAMAGE,CRIMINAL SEXUAL ASSAULT,CRIMINAL TRESPASS,DECEPTIVE PRACTICE,...,OTHER NARCOTIC VIOLATION,OTHER OFFENSE,PROSTITUTION,PUBLIC INDECENCY,PUBLIC PEACE VIOLATION,ROBBERY,SEX OFFENSE,STALKING,THEFT,WEAPONS VIOLATION
geo_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1400000US17031010100,1.0,37.0,138.0,29.0,0.0,5.0,131.0,1.0,8.0,21.0,...,0.0,35.0,0.0,0.0,3.0,7.0,3.0,0.0,117.0,6.0
1400000US17031010201,0.0,43.0,79.0,25.0,0.0,1.0,52.0,0.0,14.0,21.0,...,0.0,19.0,0.0,0.0,0.0,19.0,1.0,0.0,90.0,6.0
1400000US17031010202,0.0,32.0,99.0,12.0,0.0,2.0,34.0,1.0,24.0,19.0,...,0.0,17.0,0.0,0.0,1.0,18.0,2.0,1.0,165.0,1.0
1400000US17031010300,0.0,30.0,73.0,23.0,0.0,3.0,48.0,4.0,8.0,30.0,...,0.0,37.0,0.0,0.0,0.0,13.0,4.0,2.0,131.0,2.0
1400000US17031010400,0.0,13.0,34.0,3.0,0.0,0.0,14.0,2.0,11.0,40.0,...,0.0,16.0,0.0,0.0,1.0,7.0,0.0,1.0,80.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1400000US17031843500,2.0,23.0,91.0,6.0,0.0,0.0,26.0,0.0,7.0,12.0,...,0.0,16.0,0.0,0.0,1.0,5.0,1.0,0.0,72.0,7.0
1400000US17031843600,0.0,34.0,73.0,23.0,0.0,2.0,30.0,1.0,8.0,23.0,...,0.0,35.0,0.0,0.0,0.0,14.0,3.0,0.0,103.0,4.0
1400000US17031843700,0.0,16.0,25.0,4.0,0.0,1.0,32.0,1.0,4.0,19.0,...,0.0,17.0,1.0,0.0,6.0,2.0,1.0,1.0,56.0,1.0
1400000US17031843800,0.0,21.0,61.0,12.0,0.0,0.0,22.0,2.0,5.0,9.0,...,0.0,26.0,0.0,0.0,1.0,4.0,2.0,0.0,60.0,8.0


### Creating one total crimes dataset

In [74]:
crimes_total = table_2015.add(table_2016)
crimes_total = crimes_total.add(table_2017)
crimes_total = crimes_total.add(table_2018)
crimes_total = crimes_total.add(table_2019)
crimes_total

primary_type,ARSON,ASSAULT,BATTERY,BURGLARY,CONCEALED CARRY LICENSE VIOLATION,CRIM SEXUAL ASSAULT,CRIMINAL DAMAGE,CRIMINAL SEXUAL ASSAULT,CRIMINAL TRESPASS,DECEPTIVE PRACTICE,...,OTHER NARCOTIC VIOLATION,OTHER OFFENSE,PROSTITUTION,PUBLIC INDECENCY,PUBLIC PEACE VIOLATION,ROBBERY,SEX OFFENSE,STALKING,THEFT,WEAPONS VIOLATION
geo_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
1400000US17031010100,3.0,204.0,643.0,120.0,0.0,17.0,604.0,3.0,82.0,112.0,...,0.0,198.0,0.0,0.0,18.0,84.0,8.0,3.0,467.0,43.0
1400000US17031010201,2.0,189.0,459.0,115.0,0.0,11.0,312.0,0.0,70.0,96.0,...,0.0,138.0,0.0,0.0,8.0,70.0,15.0,0.0,431.0,18.0
1400000US17031010202,1.0,178.0,386.0,50.0,0.0,12.0,189.0,2.0,145.0,104.0,...,0.0,76.0,0.0,0.0,6.0,94.0,15.0,2.0,786.0,14.0
1400000US17031010300,4.0,118.0,388.0,112.0,0.0,25.0,191.0,4.0,52.0,116.0,...,0.0,129.0,0.0,1.0,3.0,73.0,20.0,4.0,534.0,6.0
1400000US17031010400,0.0,86.0,261.0,66.0,1.0,12.0,111.0,2.0,59.0,139.0,...,0.0,97.0,0.0,0.0,5.0,40.0,10.0,2.0,390.0,10.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
1400000US17031843500,7.0,119.0,334.0,36.0,0.0,7.0,135.0,0.0,23.0,55.0,...,0.0,133.0,0.0,1.0,12.0,33.0,8.0,0.0,537.0,22.0
1400000US17031843600,0.0,151.0,369.0,104.0,0.0,8.0,177.0,1.0,46.0,132.0,...,0.0,132.0,0.0,1.0,5.0,91.0,8.0,0.0,412.0,14.0
1400000US17031843700,0.0,79.0,146.0,46.0,0.0,3.0,113.0,2.0,35.0,91.0,...,0.0,76.0,1.0,0.0,12.0,29.0,9.0,2.0,416.0,4.0
1400000US17031843800,6.0,166.0,379.0,93.0,0.0,4.0,198.0,4.0,22.0,66.0,...,0.0,131.0,1.0,0.0,14.0,67.0,4.0,3.0,293.0,35.0


## Merging all datasets

In [75]:
#Use the acs dataset with just geo_ids as the starting point
additional_predictors_final = pd.DataFrame(acs['geo_id'])

#Police stations
additional_predictors_final = additional_predictors_final.merge(ps_acs, how = 'left', on = 'geo_id')
additional_predictors_final['police_stations'] = additional_predictors_final['police_stations'].fillna(0)

#Fire stations
additional_predictors_final = additional_predictors_final.merge(fs_acs, how = 'left', on = 'geo_id')
additional_predictors_final['fire_stations'] = additional_predictors_final['fire_stations'].fillna(0)

#Schools
additional_predictors_final = additional_predictors_final.merge(schools_acs, how = 'left', on = 'geo_id')
additional_predictors_final['public_schools'] = additional_predictors_final['public_schools'].fillna(0)

#Parks
additional_predictors_final = additional_predictors_final.merge(parks_acs, how = 'left', on = 'geo_id')
additional_predictors_final['parks'] = additional_predictors_final['parks'].fillna(0)

#Commercial establishments
additional_predictors_final = additional_predictors_final.merge(comm_est_acs, how = 'left', on = 'geo_id')
additional_predictors_final['commercial_establishments'] = additional_predictors_final['commercial_establishments'].fillna(0)

#Police killings
additional_predictors_final = additional_predictors_final.merge(police_deaths_acs, how = 'left', on = 'geo_id')
additional_predictors_final['number_of_police_killings'] = additional_predictors_final['number_of_police_killings'].fillna(0)

#total_crimes
additional_predictors_final = additional_predictors_final.merge(crimes_total, how = 'left', on = 'geo_id')

#Display and export final dataset
additional_predictors_final.to_csv(here('./data/CleanOpenData.csv'), index = False)
additional_predictors_final

Unnamed: 0,geo_id,police_stations,fire_stations,public_schools,parks,commercial_establishments,number_of_police_killings,ARSON,ASSAULT,BATTERY,...,OTHER NARCOTIC VIOLATION,OTHER OFFENSE,PROSTITUTION,PUBLIC INDECENCY,PUBLIC PEACE VIOLATION,ROBBERY,SEX OFFENSE,STALKING,THEFT,WEAPONS VIOLATION
0,1400000US17031010100,1,1,14,14,24.0,0.0,3.0,204.0,643.0,...,0.0,198.0,0.0,0.0,18.0,84.0,8.0,3.0,467.0,43.0
1,1400000US17031010201,1,3,17,18,30.0,0.0,2.0,189.0,459.0,...,0.0,138.0,0.0,0.0,8.0,70.0,15.0,0.0,431.0,18.0
2,1400000US17031010202,1,3,16,18,58.0,0.0,1.0,178.0,386.0,...,0.0,76.0,0.0,0.0,6.0,94.0,15.0,2.0,786.0,14.0
3,1400000US17031010300,1,2,15,17,71.0,0.0,4.0,118.0,388.0,...,0.0,129.0,0.0,1.0,3.0,73.0,20.0,4.0,534.0,6.0
4,1400000US17031010400,1,2,17,12,41.0,0.0,0.0,86.0,261.0,...,0.0,97.0,0.0,0.0,5.0,40.0,10.0,2.0,390.0,10.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
793,1400000US17031843500,1,5,57,4,121.0,0.0,7.0,119.0,334.0,...,0.0,133.0,0.0,1.0,12.0,33.0,8.0,0.0,537.0,22.0
794,1400000US17031843600,2,6,44,20,42.0,0.0,0.0,151.0,369.0,...,0.0,132.0,0.0,1.0,5.0,91.0,8.0,0.0,412.0,14.0
795,1400000US17031843700,1,6,41,13,117.0,1.0,0.0,79.0,146.0,...,0.0,76.0,1.0,0.0,12.0,29.0,9.0,2.0,416.0,4.0
796,1400000US17031843800,2,6,46,11,32.0,0.0,6.0,166.0,379.0,...,0.0,131.0,1.0,0.0,14.0,67.0,4.0,3.0,293.0,35.0
