<h2>
	2021 Crimes in Chicago
</h2>

<p>
	This data reflects reported incidents of crime that have occurred in the City of Chicago during a specific time period. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.
</p>
<p>
	The original data contained the following columns.
	<ul>
		<li>DATE OF OCCURRENCE</li> 
		<li>CASE#</li> 	
		<li>BLOCK</li> 	
		<li>IUCR </li>	
		<li>PRIMARY DESCRIPTION</li> 	
		<li>SECONDARY DESCRIPTION</li> 	
		<li>LOCATION DESCRIPTION</li> 	
		<li>ARREST</li> 	
		<li>DOMESTIC</li> 	
		<li>BEAT</li> 	
		<li>WARD</li> 	
		<li>COMMUNITY AREA</li> 	
		<li>FBI CD</li> 	
		<li>X COORDINATE</li> 	
		<li>Y COORDINATE</li> 	
		<li>LATITUDE</li> 	
		<li>LONGITUDE</li>	
		<li>LOCATION</li>
	</ul>
</p>
<p>
	This data can be downloaded at <a href="https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5g" target="_blank">https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5gcrime.html</a>
</p>
<p>
	The city of Chicago is divided into 77 community areas for statistical and planning purposes. Census data and other statistics are tied to the areas, which serve as the basis for a variety of urban planning initiatives on both the local and regional levels. The areas' boundaries do not generally change, allowing comparisons of statistics across time.
</p>
<p>
	I have taken the community area data and have made a new dataframe to merge into the crimes dataframe.  This will allow me to identify the neighborhood by name.  This information
	can be found at <a href="https://en.wikipedia.org/wiki/Community_areas_in_Chicago" target="_blank">https://en.wikipedia.org/wiki/Community_areas_in_Chicago</a>
</p>
<p>
	I also scraped the weather for O'Hare International Airport for 2021 to see how much of an effect weather has on the crime rate.  <a href="http://www.wx-now.com/Weather/WxHistory" target="_blank">http://www.wx-now.com/Weather/WxHistory</a>
</p>

In [166]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

After collecting the datasets, I removed, renamed and reorganized some of the columns from the crimes dataset using excel.

In [167]:
# Import Chicago crimes dataset
crimes = pd.read_csv('./csv/chicago_crimes_2021.csv')
# Import Chicago weather dataset
weather = pd.read_csv('./csv/chicago_temps_2021.csv')
# Import community area dataset
community = pd.read_csv('./csv/chicago_areas.csv')

We can see a sample of what the current Crimes dataset contains.

In [168]:
crimes.sample(3)

Unnamed: 0,crime_date,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude
139123,2021-9-13 21:05:00,state st,criminal damage,to city of chicago property,street,False,False,32,41.872894,-87.62756
8293,2021-1-18 3:21:00,lawndale ave,battery,domestic battery simple,apartment,False,True,23,41.898181,-87.718842
161739,2021-10-18 14:45:00,shore dr,robbery,vehicular hijacking,street,False,False,41,41.797697,-87.581799


Let's split the timestamp into seperate columns of 'date' and 'time'.  Then drop the original 'timestamp' and reorder the columns.

In [169]:
# Create new date and time column by extracting from timestamp
crimes['date'] = pd.to_datetime(crimes['crime_date']).dt.date
crimes['time'] = pd.to_datetime(crimes['crime_date']).dt.time
# Drop the original timestamp
crimes = crimes.drop(['crime_date'], axis=1)

In [170]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time
92190,california blvd,battery,domestic battery simple,street,False,True,29,41.862295,-87.695786,2021-06-27,01:00:00
129083,wabash ave,criminal sexual assault,non-aggravated,hotel / motel,False,False,32,41.885908,-87.626289,2021-08-27,00:00:00
33356,austin ave,criminal damage,to property,residence,False,False,19,41.924415,-87.775734,2021-03-12,16:50:00


The <b>Arrest</b> and the <b>Domestic</b> column are represented in boolean values (True/False).  We will replace these values with Yes/No for readability.

In [171]:
crimes['arrest']=crimes['arrest'].replace([True, False], ['Yes', 'No'])
crimes['domestic']=crimes['arrest'].replace([True, False], ['Yes', 'No'])

In [172]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time
168924,state st,robbery,armed - handgun,street,No,No,69,41.764707,-87.624891,2021-11-01,20:45:00
697,emerald ave,motor vehicle theft,automobile,street,No,No,71,41.73377,-87.642337,2021-01-01,02:50:00
91984,central park ave,battery,simple,apartment,No,No,27,41.87506,-87.715704,2021-06-27,12:35:00


Chicago neighborhoods are specifically named and have their own <b>community area id</b>.  <br>
Below is a dataframe with the community id's,  community names, population, area (sq. miles) and density (population / area)

In [173]:
community.sample(3)

Unnamed: 0,community_area_id,name,population,area_sq_mi,density
17,18,montclare,14401,0.99,14546.46
54,55,hegewisch,10027,5.24,1913.55
34,35,douglas,20291,1.65,12297.58


I will merge the two dataframes similiar to an SQL inner join using the '<b>community_area_id</b>' column.

In [174]:
merged = pd.merge(crimes, community, on='community_area_id')
crimes = merged

In [175]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time,name,population,area_sq_mi,density
149389,oakdale ave,battery,aggravated - handgun,street,Yes,Yes,19,41.934097,-87.759707,2021-01-16,19:50:00,belmont cragin,78116,3.91,19978.52
56036,douglas blvd,theft,$500 and under,street,No,No,29,41.862953,-87.70618,2021-01-28,12:00:00,north lawndale,34794,3.21,10839.25
84869,polk st,theft,$500 and under,apartment,No,No,26,41.870766,-87.721458,2021-02-03,16:35:00,west garfield park,17433,1.28,13619.53


With the two dataframes successfully merged, I will rename the <b>name</b> to <b>community_name</b>.  I will also drop the <b>community_area_id</b> as it is no longer needed.

In [176]:
crimes.rename(columns={'name':'community_name'})
crimes = crimes.drop(['community_area_id'], axis=1)

In [177]:
crimes.sample()

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,latitude,longitude,date,time,name,population,area_sq_mi,density
93687,prairie ave,assault,aggravated - handgun,apartment,No,No,41.821461,-87.619926,2021-08-19,02:32:00,grand boulevard,24589,1.74,14131.61


Below is a dataframe with the day of the week, temperature high's, low's and if there was any precipitation that day.

In [178]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
104,sat,2021-9-18 0:00:00,77,66,
258,sat,2021-4-17 0:00:00,56,35,
186,mon,2021-6-28 0:00:00,82,66,0.17


Split the date from the timestamp and leave <b>yyyy</b>-<b>mm</b>-<b>dd</b>

In [179]:
weather['date'] = pd.to_datetime(weather['date']).dt.date

In [180]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
159,sun,2021-07-25,90,72,
358,thu,2021-01-07,38,33,
109,mon,2021-09-13,88,68,


Replace the abbreviated days of the week with the full name of the day.

In [181]:
weather['day_of_week'] = weather['day_of_week'].str.replace('mon', 'Monday')
weather['day_of_week'] = weather['day_of_week'].str.replace('tue', 'Tuesday')
weather['day_of_week'] = weather['day_of_week'].str.replace('wed', 'Wednesday')
weather['day_of_week'] = weather['day_of_week'].str.replace('thu', 'Thursday')
weather['day_of_week'] = weather['day_of_week'].str.replace('fri', 'Friday')
weather['day_of_week'] = weather['day_of_week'].str.replace('sat', 'Saturday')
weather['day_of_week'] = weather['day_of_week'].str.replace('sun', 'Sunday')

In [182]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
149,Wednesday,2021-08-04,85,64,
54,Sunday,2021-11-07,65,41,
122,Tuesday,2021-08-31,82,66,


Replace '<b>NaN</b>' with '<b>0.0</b>' if there wasn't any precipitation that day.

In [183]:
weather['precipitation_in']=weather['precipitation_in'].fillna(0.0)
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
78,Thursday,2021-10-14,71,57,0.1
112,Friday,2021-09-10,80,59,0.0
217,Friday,2021-05-28,49,44,0.29


Now we can merge the <b>weather</b> dataframe with the <b>crime</b> dataframe by the <b>date</b> column

In [184]:
merged = pd.merge(crimes, weather, on='date')
crimes = merged

In [185]:
crimes.sample()

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,latitude,longitude,date,time,name,population,area_sq_mi,density,day_of_week,high_temp_f,low_temp_f,precipitation_in
38252,parkside ave,other offense,other vehicle offense,street,No,No,41.899215,-87.766763,2021-03-17,14:30:00,austin,96557,7.15,13504.48,Wednesday,39,34,0.4


Reorder the columns

In [186]:
crimes = crimes[['date', 'time',  'day_of_week', 'high_temp_f', 'low_temp_f', 'precipitation_in', 'crime_primary_type', 'crime_primary_description', 'crime_location_description', 'arrest', 'domestic', 'city_block', 'name', 'population', 'area_sq_mi', 'density', 'latitude', 'longitude']]

In [187]:
crimes.sample()

Unnamed: 0,date,time,day_of_week,high_temp_f,low_temp_f,precipitation_in,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,city_block,name,population,area_sq_mi,density,latitude,longitude
41111,2021-03-22,08:00:00,Monday,68,46,0.0,deceptive practice,financial identity theft over $ 300,residence,No,No,hoyne ave,logan square,71665,3.59,19962.4,41.920915,-87.680207


Rename the columns

In [188]:
crimes.rename(columns = 
 	{'date' : 'Date',
 	'time' : 'Time',
 	'day_of_week' : 'Weekday',
 	'high_temp_f' : 'Hi (f)',
 	'low_temp_f' : 'Lo (f)',
 	'precipitation_in' : 'Precipitation',
 	'crime_primary_type' : 'Type',
 	'crime_primary_description' : 'Description',
 	'crime_location_description' : 'Location',
 	'arrest' : 'Arrest',
 	'domestic' : 'Domestic',
 	'city_block' : 'Street',
 	'name' : 'Community',
 	'population' : 'Population',
 	'area_sq_mi' : 'Area',
 	'density' : 'Density',
 	'latitude' : 'Latitude',
 	'longitude' : 'Longitude'},
	inplace=True)

In [189]:
crimes.sample()

Unnamed: 0,Date,Time,Weekday,Hi (f),Lo (f),Precipitation,Type,Description,Location,Arrest,Domestic,Street,Community,Population,Area,Density,Latitude,Longitude
7871,2021-01-14,23:00:00,Thursday,40,32,0.14,theft,$500 and under,street,No,No,richmond st,west ridge,77122,3.53,21847.59,42.000165,-87.703381


For better readability, I will capitilize the first letter of every word in the columns using the title() function.

In [190]:
# Capitilize the first letter of every word
crimes['Street'] = crimes['Street'].str.title()
crimes['Type'] = crimes['Type'].str.title()
crimes['Description'] = crimes['Description'].str.title()
crimes['Description'] = crimes['Description'].str.title()
crimes['Community'] = crimes['Community'].str.title()
crimes['Weekday'] = crimes['Weekday'].str.title()

In [191]:
crimes.sample(3)

Unnamed: 0,Date,Time,Weekday,Hi (f),Lo (f),Precipitation,Type,Description,Location,Arrest,Domestic,Street,Community,Population,Area,Density,Latitude,Longitude
146889,2021-09-21,22:00:00,Tuesday,75,60,0.42,Other Offense,Other Crime Involving Property,apartment,No,No,Wentworth Ave,Greater Grand Crossing,31471,3.55,8865.07,41.766428,-87.629795
130721,2021-08-21,18:50:00,Saturday,88,73,0.6,Battery,Domestic Battery Simple,apartment,Yes,Yes,Clarendon Ave,Uptown,57182,2.32,24647.41,41.964918,-87.650001
91697,2021-06-19,01:00:00,Saturday,90,66,0.0,Theft,$500 And Under,street,No,No,Crystal St,West Town,87781,4.58,19166.16,41.904144,-87.67586


Top 10 Neighborhoods with the Highest Reported Crime Rate in 2021

In [192]:
crimes[['Type', 'Community']].groupby(['Community'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)

Unnamed: 0,Community,Reported Crimes
5,Austin,11341
47,Near North Side,8126
65,South Shore,7272
49,Near West Side,6743
52,North Lawndale,6161
4,Auburn Gresham,5873
32,Humboldt Park,5767
29,Greater Grand Crossing,5545
75,West Town,5486
41,Loop,5446


In [193]:
df = crimes[['Type', 'Community']].groupby(['Community'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)
fig = px.bar(df, x = 'Community', y = 'Reported Crimes')
fig.update_traces(marker_color='slategrey')
fig.show()

Top 10 Reported Crimes in 2021

In [194]:
crimes[['Type', 'Community']].groupby(['Type'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)

Unnamed: 0,Type,Reported Crimes
2,Battery,39988
29,Theft,39758
5,Criminal Damage,24716
1,Assault,20086
8,Deceptive Practice,15710
22,Other Offense,13588
16,Motor Vehicle Theft,10410
30,Weapons Violation,8865
26,Robbery,7813
3,Burglary,6546


In [195]:
df = crimes[['Type', 'Community']].groupby(['Type'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)
fig = px.bar(df, x = 'Type', y = 'Reported Crimes')
fig.update_traces(marker_color='teal')
fig.show()

Top 10 Communities with the Highest Homicide Rate

In [196]:
murder_df = crimes[crimes['Type'] == 'Homicide']
murder_df[['Community']].groupby(['Community'])['Community'].count().reset_index(name='Homicides').sort_values('Homicides', ascending=False).head(10)

Unnamed: 0,Community,Homicides
5,Austin,70
43,North Lawndale,45
54,South Shore,39
24,Greater Grand Crossing,39
4,Auburn Gresham,39
60,West Garfield Park,38
62,West Pullman,37
27,Humboldt Park,34
19,Englewood,33
16,East Garfield Park,31


In [197]:
df = murder_df[['Community']].groupby(['Community'])['Community'].count().reset_index(name='Homicides').sort_values('Homicides', ascending=False).head(10)
fig = px.bar(df, x = 'Community', y = 'Homicides')
fig.update_traces(marker_color='maroon')
fig.show()