<h2>
	2021 Crimes in Chicago
</h2>

<p>
	This data reflects reported incidents of crime that have occurred in the City of Chicago during a specific time period. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.
</p>
<p>
	The original data contained the following columns.
	<ul>
		<li>DATE OF OCCURRENCE</li> 
		<li>CASE#</li> 	
		<li>BLOCK</li> 	
		<li>IUCR </li>	
		<li>PRIMARY DESCRIPTION</li> 	
		<li>SECONDARY DESCRIPTION</li> 	
		<li>LOCATION DESCRIPTION</li> 	
		<li>ARREST</li> 	
		<li>DOMESTIC</li> 	
		<li>BEAT</li> 	
		<li>WARD</li> 	
		<li>COMMUNITY AREA</li> 	
		<li>FBI CD</li> 	
		<li>X COORDINATE</li> 	
		<li>Y COORDINATE</li> 	
		<li>LATITUDE</li> 	
		<li>LONGITUDE</li>	
		<li>LOCATION</li>
	</ul>
</p>
<p>
	This data can be downloaded at <a href="https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5g" target="_blank">https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5gcrime.html</a>
</p>
<p>
	The city of Chicago is divided into 77 community areas for statistical and planning purposes. Census data and other statistics are tied to the areas, which serve as the basis for a variety of urban planning initiatives on both the local and regional levels. The areas' boundaries do not generally change, allowing comparisons of statistics across time.
</p>
<p>
	I have taken the community area data and have made a new dataframe to merge into the crimes dataframe.  This will allow me to identify the neighborhood by name.  This information
	can be found at <a href="https://en.wikipedia.org/wiki/Community_areas_in_Chicago" target="_blank">https://en.wikipedia.org/wiki/Community_areas_in_Chicago</a>
</p>
<p>
	I also scraped the weather for O'Hare International Airport for 2021 to see how much of an effect weather has on the crime rate.  <a href="http://www.wx-now.com/Weather/WxHistory" target="_blank">http://www.wx-now.com/Weather/WxHistory</a>
</p>

In [127]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

After collecting the datasets, I removed, renamed and reorganized some of the columns from the crimes dataset using excel.

In [128]:
# Import Chicago crimes dataset
crimes = pd.read_csv('./csv/chicago_crimes_2021.csv')
# Import Chicago weather dataset
weather = pd.read_csv('./csv/chicago_temps_2021.csv')
# Import community area dataset
community = pd.read_csv('./csv/chicago_areas.csv')

We can see a sample of what the current Crimes dataset contains.

In [129]:
crimes.sample(3)

Unnamed: 0,crime_date,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude
80938,2021-5-26 19:00:00,maple st,theft,$500 and under,apartment,False,False,8,41.901822,-87.633071
8004,2021-1-17 2:53:00,polk st,weapons violation,unlawful possession - handgun,sidewalk,True,False,32,41.872202,-87.630935
195595,2021-12-14 21:00:00,erie st,motor vehicle theft,automobile,street,False,False,8,41.894021,-87.631816


Let's split the timestamp into seperate columns of 'date' and 'time'.  Then drop the original 'timestamp' and reorder the columns.

In [130]:
# Create new date and time column by extracting from timestamp
crimes['date'] = pd.to_datetime(crimes['crime_date']).dt.date
crimes['time'] = pd.to_datetime(crimes['crime_date']).dt.time
# Drop the original timestamp
crimes = crimes.drop(['crime_date'], axis=1)

In [131]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time
196321,fairbanks ct,deceptive practice,financial identity theft $300 and under,other (specify),False,False,8,41.8928,-87.62026,2021-12-16,14:00:00
76163,st lawrence ave,theft,$500 and under,street,False,False,50,41.688551,-87.608326,2021-06-01,04:00:00
4950,wells st,deceptive practice,financial identity theft $300 and under,residence,False,False,34,41.837389,-87.632864,2021-01-11,18:00:00


The <b>Arrest</b> and the <b>Domestic</b> column are represented in boolean values (True/False).  We will replace these values with Yes/No for readability.

In [132]:
crimes['arrest']=crimes['arrest'].replace([True, False], ['Yes', 'No'])
crimes['domestic']=crimes['arrest'].replace([True, False], ['Yes', 'No'])

In [133]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time
33662,foster ave,theft,$500 and under,street,No,No,3,41.976309,-87.660731,2021-03-13,13:30:00
73804,104th pl,criminal damage,to vehicle,vehicle non-commercial,No,No,49,41.704276,-87.619816,2021-05-28,19:00:00
6429,oakley blvd,theft,$500 and under,apartment,No,No,28,41.872194,-87.683744,2021-01-14,17:35:00


Chicago neighborhoods are specifically named and have their own <b>community area id</b>.  <br>
Below is a dataframe with the community id's,  community names, population, area (sq. miles) and density (population / area)

In [134]:
community.sample(3)

Unnamed: 0,community_area_id,name,population,area_sq_mi,density
0,1,rogers park,55628,1.84,30232.61
58,59,mckinley park,15923,1.41,11292.91
38,39,kenwood,19116,1.04,18380.77


I will merge the two dataframes similiar to an SQL inner join using the '<b>community_area_id</b>' column.

In [135]:
merged = pd.merge(crimes, community, on='community_area_id')
crimes = merged

In [136]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time,name,population,area_sq_mi,density
106496,karlov ave,battery,domestic battery simple,apartment,No,No,23,41.904905,-87.728828,2021-06-26,16:00:00,humboldt park,54165,3.6,15045.83
137213,halsted st,assault,simple,sidewalk,No,No,7,41.925978,-87.648781,2021-09-08,12:30:00,lincoln park,70492,3.16,22307.59
141300,whipple st,theft,over $500,street,No,No,2,42.000178,-87.705824,2021-11-11,07:00:00,west ridge,77122,3.53,21847.59


With the two dataframes successfully merged, I will rename the <b>name</b> to <b>community_name</b>.  I will also drop the <b>community_area_id</b> as it is no longer needed.

In [137]:
crimes.rename(columns={'name':'community_name'})
crimes = crimes.drop(['community_area_id'], axis=1)

In [138]:
crimes.sample()

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,latitude,longitude,date,time,name,population,area_sq_mi,density
5008,vernon ave,theft,$500 and under,apartment,No,No,41.746615,-87.613591,2021-01-30,19:30:00,chatham,31710,2.95,10749.15


Below is a dataframe with the day of the week, temperature high's, low's and if there was any precipitation that day.

In [139]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
282,wed,2021-3-24 0:00:00,58,46,0.01
321,sat,2021-2-13 0:00:00,12,-1,0.06
161,fri,2021-7-23 0:00:00,89,74,


Split the date from the timestamp and leave <b>yyyy</b>-<b>mm</b>-<b>dd</b>

In [140]:
weather['date'] = pd.to_datetime(weather['date']).dt.date

In [141]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
231,fri,2021-05-14,71,48,
32,mon,2021-11-29,41,27,
310,wed,2021-02-24,45,32,


Replace the abbreviated days of the week with the full name of the day.

In [142]:
weather['day_of_week'] = weather['day_of_week'].str.replace('mon', 'Monday')
weather['day_of_week'] = weather['day_of_week'].str.replace('tue', 'Tuesday')
weather['day_of_week'] = weather['day_of_week'].str.replace('wed', 'Wednesday')
weather['day_of_week'] = weather['day_of_week'].str.replace('thu', 'Thursday')
weather['day_of_week'] = weather['day_of_week'].str.replace('fri', 'Friday')
weather['day_of_week'] = weather['day_of_week'].str.replace('sat', 'Saturday')
weather['day_of_week'] = weather['day_of_week'].str.replace('sun', 'Sunday')

In [143]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
4,Monday,2021-12-27,46,32,
27,Saturday,2021-12-04,44,31,
7,Friday,2021-12-24,57,34,0.03


Replace '<b>NaN</b>' with '<b>No</b>' if there wasnt any precipitation that day.

In [144]:
weather['precipitation_in']=weather['precipitation_in'].fillna('No')
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
4,Monday,2021-12-27,46,32,No
267,Thursday,2021-04-08,69,52,0.33
201,Sunday,2021-06-13,87,66,No


Now we can merge the <b>weather</b> dataframe with the <b>crime</b> dataframe by the <b>date</b> column

In [145]:
merged = pd.merge(crimes, weather, on='date')
crimes = merged

In [146]:
crimes.sample()

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,latitude,longitude,date,time,name,population,area_sq_mi,density,day_of_week,high_temp_f,low_temp_f,precipitation_in
119320,kostner ave,criminal damage,to property,apartment,No,No,41.918259,-87.736363,2021-08-01,00:00:00,hermosa,24062,1.17,20565.81,Sunday,76,62,No


Reorder the columns

In [147]:
crimes = crimes[['date', 'time',  'day_of_week', 'high_temp_f', 'low_temp_f', 'precipitation_in', 'crime_primary_type', 'crime_primary_description', 'crime_location_description', 'arrest', 'domestic', 'city_block', 'name', 'population', 'area_sq_mi', 'density', 'latitude', 'longitude']]

In [148]:
crimes.sample()

Unnamed: 0,date,time,day_of_week,high_temp_f,low_temp_f,precipitation_in,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,city_block,name,population,area_sq_mi,density,latitude,longitude
138486,2021-09-06,00:00:00,Monday,82,61,No,battery,domestic battery simple,apartment,No,No,mason ave,austin,96557,7.15,13504.48,41.868564,-87.773011


Rename the columns

In [149]:
crimes.rename(columns = 
 	{'date' : 'Date',
 	'time' : 'Time',
 	'day_of_week' : 'Weekday',
 	'high_temp_f' : 'Hi (f)',
 	'low_temp_f' : 'Lo (f)',
 	'precipitation_in' : 'Precipitation',
 	'crime_primary_type' : 'Type',
 	'crime_primary_description' : 'Description',
 	'crime_location_description' : 'Location',
 	'arrest' : 'Arrest',
 	'domestic' : 'Domestic',
 	'city_block' : 'Street',
 	'name' : 'Community',
 	'population' : 'Population',
 	'area_sq_mi' : 'Area',
 	'density' : 'Density',
 	'latitude' : 'Latitude',
 	'longitude' : 'Longitude'},
	inplace=True)

In [150]:
crimes.sample()

Unnamed: 0,Date,Time,Weekday,Hi (f),Lo (f),Precipitation,Type,Description,Location,Arrest,Domestic,Street,Community,Population,Area,Density,Latitude,Longitude
7198,2021-01-13,23:30:00,Wednesday,41,31,No,criminal damage,to property,apartment,No,No,calumet ave,grand boulevard,24589,1.74,14131.61,41.812035,-87.618229


For better readability, I will capitilize the first letter of every word in the columns using the title() function.

In [151]:
# Capitilize the first letter of every word
crimes['Street'] = crimes['Street'].str.title()
crimes['Type'] = crimes['Type'].str.title()
crimes['Description'] = crimes['Description'].str.title()
crimes['Description'] = crimes['Description'].str.title()
crimes['Community'] = crimes['Community'].str.title()
crimes['Weekday'] = crimes['Weekday'].str.title()

In [152]:
crimes.sample(3)

Unnamed: 0,Date,Time,Weekday,Hi (f),Lo (f),Precipitation,Type,Description,Location,Arrest,Domestic,Street,Community,Population,Area,Density,Latitude,Longitude
50648,2021-04-12,16:30:00,Monday,62,48,No,Deceptive Practice,Fraud Or Confidence Game,apartment,No,No,Laflin St,New City,43628,4.83,9032.71,,
1340,2021-01-05,20:09:00,Tuesday,37,29,No,Theft,Retail Theft,small retail store,No,No,Division St,West Town,87781,4.58,19166.16,41.903304,-87.670065
119333,2021-08-01,12:00:00,Sunday,76,62,No,Theft,Pocket-Picking,park property,No,No,Michigan Ave,Loop,42298,1.65,25635.15,41.873598,-87.624211


Top 10 Neighborhoods with the Highest Reported Crime Rate in 2021

In [153]:
crimes[['Type', 'Community']].groupby(['Community'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)

Unnamed: 0,Community,Reported Crimes
5,Austin,11341
47,Near North Side,8126
65,South Shore,7272
49,Near West Side,6743
52,North Lawndale,6161
4,Auburn Gresham,5873
32,Humboldt Park,5767
29,Greater Grand Crossing,5545
75,West Town,5486
41,Loop,5446


In [154]:
df = crimes[['Type', 'Community']].groupby(['Community'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)
fig = px.bar(df, x = 'Community', y = 'Reported Crimes')
fig.update_traces(marker_color='slategrey')
fig.show()

Top 10 Reported Crimes in 2021

In [155]:
crimes[['Type', 'Community']].groupby(['Type'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)

Unnamed: 0,Type,Reported Crimes
2,Battery,39988
29,Theft,39758
5,Criminal Damage,24716
1,Assault,20086
8,Deceptive Practice,15710
22,Other Offense,13588
16,Motor Vehicle Theft,10410
30,Weapons Violation,8865
26,Robbery,7813
3,Burglary,6546


In [156]:
df = crimes[['Type', 'Community']].groupby(['Type'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)
fig = px.bar(df, x = 'Type', y = 'Reported Crimes')
fig.update_traces(marker_color='teal')
fig.show()

Top 10 Communities with the Highest Homicide Rate

In [157]:
murder_df = crimes[crimes['Type'] == 'Homicide']
murder_df[['Community']].groupby(['Community'])['Community'].count().reset_index(name='Homicides').sort_values('Homicides', ascending=False).head(10)

Unnamed: 0,Community,Homicides
5,Austin,70
43,North Lawndale,45
54,South Shore,39
24,Greater Grand Crossing,39
4,Auburn Gresham,39
60,West Garfield Park,38
62,West Pullman,37
27,Humboldt Park,34
19,Englewood,33
16,East Garfield Park,31


In [164]:
df = murder_df[['Community']].groupby(['Community'])['Community'].count().reset_index(name='Homicides').sort_values('Homicides', ascending=False).head(10)
fig = px.bar(df, x = 'Community', y = 'Homicides')
fig.update_traces(marker_color='maroon')
fig.show()