<h2>
	2021 Crimes in Chicago
</h2>

<p>
	This data reflects reported incidents of crime that have occurred in the City of Chicago during a specific time period. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.
</p>
<p>
	The original data contained the following columns.
	<ul>
		<li>DATE OF OCCURRENCE</li> 
		<li>CASE#</li> 	
		<li>BLOCK</li> 	
		<li>IUCR </li>	
		<li>PRIMARY DESCRIPTION</li> 	
		<li>SECONDARY DESCRIPTION</li> 	
		<li>LOCATION DESCRIPTION</li> 	
		<li>ARREST</li> 	
		<li>DOMESTIC</li> 	
		<li>BEAT</li> 	
		<li>WARD</li> 	
		<li>COMMUNITY AREA</li> 	
		<li>FBI CD</li> 	
		<li>X COORDINATE</li> 	
		<li>Y COORDINATE</li> 	
		<li>LATITUDE</li> 	
		<li>LONGITUDE</li>	
		<li>LOCATION</li>
	</ul>
</p>
<p>
	This data can be downloaded at <a href="https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5g" target="_blank">https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5gcrime.html</a>
</p>
<p>
	The city of Chicago is divided into 77 community areas for statistical and planning purposes. Census data and other statistics are tied to the areas, which serve as the basis for a variety of urban planning initiatives on both the local and regional levels. The areas' boundaries do not generally change, allowing comparisons of statistics across time.
</p>
<p>
	I have taken the community area data and have made a new dataframe to merge into the crimes dataframe.  This will allow me to identify the neighborhood by name.  This information
	can be found at <a href="https://en.wikipedia.org/wiki/Community_areas_in_Chicago" target="_blank">https://en.wikipedia.org/wiki/Community_areas_in_Chicago</a>
</p>
<p>
	I also scraped the weather for O'Hare International Airport for 2021 to see how much of an effect weather has on the crime rate.  <a href="http://www.wx-now.com/Weather/WxHistory" target="_blank">http://www.wx-now.com/Weather/WxHistory</a>
</p>

In [1]:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
import plotly.express as px

After collecting the datasets, I removed, renamed and reorganized some of the columns from the crimes dataset using excel.

In [2]:
# Import Chicago crimes dataset
crimes = pd.read_csv('./csv/chicago_crimes_2021.csv')
# Import Chicago weather dataset
weather = pd.read_csv('./csv/chicago_temps_2021.csv')
# Import community area dataset
community = pd.read_csv('./csv/chicago_areas.csv')

We can see a sample of what the current Crimes dataset contains.

In [3]:
crimes.sample(3)

Unnamed: 0,crime_date,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude
199597,2021-12-23 19:33:00,clyde ave,battery,aggravated - handgun,street,False,False,48,41.73453,-87.57323
110576,2021-7-28 22:40:00,karlov ave,battery,"agg. domestic battery - hands, fists, feet, se...",apartment,True,True,14,41.96323,-87.730318
159808,2021-9-20 10:00:00,noble st,theft,$500 and under,school - public grounds,False,False,24,41.90011,-87.662516


Let's split the timestamp into seperate columns of 'date' and 'time'.  Then drop the original 'timestamp' and reorder the columns.

In [4]:
# Create new date and time column by extracting from timestamp
crimes['date'] = pd.to_datetime(crimes['crime_date']).dt.date
crimes['time'] = pd.to_datetime(crimes['crime_date']).dt.time
# Drop the original timestamp
crimes = crimes.drop(['crime_date'], axis=1)

In [5]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time
150089,michigan ave,assault,simple,gas station,False,False,38,41.808833,-87.622824,2021-09-29,05:30:00
177221,ellis ave,deceptive practice,fraud or confidence game,apartment,False,False,69,41.752297,-87.60036,2021-11-12,13:09:00
188746,narragansett ave,criminal damage,to property,apartment,False,False,25,41.910006,-87.78503,2021-12-03,18:01:00


The <b>Arrest</b> and the <b>Domestic</b> column are represented in boolean values (True/False).  We will replace these values with Yes/No for readability.

In [6]:
crimes['arrest']=crimes['arrest'].replace([True, False], ['Yes', 'No'])
crimes['domestic']=crimes['arrest'].replace([True, False], ['Yes', 'No'])

In [7]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time
153392,avers ave,theft,over $500,street,Yes,Yes,23,41.896155,-87.722439,2021-10-02,14:00:00
150037,prairie ave,criminal damage,criminal defacement,residence,No,No,44,41.739578,-87.617843,2021-09-29,18:45:00
1322,111th st,weapons violation,unlawful possession - handgun,sidewalk,Yes,Yes,49,41.692589,-87.622669,2021-01-01,20:07:00


Chicago neighborhoods are specifically named and have their own <b>community area id</b>.  <br>
Below is a dataframe with the community id's,  community names, population, area (sq. miles) and density (population / area)

In [8]:
community.sample(3)

Unnamed: 0,community_area_id,name,population,area_sq_mi,density
31,32,loop,42298,1.65,25635.15
9,10,norwood park,38303,4.37,8764.99
4,5,north center,35114,2.05,17128.78


I will merge the two dataframes similiar to an SQL inner join using the '<b>community_area_id</b>' column.

In [9]:
merged = pd.merge(crimes, community, on='community_area_id')
crimes = merged

In [10]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time,name,population,area_sq_mi,density
2590,shields ave,weapons violation,unlawful possession - handgun,street,Yes,Yes,68,41.787768,-87.634031,2021-07-21,21:40:00,englewood,24369,3.07,7937.79
158869,cortland st,robbery,armed - handgun,street,No,No,20,41.915207,-87.737909,2021-10-27,07:50:00,hermosa,24062,1.17,20565.81
106387,lawndale ave,weapons violation,unlawful possession - handgun,sidewalk,Yes,Yes,23,41.901886,-87.718968,2021-06-19,10:55:00,humboldt park,54165,3.6,15045.83


With the two dataframes successfully merged, I will rename the <b>name</b> to <b>community_name</b>.  I will also drop the <b>community_area_id</b> as it is no longer needed.

In [11]:
crimes.rename(columns={'name':'community_name'})
crimes = crimes.drop(['community_area_id'], axis=1)

In [12]:
crimes.sample()

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,latitude,longitude,date,time,name,population,area_sq_mi,density
93679,50th pl,criminal damage,to property,residence - yard (front / back),Yes,Yes,41.803058,-87.61337,2021-08-20,18:37:00,grand boulevard,24589,1.74,14131.61


Below is a dataframe with the day of the week, temperature high's, low's and if there was any precipitation that day.

In [13]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
5,sun,2021-12-26 0:00:00,43,26,0.69
128,wed,2021-8-25 0:00:00,94,71,0.39
25,mon,2021-12-6 0:00:00,38,15,


Split the date from the timestamp and leave <b>yyyy</b>-<b>mm</b>-<b>dd</b>

In [14]:
weather['date'] = pd.to_datetime(weather['date']).dt.date

In [15]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
131,sun,2021-08-22,83,70,
159,sun,2021-07-25,90,72,
211,thu,2021-06-03,87,56,


Replace the abbreviated days of the week with the full name of the day.

In [16]:
weather['day_of_week'] = weather['day_of_week'].str.replace('mon', 'Monday')
weather['day_of_week'] = weather['day_of_week'].str.replace('tue', 'Tuesday')
weather['day_of_week'] = weather['day_of_week'].str.replace('wed', 'Wednesday')
weather['day_of_week'] = weather['day_of_week'].str.replace('thu', 'Thursday')
weather['day_of_week'] = weather['day_of_week'].str.replace('fri', 'Friday')
weather['day_of_week'] = weather['day_of_week'].str.replace('sat', 'Saturday')
weather['day_of_week'] = weather['day_of_week'].str.replace('sun', 'Sunday')

In [17]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
228,Monday,2021-05-17,70,56,
198,Wednesday,2021-06-16,77,56,
43,Thursday,2021-11-18,40,32,


Replace '<b>NaN</b>' with '<b>0.0</b>' if there wasn't any precipitation that day.

In [18]:
weather['precipitation_in']=weather['precipitation_in'].fillna(0.0)
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
195,Saturday,2021-06-19,90,66,0.0
210,Friday,2021-06-04,91,71,0.0
284,Monday,2021-03-22,68,46,0.0


Now we can merge the <b>weather</b> dataframe with the <b>crime</b> dataframe by the <b>date</b> column

In [19]:
merged = pd.merge(crimes, weather, on='date')
crimes = merged

In [20]:
crimes.sample()

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,latitude,longitude,date,time,name,population,area_sq_mi,density,day_of_week,high_temp_f,low_temp_f,precipitation_in
62187,crystal st,theft,$500 and under,apartment,No,No,41.9039,-87.693245,2021-05-03,17:30:00,west town,87781,4.58,19166.16,Monday,73,58,0.06


Reorder the columns

In [21]:
crimes = crimes[['date', 'time',  'day_of_week', 'high_temp_f', 'low_temp_f', 'precipitation_in', 'crime_primary_type', 'crime_primary_description', 'crime_location_description', 'arrest', 'domestic', 'city_block', 'name', 'population', 'area_sq_mi', 'density', 'latitude', 'longitude']]

In [22]:
crimes.sample()

Unnamed: 0,date,time,day_of_week,high_temp_f,low_temp_f,precipitation_in,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,city_block,name,population,area_sq_mi,density,latitude,longitude
52960,2021-04-16,00:00:00,Friday,57,38,0.0,assault,simple,apartment,No,No,fulton st,austin,96557,7.15,13504.48,41.885615,-87.757023


Rename the columns

In [23]:
crimes.rename(columns = 
 	{'date' : 'Date',
 	'time' : 'Time',
 	'day_of_week' : 'Weekday',
 	'high_temp_f' : 'Hi (f)',
 	'low_temp_f' : 'Lo (f)',
 	'precipitation_in' : 'Precipitation',
 	'crime_primary_type' : 'Type',
 	'crime_primary_description' : 'Description',
 	'crime_location_description' : 'Location',
 	'arrest' : 'Arrest',
 	'domestic' : 'Domestic',
 	'city_block' : 'Street',
 	'name' : 'Community',
 	'population' : 'Population',
 	'area_sq_mi' : 'Area',
 	'density' : 'Density',
 	'latitude' : 'Latitude',
 	'longitude' : 'Longitude'},
	inplace=True)

In [24]:
crimes.sample()

Unnamed: 0,Date,Time,Weekday,Hi (f),Lo (f),Precipitation,Type,Description,Location,Arrest,Domestic,Street,Community,Population,Area,Density,Latitude,Longitude
69203,2021-05-16,06:00:00,Sunday,72,55,0.0,other offense,telephone threat,street,No,No,sacramento blvd,east garfield park,19992,1.93,10358.55,41.873896,-87.701017


For better readability, I will capitilize the first letter of every word in the columns using the title() function.

In [25]:
# Capitilize the first letter of every word
crimes['Street'] = crimes['Street'].str.title()
crimes['Type'] = crimes['Type'].str.title()
crimes['Description'] = crimes['Description'].str.title()
crimes['Description'] = crimes['Description'].str.title()
crimes['Community'] = crimes['Community'].str.title()
crimes['Weekday'] = crimes['Weekday'].str.title()

In [26]:
crimes.sample(3)

Unnamed: 0,Date,Time,Weekday,Hi (f),Lo (f),Precipitation,Type,Description,Location,Arrest,Domestic,Street,Community,Population,Area,Density,Latitude,Longitude
182108,2021-11-15,13:00:00,Monday,36,30,0.01,Theft,Over $500,cleaning store,No,No,Lincoln Ave,Lincoln Square,40494,2.56,15817.97,41.978986,-87.692652
142777,2021-09-12,07:00:00,Sunday,87,68,0.0,Battery,Domestic Battery Simple,residence,Yes,Yes,Houston Ave,Hegewisch,10027,5.24,1913.55,41.658243,-87.548365
171988,2021-10-30,19:00:00,Saturday,59,45,0.0,Other Offense,Harassment By Electronic Means,apartment,No,No,Hill St,Near North Side,105481,2.74,38496.72,41.902095,-87.634993


Top 10 Neighborhoods with the Highest Reported Crime Rate in 2021

In [27]:
crimes[['Type', 'Community']].groupby(['Community'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)

Unnamed: 0,Community,Reported Crimes
5,Austin,11341
47,Near North Side,8126
65,South Shore,7272
49,Near West Side,6743
52,North Lawndale,6161
4,Auburn Gresham,5873
32,Humboldt Park,5767
29,Greater Grand Crossing,5545
75,West Town,5486
41,Loop,5446


In [42]:
df = crimes[['Type', 'Community']].groupby(['Community'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)
fig = px.bar(df, y = 'Community', x = 'Reported Crimes', orientation='h')
fig.update_traces(marker_color='slategrey')
fig.update_layout(yaxis={'categoryorder':'total ascending'})
fig.show()

Top 10 Reported Crimes in 2021

In [29]:
crimes[['Type', 'Community']].groupby(['Type'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)

Unnamed: 0,Type,Reported Crimes
2,Battery,39988
29,Theft,39758
5,Criminal Damage,24716
1,Assault,20086
8,Deceptive Practice,15710
22,Other Offense,13588
16,Motor Vehicle Theft,10410
30,Weapons Violation,8865
26,Robbery,7813
3,Burglary,6546


In [30]:
df = crimes[['Type', 'Community']].groupby(['Type'])['Type'].count().reset_index(name='Reported Crimes').sort_values('Reported Crimes', ascending=False).head(10)
fig = px.bar(df, x = 'Type', y = 'Reported Crimes')
fig.update_traces(marker_color='teal')
fig.show()

Top 10 Communities with the Highest Homicide Rate

In [31]:
murder_df = crimes[crimes['Type'] == 'Homicide']
murder_df[['Community']].groupby(['Community'])['Community'].count().reset_index(name='Homicides').sort_values('Homicides', ascending=False).head(10)

Unnamed: 0,Community,Homicides
5,Austin,70
43,North Lawndale,45
54,South Shore,39
24,Greater Grand Crossing,39
4,Auburn Gresham,39
60,West Garfield Park,38
62,West Pullman,37
27,Humboldt Park,34
19,Englewood,33
16,East Garfield Park,31


In [32]:
df = murder_df[['Community']].groupby(['Community'])['Community'].count().reset_index(name='Homicides').sort_values('Homicides', ascending=False).head(10)
fig = px.bar(df, x = 'Community', y = 'Homicides')
fig.update_traces(marker_color='maroon')
fig.show()