<h2>
	2021 Crimes in Chicago
</h2>

<p>
	This data reflects reported incidents of crime that have occurred in the City of Chicago during a specific time period. Data is extracted from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.
</p>
<p>
	The original data contained the following columns.
	<ul>
		<li>DATE OF OCCURRENCE</li> 
		<li>CASE#</li> 	
		<li>BLOCK</li> 	
		<li>IUCR </li>	
		<li>PRIMARY DESCRIPTION</li> 	
		<li>SECONDARY DESCRIPTION</li> 	
		<li>LOCATION DESCRIPTION</li> 	
		<li>ARREST</li> 	
		<li>DOMESTIC</li> 	
		<li>BEAT</li> 	
		<li>WARD</li> 	
		<li>COMMUNITY AREA</li> 	
		<li>FBI CD</li> 	
		<li>X COORDINATE</li> 	
		<li>Y COORDINATE</li> 	
		<li>LATITUDE</li> 	
		<li>LONGITUDE</li>	
		<li>LOCATION</li>
	</ul>
</p>
<p>
	This data can be downloaded at <a href="https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5g" target="_blank">https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present-Dashboard/5cd6-ry5gcrime.html</a>
</p>
<p>
	The city of Chicago is divided into 77 community areas for statistical and planning purposes. Census data and other statistics are tied to the areas, which serve as the basis for a variety of urban planning initiatives on both the local and regional levels. The areas' boundaries do not generally change, allowing comparisons of statistics across time.
</p>
<p>
	I have taken the community area data and have made a new dataframe to merge into the crimes dataframe.  This will allow me to identify the neighborhood by name.  This information
	can be found at <a href="https://en.wikipedia.org/wiki/Community_areas_in_Chicago" target="_blank">https://en.wikipedia.org/wiki/Community_areas_in_Chicago</a>
</p>
<p>
	I also scraped the weather for O'Hare International Airport for 2021 to see how much of an effect weather has on the crime rate.  <a href="http://www.wx-now.com/Weather/WxHistory" target="_blank">http://www.wx-now.com/Weather/WxHistory</a>
</p>

In [263]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

After collecting the datasets, I removed, renamed and reorganized some of the columns from the crimes dataset using excel.

In [264]:
# Import Chicago crimes dataset
crimes = pd.read_csv('./csv/chicago_crimes_2021.csv')
# Import Chicago weather dataset
weather = pd.read_csv('./csv/chicago_temps_2021.csv')
# Import community area dataset
community = pd.read_csv('./csv/chicago_areas.csv')

We can see a sample of what the current Crimes dataset contains.

In [265]:
crimes.sample(3)

Unnamed: 0,crime_date,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude
189589,2021-12-4 21:20:00,washington st,public peace violation,reckless conduct,convenience store,True,False,32,41.883225,-87.625037
195542,2021-1-3 18:52:00,83rd pl,battery,domestic battery simple,residence,False,True,70,41.741365,-87.715425
200293,2021-12-23 22:55:00,emerald ave,homicide,first degree murder,street,False,False,68,41.793289,-87.643917


Let's split the timestamp into seperate columns of 'date' and 'time'.  Then drop the original 'timestamp' and reorder the columns.

In [266]:
# Create new date and time column by extracting from timestamp
crimes['date'] = pd.to_datetime(crimes['crime_date']).dt.date
crimes['time'] = pd.to_datetime(crimes['crime_date']).dt.time
# Drop the original timestamp
crimes = crimes.drop(['crime_date'], axis=1)

In [267]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time
74453,pulaski rd,narcotics,possess - cocaine,alley,True,False,30,41.840076,-87.724437,2021-05-29,21:10:00
128394,sacramento ave,assault,simple,street,False,False,63,41.78769,-87.69851,2021-08-26,22:30:00
198461,burley ave,criminal damage,to vehicle,street,False,False,46,41.735089,-87.545781,2021-12-19,22:00:00


The <b>Arrest</b> and the <b>Domestic</b> column are represented in boolean values (True/False).  We will replace these values with Yes/No for readability.

In [268]:
crimes['arrest']=crimes['arrest'].replace([True, False], ['Yes', 'No'])
crimes['domestic']=crimes['arrest'].replace([True, False], ['Yes', 'No'])

In [269]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time
144144,arthington st,deceptive practice,financial identity theft over $ 300,apartment,No,No,26,41.86983,-87.723615,2021-09-20,10:00:00
169693,sacramento ave,assault,simple,sidewalk,No,No,30,41.850975,-87.700329,2021-10-31,18:00:00
91801,kingston ave,battery,domestic battery simple,apartment,No,No,46,41.740108,-87.56241,2021-06-27,06:00:00


Chicago neighborhoods are specifically named and have their own <b>community area id</b>.  <br>
Below is a dataframe with the community id's,  community names, population, area (sq. miles) and density (population / area)

In [270]:
community.sample(3)

Unnamed: 0,community_area_id,name,population,area_sq_mi,density
52,53,west pullman,26104,3.56,7332.58
71,72,beverly,20027,3.18,6297.8
76,77,edgewater,56296,1.74,32354.02


I will merge the two dataframes similiar to an SQL inner join using the '<b>community_area_id</b>' column.

In [271]:
merged = pd.merge(crimes, community, on='community_area_id')
crimes = merged

In [272]:
crimes.sample(3)

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,community_area_id,latitude,longitude,date,time,name,population,area_sq_mi,density
198448,rockwell st,deceptive practice,financial identity theft $300 and under,,No,No,58,41.81802,-87.6896,2021-11-18,19:00:00,brighton park,45053,2.72,16563.6
68117,yale ave,burglary,forcible entry,restaurant,No,No,69,41.755849,-87.63119,2021-12-27,03:41:00,greater grand crossing,31471,3.55,8865.07
27939,kingston ave,other offense,telephone threat,residence,No,No,43,41.756691,-87.562688,2021-04-12,10:34:00,south shore,53971,2.93,18420.14


With the two dataframes successfully merged, I will rename the <b>name</b> to <b>community_name</b>.  I will also drop the <b>community_area_id</b> as it is no longer needed.

In [273]:
crimes.rename(columns={'name':'community_name'})
crimes = crimes.drop(['community_area_id'], axis=1)

In [274]:
crimes.sample()

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,latitude,longitude,date,time,name,population,area_sq_mi,density
85804,independence blvd,criminal damage,to property,apartment,No,No,41.872277,-87.72032,2021-04-28,03:00:00,west garfield park,17433,1.28,13619.53


Below is a dataframe with the day of the week, temperature high's, low's and if there was any precipitation that day.

In [275]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
42,fri,2021-11-19 0:00:00,40,26,
83,sat,2021-10-9 0:00:00,80,64,
179,mon,2021-7-5 0:00:00,91,73,


Split the date from the timestamp and leave <b>yyyy</b>-<b>mm</b>-<b>dd</b>

In [276]:
weather['date'] = pd.to_datetime(weather['date']).dt.date

In [277]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
269,tue,2021-04-06,81,59,
324,wed,2021-02-10,16,6,0.03
299,sun,2021-03-07,52,26,


Replace the abbreviated days of the week with the full name of the day.

In [278]:
weather['day_of_week'] = weather['day_of_week'].str.replace('mon', 'Monday')
weather['day_of_week'] = weather['day_of_week'].str.replace('tue', 'Tuesday')
weather['day_of_week'] = weather['day_of_week'].str.replace('wed', 'Wednesday')
weather['day_of_week'] = weather['day_of_week'].str.replace('thu', 'Thursday')
weather['day_of_week'] = weather['day_of_week'].str.replace('fri', 'Friday')
weather['day_of_week'] = weather['day_of_week'].str.replace('sat', 'Saturday')
weather['day_of_week'] = weather['day_of_week'].str.replace('sun', 'Sunday')

In [279]:
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
33,Sunday,2021-11-28,41,27,
116,Monday,2021-09-06,82,61,
137,Monday,2021-08-16,80,63,


Replace '<b>NaN</b>' with '<b>No</b>' if there wasnt any precipitation that day.

In [280]:
weather['precipitation_in']=weather['precipitation_in'].fillna('No')
weather.sample(3)

Unnamed: 0,day_of_week,date,high_temp_f,low_temp_f,precipitation_in
86,Wednesday,2021-10-06,68,64,0.03
324,Wednesday,2021-02-10,16,6,0.03
24,Tuesday,2021-12-07,26,13,No


Now we can merge the <b>weather</b> dataframe with the <b>crime</b> dataframe by the <b>date</b> column

In [281]:
merged = pd.merge(crimes, weather, on='date')
crimes = merged

In [282]:
crimes.sample()

Unnamed: 0,city_block,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,latitude,longitude,date,time,name,population,area_sq_mi,density,day_of_week,high_temp_f,low_temp_f,precipitation_in
174004,roscoe st,theft,$500 and under,sidewalk,No,No,41.943337,-87.643519,2021-11-02,17:55:00,lake view,103050,3.12,33028.85,Tuesday,47,32,No


Reorder the columns

In [283]:
crimes = crimes[['date', 'time',  'day_of_week', 'high_temp_f', 'low_temp_f', 'precipitation_in', 'crime_primary_type', 'crime_primary_description', 'crime_location_description', 'arrest', 'domestic', 'city_block', 'name', 'population', 'area_sq_mi', 'density', 'latitude', 'longitude']]

In [284]:
crimes.sample()

Unnamed: 0,date,time,day_of_week,high_temp_f,low_temp_f,precipitation_in,crime_primary_type,crime_primary_description,crime_location_description,arrest,domestic,city_block,name,population,area_sq_mi,density,latitude,longitude
199815,2021-12-20,21:15:00,Monday,42,30,No,motor vehicle theft,automobile,street,No,No,wabash ave,loop,42298,1.65,25635.15,41.868416,-87.625847


Rename the columns

In [285]:
crimes.rename(columns = 
 	{'date' : 'Date',
 	'time' : 'Time',
 	'day_of_week' : 'Weekday',
 	'high_temp_f' : 'Hi (f)',
 	'low_temp_f' : 'Lo (f)',
 	'precipitation_in' : 'Precipitation',
 	'crime_primary_type' : 'Type',
 	'crime_primary_description' : 'Description',
 	'crime_location_description' : 'Location',
 	'arrest' : 'Arrest',
 	'domestic' : 'Domestic',
 	'city_block' : 'Street',
 	'name' : 'Community',
 	'population' : 'Population',
 	'area_sq_mi' : 'Area',
 	'density' : 'Density',
 	'latitude' : 'Latitude',
 	'longitude' : 'Longitude'},
	inplace=True)

In [286]:
crimes.sample()

Unnamed: 0,Date,Time,Weekday,Hi (f),Lo (f),Precipitation,Type,Description,Location,Arrest,Domestic,Street,Community,Population,Area,Density,Latitude,Longitude
29734,2021-02-27,12:14:00,Saturday,52,33,No,weapons violation,unlawful possession - handgun,street,Yes,Yes,kilbourn ave,west garfield park,17433,1.28,13619.53,41.876639,-87.73779


For better readability, I will capitilize the first letter of every word in the columns using the title() function.

In [287]:
# Capitilize the first letter of every word
crimes['Street'] = crimes['Street'].str.title()
crimes['Type'] = crimes['Type'].str.title()
crimes['Description'] = crimes['Description'].str.title()
crimes['Description'] = crimes['Description'].str.title()
crimes['Community'] = crimes['Community'].str.title()
crimes['Weekday'] = crimes['Weekday'].str.title()

In [288]:
crimes.sample(3)

Unnamed: 0,Date,Time,Weekday,Hi (f),Lo (f),Precipitation,Type,Description,Location,Arrest,Domestic,Street,Community,Population,Area,Density,Latitude,Longitude
93284,2021-06-24,02:11:00,Thursday,76,68,1.49,Robbery,Aggravated Vehicular Hijacking,street,No,No,Van Buren St,Near West Side,67881,5.69,11929.88,41.876523,-87.656847
193197,2021-12-07,14:34:00,Tuesday,26,13,No,Robbery,Aggravated Vehicular Hijacking,street,No,No,37Th St,Bridgeport,33702,2.09,16125.36,41.827196,-87.639014
83267,2021-06-07,09:53:00,Monday,80,71,No,Assault,Simple,residence,No,No,Manistee Ave,South Chicago,27300,3.34,8173.65,41.736372,-87.556944


Top 20 reported crimes in the Chicago during 2021

In [289]:
crimes['Type'].value_counts().iloc[: 20]

Battery                             39988
Theft                               39758
Criminal Damage                     24716
Assault                             20086
Deceptive Practice                  15710
Other Offense                       13588
Motor Vehicle Theft                 10410
Weapons Violation                    8865
Robbery                              7813
Burglary                             6546
Narcotics                            4072
Criminal Trespass                    3367
Offense Involving Children           1839
Criminal Sexual Assault              1428
Sex Offense                          1036
Homicide                              803
Public Peace Violation                596
Arson                                 515
Stalking                              356
Interference With Public Officer      307
Name: Type, dtype: int64

Top 10 days of the year with the most reported crime

In [290]:
crimes['Date'].value_counts().iloc[: 10]

2021-07-31    779
2021-10-01    752
2021-06-06    715
2021-08-01    713
2021-06-19    710
2021-09-19    707
2021-07-29    704
2021-01-01    701
2021-10-02    695
2021-06-20    692
Name: Date, dtype: int64

Neighborhoods with the most reported crimes.

In [291]:
crimes['Community'].value_counts().iloc[: 10]

Austin                    11341
Near North Side            8126
South Shore                7272
Near West Side             6743
North Lawndale             6161
Auburn Gresham             5873
Humboldt Park              5767
Greater Grand Crossing     5545
West Town                  5486
Loop                       5446
Name: Community, dtype: int64