# Chicago Crime Trends Analysis

## by Justin Sierchio

This Jupyter Notebook will be examining crime statistics for the city of Chicago, Illinois (Ward 11) from 2001 to present. 

This data is in .csv file format and is sourced from the Chicago Police Department. It can be found at: https://data.cityofchicago.org/api/views/srg9-gsb8/rows.csv?accessType=DOWNLOAD. Additional related information can be found at: https://data.cityofchicago.org/Public-Safety/bridgeport-crime-by-longitude-latitude-location/srg9-gsb8.

## Notebook Initialization

In [1]:
# Import Relevant Libraries
import pandas as pd
import numpy as np
import seaborn as sns 
import matplotlib.pyplot as plt

print('Initial libraries loaded into workspace!')

Initial libraries loaded into workspace!


Since we will be making data visualizations geospatially, we will need to also install the 'folium' package.

In [2]:
# Install folium into local Jupyter Notebook
!pip install folium

print('Successfully installed folium package!')

Successfully installed folium package!


In [3]:
# Upload Datasets for Study
df_CHI = pd.read_csv("bridgeport_crime_by_longitude_latitude_location.csv");

print('Datasets uploaded!');

Datasets uploaded!


In [4]:
# Open Chicago Crime Statistics dataset and display 1st 5 rows
df_CHI.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,Ward,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,7227048,HR642042,11/13/2008 10:22:00 PM,007XX W 31ST ST,860,THEFT,RETAIL THEFT,DRUG STORE,True,False,924,11,6,,,2008,08/17/2015 03:03:40 PM,,,
1,6689411,HR104533,01/03/2009 08:12:13 PM,007XX W 31ST ST,2091,NARCOTICS,FORFEIT PROPERTY,STREET,False,False,924,11,26,,,2009,08/17/2015 03:03:40 PM,,,
2,7732636,HS538339,07/01/2008 12:01:00 AM,007XX W 31ST ST,840,THEFT,FINANCIAL ID THEFT: OVER $300,RESIDENCE,False,False,924,11,6,,,2008,08/17/2015 03:03:40 PM,,,
3,11225422,JB144398,12/03/2017 05:42:00 PM,007XX W 31ST ST,810,THEFT,OVER $500,SMALL RETAIL STORE,True,False,915,11,6,,,2017,02/08/2018 03:51:54 PM,,,
4,7100352,HR508891,08/29/2009 12:01:00 AM,007XX W 31ST ST,860,THEFT,RETAIL THEFT,CONVENIENCE STORE,False,False,924,11,6,,,2009,08/17/2015 03:03:40 PM,,,


## Data Cleaning

First, let us get an understanding of the characteristics of this dataset.

In [5]:
# Characteristics of the Chicago Crime Statistics dataset
df_CHI.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 774 entries, 0 to 773
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   ID                    774 non-null    int64  
 1   Case Number           774 non-null    object 
 2   Date                  774 non-null    object 
 3   Block                 774 non-null    object 
 4   IUCR                  774 non-null    object 
 5   Primary Type          774 non-null    object 
 6   Description           774 non-null    object 
 7   Location Description  774 non-null    object 
 8   Arrest                774 non-null    bool   
 9   Domestic              774 non-null    bool   
 10  Beat                  774 non-null    int64  
 11  Ward                  774 non-null    int64  
 12  FBI Code              774 non-null    object 
 13  X Coordinate          768 non-null    float64
 14  Y Coordinate          768 non-null    float64
 15  Year                  7

Let's find out how many 'NULL' values we have in the dataset.

In [6]:
# Find 'NULL' values in Chicago Crime Statistics dataset
df_CHI.isnull().sum()

ID                      0
Case Number             0
Date                    0
Block                   0
IUCR                    0
Primary Type            0
Description             0
Location Description    0
Arrest                  0
Domestic                0
Beat                    0
Ward                    0
FBI Code                0
X Coordinate            6
Y Coordinate            6
Year                    0
Updated On              0
Latitude                6
Longitude               6
Location                6
dtype: int64

Let's remove all the 'NULL' and 'NA' values in the rows of this dataset.

In [7]:
# Remove 'NULL' rows from Houston Crime Statistics dataset
df_CHI1 = df_CHI.dropna()

# Confirm all 'NULL' rows and columns removed
df_CHI1.isnull().sum()

ID                      0
Case Number             0
Date                    0
Block                   0
IUCR                    0
Primary Type            0
Description             0
Location Description    0
Arrest                  0
Domestic                0
Beat                    0
Ward                    0
FBI Code                0
X Coordinate            0
Y Coordinate            0
Year                    0
Updated On              0
Latitude                0
Longitude               0
Location                0
dtype: int64

In [8]:
# Confirm Data Types
df_CHI1.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 768 entries, 5 to 773
Data columns (total 20 columns):
 #   Column                Non-Null Count  Dtype  
---  ------                --------------  -----  
 0   ID                    768 non-null    int64  
 1   Case Number           768 non-null    object 
 2   Date                  768 non-null    object 
 3   Block                 768 non-null    object 
 4   IUCR                  768 non-null    object 
 5   Primary Type          768 non-null    object 
 6   Description           768 non-null    object 
 7   Location Description  768 non-null    object 
 8   Arrest                768 non-null    bool   
 9   Domestic              768 non-null    bool   
 10  Beat                  768 non-null    int64  
 11  Ward                  768 non-null    int64  
 12  FBI Code              768 non-null    object 
 13  X Coordinate          768 non-null    float64
 14  Y Coordinate          768 non-null    float64
 15  Year                  7

Let's now take an updated view of Chicago Crime Statistics dataset.

In [9]:
# Display updated 1st 5 rows of Chicago Crime Statistics dataset
df_CHI1.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,Ward,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
5,11694922,JC272450,05/21/2019 03:30:00 PM,007XX W 31ST ST,560,ASSAULT,SIMPLE,DRUG STORE,True,False,915,11,08A,1171697.0,1884328.0,2019,06/30/2019 03:56:27 PM,41.838055,-87.645458,"(41.83805499, -87.645458414)"
6,11787889,JC384809,08/08/2019 11:38:00 PM,007XX W 31ST ST,560,ASSAULT,SIMPLE,CONVENIENCE STORE,True,False,915,11,08A,1171697.0,1884328.0,2019,08/15/2019 04:12:43 PM,41.838055,-87.645458,"(41.83805499, -87.645458414)"
7,11791240,JC388855,08/12/2019 01:30:00 AM,007XX W 31ST ST,530,ASSAULT,AGGRAVATED: OTHER DANG WEAPON,PARKING LOT/GARAGE(NON.RESID.),False,False,915,11,04A,1171697.0,1884328.0,2019,08/19/2019 04:24:12 PM,41.838055,-87.645458,"(41.83805499, -87.645458414)"
8,11715695,JC297743,06/08/2019 03:15:00 PM,007XX W 31ST ST,560,ASSAULT,SIMPLE,RESTAURANT,False,False,915,11,08A,1171668.0,1884327.0,2019,06/30/2019 03:56:27 PM,41.838053,-87.645565,"(41.838052883, -87.645564858)"
9,11723441,JC307153,06/15/2019 04:30:00 AM,007XX W 31ST ST,860,THEFT,RETAIL THEFT,SMALL RETAIL STORE,True,False,915,11,06,1171697.0,1884328.0,2019,06/30/2019 03:56:27 PM,41.838055,-87.645458,"(41.83805499, -87.645458414)"


It appears at this juncture the data is sufficiently clean that we can begin analyzing it.

## Exploratory Data Analysis

To begin our analysis, let's take a look at the types of crimes being committed.

In [10]:
# Determine the types and frequency of crimes being committed in Chicago
df_CHICrimeTypeFrequency = df_CHI1.groupby('Primary Type').size()
df_CHICrimeTypeFrequency

Primary Type
ASSAULT                              81
BATTERY                              77
BURGLARY                             28
CRIMINAL DAMAGE                      50
CRIMINAL TRESPASS                    58
DECEPTIVE PRACTICE                   46
GAMBLING                              1
INTERFERENCE WITH PUBLIC OFFICER      2
INTIMIDATION                          1
KIDNAPPING                            1
LIQUOR LAW VIOLATION                  7
MOTOR VEHICLE THEFT                   7
NARCOTICS                            33
OBSCENITY                             1
OTHER OFFENSE                        25
PUBLIC PEACE VIOLATION                8
ROBBERY                              14
SEX OFFENSE                           2
THEFT                               324
WEAPONS VIOLATION                     2
dtype: int64

So we can see that the highest number of crimes committed in Chicago are weapons violations. Let's do similar frequency analyses for the number of arrests and location description.

In [11]:
# Determine where the crimes are being committed in Chicago by Ward
df_CHI1CrimeLocationFrequency = df_CHI1['Arrest'].value_counts(sort=True)
print('Here are the numbers of arrests or lack of arrests for all the incidents reported in Chicago Ward 11:')
print('Arrest   # Crimes')
print(df_CHI1CrimeLocationFrequency.head(10))

Here are the numbers of arrests or lack of arrests for all the incidents reported in Chicago Ward 11:
Arrest   # Crimes
False    466
True     302
Name: Arrest, dtype: int64


In [12]:
# Determine what types of premises crimes are being committed in Chicago
df_CHI1CrimeLocationTypeFreq = df_CHI1['Location Description'].value_counts(sort=True)
print('Here are the Top 10 Locations types and the number of crimes in each:\n')
print('Location Type                   # Crimes')
print(df_CHI1CrimeLocationTypeFreq.head(10))

Here are the Top 10 Locations types and the number of crimes in each:

Location Type                   # Crimes
DRUG STORE                        206
SMALL RETAIL STORE                 97
PARKING LOT/GARAGE(NON.RESID.)     83
RESTAURANT                         66
STREET                             64
APARTMENT                          31
GROCERY FOOD STORE                 30
SIDEWALK                           30
CONVENIENCE STORE                  29
OTHER                              29
Name: Location Description, dtype: int64


From this information we can see that more crimes are committed in drug stores than other premises. However, we don't what crimes are being committed where. Let's see what crimes are committed in drug stores as an example.

In [13]:
# Types and FrequencY of crimes committed in Chicago drug stores
df_CHI1_DSCrime = df_CHI1[df_CHI1['Location Description'] == 'DRUG STORE']
df_CHI1_DSCrimeCategory = df_CHI1_DSCrime['Primary Type'].value_counts(sort=True)
print('Types of Crimes committed in drug stores in Chicago Ward 11')
print('Type of Crime                        # Crimes')
print(df_CHI1_DSCrimeCategory)

Types of Crimes committed in drug stores in Chicago Ward 11
Type of Crime                        # Crimes
THEFT                   123
CRIMINAL TRESPASS        24
ASSAULT                  18
BATTERY                  10
DECEPTIVE PRACTICE        9
NARCOTICS                 7
CRIMINAL DAMAGE           4
ROBBERY                   4
LIQUOR LAW VIOLATION      4
OTHER OFFENSE             2
OBSCENITY                 1
Name: Primary Type, dtype: int64


So we can see that retail thefts are the most likely type of crimes in Chicago Ward 11 drug stores. Let's look at restaurants.

In [14]:
# Types and Frequency of crimes committed in Chicago Ward 11 restaurants
df_CHI1_RESTCrime = df_CHI1[df_CHI1['Location Description'] == 'RESTAURANT']
df_CHI1_RESTCrimeCategory = df_CHI1_RESTCrime['Primary Type'].value_counts(sort=True)
print('Types of crimes committed in restaurants in Chicago Ward 11\n')
print('Type of Crime                             # Crimes')
print(df_CHI1_RESTCrimeCategory)

Types of crimes committed in restaurants in Chicago Ward 11

Type of Crime                             # Crimes
THEFT                     17
ASSAULT                   13
DECEPTIVE PRACTICE         9
BATTERY                    8
CRIMINAL TRESPASS          6
OTHER OFFENSE              4
BURGLARY                   4
CRIMINAL DAMAGE            2
NARCOTICS                  1
PUBLIC PEACE VIOLATION     1
LIQUOR LAW VIOLATION       1
Name: Primary Type, dtype: int64


So more people are victims of theft at a restaurant that any other crime, which makes sense given the nature of the respective premises.

At this juncture, let's try and plot the level of crime for different locations on a map of Chicago. Let's start by generating a map of Chicago with folium.

In [20]:
# Import Folium package
import folium

# Create a map of Chicago
CHI_map = folium.Map([41.827100, -87.648431], zoom_start=12)
CHI_map

Now lets's mark the incidents (specifically ASSAULTS, THEFT, and NARCOTICS) on the map of Chicago Ward 11. We will accomplish this task by creating a variable of the incidents and then placing them on the map.

In [34]:
# Create a variable for all the criminal incidents
CHICrime = df_CHI1[df_CHI1['Primary Type'].isin(['ASSAULT', 'THEFT', 'NARCOTICS'])]

# Place the criminal incidents on the map of Chicago
for i in range(len(CHICrime)):
    
    folium.CircleMarker(
        location = [CHICrime.Latitude.iloc[i], CHICrime.Longitude.iloc[i]],
        radius = 1,
        popup = CHICrime.Block.iloc[i],
        color = '#3186cc' if CHICrime['Primary Type'].iloc[i] == 'ASSAULT' else '#6ccc31' 
        if CHICrime['Primary Type'].iloc[i] =='THEFT' else '#ac31cc',
    ).add_to(CHI_map) 
    
CHI_map

Finally, we can see that the vast majority of these crimes occur in a roughly 3 block area along W 31st Street.

## Discussion

In this open-ended project, we were able to show the following:

<ul>
    <li>(1) West 31st Street in Chicago has a large number of property crimes going back several years.</li>
    <li>(2) Thefts at drug stores along this section of the street are a problem.</li>
    <li>(3) Arrests are only made about 40% of the time.</li>
</ul>

## Conclusion

The goal of this project was to explore any trends in a small sample of crime statistics for the City of Chicago, as well as to show their locations geospatially. In this project, we were able to upload a real-world dataset, clean the data systematically, perform several different exploratory data analysis, make useful data visualizations, and draw logical inferences from those analyses. It is the author's hope that others find this exericse useful. Thanks for reading!