# CS435 Term Project

## Analyze Crime Rates to Help Mitigate Crimes in Chicago

Team Members: Jimit Bhalavat, Logan Cuccia, Kyle Cummings, Mike Peyton

### Description

Chicago was once a pillar of prosperity and trade where companies housed their headquarters, where business partners formed coalitions, and where impactful business decisions were made. Today, it is still a major capital for trade, and some areas are very prosperous. Other sectors are rampant with crime, murder, theft, break-ins, and kidnappings. The crimes within some Chicago districts are so high that officers often don’t know whether they will return home that day to their families. Although there are countless factors contributing to the growing crime rate and law enforcement procedures, crimes are overwhelmingly complex and have to do with politics, state, and local government actions. The question arises how has crime changed over the years? Is it possible to predict where or when a crime will be committed? Which areas of the city have evolved over this time span?

The goal of this project is to tackle how we can use data from these crimes to potentially stop them from becoming so frequent. It will also provide data summarizations such as the location and frequency of the crimes, and the nature of the crimes which can ultimately help mitigate crimes and help law enforcement agencies pinpoint where these crimes are more likely to occur, and which districts should have a higher abundance of police officers. This information can not only save the lives of citizens but also police officers who risk their lives every day to better their communities. It also ensures that these criminals have a higher likelihood of being caught since so many crimes in these districts go unsolved. The dataset contains attributes such as date, coordinates of the incident, district, the nature of the crime, etc. We will attempt to use these attributes to generate summarizations that will assist in the fight against crime as well as assist in the city’s economic future. 

### Goal of the Project

In order to solve this problem and help aid law enforcement agencies in Chicago, we will aim to analyze this dataset and run a variety of different processing techniques and models in order to interpret results and answer the questions mentioned above. This dataset is large and contains various attributes, so it is complex to analyze through normal methods. In order to process this data effectively and efficiently, we will use MapReduce to process the data and provide numerical summarizations such as the accuracy rates of prediction through Neural Networks, Random Forests, and K Means clustering. Other numerical summarizations will include the proportions of arrests made by the crimes committed, whether the trends in crimes have increased over the years, and if yes, what crimes are more likely to be committed. 


In this project, we aim to analyze the relationship between different types of crimes and the location they happened. In our research, we also aim to analyze the statistics of a few specific crimes: theft, homicide, and sexual harassment, and whether these crimes have declined or increased over the years. We aim to use a few of the packages available in Python3 such as Pandas (Dataframe), Numpy (Math), Seaborn and Matplotlib (Data Visualization), and Sklearn (Algorithms). The framework we plan to use is Hadoop or Spark. 


### Data Description

The dataset comes from Kaggle and reflects the reported incidents of crime in the City of Chicago from 2001 to 2017. The set excludes murders where data exists for each victim. This dataset is withdrawn from the Chicago Police Department’s Citizen Law Enforcement Analysis and Reporting System. The entire dataset spans around 2GB. In order to span the 16 years of data, the entire dataset has been broken down into four different CSV files. The entries of the files have attributes such as a unique identifier, block where the incident occurred, description of the location where the incident occurred, whether the incident was domestic-related, police district where the incident occurred, latitude, and longitude of the incident, to name a few. To protect the privacy of crime victims, the exact addresses of the crimes are not shared, but the location data included will allow us to pinpoint the general location of the crime. We can use the description attributes to categorize what type of crime was committed, and also filter based on other attributes such as whether an arrest was made, or whether the case was domestic.

In [28]:
import numpy as np
import pandas as pd
import matplotlib
from datetime import datetime

### Data Processing

In [20]:
dt_string = "09/05/2015 01:30:00 PM"

# Considering date is in dd/mm/yyyy format
dt_object1 = datetime.strptime(dt_string, "%d/%m/%Y %H:%M:%S %p")
print("dt_object1 =", dt_object1)

dt_object1 = 2015-05-09 01:30:00


In [29]:
crimes_temp = pd.read_csv("../Crimes_-_2001_to_Present.csv")
crimes_temp.head()

Unnamed: 0,ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,...,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location
0,10224738,HY411648,09/05/2015 01:30:00 PM,043XX S WOOD ST,486,BATTERY,DOMESTIC BATTERY SIMPLE,RESIDENCE,False,True,...,12.0,61.0,08B,1165074.0,1875917.0,2015,02/10/2018 03:50:01 PM,41.815117,-87.67,"(41.815117282, -87.669999562)"
1,10224739,HY411615,09/04/2015 11:30:00 AM,008XX N CENTRAL AVE,870,THEFT,POCKET-PICKING,CTA BUS,False,False,...,29.0,25.0,06,1138875.0,1904869.0,2015,02/10/2018 03:50:01 PM,41.89508,-87.7654,"(41.895080471, -87.765400451)"
2,11646166,JC213529,09/01/2018 12:01:00 AM,082XX S INGLESIDE AVE,810,THEFT,OVER $500,RESIDENCE,False,True,...,8.0,44.0,06,,,2018,04/06/2019 04:04:43 PM,,,
3,10224740,HY411595,09/05/2015 12:45:00 PM,035XX W BARRY AVE,2023,NARCOTICS,POSS: HEROIN(BRN/TAN),SIDEWALK,True,False,...,35.0,21.0,18,1152037.0,1920384.0,2015,02/10/2018 03:50:01 PM,41.937406,-87.71665,"(41.937405765, -87.716649687)"
4,10224741,HY411610,09/05/2015 01:00:00 PM,0000X N LARAMIE AVE,560,ASSAULT,SIMPLE,APARTMENT,False,True,...,28.0,25.0,08A,1141706.0,1900086.0,2015,02/10/2018 03:50:01 PM,41.881903,-87.755121,"(41.881903443, -87.755121152)"


In [30]:
crimes = crimes_temp.drop(['IUCR', 'FBI Code', 'X Coordinate','Y Coordinate', 'Updated On', 'Location', 'Location Description'], axis = 1)
crimes = crimes.dropna()
crimes.isnull().sum()
crimes

Unnamed: 0,ID,Case Number,Date,Block,Primary Type,Description,Arrest,Domestic,Beat,District,Ward,Community Area,Year,Latitude,Longitude
0,10224738,HY411648,09/05/2015 01:30:00 PM,043XX S WOOD ST,BATTERY,DOMESTIC BATTERY SIMPLE,False,True,924,9.0,12.0,61.0,2015,41.815117,-87.670000
1,10224739,HY411615,09/04/2015 11:30:00 AM,008XX N CENTRAL AVE,THEFT,POCKET-PICKING,False,False,1511,15.0,29.0,25.0,2015,41.895080,-87.765400
3,10224740,HY411595,09/05/2015 12:45:00 PM,035XX W BARRY AVE,NARCOTICS,POSS: HEROIN(BRN/TAN),True,False,1412,14.0,35.0,21.0,2015,41.937406,-87.716650
4,10224741,HY411610,09/05/2015 01:00:00 PM,0000X N LARAMIE AVE,ASSAULT,SIMPLE,False,True,1522,15.0,28.0,25.0,2015,41.881903,-87.755121
5,10224742,HY411435,09/05/2015 10:55:00 AM,082XX S LOOMIS BLVD,BURGLARY,FORCIBLE ENTRY,False,False,614,6.0,21.0,71.0,2015,41.744379,-87.658431
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
7425864,12518746,JE417839,10/20/2021 10:30:00 PM,016XX S CHRISTIANA AVE,CRIMINAL DAMAGE,TO PROPERTY,False,False,1021,10.0,24.0,29.0,2021,41.857708,-87.709076
7425865,12518119,JE416970,10/20/2021 09:00:00 AM,061XX S WHIPPLE ST,OTHER OFFENSE,HARASSMENT BY TELEPHONE,False,False,823,8.0,16.0,66.0,2021,41.782172,-87.699569
7425866,12521771,JE421470,10/19/2021 06:30:00 AM,064XX W 60TH ST,OTHER OFFENSE,TELEPHONE THREAT,False,True,812,8.0,13.0,64.0,2021,41.783149,-87.783244
7425867,12517772,JE416803,10/20/2021 09:30:00 AM,044XX W BELDEN AVE,THEFT,$500 AND UNDER,False,True,2522,25.0,31.0,20.0,2021,41.922532,-87.736944


In [44]:
crimes['Description'] = crimes['Description'].str.replace(',', '/')

### Map Reduce

### Neural Networks

### K-Means Clustering

### Random Forests