# Predicting Crime Rate in Chicago using Facebook Prophet

**Author** : Rahul Bordoloi






PROBLEM STATEMENT

- The Chicago Crime dataset contains a summary of the reported crimes occurred in the City of Chicago from 2001 to 2017. 
- Dataset has been obtained from the Chicago Police Department's CLEAR (Citizen Law Enforcement Analysis and Reporting) system.
- Dataset contains the following columns: 
    - ID: Unique identifier for the record.
    - Case Number: The Chicago Police Department RD Number (Records Division Number), which is unique to the incident.
    - Date: Date when the incident occurred.
    - Block: address where the incident occurred
    - IUCR: The Illinois Unifrom Crime Reporting code.
    - Primary Type: The primary description of the IUCR code.
    - Description: The secondary description of the IUCR code, a subcategory of the primary description.
    - Location Description: Description of the location where the incident occurred.
    - Arrest: Indicates whether an arrest was made.
    - Domestic: Indicates whether the incident was domestic-related as defined by the Illinois Domestic Violence Act.
    - Beat: Indicates the beat where the incident occurred. A beat is the smallest police geographic area – each beat has a dedicated police beat car. 
    - District: Indicates the police district where the incident occurred. 
    - Ward: The ward (City Council district) where the incident occurred. 
    - Community Area: Indicates the community area where the incident occurred. Chicago has 77 community areas. 
    - FBI Code: Indicates the crime classification as outlined in the FBI's National Incident-Based Reporting System (NIBRS). 
    - X Coordinate: The x coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. 
    - Y Coordinate: The y coordinate of the location where the incident occurred in State Plane Illinois East NAD 1983 projection. 
    - Year: Year the incident occurred.
    - Updated On: Date and time the record was last updated.
    - Latitude: The latitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.
    - Longitude: The longitude of the location where the incident occurred. This location is shifted from the actual location for partial redaction but falls on the same block.
    - Location: The location where the incident occurred in a format that allows for creation of maps and other geographic operations on this data portal. This location is shifted from the actual location for partial redaction but falls on the same block.
- Datasource: https://www.kaggle.com/currie32/crimes-in-chicago

# About Facebook Prophet

- Prophet is open source software released by Facebook’s Core Data Science team.

- Prophet is a procedure for forecasting time series data based on an additive model where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects. 

- Prophet works best with time series that have strong seasonal effects and several seasons of historical data. 

- For more information, check out: https://research.fb.com/prophet-forecasting-at-scale/
https://facebook.github.io/prophet/docs/quick_start.html#python-api


# Importing the Libraries and Dataset

In [None]:
import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt
import random
import seaborn as sns
from fbprophet import Prophet     #facebook prophet package

In [None]:
# finding out Kaggle cwd
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

In [None]:
chicago_df_1 = pd.read_csv('/kaggle/input/crimes-in-chicago/Chicago_Crimes_2005_to_2007.csv', error_bad_lines=False)
chicago_df_2 = pd.read_csv('/kaggle/input/crimes-in-chicago/Chicago_Crimes_2008_to_2011.csv', error_bad_lines=False)
chicago_df_3 = pd.read_csv('/kaggle/input/crimes-in-chicago/Chicago_Crimes_2012_to_2017.csv', error_bad_lines=False)
# error_bad_lines are used to ignore rows that are corrupted

In [None]:
#concatenating all the datasets together
chicago_df = pd.concat([chicago_df_1, chicago_df_2, chicago_df_3], ignore_index=False, axis=0)

In [None]:
chicago_df.shape

# Exploring the Dataset

In [None]:
chicago_df.head()

In [None]:
chicago_df.tail(20)

In [None]:
#visualizing and observing the null elements in the dataset
plt.figure(figsize=(10,10))
sns.heatmap(chicago_df.isnull(), cbar = False, cmap = 'YlGnBu')   #ploting missing data #cbar, cmap = colour bar, colour map

In [None]:
# Dropping the following columns: ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic Beat District Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude Longitude Location
chicago_df.drop(['Unnamed: 0', 'Case Number', 'Case Number', 'IUCR', 'X Coordinate', 'Y Coordinate','Updated On','Year', 'FBI Code', 'Beat','Ward','Community Area', 'Location', 'District', 'Latitude' , 'Longitude'], inplace=True, axis=1)

In [None]:
chicago_df

In [None]:
#assembling a datetime by rearranging the dataframe column "Date" converting it to date-time format
chicago_df.Date = pd.to_datetime(chicago_df.Date, format='%m/%d/%Y %I:%M:%S %p')  #I-Hour %p-AM/PM

In [None]:
chicago_df.Date 

In [None]:
# setting the index to be the date-time column 
chicago_df.index = pd.DatetimeIndex(chicago_df.Date)

In [None]:
#counting all the no of elements within a specific column 'Primary Type'
chicago_df['Primary Type'].value_counts()

In [None]:
#top 15 cases
chicago_df['Primary Type'].value_counts().iloc[:15]

In [None]:
#indices of the top 15 cases
order_data = chicago_df['Primary Type'].value_counts().iloc[:15].index

In [None]:
#plotting a bar plot for the top 15 cases
plt.figure(figsize=(15,10))
sns.countplot(y='Primary Type', data=chicago_df, order = order_data)

In [None]:
#Locations where the crimes happened
plt.figure(figsize = (15, 10))
sns.countplot(y= 'Location Description', data = chicago_df, order = chicago_df['Location Description'].value_counts().iloc[:15].index)

In [None]:
#count the no of crimes occuring in a particular year
chicago_df.resample('Y').size()
#resample is a convenience method for frequency conversion and resampling of time series. 

In [None]:
#plotting crimmes occuring each year vs no. of crimes happening in that year
plt.plot(chicago_df.resample('Y').size())
plt.title('Crimes Count Per Year')
plt.xlabel('Years')
plt.ylabel('Number of Crimes')

In [None]:
chicago_df.resample('M').size()         #over the period of 'M' Months

In [None]:
plt.plot(chicago_df.resample('M').size())
plt.title('Crimes Count Per Month')
plt.xlabel('Months')
plt.ylabel('Number of Crimes')

In [None]:
chicago_df.resample('Q').size()           #over the period of 'Q' Quaters

In [None]:
plt.plot(chicago_df.resample('Q').size())
plt.title('Crimes Count Per Quarter')
plt.xlabel('Quarters')
plt.ylabel('Number of Crimes')

# Preparing the Data for Prophet

In [None]:
#performing quality set index
chicago_prophet = chicago_df.resample('M').size().reset_index()

In [None]:
chicago_prophet

In [None]:
chicago_prophet.columns = ['Date', 'Crime Count']

In [None]:
chicago_prophet

In [None]:
chicago_prophet_df = pd.DataFrame(chicago_prophet)

In [None]:
chicago_prophet_df

# Making Future Predictions using Prophet

In [None]:
chicago_prophet_df.columns

In [None]:
#renaming the columns into 'ds' and 'y' format for facebook prophet,
#formatting in 'M' for implementation
chicago_prophet_df_final = chicago_prophet_df.rename(columns={'Date':'ds', 'Crime Count':'y'})

In [None]:
chicago_prophet_df_final

In [None]:
#instantiating prophet object
m = Prophet()
m.fit(chicago_prophet_df_final)

In [None]:
#forcasting into the future
future = m.make_future_dataframe(periods=720)  #periods = no. of days for prediction
forecast = m.predict(future)

In [None]:
forecast

In [None]:
#visualizing future results
figure = m.plot(forecast, xlabel='Date', ylabel='Crime Rate')

In [None]:
#expected trend in the future
figure3 = m.plot_components(forecast)

# End