# Crime Prediction and Inference
The main goal of this model is to make inferences about crime in different sections of the city of Chicago. The secondary goal is to minimize the Mean Squared Error (RMSE) of the final model.   

  
Because of the nonlinear nature of the crimes as shown by the Crimes per Region from 2010 to 2020 line graph in the Data_Exploration notebook, I skipped Ordinary Least Squares Regression and went straight to other algorithms that were better suited to handling nonlinear data.

In [41]:
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import Lasso

In [42]:
crimes = pd.read_csv('Data/crimes_cleaned.csv')

In [43]:
crimes.head()

Unnamed: 0,Date,Primary Type,Arrest,Domestic,Year,Location,Location Description,Community Area,Region,2010 Population,Month,Day of Week,Time of Day,Season
0,2019-04-10 16:37:00,Sex Offense,False,False,2019,"(41.708589, -87.612583094)",School,Roseland,Far Southeast Side,44619.0,4,Wednesday,Afternoon,Spring
1,2019-04-19 13:57:00,Offense Involving Children,False,True,2019,"(41.884865037, -87.755230327)",Residence,Austin,West Side,98514.0,4,Friday,Afternoon,Spring
2,2019-04-12 16:08:00,Offense Involving Children,False,True,2019,"(41.940297617, -87.732066473)",Residence,Irving Park,Northwest Side,53359.0,4,Friday,Afternoon,Spring
3,2019-04-25 17:20:00,Battery,False,True,2019,"(41.697609261, -87.613507612)",Residence,Roseland,Far Southeast Side,44619.0,4,Thursday,Evening,Spring
4,2019-05-13 17:26:00,Assault,False,False,2019,"(41.729973132, -87.653166753)",Street,Washington Heights,Far Southwest Side,26493.0,5,Monday,Evening,Spring


## Splitting into Train, Test, and Validation Sets

In [32]:
#splitting into X and y 
y = crimes['Number of Crimes'].values
X = crimes.iloc[:,1:].values

#creating train,validation, and test sets in order, to avoid 
train_end_idx = int(X.shape[0]*0.70)
val_end_idx = int(X.shape[0]*0.85)

#splitting into train and test sets
X_train = X[:train_end_idx, :]
X_val = X[train_end_idx:val_end_idx, :]
X_test = X[val_end_idx:, :]

y_train = y[:train_end_idx]
y_val = y[train_end_idx:val_end_idx]
y_test = y[val_end_idx:]

## Training Model

In [39]:
#initializing the model
rf = RandomForestRegressor(n_estimators=1000, criterion='mse', random_state=0, oob_score=True)
rf.fit(X_train, y_train)

print("Model R^2: {:.3}".format(rf.oob_score_))

Model R^2: 0.133


In [None]:
#initializing the model
ls = Lasso(alpha=0.3, random_state=0)
ls.fit(X_train, y_train)

print("Model R^2: {:.3}".format(ls.oob_score_))