# **Seattle Terry Stop Data Analysis**

##### Author: Spencer Hadel
***
### Overview

Recent tensions in the United States have led to a mistrust of police forces across the country, particularly due to the increasing strength of movements such as Black Lives Matter, and increased cultural attention to the racial and ethnic disparity in many facets of life. There is increasing focus on the scope of what police officer's are legally able to do, and whether they use this right fairly.

One such disparity has been observed in Terry Stops (also known as 'stop-and-frisks'), when a police officer uses theur right to legally temporarily detain a person based on 'reasonable suspsicion' that the person may be involved in criminal activity. The officer has the right to physically 'frisk' the subject, and take whatever action they feel is necessary properly handle the situation.

The newly elected mayor of Seattle campaigned on a platform of police reform, and has hired our agency to analyze, test, and interpret the current Seattle police department's Terry Stop data, so that their selected Chief of Police can make meaningful changes to the system as it stands.

### Data
This analysis utilizes about 52,000 data entries of Seattly Terry Stops ([from data.seattle.gov](https://data.seattle.gov/Public-Safety/Terry-Stops/28ny-9ts8)), in the file [Terry_Stops.csv](./data/Terry_Stops.csv). This data has been collected from 2015 until the present. A deeper explanation of this dataset and how it was cleaned can be found in the [First Notebook](./nb_1-terry_data_cleaning_analysis.ipynb).


***

# Part 3: Target and Feature Selection

So far, we have only run classifaction models on one set of features seeking to find one target variable. Going forward, we will test a few different arrangements of features and targets.

In [1]:
#get data
from imports import *
df = pd.read_csv('data/cleaned_df.csv')

#create cummy variables from categorical data
dummies_df = pd.get_dummies(df, drop_first=True)
dummies_df.columns = dummies_df.columns.str.replace(' ','_')
#dummies_df.info()

There are a large number of potential features in this dataset, as well as 3 potential targets: Arrested, Legal_Action_Taken, and Physical_Arrest. In the [previous notebook](./terry_models.ipynb) we only attempted to find Physical Arrests using all the data that existed.

The first thing we do is see if there are better results with each of our other target variables, using Machine Learning Pipelines to streamline the workflow.

We will once again test our data with and without SMOTE since the change in target variables has changed how imbalanced the data is (in this case, it is LESS imbalanced).

In [2]:
#initialize classifiers
clf_forest = RandomForestClassifier(random_state=42)
clf_logreg = LogisticRegression(random_state=42, max_iter=1000)
clf_xgb = XGBClassifier(random_state=42)

#parameters for each classifier
logreg_param = {}
logreg_param['classifier__C'] = np.logspace(-2, 2, 10)
logreg_param['classifier__penalty'] = ['l1', 'l2']
logreg_param['classifier'] = [clf_logreg]


forest_param = {}
forest_param['classifier__n_estimators'] = [10,100,1000]
forest_param['classifier__max_depth'] = [None, 3, 4, 10]
forest_param['classifier__max_features'] = ['sqrt', 'log2', 2, 5, 10],
forest_param['classifier'] = [clf_forest]

xgb_param = {}
xgb_param['classifier__n_estimators'] = [10,100,1000]
xgb_param['classifier__learning_rate'] = [0.001, 0.01, 0.1]
xgb_param['classifier__subsample'] =  [0.7]
xgb_param['classifier__max_depth'] = [9]
xgb_param['classifier'] = [clf_xgb]

In [3]:
#create pipelines (SMOTE and non SMOTE)
pipeline = imbpipeline([('classifier', LogisticRegression())])

smote_pipeline = imbpipeline([('sm', SMOTE(random_state = 42)),('classifier', LogisticRegression())])

### Target: Arrested (Without Synthetic Data)

In [None]:
# remove other targets
dummies_df.drop(['Physical_Arrest', 'Legal_Action_Taken'], axis=1, inplace=True)

#prepare data
X= dummies_df.drop('Arrested', axis=1)
y = dummies_df['Arrested']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)

# new negative and positive description for visualizations and scoring
neg = 'Not Arrested'
pos = 'Arrested'

#use params established earlier
params = [logreg_param, forest_param, xgb_param]

#create and fit data using GridSearch and pipeline
arrested_cv = GridSearchCV(pipeline, params, cv=3, n_jobs=-1, verbose=True, scoring='f1')
arrested_cv.fit(X_train, y_train)

Fitting 3 folds for each of 41 candidates, totalling 123 fits


In [None]:
#best parameters as decided by gridsearch
arrested_cv.best_params_

# {'classifier': XGBClassifier()
#  'classifier__learning_rate': 0.1,
#  'classifier__max_depth': 9,
#  'classifier__n_estimators': 1000,
#  'classifier__subsample': 0.7}

In [None]:
#view scores
clf_scores(arrested_cv, X_train, X_test, y_train, y_test, neg, pos)

# Train Data:                                 Test Data:
# Accuracy:  0.8092303758629507               Accuracy:  0.7264708138375393
# Recall:    0.30616488200436637              Recall:    0.1322107888992828
# Precision: 0.7891211146838156               Precision: 0.3512841756420878
# F1:        0.4411654557711033               F1:        0.19211599456275485

#### Performance: 
# WHATTTTTTTTTTTTTTTTTTT#

### Target: Arrested (With Synthetic Data)

In [None]:
#repeat with SMOTE pipeline

arrested_smote_cv = GridSearchCV(smote_pipeline, params, cv=3, n_jobs=-1, verbose=True, scoring='f1')
arrested_smote_cv.fit(X_train, y_train)

In [None]:
arrested_smote_cv.best_params_

# {'classifier': XGBClassifier()
#  'classifier__learning_rate': 0.1,
#  'classifier__max_depth': 9,
#  'classifier__n_estimators': 10,
#  'classifier__subsample': 0.7}

In [None]:
clf_scores(arrested_smote_cv, X_train, X_test, y_train, y_test, neg, pos)

# Train Data:                                 Test Data:
# Accuracy:  0.5920225006392227               Accuracy:  0.5724476489990028
# Recall:    0.6382160307724296               Recall:    0.5806049267227938
# Precision: 0.32978780553317216              Precision: 0.3056969298965687
# F1:        0.43486576468088123              F1:        0.40051624005162395

#### Performance: 
# WHATTTTTTTTTTTTTTTTTTT#

### Target: Legal Action Taken (Without Synthetic Data)

In [None]:
#remove other targets
dummies_df = pd.get_dummies(df, drop_first=True)
dummies_df.drop(['Physical_Arrest', 'Arrested'], axis=1, inplace=True)

#new variables for vizualizations and scoring
neg = 'No Legal Action'
pos = 'Legal Action Taken'

#prepare dataset fir bew target
X= dummies_df.drop('Legal_Action_Taken', axis=1)
y = dummies_df['Legal_Action_Taken']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, stratify=y)

#create and fit gridsearch
legal_cv = GridSearchCV(pipeline, params, cv=3, n_jobs=-1, verbose=True, scoring='f1')
legal_cv.fit(X_train, y_train)

In [None]:
legal_cv.best_params_

# {'classifier': XGBClassifier()
#  'classifier__learning_rate': 0.001,
#  'classifier__max_depth': 9,
#  'classifier__n_estimators': 100,
#  'classifier__subsample': 0.7}

In [None]:
clf_scores(legal_cv, X_train, X_test, y_train, y_test, neg, pos)

#### Performance: 
# WHATTTTTTTTTTTTTTTTTTT#

### Target: Legal Action Taken (With Synthetic Data)

In [None]:
#fit new gridsearch using pipeline with SMOTE

legal_smote_cv = GridSearchCV(smote_pipeline, params, cv=3, n_jobs=-1, verbose=True, scoring='f1')
legal_smote_cv.fit(X_train, y_train)

In [None]:
legal_smote_cv.best_params_

# {'classifier': XGBClassifier()
#  'classifier__learning_rate': 0.01,
#  'classifier__max_depth': 9,
#  'classifier__n_estimators': 1000,
#  'classifier__subsample': 0.7}

In [None]:
clf_scores(legal_smote_cv, X_train, X_test, y_train, y_test, neg, pos)

# Train Data:                                 Test Data:
# Accuracy:  0.6465354129378675               Accuracy:  0.5879420111988954
# Recall:    0.6153539293471034               Recall:    0.5597659885653503
# Precision: 0.729570655315571                Precision: 0.6713442832084197
# F1:        0.6676124068285645               F1:        0.6104988399071926

#### Performance: 
# WHATTTTTTTTTTTTTTTTTTT#

Our most successful model so far: XG Boost testing for Legal Action Taken, without synthetic data (legal_cv), has an F1 score of 0.72 and an Accuracy score of 0.60 on the test data. These are far from ideal outcomes, but at least show that the classifier is better trained than previous models.

In [None]:
# get importance
importance = xgb_cv.feature_importances_
# summarize feature importance
for i,v in enumerate(importance):
	print('Feature: %0d, Score: %.5f' % (i,v))
# plot feature importance
pyplot.bar([x for x in range(len(importance))], importance)
pyplot.show()

In [None]:
xgb_cv.feature_importances_

## Feature Analysis

The classification models trained with different target variables had significantly different scores, all much better than when the target variable was simply the "Physical Arrest" column.

So far, however, these models have been trained on every single piece of data in the dataset, with the exception of those that we removed in cleaning. There are two issues with this. 

First, some of this data could be meaningless to our classifiers, or even worse, generating noise that can cause our model to underperform or overfit.

Second, there is a limitation on how much real world information we can take away from training models on the entire dataset. If we want to test the predictions of a model based on Subject demographcis alone, for example, then we would want to remove the demographics for the Officer.

### Checking least important features

Next we want see what features could be generating noise in the data, using Lasso Regression to indicate which features have the lowest coefficients.

In [None]:
#using most recent featureset (no changes to df, X, or y needed)

#create and fit Lasso Regression model
lasso = Lasso(alpha = 0.0001, normalize = False)
lasso = lasso.fit(X_train, y_train)

#lass model prediction on train data
lasso_pred = lasso.predict(X_train)

#take transpose of lasso coefficients
lasso_coef = pd.DataFrame(data=lasso.coef_).T

#rename columns to match features, sort
lasso_coef.columns = X_train.columns
lasso_coef = lasso_coef.T.sort_values(by=0).T

#bar plot of features by importance
lasso_coef.plot(kind='bar', legend=True, figsize=(20,15))

In [None]:
#list of features by importance
lasso_coef.T

We can see by this visualization that specific features are far more impactful on the target, such as "Subject_Age_Group_1 - 17	", and "Subject_Perceived_Race_Unknown".

All the features that have been reduced to 0 have been selected by Lasso as the least significant to the dataset. We will attempt running the model again to see if there is a change in scores.

In [None]:
#show features that became 0 using Lasso
zero_features = lasso_coef.T.index[lasso_coef.T[0] == 0]
zero_features

In [None]:
#remove zeroed features from features
X_train.drop(zero_features, axis=1, inplace=True)
X_test.drop(zero_features, axis=1, inplace=True)

In [None]:
#reduced features classifier using best_params from earlier
best_xgb = XGBClassifier(learning_rate = 0.01, max_depth = 9, n_estimators = 1000, subsample = 0.7, random_state=42)

best_xgb.fit(X_train, y_train)

In [None]:
clf_scores(best_xgb, X_train, X_test, y_train, y_test, neg, pos)

### Explanation of Results

The fact that we did not get any improvement in our model, and in fact caused scores to go down by a small margin, shows that the extraneous data was not causing any issue for our classifier.

Unfortunately, with the current data, it seems unlikely to help our models perform any better with a binary target. This is most likely due to the variability and lack of clarity in very much of the original data. There is clearly potential for more accurate classifiers, but only with more specific data, or perhaps different configurations of the original dataset.

Another potential change that could be made is an adjustment to the type of Classifier. A multiclass or multioutput classifier, or even an unsupervised learning algorithm could find different results, potentially identifying key groups of features that yield different targets.

# Conclusion

# Baseline Etc

# Best

# Problems
## stadnardized, detailed data
## single target

# Solutions
## more standardization
## multi target
## separate analysis of performance per officer based on past records

As is clear from these tests (and as stated above), there are a vast assortment of possibilities that can be investigated in future analyses. The limitations of a binary target classification algorithm make it challenging to adequately find the results we need to make any sort of substantive reports about the data we have, besides the fact that we need more detailed and standardized data.