# Introduction

This analysis seeks to bring understanding to data regarding police interactions known as Terry stops. According to [Merriam-Webster](https://www.merriam-webster.com/legal/Terry%20stop), a Terry stop is "a stop and limited search of a person for weapons justified by a police officer's reasonable conclusion that a crime is being or about to be committed by a person who may be armed and whose responses to questioning do not dispel the officer's fear of danger to the officer or to others." 

## Data Sources

The city of Seattle [provides](https://data.seattle.gov/Public-Safety/Terry-Stops/28ny-9ts8) substantial publicly available data about these encounters. There are over 47,000 records spanning a period from 2015 to 2021. It includes 23 different features, including topics such as race, gender, age, location, call type, and the final resolution of the stop -- whether it ended in arrest or citation, for example.

## The Process
This analysis will follow the general structure listed here:
1. Setup and Data Import
2. Data Cleaning
3. Feature Engineering
4. Graphical Exploratory Data Analysis (EDA)
5. Feature Selection
6. Modeling
7. Results Interpretation
8. Conclusion

### Additional Notes
This notebook provides a somewhat condensed analysis compared to the full sequence necessary to understand the all details of choosing specific models and the nitty-gritty details of feature selection. Please refer to the EDA notebook in this folder for that analysis.

## Part I: Setup and Data Import
**Import relevant packages**

In [4]:
# Basic packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import pickle
import os, sys
from datetime import time

# Data manipulation packages
from custom_functions import *
from imblearn.over_sampling import SMOTE
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split

# Modeling packages
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_selector, make_column_transformer
from sklearn.model_selection import KFold, GridSearchCV
from imblearn.pipeline import Pipeline
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from xgboost import XGBClassifier

# Model evaluation packages
import shap
from sklearn.metrics import f1_score, plot_confusion_matrix

# Basic settings for an easier to use notebook
%load_ext autoreload
%autoreload 2
%matplotlib inline
pd.set_option('display.max_columns', 100)

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


**Declare static variables used throughout notebook**

In [5]:
# Global static variable declarations used throughout notebook
RANDOM_STATE = 0
raw_terry_path = os.path.join('..', 'data', 'raw', 'terry-stops.csv')
processed_data_path = os.path.join('..', 'data', 'processed')
UNKNOWN = 'Not provided'
N_SPLITS = 3
JOBS = 2

**Import full dataset from *data* folder**

In [6]:
df = pd.read_csv(raw_terry_path, dtype='str')

# Strip spaces, as relevant
for col in df.columns:
    df[col] = df[col].str.strip()

# Part II: Data Cleaning