# Predicting NYPD Complaint Response Times

Courtney Cheung, John Driscoll

## Summary of Findings


### Introduction
We are building a regression model to predict complaint response time (time between when the complaint was received and when it was closed). We chose complaint response time because we believe that it is likely influenced by ethnicity and gender, and would like to test this theory. We are using RMSE as the metric to evaluate the model because it is easier to interpret than an R-squared value, as the units of RMSE would be the same as the dependent variable. All of the columns in the dataset represent features that we would know at the time of prediction, except for month_closed, year_closed, rank_abbrev_now, rank_now, outcome_description, and board_disposition. These columns are excluded because the information would not have been known until the complaints had been closed. We cleaned this dataframe by dropping the excluded columns, and created new columns to represent complain response time and number of allegations based on complaint id. We also imputed the null values in the dataset.

### Baseline Model
We used the following features: complainant_ethnicity and complainant_gender. This means that we used 2 nominal features in our baseline model. We one-hot-encoded the nominal features. Our model had an RMSE of 146.02661001292563 on the testing data, so we do not believe it is a good model. 

### Final Model
We added the rank_incident, fado_type, and complainant_age_incident features. They are good for the data and predicting complaint response time because they decreased the RMSE of our model to 139.6725824543701 on the testing data. We chose 2 to be the best degree hyperparameter for PolynomialFeatures by looping through hyperparameters of 0-4. From 0-2, the RMSE decreased, and from 3-4, the RMSE started increasing due to overfitting.

### Fairness Analysis
We performed a permutation test to test the following:

Null Hypothesis: Our model is fair. The difference in RMSE for young and old complainants is roughly the same, and any differences are due to random chance.

Alternative Hypothesis: Our model is unfair. The observed difference in RMSE is less than 0, meaning that our model performs worse on young people, is not due to random chance.

Based on our p-value of 0.02, there is significant evidence in favor of our alternative hypothesis. We reject the null and conclude that the difference in RMSE is not due to random chance. Therefore, our model is NOT fair. **Our model performs worse for people who are young.**

## Code

In [1]:
#imports
import matplotlib.pyplot as plt
import numpy as np
import os
import pandas as pd
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import FunctionTransformer
from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.metrics import mean_squared_error

%config InlineBackend.figure_format = 'retina'  # Higher resolution figures

In [2]:
#raw data
unclean_complaints = pd.read_csv('allegations_202007271729.csv')
unclean_complaints.head(3)

Unnamed: 0,unique_mos_id,first_name,last_name,command_now,shield_no,complaint_id,month_received,year_received,month_closed,year_closed,...,mos_age_incident,complainant_ethnicity,complainant_gender,complainant_age_incident,fado_type,allegation,precinct,contact_reason,outcome_description,board_disposition
0,10004,Jonathan,Ruiz,078 PCT,8409,42835,7,2019,5,2020,...,32,Black,Female,38.0,Abuse of Authority,Failure to provide RTKA card,78.0,Report-domestic dispute,No arrest made or summons issued,Substantiated (Command Lvl Instructions)
1,10007,John,Sears,078 PCT,5952,24601,11,2011,8,2012,...,24,Black,Male,26.0,Discourtesy,Action,67.0,Moving violation,Moving violation summons issued,Substantiated (Charges)
2,10007,John,Sears,078 PCT,5952,24601,11,2011,8,2012,...,24,Black,Male,26.0,Offensive Language,Race,67.0,Moving violation,Moving violation summons issued,Substantiated (Charges)


We cleaned the dataframe by identifying null values, adding the response time column, and dropping unnecessary columns that represent features that would not have been known at the time of prediction.

In [3]:
complaints = unclean_complaints.copy()
#cleaning missingness of ethnicity
complaints['complainant_ethnicity'] = complaints['complainant_ethnicity'].replace(['Unknown', 'Refused'], np.NaN)
#cleaning missingness of gender
complaints['complainant_gender'] = complaints['complainant_gender'].replace('Not described', np.NaN)
#convert year/month recieved/closed into timestamp columns
complaints['time_received'] = complaints.apply(axis=1, func=lambda x: pd.Timestamp(year=x['year_received'], month=x['month_received'], day=1))
complaints.drop(columns=['month_received', 'year_received'], inplace=True)
complaints['time_closed'] = complaints.apply(axis=1, func=lambda x: pd.Timestamp(year=x['year_closed'], month=x['month_closed'], day=1))
complaints.drop(columns=['month_closed', 'year_closed'], inplace=True)
#subtract recieved/closed to find length of commplaint
complaints['complaint response time'] = (complaints['time_closed'] - complaints['time_received'])
#dropping unnecessary columns
complaints = complaints.drop(columns=['rank_abbrev_now', 'rank_now', 'outcome_description', 'board_disposition', 'time_received', 'time_closed'])

### Baseline Model

We cleaned the dataframe futher by imputing null age values.

In the following steps, we used complainant_ethnicity, complainant_gender to predict complaint response time.

In [4]:
#defining X and y, and splitting the data into a training and test set
X = complaints[['complainant_ethnicity', 
                'complainant_gender']].fillna('NA')
y = complaints['complaint response time'].astype(str).str.split(' ').str[0].astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [5]:
#transforming the categorical columns using one hot encoding
preproc = ColumnTransformer([
    ('one-hot', OneHotEncoder(drop='first'), [
                'complainant_ethnicity', 
                'complainant_gender']),
], remainder='passthrough')

In [6]:
#constructing pipeline using the preprocesing and linear regression
pl = Pipeline([
    ('preprocessor', preproc),
    ('lin-reg', LinearRegression())
])

In [7]:
#fitting the model
pl.fit(X_train,y_train)

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(remainder='passthrough',
                                   transformers=[('one-hot',
                                                  OneHotEncoder(drop='first'),
                                                  ['complainant_ethnicity',
                                                   'complainant_gender'])])),
                ('lin-reg', LinearRegression())])

In [8]:
#predicting using the model
pl.predict(X_test)

array([322.03443834, 322.03443834, 322.03443834, ..., 289.42114425,
       191.63016639, 191.63016639])

In [9]:
#defining a function to calcualte root mean squared error
def rmse(actual, pred):
    return np.sqrt(np.mean((actual - pred) ** 2))

In [10]:
pl.score(X_test, y_test)

0.07626191232510116

In [11]:
#calculating rmse to evaluate the model
rmse(y_test, pl.predict(X_test))

146.02661001292563

### Final Model

We improved upon the baseline model by adding the 'rank_incident', 'fado_type', and 'complainant_age_incident' features to the model and increasing the degree of our regression.

To account for complainant_age_incident, we first imputed the null values using probablistic imputation.

In [12]:
#probablistic imputation of age column
num_null = complaints['complainant_age_incident'].isna().sum()
fill_values = complaints['complainant_age_incident'].dropna().sample(num_null, replace=True)
fill_values.index = complaints.loc[complaints['complainant_age_incident'].isna()].index
ages_filled = complaints.fillna({'complainant_age_incident': fill_values.to_dict()})
ages_filled.head()

Unnamed: 0,unique_mos_id,first_name,last_name,command_now,shield_no,complaint_id,command_at_incident,rank_abbrev_incident,rank_incident,mos_ethnicity,mos_gender,mos_age_incident,complainant_ethnicity,complainant_gender,complainant_age_incident,fado_type,allegation,precinct,contact_reason,complaint response time
0,10004,Jonathan,Ruiz,078 PCT,8409,42835,078 PCT,POM,Police Officer,Hispanic,M,32,Black,Female,38.0,Abuse of Authority,Failure to provide RTKA card,78.0,Report-domestic dispute,305 days
1,10007,John,Sears,078 PCT,5952,24601,PBBS,POM,Police Officer,White,M,24,Black,Male,26.0,Discourtesy,Action,67.0,Moving violation,274 days
2,10007,John,Sears,078 PCT,5952,24601,PBBS,POM,Police Officer,White,M,24,Black,Male,26.0,Offensive Language,Race,67.0,Moving violation,274 days
3,10007,John,Sears,078 PCT,5952,26146,PBBS,POM,Police Officer,White,M,25,Black,Male,45.0,Abuse of Authority,Question,67.0,PD suspected C/V of violation/crime - street,427 days
4,10009,Noemi,Sierra,078 PCT,24058,40253,078 PCT,POF,Police Officer,Hispanic,F,39,,,16.0,Force,Physical force,67.0,Report-dispute,184 days


In [13]:
#splitting the data into training and test set
X = ages_filled[['rank_incident', 'fado_type','complainant_ethnicity', 'complainant_gender', 'complainant_age_incident']].fillna('NA')
y = complaints['complaint response time']
y = y.astype(str).str.split(' ').str[0].astype(int)
X_train, X_test, y_train, y_test = train_test_split(X, y)

In [14]:
from sklearn.preprocessing import PolynomialFeatures

In [15]:
#one hot encoding the categorical columns
preproc = ColumnTransformer([
    ('one-hot', OneHotEncoder(drop='first'), ['rank_incident', 'fado_type', 'complainant_ethnicity', 'complainant_gender']),
], remainder='passthrough')

We found the best hyperparameter for PolynormialFeatures by testing different degrees.

In [16]:
#looping through polynomial degrees
for i in range(5):
    pl = Pipeline([
        ('preprocessor', preproc),
        ('poly', PolynomialFeatures(i)),
        ('lin-reg', LinearRegression())
    ])
    pl.fit(X_train,y_train)
    pred = pl.predict(X_test)
    #printing RMSE
    print(rmse(y_test, pred))

143.8302355861691
143.8302355861691
139.6725824543701
146.30289648652195
150.0897422488692


The degree that produced the lowest RMSE is 2, so we fit the improved model using a degree 2 polynomial regression.

In [17]:
#saving the improved model
pl = Pipeline([
        ('preprocessor', preproc),
        ('poly', PolynomialFeatures(2)),
        ('lin-reg', LinearRegression())
    ])
pl.fit(X_train,y_train)
improved_pl = pl

### Fairness Analysis

In the following steps, we binarized complainant ages.

In [18]:
#imports
from sklearn.preprocessing import Binarizer, QuantileTransformer, FunctionTransformer

In [19]:
#finding threshold
ages_filled['complainant_age_incident'].mean()

32.52134420528809

In [20]:
#adding binarized column
transformer = Binarizer(threshold =32)
ages_filled['binarized_complainant_ages']=transformer.transform(ages_filled[['complainant_age_incident']])
ages_filled

Unnamed: 0,unique_mos_id,first_name,last_name,command_now,shield_no,complaint_id,command_at_incident,rank_abbrev_incident,rank_incident,mos_ethnicity,...,mos_age_incident,complainant_ethnicity,complainant_gender,complainant_age_incident,fado_type,allegation,precinct,contact_reason,complaint response time,binarized_complainant_ages
0,10004,Jonathan,Ruiz,078 PCT,8409,42835,078 PCT,POM,Police Officer,Hispanic,...,32,Black,Female,38.0,Abuse of Authority,Failure to provide RTKA card,78.0,Report-domestic dispute,305 days,1.0
1,10007,John,Sears,078 PCT,5952,24601,PBBS,POM,Police Officer,White,...,24,Black,Male,26.0,Discourtesy,Action,67.0,Moving violation,274 days,0.0
2,10007,John,Sears,078 PCT,5952,24601,PBBS,POM,Police Officer,White,...,24,Black,Male,26.0,Offensive Language,Race,67.0,Moving violation,274 days,0.0
3,10007,John,Sears,078 PCT,5952,26146,PBBS,POM,Police Officer,White,...,25,Black,Male,45.0,Abuse of Authority,Question,67.0,PD suspected C/V of violation/crime - street,427 days,1.0
4,10009,Noemi,Sierra,078 PCT,24058,40253,078 PCT,POF,Police Officer,Hispanic,...,39,,,16.0,Force,Physical force,67.0,Report-dispute,184 days,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33353,9992,Tomasz,Pulawski,078 PCT,2642,35671,066 PCT,POM,Police Officer,White,...,36,Asian,Male,21.0,Discourtesy,Word,66.0,Moving violation,184 days,0.0
33354,9992,Tomasz,Pulawski,078 PCT,2642,35671,066 PCT,POM,Police Officer,White,...,36,Asian,Male,21.0,Abuse of Authority,Interference with recording,66.0,Moving violation,184 days,0.0
33355,9992,Tomasz,Pulawski,078 PCT,2642,35671,066 PCT,POM,Police Officer,White,...,36,Asian,Male,21.0,Abuse of Authority,Search (of person),66.0,Moving violation,184 days,0.0
33356,9992,Tomasz,Pulawski,078 PCT,2642,35671,066 PCT,POM,Police Officer,White,...,36,Asian,Male,21.0,Abuse of Authority,Vehicle search,66.0,Moving violation,184 days,0.0


Null Hypothesis: Our model is fair. The difference in RMSE for young and old complainants is roughly the same, and any differences are due to random chance.

Alternative Hypothesis: Our model is unfair. The observed difference in RMSE is less than 0, meaning that our model performs worse on young people, is not due to random chance.

In [21]:
#defining function to caluclate test statistics
def rmse_diff(ages_filled):
    young = ages_filled[ages_filled['binarized_complainant_ages'] == 0]
    old = ages_filled[ages_filled['binarized_complainant_ages'] == 1]
    young_X = young[['rank_incident', 'fado_type','complainant_ethnicity', 'complainant_gender', 'complainant_age_incident']].fillna('NA')
    young_y = young['complaint response time'].astype(str).str.split(' ').str[0].astype(int)
    old_X = old[['rank_incident', 'fado_type','complainant_ethnicity', 'complainant_gender', 'complainant_age_incident']].fillna('NA')
    old_y = old['complaint response time'].astype(str).str.split(' ').str[0].astype(int)

    pl.fit(young_X,young_y)
    young_rmse = rmse(pl.predict(young_X), young_y)
    pl.fit(old_X,old_y)
    old_rmse = rmse(pl.predict(old_X), old_y)
    difference = old_rmse - young_rmse
    return difference

In [22]:
#calculating the observed test statistic
observed = rmse_diff(ages_filled)
observed

-5.657549038314954

In [23]:
#running 100 permutation tests
sample = []
for _ in range(100):
    ages_filled['binarized_complainant_ages'] = ages_filled['binarized_complainant_ages'].sample(ages_filled.shape[0]).reset_index(drop=True)
    sample.append(rmse_diff(ages_filled))
p_value = np.mean(sample <= observed)
p_value

0.02

Based on our p-value of 0.02, there is significant evidence in favor of our alternative hypothesis. We reject the null and conclude that the difference in RMSE is not due to random chance. Therefore, our model is NOT fair. **Our model performs worse for people who are young.**