# Analysis of Disparate Treatment of Demographic Groups by the New Orleans Police Force in 2022
-----------
Hayden Outlaw & Mikey Sison - Spring 2023 - Tulane University


## Introduction
------
### TODO

## Loading Libraries

In [17]:
import pandas as pd
import numpy as np
import os
import sklearn as sk
import time
import seaborn as sns

# force pandas to display all columns globally
pd.set_option('display.max_columns', None)

## Loading Data
----
The Field Interview Card data is sourced from [data.nola.gov](https://data.nola.gov/Public-Safety-and-Preparedness/Stop-and-Search-Field-Interviews-/kitu-f4uy), which is an online third-party resource for data and metrics about the city of New Orleans. This data was originally published by the New Orleans Police Department, and was most recently loaded here on **February 24th, 2023**. This data contains a unique primary key, an NOPD reference item, the district and zone of the interview, the officer's district assignment, a description of the stop, the actions taken, information about vehicles involved if the interview was the result of an incident involving vehicles, information about the subject, and location. Note that the locations involved have been anonymized - the data only contains fidelity to a zip code and a block, and is maintained that way out of privacy concerns. Interview data was also available dating back to at least 1993, albeit sparsely, but this project only focuses on interviews conducted on or after January 1st, 2022.




In [16]:
interview_data = pd.read_csv('../data/Interview_Cards_Filtered.csv')
interview_data.head(3)

Unnamed: 0,FieldInterviewID,NOPD_Item,EventDate,District,Zone,OfficerAssignment,StopDescription,ActionsTaken,VehicleYear,VehicleMake,VehicleModel,VehicleStyle,VehicleColor,SubjectID,SubjectRace,SubjectGender,SubjectAge,SubjectHasPhotoID,SubjectHeight,SubjectWeight,SubjectEyeColor,SubjectHairColor,SubjectDriverLicState,CreatedDateTime,LastModifiedDateTime,Longitude,Latitude,Zip,BlockAddress
0,607431,B2364223,2/23/23 14:26,8,E,8th District,CALL FOR SERVICE,Stop Results: No action taken;Subject Type: Pe...,,,,,,695359.0,WHITE,MALE,,Yes,74.0,180.0,Brown,Black,LA,2/23/23 15:14,,-90.059982,29.963337,70116.0,Royal St & Esplanade Av
1,607433,B2383823,2/23/23 14:13,5,Q,5th District,CALL FOR SERVICE,Stop Results: No action taken;Subject Type: Pe...,,,,,,695361.0,BLACK,FEMALE,,No,68.0,140.0,Brown,Brown,,2/23/23 17:35,,-90.016421,29.966743,70117.0,005XX N Claiborne Av
2,607430,B2342223,2/23/23 13:02,7,F,7th District,TRAFFIC VIOLATION,Stop Results: Verbal Warning;Subject Type: Dri...,2003.0,TOYOTA,TACOMA,PICK UP,BEIGE,695358.0,BLACK,FEMALE,,Yes,65.0,110.0,Brown,Black,LA,2/23/23 13:13,,-89.992275,30.023754,,Dwyer Rd & Sandhurst Dr


## Cleaning Data

The total size of the dataset is 16500 observations of 29 attributes each. This includes all observations from January 1st, 2022 through the most recent version of the data downloaded.

In [21]:
interview_data.shape

(16500, 29)

These are all the counts of missing data ([numpy NaN objects](https://numpy.org/doc/stable/reference/constants.html)) prseent in the field interview cards data set. Some columns are inherently non-sparse, such as teh primary key `FieldInterviewID` or `NOPD_Item`, as well as the boolean attribute `SubjectHasPhotoID` in which the variable is marked as False unless shown to be otherwise true. Notably, the information about vehicles involved tends to be sparse, but this is most likely accounted for in incidents not involving vehicles at all, and so presents no logical issues. There are various reasons why subject information could be missing, such as officer error, indeterminate attributes (low lighting, incident was too short to collect definitive attributes, etc.,), lack of collection (ID or driver's licence was never needed to be checked), or anonymization out of safety concerns. We do not assert that the sparsity of these attributes is independent of the attributes of the subjects themselves - `SubjectRace` is one of the least sparse attributes in the data set, as well as `SubjectGender`, which can be determined visually as opposed to attributes like `SubjectAge` which requires some official record examination to collect. Therefore, these sparse observations will not be dropped, but will be accounted for separately.

In [20]:
print(interview_data.isna().sum())

FieldInterviewID             0
NOPD_Item                    0
EventDate                    0
District                     0
Zone                         0
OfficerAssignment            0
StopDescription              0
ActionsTaken               126
VehicleYear               8004
VehicleMake               7793
VehicleModel              8176
VehicleStyle              7929
VehicleColor              7838
SubjectID                  126
SubjectRace                126
SubjectGender              183
SubjectAge               16500
SubjectHasPhotoID            0
SubjectHeight             1215
SubjectWeight             1188
SubjectEyeColor           1116
SubjectHairColor          1801
SubjectDriverLicState     7714
CreatedDateTime              0
LastModifiedDateTime     10722
Longitude                    0
Latitude                     0
Zip                       1600
BlockAddress              1270
dtype: int64


Since the data was imported from a .csv file, all data types are either categorical objects, integers, or floating point integers. For use in analysis, all attributes of dates will be converted to [python DateTime64 objects](https://pandas.pydata.org/pandas-docs/version/0.23.0/timeseries.html), and the index of the table will be labeled.

In [22]:
interview_data.dtypes

FieldInterviewID           int64
NOPD_Item                 object
EventDate                 object
District                   int64
Zone                      object
OfficerAssignment         object
StopDescription           object
ActionsTaken              object
VehicleYear              float64
VehicleMake               object
VehicleModel              object
VehicleStyle              object
VehicleColor              object
SubjectID                float64
SubjectRace               object
SubjectGender             object
SubjectAge               float64
SubjectHasPhotoID         object
SubjectHeight            float64
SubjectWeight            float64
SubjectEyeColor           object
SubjectHairColor          object
SubjectDriverLicState     object
CreatedDateTime           object
LastModifiedDateTime      object
Longitude                float64
Latitude                 float64
Zip                      float64
BlockAddress              object
dtype: object

In [26]:
interview_data["EventDate"] = pd.to_datetime(interview_data["EventDate"])
interview_data["CreatedDateTime"] = pd.to_datetime(interview_data["CreatedDateTime"])
interview_data["LastModifiedDateTime"] = pd.to_datetime(interview_data["LastModifiedDateTime"])

In [27]:
interview_data.dtypes

FieldInterviewID                  int64
NOPD_Item                        object
EventDate                datetime64[ns]
District                          int64
Zone                             object
OfficerAssignment                object
StopDescription                  object
ActionsTaken                     object
VehicleYear                     float64
VehicleMake                      object
VehicleModel                     object
VehicleStyle                     object
VehicleColor                     object
SubjectID                       float64
SubjectRace                      object
SubjectGender                    object
SubjectAge                      float64
SubjectHasPhotoID                object
SubjectHeight                   float64
SubjectWeight                   float64
SubjectEyeColor                  object
SubjectHairColor                 object
SubjectDriverLicState            object
CreatedDateTime          datetime64[ns]
LastModifiedDateTime     datetime64[ns]


## Exploratory Data Analysis