## Decoding Arrest Decisions in Terry Traffic Stops

## Project Overview

The Terry Traffic Stops project aims to unravel the complex dynamics behind police decisions during traffic stops, particularly focusing on whether an arrest is made. Building on the legal precedent set by Terry v. Ohio, which introduced the concept of "reasonable suspicion," the project develops a machine learning model to predict the likelihood of arrest following a stop.

Using a rich dataset that includes details such as the stop’s context, demographics, and circumstances, the model seeks to identify patterns and key factors influencing these critical decisions. Beyond understanding police behavior, the project aspires to inform policy, promote fairness, and enhance transparency in law enforcement practices. Through this analysis, the Terry Traffic Stops project contributes to the broader conversation on policing and civil rights.

## Business Problem

Terry Stops, rooted in the principle of “reasonable suspicion,” allow police officers to temporarily detain individuals for investigation. However, the decision to escalate a stop to an arrest can be influenced by a variety of factors, some of which may not be immediately clear or consistent. This project seeks to address the complexities involved in these critical decisions through a multifaceted approach:

1. **Identifying Key Factors**: The project aims to determine the primary factors that influence whether an arrest is made following a Terry Stop. These factors could range from situational elements, such as time and location, to individual characteristics, including behavior and demeanor during the stop. Understanding these variables is crucial for identifying patterns and ensuring that decisions are made based on objective criteria.

2. **Predictive Modeling**: To systematically analyze these factors, the project will develop a binary classification model capable of predicting the likelihood of an arrest. By leveraging historical data, this model will provide a data-driven approach to anticipate outcomes, allowing law enforcement agencies to understand the potential impact of various factors on arrest decisions.

3. **Policy Evaluation**: A key focus of the project is to provide insights into the role of demographic factors, such as race and gender, in arrest decisions. By examining these elements, the project will help ensure that arrest decisions are fair and unbiased, fostering greater transparency and addressing ethical concerns in policing practices.

4. **Resource Allocation**: Understanding the patterns and factors that lead to arrests can also assist law enforcement agencies in better allocating their resources. By identifying areas where arrests are more likely or understanding the circumstances that typically lead to such outcomes, agencies can deploy their personnel more effectively and make informed decisions about training and policy development.


## Research Questions:

1. **Prediction Accuracy**: How well can we predict whether a traffic stop will occur using factors like location, time of day, and demographic information?

2. **Key Influences**: Which factors are most important in predicting traffic stops? Are there any surprising trends or patterns?

3. **Bias and Fairness**: Does the model show any bias in its predictions? For example, does it unfairly target certain groups of drivers more than others?


## Data Understanding Overview
In this phase, we explore the Terry Traffic Stops dataset to gain insights into the variables that describe police interactions during traffic stops. The dataset includes detailed information on both the individuals stopped and the officers involved, covering demographics, stop outcomes, and geographical data. By examining these features, we aim to identify key factors that influence stop outcomes, assess potential biases, and prepare the data for further analysis and modeling

1. **Subject Age Group**: Categorizes the age of the individual involved in the stop.
2. **Subject ID**: Unique identifier for each individual stopped.
3. **GO / SC Num**: Case or report number associated with the stop.
4. **Terry Stop ID**: Unique identifier for each specific Terry Stop event.
5. **Stop Resolution**: Outcome of the stop (e.g., arrest, warning).
6. **Weapon Type**: Type of weapon found, if any, during the stop.
7. **Officer ID**: Unique identifier for the officer conducting the stop.
8. **Officer YOB**: Year of birth of the officer involved in the stop.
9. **Officer Gender**: Gender of the officer involved in the stop.
10. **Officer Race**: Race of the officer conducting the stop.
11. **Subject Perceived Race**: Race of the individual as perceived by the officer.
12. **Subject Perceived Gender**: Gender of the individual as perceived by the officer.
13. **Reported Date**: Date when the stop was reported.
14. **Reported Time**: Time of day when the stop was reported.
15. **Initial Call Type**: Reason for the initial call that led to the stop.
16. **Final Call Type**: Nature of the call after the stop was resolved.
17. **Call Type**: General category of the call or incident.
18. **Officer Squad**: The squad or unit to which the officer belongs.
19. **Arrest Flag**: Indicator of whether the stop resulted in an arrest.
20. **Frisk Flag**: Indicator of whether the individual was frisked during the stop.
21. **Precinct**: The police precinct where the stop took place.
22. **Sector**: Sub-division within a precinct for geographical analysis.
23. **Beat**: Smallest geographical area of police patrol.


## Step 1: Import all the Necessary Libraries

In [3]:
# Importing essential libraries for data handling and analysis
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from imblearn.over_sampling import SMOTE

In [4]:
# Load the dataset and print out the first few rows of our data
df = pd.read_csv("Terry_Traffic_Stops.csv")

# Display the first few rows to verify the data
df.head()

Unnamed: 0,Subject Age Group,Subject ID,GO / SC Num,Terry Stop ID,Stop Resolution,Weapon Type,Officer ID,Officer YOB,Officer Gender,Officer Race,...,Reported Time,Initial Call Type,Final Call Type,Call Type,Officer Squad,Arrest Flag,Frisk Flag,Precinct,Sector,Beat
0,36 - 45,7732696346,20190000315233,9803669705,Field Contact,-,4161,1957,M,American Indian/Alaska Native,...,10:47:39.0000000,SUSPICIOUS STOP - OFFICER INITIATED ONVIEW,--SUSPICIOUS CIRCUM. - SUSPICIOUS PERSON,ONVIEW,WEST PCT 1ST W - KING - PLATOON 1,N,N,West,K,K2
1,46 - 55,8295859194,20190000207006,8299762394,Arrest,-,6404,1971,M,White,...,17:08:12.0000000,BURG - OCCUPIED RESD,--PROPERTY DEST (DAMG),911,NORTH PCT 2ND W - NORA (JOHN) - PLATOON 1,Y,N,North,N,N3
2,26 - 35,-1,20170000002886,467843,Field Contact,,7430,1984,F,White,...,17:48:00.0000000,-,-,-,NORTH PCT 2ND WATCH - NORTH BEATS,N,Y,-,-,-
3,1 - 17,-1,20180000275743,472723,Offense Report,,5151,1962,M,White,...,11:49:00.0000000,-,-,-,SOUTHWEST PCT 1ST W - WILLIAM - PLATOON 2,N,N,Southwest,F,F1
4,36 - 45,16227498273,20220000263279,37099192062,Arrest,-,7655,1982,M,Nat Hawaiian/Oth Pac Islander,...,22:18:36.0000000,SHOPLIFT - THEFT,--BURGLARY - NON RESIDENTIAL/COMMERCIAL,ONVIEW,SOUTHWEST PCT 2ND W - FRANK - PLATOON 2,Y,N,Southwest,F,F2


Lets have a general overview of our dataset

In [13]:

class DataFrameInspector:
    def __init__(self, file_path):
        # Initialize the class with the file path and load the DataFrame
        self.file_path = file_path
        self.df = pd.read_csv(file_path)

    def display_info(self):
        # Display basic information about the DataFrame
        print("DataFrame Info:")
        self.df.info()

    def show_head(self, n=5):
        # Show the first n rows of the DataFrame
        print(f"\nFirst {n} rows:")
        print(self.df.head(n))

    def show_basic_stats(self):
        # Display summary statistics for numerical columns
        print("\nBasic statistics:")
        print(self.df.describe())

    def count_null_values(self):
        # Count the number of null values in each column
        print("\nNull values per column:")
        print(self.df.isnull().sum())

    def count_unique_values(self):
        # Count the number of unique values in each column
        print("\nUnique values per column:")
        print(self.df.nunique())

    def inspect_all(self):
        # Run all inspection methods
        self.display_info()
        self.show_head()
        self.show_basic_stats()
        self.count_null_values()
        self.count_unique_values()

inspector = DataFrameInspector('Terry_Traffic_Stops.csv')
inspector.inspect_all()

# Alternatively, you can call individual methods:
# inspector.display_info()
# inspector.show_head(10)  # Show first 10 rows
# inspector.show_basic_stats()
# inspector.count_null_values()
# inspector.count_unique_values()

DataFrame Info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 61021 entries, 0 to 61020
Data columns (total 23 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Subject Age Group         61021 non-null  object
 1   Subject ID                61021 non-null  int64 
 2   GO / SC Num               61021 non-null  int64 
 3   Terry Stop ID             61021 non-null  int64 
 4   Stop Resolution           61021 non-null  object
 5   Weapon Type               28456 non-null  object
 6   Officer ID                61021 non-null  object
 7   Officer YOB               61021 non-null  int64 
 8   Officer Gender            61021 non-null  object
 9   Officer Race              61021 non-null  object
 10  Subject Perceived Race    61021 non-null  object
 11  Subject Perceived Gender  61021 non-null  object
 12  Reported Date             61021 non-null  object
 13  Reported Time             61021 non-null  object
 14  Initia

## Data Understanding: General overview

1. Dataset Overview

The Terry Traffic Stops dataset contains a comprehensive record of 61,021 police stops. This data is organized across 23 columns, providing a detailed view of each incident.

2. Data Structure

The dataset's structure comprises 4 integer columns and 19 object columns. The object columns likely contain strings or mixed data types, allowing for a diverse range of information to be captured.

3. Missing Data

While most columns are complete, there are two notable exceptions. The "Weapon Type" column has the highest number of null values at 32,565, suggesting that weapon information is not always available or applicable. Additionally, the "Officer Squad" column has 561 missing entries.

4. Key Column Categories

4.1 Subject Information
The dataset includes details about the subjects of the stops, such as their age group, ID, perceived race, and perceived gender.

4.2 Officer Details
Information about the officers conducting the stops is also recorded, including their ID, year of birth (YOB), gender, and race.

4.3 Stop Characteristics
Each stop is documented with a unique Terry Stop ID, along with the stop resolution and the reported date and time.

4.4 Location Data
The geographic context of each stop is captured through precinct, sector, and beat information.

5. Numerical Insights

The "Officer YOB" column provides interesting demographic data about the police force. Birth years range from 1900 to 2002, with a mean of 1984. Both the Subject ID and Terry Stop ID columns show a wide range of values, likely serving as unique identifiers for each entry.

6. Categorical Data Highlights

Several columns offer insights into the nature of the stops and the diversity of those involved:
- Stop Resolution has 5 unique categories, indicating various outcomes of the stops.
- There are 9 distinct precincts represented in the data.
- Officer Race is categorized into 9 groups.
- Subject Perceived Race has 11 unique categories, suggesting a detailed approach to recording racial data.

7. Flag Columns

The dataset includes two flag columns that provide quick reference points:
- Arrest Flag is binary, indicating whether an arrest was made.
- Frisk Flag has 3 unique values, potentially indicating different levels or types of frisking procedures.

## Column classification

In [3]:
def classify_columns(file_path):
    # Read the CSV file
    df = pd.read_csv(file_path)

    # Initialize lists to store column names
    numerical_columns = []
    categorical_columns = []

    # Iterate through each column
    for column in df.columns:
        # Check if the column is numerical
        if df[column].dtype in ['int64', 'float64']:
            numerical_columns.append(column)
        else:
            # Check if the column is categorical
            if df[column].dtype == 'object' or df[column].dtype.name == 'category':
                categorical_columns.append(column)
            # If it's neither numerical nor categorical (e.g., datetime), you can add more conditions here

    # Print the results
    print("Numerical columns:")
    for col in numerical_columns:
        print(f"- {col}")

    print("\nCategorical columns:")
    for col in categorical_columns:
        print(f"- {col}")

    # Return the lists if you need to use them later
    return numerical_columns, categorical_columns

# Usage
file_path = 'Terry_Traffic_Stops.csv'
num_cols, cat_cols = classify_columns(file_path)

Numerical columns:
- Subject ID
- GO / SC Num
- Terry Stop ID
- Officer YOB

Categorical columns:
- Subject Age Group
- Stop Resolution
- Weapon Type
- Officer ID
- Officer Gender
- Officer Race
- Subject Perceived Race
- Subject Perceived Gender
- Reported Date
- Reported Time
- Initial Call Type
- Final Call Type
- Call Type
- Officer Squad
- Arrest Flag
- Frisk Flag
- Precinct
- Sector
- Beat
