# Racial Differences in Sentencing 

In the United States issues around race have been, and continue to be, an important element of American culture and society. With the adoption of the Civil Rights Act of 1964, legal race-based discrimination is now illegal. However, many economists, sociologists, and political scientists have argued that the legacy of these past policies and practices still negatively affect people in the United States today. 

Policy makers who wish to address these programs need to understand the extent of these issues so they can properly implement new or revise current laws. This notebook uses data from Cook County, IL and basic econometric tools to determine the extent of racial bias in prison sentence length for Black and White Americans. 

In [1]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
import_df = pd.read_csv('sentencing.csv', dtype={'DISPOSITION_CHARGED_CHAPTER':'str',
                                                'DISPOSITION_CHARGED_ACT':'str',
                                                'DISPOSITION_CHARGED_AOIC':'str',
                                                'COMMITMENT_TERM':'object'})

In [3]:
sentencing_df = import_df.copy(deep=True)
sentencing_df.head(3)

Unnamed: 0.1,Unnamed: 0,CASE_ID,CASE_PARTICIPANT_ID,RECEIVED_DATE,OFFENSE_CATEGORY,PRIMARY_CHARGE_FLAG,CHARGE_ID,CHARGE_VERSION_ID,DISPOSITION_CHARGED_OFFENSE_TITLE,CHARGE_COUNT,...,INCIDENT_CITY,INCIDENT_BEGIN_DATE,INCIDENT_END_DATE,LAW_ENFORCEMENT_AGENCY,LAW_ENFORCEMENT_UNIT,ARREST_DATE,FELONY_REVIEW_DATE,FELONY_REVIEW_RESULT,ARRAIGNMENT_DATE,UPDATED_OFFENSE_CATEGORY
0,0,226328000000.0,83148650000.0,2/5/2002 0:00,PROMIS Conversion,True,6238910000000.0,1097000000000.0,CONSPIRACY TO COMMIT FIRST DEGREE MURDER,1,...,,7/2/2000 0:00,,CHICAGO POLICE DEPT,,2016-10-08 20:00:00,10/9/2016 0:00,Approved,11/7/2016 0:00,PROMIS Conversion
1,166,258677000000.0,89434940000.0,11/1/2004 0:00,Reckless Homicide,False,9453060000000.0,976815000000.0,RECKLESS HOMICIDE,6,...,,10/10/2004 0:00,,CHICAGO POLICE DEPT,,2014-08-21 20:20:00,8/22/2014 0:00,Approved,11/5/2014 0:00,Reckless Homicide
2,842,280789000000.0,104164000000.0,3/13/2008 0:00,PROMIS Conversion,True,5969080000000.0,527094000000.0,RETAIL THEFT,1,...,,3/13/2008 0:00,,HODGKINS POLICE DEPARTMENT,,2012-03-19 21:56:00,3/13/2008 0:00,Charge(S) Approved,5/9/2012 0:00,PROMIS Conversion


## Filtering Down to Desired Defendants
For this notebook, we will limit our focus to only Black and White Americans between the ages of 18 and 39, who were arrested between 2010 and 2019 for a felony narcotics charge, and were sentenced to some time in prison. This is to allow us to compare a mostly similar group of individuals to each other.

In [4]:
sentencing_df.columns

Index(['Unnamed: 0', 'CASE_ID', 'CASE_PARTICIPANT_ID', 'RECEIVED_DATE',
       'OFFENSE_CATEGORY', 'PRIMARY_CHARGE_FLAG', 'CHARGE_ID',
       'CHARGE_VERSION_ID', 'DISPOSITION_CHARGED_OFFENSE_TITLE',
       'CHARGE_COUNT', 'DISPOSITION_DATE', 'DISPOSITION_CHARGED_CHAPTER',
       'DISPOSITION_CHARGED_ACT', 'DISPOSITION_CHARGED_SECTION',
       'DISPOSITION_CHARGED_CLASS', 'DISPOSITION_CHARGED_AOIC',
       'CHARGE_DISPOSITION', 'CHARGE_DISPOSITION_REASON', 'SENTENCE_JUDGE',
       'SENTENCE_COURT_NAME', 'SENTENCE_COURT_FACILITY', 'SENTENCE_PHASE',
       'SENTENCE_DATE', 'SENTENCE_TYPE', 'CURRENT_SENTENCE_FLAG',
       'COMMITMENT_TYPE', 'COMMITMENT_TERM', 'COMMITMENT_UNIT',
       'LENGTH_OF_CASE_in_Days', 'AGE_AT_INCIDENT', 'RACE', 'GENDER',
       'INCIDENT_CITY', 'INCIDENT_BEGIN_DATE', 'INCIDENT_END_DATE',
       'LAW_ENFORCEMENT_AGENCY', 'LAW_ENFORCEMENT_UNIT', 'ARREST_DATE',
       'FELONY_REVIEW_DATE', 'FELONY_REVIEW_RESULT', 'ARRAIGNMENT_DATE',
       'UPDATED_OFFENSE_CATEGORY'

In [5]:
# removing duplicate cases
sentencing_df = sentencing_df.drop_duplicates(subset='CASE_ID', keep=False)
sentencing_df.shape

(11776, 42)

In [6]:
# filtering dataframe to only keep cases related to narcotics 
sentencing_df = sentencing_df[sentencing_df['UPDATED_OFFENSE_CATEGORY'] == 'Narcotics']
sentencing_df.shape

(5518, 42)

In [7]:
# changing the ARREST_DATE variable to datetime from string to allow filtering by date
sentencing_df['ARREST_DATE'] = pd.to_datetime(sentencing_df['ARREST_DATE'])

# filtering the dataframe to only have 
sentencing_df = sentencing_df[sentencing_df['ARREST_DATE'].between('2014-01-01', '2019-12-31')]
sentencing_df.shape

(4006, 42)

In [8]:
# filtering dataframe to only have defendants between ages getting ages between 18 and 39
sentencing_df = sentencing_df[sentencing_df['AGE_AT_INCIDENT'].between(18, 39)]
sentencing_df.shape

(2675, 42)

In [9]:
#sentencing_df['ARREST_DATE'].dt.year

In [10]:
# main variables to be used, dropping na values 
keep_list = ['CHARGE_COUNT','AGE_AT_INCIDENT','RACE','GENDER', 'COMMITMENT_TERM',
             'COMMITMENT_UNIT','DISPOSITION_CHARGED_CLASS','SENTENCE_TYPE']

sentencing_df = sentencing_df.dropna(subset=keep_list)
sentencing_df.shape

(2662, 42)

In [11]:
# filtering dataset for only felony crimes
felony_list = ['X', '1', '2', '3', '4']
sentencing_df = sentencing_df[sentencing_df['DISPOSITION_CHARGED_CLASS'].isin(felony_list)]
sentencing_df.shape

(2334, 42)

In [12]:
# filtering for only instances where the the defendent was sentanced to prison 
sentencing_df = sentencing_df[sentencing_df['SENTENCE_TYPE'] == 'Prison']
sentencing_df.shape

(1006, 42)

In [13]:
# updating commitment term to numeric 
sentencing_df['COMMITMENT_TERM'] = sentencing_df['COMMITMENT_TERM'].astype('float')
sentencing_df.shape

(1006, 42)

In [14]:
# filtering dataframe to only include commitment units in years, months, or days 
sentencing_df = sentencing_df[
    (sentencing_df['COMMITMENT_UNIT']=='Year(s)') | 
    (sentencing_df['COMMITMENT_UNIT']=='Months') | 
    (sentencing_df['COMMITMENT_UNIT']=='Days')]

# removing 0 from commitmnet term 
sentencing_df = sentencing_df[sentencing_df['COMMITMENT_TERM'] != 0]

sentencing_df.shape

(1005, 42)

In [15]:
# filtering dataframe to remove unknown gender
sentencing_df = sentencing_df[
    (sentencing_df['GENDER']=='Female') | 
    (sentencing_df['GENDER']=='Male')]
sentencing_df.shape

(1005, 42)

## Creating New Variables
In order to aid the the contruction of the regression analysis, I will create a few new variables that capture important information about the defenants. 
1. DISPOSITION_TYPE: A binary variable indicating whether the defenant plead guilty or if a judge/jury found them guilty
2. TERM_IN_MONTHS: An update of the commitment variable where the duration of each sentence is converted into months rather than different units of years, months, and days 
3. RACE_UPDATED: An update of the RACE variable that properly classifies Black and White individuals and classifies other races as "Other"

In addition to the three mentioned above, a log transformation will be taken for all continuous variables to allow percentage comparison to be estimated. 


In [16]:
# creating variable for whether the defendant plead guily or was found guilty

DISPOSITION_TYPE = []

for index, row in sentencing_df.iterrows():
    if row['CHARGE_DISPOSITION'] == 'Plea Of Guilty':
        DISPOSITION_TYPE.append('Plead Guilty')
    elif row['CHARGE_DISPOSITION'] == 'Finding Guilty' or 'Verdict Guilty':
        DISPOSITION_TYPE.append('Found Guilty')
    else:
        DISPOSITION_TYPE.append('Other')


sentencing_df['DISPOSITION_TYPE'] = DISPOSITION_TYPE  

# filtering for only guilty verdicts
sentencing_df = sentencing_df[sentencing_df['DISPOSITION_TYPE'] != 'Other']
sentencing_df.shape

(1005, 43)

In [17]:
# creating a new column to standardize sentence duration to around months
TERM_IN_MONTHS = []

for index, row in sentencing_df.iterrows():
    if row['COMMITMENT_UNIT'] == 'Year(s)':
        TERM_IN_MONTHS.append(row['COMMITMENT_TERM']/12)
        
    elif row['COMMITMENT_UNIT'] == 'Months':
        TERM_IN_MONTHS.append(row['COMMITMENT_TERM']/1)
        
    elif row['COMMITMENT_UNIT'] == 'Days':
        TERM_IN_MONTHS.append(row['COMMITMENT_TERM']/30)
        
sentencing_df['TERM_IN_MONTHS'] = np.round(TERM_IN_MONTHS, 3) 

In [18]:
# updating race to remove unknown and classifying race - black, white

RACE_UPDATED = []

for index, row in sentencing_df.iterrows():
    if 'Black' in row['RACE']:
        RACE_UPDATED.append('Black')
    elif row['RACE'] == 'White':
        RACE_UPDATED.append('White')
    else:
        RACE_UPDATED.append('Other')

sentencing_df['RACE_UPDATED'] = RACE_UPDATED


# filtering dataframe to only include Black and White individuals 
sentencing_df = sentencing_df[sentencing_df['RACE_UPDATED'] != 'Other']
sentencing_df.shape

(925, 45)

In [19]:
# filtering to only usable features 
feature_list = ['CHARGE_COUNT','AGE_AT_INCIDENT','RACE_UPDATED','GENDER',
                'DISPOSITION_CHARGED_CLASS','TERM_IN_MONTHS','DISPOSITION_TYPE']

regression_df = sentencing_df[feature_list].copy(deep=True)

In [20]:
# verifiying that all null values have been eliminated 
regression_df.isnull().sum()

CHARGE_COUNT                 0
AGE_AT_INCIDENT              0
RACE_UPDATED                 0
GENDER                       0
DISPOSITION_CHARGED_CLASS    0
TERM_IN_MONTHS               0
DISPOSITION_TYPE             0
dtype: int64

In [21]:
# leaving the out of 
regression_df.describe()

Unnamed: 0,CHARGE_COUNT,AGE_AT_INCIDENT,TERM_IN_MONTHS
count,925.0,925.0,925.0
mean,1.330811,27.968649,3.736366
std,0.781809,5.557851,9.418103
min,1.0,18.0,0.067
25%,1.0,23.0,0.083
50%,1.0,27.0,0.167
75%,1.0,32.0,0.333
max,10.0,39.0,90.0


In [22]:
# log transformation for all continuous variables 

regression_df['CHARGE_COUNT_LOG'] = np.log(regression_df['CHARGE_COUNT'])
regression_df['AGE_AT_INCIDENT_LOG'] = np.log(regression_df['AGE_AT_INCIDENT'])
regression_df['TERM_IN_MONTHS_LOG'] = np.log(regression_df['TERM_IN_MONTHS'])

In [23]:
regression_df.describe()

Unnamed: 0,CHARGE_COUNT,AGE_AT_INCIDENT,TERM_IN_MONTHS,CHARGE_COUNT_LOG,AGE_AT_INCIDENT_LOG,TERM_IN_MONTHS_LOG
count,925.0,925.0,925.0,925.0,925.0,925.0
mean,1.330811,27.968649,3.736366,0.191361,3.311273,-1.158382
std,0.781809,5.557851,9.418103,0.38198,0.199732,1.891531
min,1.0,18.0,0.067,0.0,2.890372,-2.703063
25%,1.0,23.0,0.083,0.0,3.135494,-2.488915
50%,1.0,27.0,0.167,0.0,3.295837,-1.789761
75%,1.0,32.0,0.333,0.0,3.465736,-1.099613
max,10.0,39.0,90.0,2.302585,3.663562,4.49981


In [24]:
# getting binary numeric form for all categorical variables 
regression_df = pd.get_dummies(regression_df)
regression_df.head(3)

Unnamed: 0,CHARGE_COUNT,AGE_AT_INCIDENT,TERM_IN_MONTHS,CHARGE_COUNT_LOG,AGE_AT_INCIDENT_LOG,TERM_IN_MONTHS_LOG,RACE_UPDATED_Black,RACE_UPDATED_White,GENDER_Female,GENDER_Male,DISPOSITION_CHARGED_CLASS_1,DISPOSITION_CHARGED_CLASS_2,DISPOSITION_CHARGED_CLASS_3,DISPOSITION_CHARGED_CLASS_4,DISPOSITION_CHARGED_CLASS_X,DISPOSITION_TYPE_Found Guilty,DISPOSITION_TYPE_Plead Guilty
62675,3,25.0,0.417,1.098612,3.218876,-0.874669,1,0,0,1,0,1,0,0,0,0,1
62694,2,39.0,0.083,0.693147,3.663562,-2.488915,1,0,0,1,0,0,0,1,0,0,1
62925,2,30.0,0.333,0.693147,3.401197,-1.099613,1,0,0,1,1,0,0,0,0,0,1


In [25]:
# definning the two dependent variables to be used 
y = regression_df['TERM_IN_MONTHS']
y_log = regression_df['TERM_IN_MONTHS_LOG']

In [26]:
#defining x for a simple regression on race and sentence
x_1 = regression_df[['RACE_UPDATED_Black','CHARGE_COUNT', 'AGE_AT_INCIDENT','GENDER_Female','DISPOSITION_TYPE_Plead Guilty']]
x_1 = sm.add_constant(x_1)

In [27]:
model_1 = sm.OLS(y, x_1).fit()
print(model_1.summary())

                            OLS Regression Results                            
Dep. Variable:         TERM_IN_MONTHS   R-squared:                       0.009
Model:                            OLS   Adj. R-squared:                  0.003
Method:                 Least Squares   F-statistic:                     1.624
Date:                Sat, 08 Oct 2022   Prob (F-statistic):              0.151
Time:                        14:57:23   Log-Likelihood:                -3382.4
No. Observations:                 925   AIC:                             6777.
Df Residuals:                     919   BIC:                             6806.
Df Model:                           5                                         
Covariance Type:            nonrobust                                         
                                    coef    std err          t      P>|t|      [0.025      0.975]
-------------------------------------------------------------------------------------------------
const         

In [28]:
regression_df.to_csv('test.csv')