# New Orleans Police Department (NOPD) misconduct complaints 

 * **Topic:** Investigating bias in NOPD misconduct complaints using a multivariable **logistic regression**
 * **Data:** https://catalog.data.gov/dataset/nopd-misconduct-complaints
 * **Purpose:** Determine whether the department’s disciplinary council, the Public Integrity Bureau (PIB), is biased against officers of racial and ethnic minorities in its investigation process. In other words, is the NOPD’s PIB racist?
 

We pull in pandas, statsmodels for the regression, and numpy for computing odds ratios

In [1]:
import pandas as pd
import statsmodels.formula.api as smf
import numpy as np

pd.set_option("display.max_columns", 100)
pd.set_option("display.max_colwidth", 100)



In [2]:
df = pd.read_csv('NOPD_Misconduct_Complaints.csv')
df.head()

Unnamed: 0,Incident Type,Complaint Tracking Number,Date Complaint Occurred,Date Complaint Received by NOPD (PIB),Date Complaint Investigation Complete,Complaint classification,Investigation status,Disposition,Bureau of Complainant,Division of Complainant,Unit of Complainant,Unit Additional Details of Complainant,Working Status of Complainant,Shift of Complainant,Rule Violation,Paragraph Violation,Unique Officer Allegation ID,Officer Race Ethnicity,Officer Gender,Officer Age,Officer years of service,Complainant Gender,Complainant Ethnicity,Complainant Age
0,Public Initiated,2016-0001-P,2016-01-01,2016-01-01,2016-07-21,DI-1,Completed,Unfounded,,8th District,,,,,RULE 3: PROF CONDUCT,PARAGRAPH 01 - Professionalism,30664.0,,,,,Male,Black,
1,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30667.0,Black,Male,60.0,,Female,White,
2,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30669.0,Black,Male,44.0,,Female,White,
3,Public Initiated,2016-0009-P,2016-01-04,2016-01-04,2017-03-20,DI-1,Completed,Unfounded,FOB - Field Operations Bureau,8th District,8th District,Patrol,Regular Working,Between 3pm-11pm,RULE 2: MORAL CONDUCT,PARAGRAPH 01 - ADHERENCE TO LAW,30671.0,White,Male,,,,,
4,Public Initiated,2016-0006-P,2016-12-30,2016-01-04,2016-07-25,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,Command Staff,Admin,,,,RULE 4: PERF OF DUTY,PARAGRAPH 02 - INSTRUCTIONS FROM AUTHORITATIVE SOURCE,30674.0,Black,Male,54.0,,Female,Black,50.0


# Cleaning the features

**Converting inconsistent data into standard categories or null values + changing the data type as necessary:**
* Date Complaint Investigation Complete
* Officer Race Ethnicity
* Officer Age
* Officer Gender
* Incident Type

In [3]:
df['Year_Complete'] = pd.to_datetime(df['Date Complaint Investigation Complete'], format='%Y/%m/%d')
df.head()

Unnamed: 0,Incident Type,Complaint Tracking Number,Date Complaint Occurred,Date Complaint Received by NOPD (PIB),Date Complaint Investigation Complete,Complaint classification,Investigation status,Disposition,Bureau of Complainant,Division of Complainant,Unit of Complainant,Unit Additional Details of Complainant,Working Status of Complainant,Shift of Complainant,Rule Violation,Paragraph Violation,Unique Officer Allegation ID,Officer Race Ethnicity,Officer Gender,Officer Age,Officer years of service,Complainant Gender,Complainant Ethnicity,Complainant Age,Year_Complete
0,Public Initiated,2016-0001-P,2016-01-01,2016-01-01,2016-07-21,DI-1,Completed,Unfounded,,8th District,,,,,RULE 3: PROF CONDUCT,PARAGRAPH 01 - Professionalism,30664.0,,,,,Male,Black,,2016-07-21
1,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30667.0,Black,Male,60.0,,Female,White,,2016-08-03
2,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30669.0,Black,Male,44.0,,Female,White,,2016-08-03
3,Public Initiated,2016-0009-P,2016-01-04,2016-01-04,2017-03-20,DI-1,Completed,Unfounded,FOB - Field Operations Bureau,8th District,8th District,Patrol,Regular Working,Between 3pm-11pm,RULE 2: MORAL CONDUCT,PARAGRAPH 01 - ADHERENCE TO LAW,30671.0,White,Male,,,,,,2017-03-20
4,Public Initiated,2016-0006-P,2016-12-30,2016-01-04,2016-07-25,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,Command Staff,Admin,,,,RULE 4: PERF OF DUTY,PARAGRAPH 02 - INSTRUCTIONS FROM AUTHORITATIVE SOURCE,30674.0,Black,Male,54.0,,Female,Black,50.0,2016-07-25


In [4]:
df['Year_Complete'] =df['Year_Complete'].dt.year
df.head()

Unnamed: 0,Incident Type,Complaint Tracking Number,Date Complaint Occurred,Date Complaint Received by NOPD (PIB),Date Complaint Investigation Complete,Complaint classification,Investigation status,Disposition,Bureau of Complainant,Division of Complainant,Unit of Complainant,Unit Additional Details of Complainant,Working Status of Complainant,Shift of Complainant,Rule Violation,Paragraph Violation,Unique Officer Allegation ID,Officer Race Ethnicity,Officer Gender,Officer Age,Officer years of service,Complainant Gender,Complainant Ethnicity,Complainant Age,Year_Complete
0,Public Initiated,2016-0001-P,2016-01-01,2016-01-01,2016-07-21,DI-1,Completed,Unfounded,,8th District,,,,,RULE 3: PROF CONDUCT,PARAGRAPH 01 - Professionalism,30664.0,,,,,Male,Black,,2016.0
1,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30667.0,Black,Male,60.0,,Female,White,,2016.0
2,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30669.0,Black,Male,44.0,,Female,White,,2016.0
3,Public Initiated,2016-0009-P,2016-01-04,2016-01-04,2017-03-20,DI-1,Completed,Unfounded,FOB - Field Operations Bureau,8th District,8th District,Patrol,Regular Working,Between 3pm-11pm,RULE 2: MORAL CONDUCT,PARAGRAPH 01 - ADHERENCE TO LAW,30671.0,White,Male,,,,,,2017.0
4,Public Initiated,2016-0006-P,2016-12-30,2016-01-04,2016-07-25,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,Command Staff,Admin,,,,RULE 4: PERF OF DUTY,PARAGRAPH 02 - INSTRUCTIONS FROM AUTHORITATIVE SOURCE,30674.0,Black,Male,54.0,,Female,Black,50.0,2016.0


In [5]:
df['Officer Race Ethnicity'].value_counts()

Black                             2595
White                             1817
Hispanic                           238
Asian/Pacifi                        69
Not Specifie                        21
Race-Unknown                        16
American Ind                         9
Asian/Pacif                          8
 Giving Anything of Value            6
PARAGRAPH 01 - Professionalism       2
Name: Officer Race Ethnicity, dtype: int64

In [6]:
df['Officer_Race_Ethnicity'] = df['Officer Race Ethnicity'].replace({
    'Asian/Pacifi':'Asian',
    'Not Specifie' : np.nan,
    'Race-Unknown' : np.nan,
    'American Ind':'Indigenous',
    'Asian/Pacif':'Asian',
    ' Giving Anything of Value':np.nan,
    'PARAGRAPH 01 - Professionalism': np.nan,
})
df.Officer_Race_Ethnicity.value_counts()

Black         2595
White         1817
Hispanic       238
Asian           77
Indigenous       9
Name: Officer_Race_Ethnicity, dtype: int64

In [7]:
df['Officer Age'].value_counts()

32    205
34    169
33    169
40    160
41    158
     ... 
65      1
84      1
72      1
68      1
83      1
Name: Officer Age, Length: 64, dtype: int64

In [8]:
df['Officer_Age'] = df['Officer Age'].replace({
    'Male': np.nan,
    'Female' : np.nan,
    '-38' : np.nan,
    '-8': np.nan
})
df.Officer_Age.value_counts()

32.0     205
34.0     169
33.0     169
40.0     160
41.0     158
28.0     156
29.0     154
36.0     150
39.0     148
38.0     145
35.0     139
42.0     138
31.0     138
30.0     138
44.0     130
37.0     126
27.0     124
47.0     121
43.0     121
54.0     120
45.0     111
46.0     108
51.0     107
26.0     106
53.0     106
49.0      99
48.0      95
52.0      89
25.0      87
50.0      82
55.0      70
24.0      56
23.0      53
57.0      51
56.0      51
58.0      46
60.0      28
59.0      26
61.0      25
62.0      20
22.0      16
63.0      10
64.0       5
69.0       4
67.0       3
21.0       3
66.0       2
65.0       1
84.0       1
106.0      1
73.0       1
75.0       1
86.0       1
72.0       1
71.0       1
105.0      1
83.0       1
68.0       1
80.0       1
109.0      1
Name: Officer_Age, dtype: int64

In [9]:
df['Officer_Age'] = df.Officer_Age.astype(float)

In [10]:
df['Officer Gender'].value_counts()

Male      3779
Female     996
N            8
Black        4
White        3
Name: Officer Gender, dtype: int64

In [11]:
df['Officer_Gender'] = df['Officer Gender'].replace({
    'N': np.nan,
    'Black' : np.nan,
    'White' : np.nan,
})
df.Officer_Gender.value_counts()

Male      3779
Female     996
Name: Officer_Gender, dtype: int64

In [12]:
df['Incident_Type'] = df['Incident Type']
df.Incident_Type.value_counts()

Public Initiated    3460
Rank Initiated      1830
Name: Incident_Type, dtype: int64

## Creating the "Minority" column
To generate a simpler model, we created a new column called “Minority,” which categorized each officer as either a white person (“W”) or a person from a minority racial or ethnic community (“M”)

In [13]:
df['minority'] = df['Officer_Race_Ethnicity'].replace({
    'Black':'M',
    'White':'W',
    'Hispanic':'M',
    'Asian':'M',
    'Indigenous':'M'
})
df.minority.value_counts()

M    2919
W    1817
Name: minority, dtype: int64

## Categorizing each disposition as either “Sustained” or “Other."
The “Disposition” column in the dataset refers to the outcome of the complaint investigation, and we reviewed the NOPD’s Operations Manual to understand the meaning of each possible result. Allegations receive in a "Sustained" disposition when “the investigation determines by a preponderance of the evidence that the alleged misconduct did occur.” 

**Source:** https://www.nola.gov/getattachment/NOPD/Policies/Chapter-52-1-1-Misconduct-Intake-and-Complaint-Investigation-EFFECTIVE-3-18-18.pdf/ 
* Eliminated the rows that were classified as pending
* Created a new column categorizing each disposition as either “Sustained” or “Other.” 

In [14]:
df.Disposition.value_counts()

Unfounded                       1121
Pending                         1076
Sustained                        801
Not Sustained                    582
Other                            540
Exonerated                       526
NFIM                             359
Withdrawn - Mediation            177
Negotiated Settlement            106
Resigned under investigation       2
Name: Disposition, dtype: int64

In [15]:
df2 = df[df['Disposition'] != 'Pending']
df2.head()

Unnamed: 0,Incident Type,Complaint Tracking Number,Date Complaint Occurred,Date Complaint Received by NOPD (PIB),Date Complaint Investigation Complete,Complaint classification,Investigation status,Disposition,Bureau of Complainant,Division of Complainant,Unit of Complainant,Unit Additional Details of Complainant,Working Status of Complainant,Shift of Complainant,Rule Violation,Paragraph Violation,Unique Officer Allegation ID,Officer Race Ethnicity,Officer Gender,Officer Age,Officer years of service,Complainant Gender,Complainant Ethnicity,Complainant Age,Year_Complete,Officer_Race_Ethnicity,Officer_Age,Officer_Gender,Incident_Type,minority
0,Public Initiated,2016-0001-P,2016-01-01,2016-01-01,2016-07-21,DI-1,Completed,Unfounded,,8th District,,,,,RULE 3: PROF CONDUCT,PARAGRAPH 01 - Professionalism,30664.0,,,,,Male,Black,,2016.0,,,,Public Initiated,
1,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30667.0,Black,Male,60.0,,Female,White,,2016.0,Black,60.0,Male,Public Initiated,M
2,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30669.0,Black,Male,44.0,,Female,White,,2016.0,Black,44.0,Male,Public Initiated,M
3,Public Initiated,2016-0009-P,2016-01-04,2016-01-04,2017-03-20,DI-1,Completed,Unfounded,FOB - Field Operations Bureau,8th District,8th District,Patrol,Regular Working,Between 3pm-11pm,RULE 2: MORAL CONDUCT,PARAGRAPH 01 - ADHERENCE TO LAW,30671.0,White,Male,,,,,,2017.0,White,,Male,Public Initiated,W
4,Public Initiated,2016-0006-P,2016-12-30,2016-01-04,2016-07-25,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,Command Staff,Admin,,,,RULE 4: PERF OF DUTY,PARAGRAPH 02 - INSTRUCTIONS FROM AUTHORITATIVE SOURCE,30674.0,Black,Male,54.0,,Female,Black,50.0,2016.0,Black,54.0,Male,Public Initiated,M


In [16]:
df2['Disposition_new'] = df2.Disposition.replace({
    'Unfounded':'Other',
    'Other' : 'Other',
    'Exonerated' : 'Other',
    'NFIM':'Other',
    'Not Sustained':'Other',
    'NFIM':'Other',
    'Withdrawn - Mediation':'Other',
    'Resigned under investigation': 'Other',
    'Negotiated Settlement':'Other'
    })
df2.Disposition_new.value_counts()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2['Disposition_new'] = df2.Disposition.replace({


Other        3413
Sustained     801
Name: Disposition_new, dtype: int64

## Creating age groups
* Created a new column with officer ages divided into bins to compare the effects of different age groups on the investigation’s outcome

In [17]:
labels = [
    'under 25',
    '25-38',
    '39-54',
    '55-69',
    'over 70'
]
breaks = [0, 25, 39, 55, 70, 999]
df2['Officer_Age_bin'] = pd.cut(df2.Officer_Age, bins=breaks, labels=labels)
df2.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2['Officer_Age_bin'] = pd.cut(df2.Officer_Age, bins=breaks, labels=labels)


Unnamed: 0,Incident Type,Complaint Tracking Number,Date Complaint Occurred,Date Complaint Received by NOPD (PIB),Date Complaint Investigation Complete,Complaint classification,Investigation status,Disposition,Bureau of Complainant,Division of Complainant,Unit of Complainant,Unit Additional Details of Complainant,Working Status of Complainant,Shift of Complainant,Rule Violation,Paragraph Violation,Unique Officer Allegation ID,Officer Race Ethnicity,Officer Gender,Officer Age,Officer years of service,Complainant Gender,Complainant Ethnicity,Complainant Age,Year_Complete,Officer_Race_Ethnicity,Officer_Age,Officer_Gender,Incident_Type,minority,Disposition_new,Officer_Age_bin
0,Public Initiated,2016-0001-P,2016-01-01,2016-01-01,2016-07-21,DI-1,Completed,Unfounded,,8th District,,,,,RULE 3: PROF CONDUCT,PARAGRAPH 01 - Professionalism,30664.0,,,,,Male,Black,,2016.0,,,,Public Initiated,,Other,
1,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30667.0,Black,Male,60.0,,Female,White,,2016.0,Black,60.0,Male,Public Initiated,M,Other,55-69
2,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30669.0,Black,Male,44.0,,Female,White,,2016.0,Black,44.0,Male,Public Initiated,M,Other,39-54
3,Public Initiated,2016-0009-P,2016-01-04,2016-01-04,2017-03-20,DI-1,Completed,Unfounded,FOB - Field Operations Bureau,8th District,8th District,Patrol,Regular Working,Between 3pm-11pm,RULE 2: MORAL CONDUCT,PARAGRAPH 01 - ADHERENCE TO LAW,30671.0,White,Male,,,,,,2017.0,White,,Male,Public Initiated,W,Other,
4,Public Initiated,2016-0006-P,2016-12-30,2016-01-04,2016-07-25,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,Command Staff,Admin,,,,RULE 4: PERF OF DUTY,PARAGRAPH 02 - INSTRUCTIONS FROM AUTHORITATIVE SOURCE,30674.0,Black,Male,54.0,,Female,Black,50.0,2016.0,Black,54.0,Male,Public Initiated,M,Other,39-54


In [18]:
df2.Officer_Age_bin.value_counts()

25-38       1659
39-54       1450
55-69        227
under 25     179
over 70        8
Name: Officer_Age_bin, dtype: int64

# Logistic Regression

Use statsmodels module to build our logistic regression, which predicted the if the complaint was most likely to be "Sustained” based on:
* Minority
* Age
* Gender 
* Incident Type 

For each feature, we set a reference category: we compared minorities to white, gender to male, age to 25-38 and complaint origin to “Public Initiated.”

In [19]:
df2['Sustained'] = df2.Disposition_new.replace({'Sustained': 1, 'Other': 0})
df2.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df2['Sustained'] = df2.Disposition_new.replace({'Sustained': 1, 'Other': 0})


Unnamed: 0,Incident Type,Complaint Tracking Number,Date Complaint Occurred,Date Complaint Received by NOPD (PIB),Date Complaint Investigation Complete,Complaint classification,Investigation status,Disposition,Bureau of Complainant,Division of Complainant,Unit of Complainant,Unit Additional Details of Complainant,Working Status of Complainant,Shift of Complainant,Rule Violation,Paragraph Violation,Unique Officer Allegation ID,Officer Race Ethnicity,Officer Gender,Officer Age,Officer years of service,Complainant Gender,Complainant Ethnicity,Complainant Age,Year_Complete,Officer_Race_Ethnicity,Officer_Age,Officer_Gender,Incident_Type,minority,Disposition_new,Officer_Age_bin,Sustained
0,Public Initiated,2016-0001-P,2016-01-01,2016-01-01,2016-07-21,DI-1,Completed,Unfounded,,8th District,,,,,RULE 3: PROF CONDUCT,PARAGRAPH 01 - Professionalism,30664.0,,,,,Male,Black,,2016.0,,,,Public Initiated,,Other,,0
1,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30667.0,Black,Male,60.0,,Female,White,,2016.0,Black,60.0,Male,Public Initiated,M,Other,55-69,0
2,Public Initiated,2016-0002-P,2016-01-02,2016-01-01,2016-08-03,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,7th District,Night Watch,Patrol,Regular Working,Between 11pm-7am,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,30669.0,Black,Male,44.0,,Female,White,,2016.0,Black,44.0,Male,Public Initiated,M,Other,39-54,0
3,Public Initiated,2016-0009-P,2016-01-04,2016-01-04,2017-03-20,DI-1,Completed,Unfounded,FOB - Field Operations Bureau,8th District,8th District,Patrol,Regular Working,Between 3pm-11pm,RULE 2: MORAL CONDUCT,PARAGRAPH 01 - ADHERENCE TO LAW,30671.0,White,Male,,,,,,2017.0,White,,Male,Public Initiated,W,Other,,0
4,Public Initiated,2016-0006-P,2016-12-30,2016-01-04,2016-07-25,DI-1,Completed,Exonerated,FOB - Field Operations Bureau,Command Staff,Admin,,,,RULE 4: PERF OF DUTY,PARAGRAPH 02 - INSTRUCTIONS FROM AUTHORITATIVE SOURCE,30674.0,Black,Male,54.0,,Female,Black,50.0,2016.0,Black,54.0,Male,Public Initiated,M,Other,39-54,0


In [20]:
df2.Sustained.value_counts()

0    3413
1     801
Name: Sustained, dtype: int64

In [21]:
model = smf.logit("""
    Sustained ~
        C(minority, Treatment('W'))
        + C(Officer_Gender, Treatment('Male'))
        + C(Incident_Type, Treatment('Public Initiated'))
        + C(Officer_Age_bin, Treatment('25-38'))
""", data=df2)

results = model.fit()
results.summary()



         Current function value: 0.472642
         Iterations: 35


0,1,2,3
Dep. Variable:,Sustained,No. Observations:,3498.0
Model:,Logit,Df Residuals:,3490.0
Method:,MLE,Df Model:,7.0
Date:,"Wed, 07 Apr 2021",Pseudo R-squ.:,0.07999
Time:,19:27:28,Log-Likelihood:,-1653.3
converged:,False,LL-Null:,-1797.1
Covariance Type:,nonrobust,LLR p-value:,2.824e-58

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-1.8492,0.092,-20.118,0.000,-2.029,-1.669
"C(minority, Treatment('W'))[T.M]",0.0128,0.092,0.139,0.890,-0.168,0.194
"C(Officer_Gender, Treatment('Male'))[T.Female]",-0.2332,0.112,-2.085,0.037,-0.452,-0.014
"C(Incident_Type, Treatment('Public Initiated'))[T.Rank Initiated]",1.4482,0.087,16.622,0.000,1.277,1.619
"C(Officer_Age_bin, Treatment('25-38'))[T.under 25]",0.0146,0.201,0.073,0.942,-0.379,0.408
"C(Officer_Age_bin, Treatment('25-38'))[T.39-54]",-0.0650,0.094,-0.693,0.488,-0.249,0.119
"C(Officer_Age_bin, Treatment('25-38'))[T.55-69]",0.0484,0.180,0.269,0.788,-0.304,0.400
"C(Officer_Age_bin, Treatment('25-38'))[T.over 70]",-17.6936,8762.720,-0.002,0.998,-1.72e+04,1.72e+04


In [22]:
coefs = pd.DataFrame({
    'coef': results.params.values,
    'odds ratio': np.exp(results.params.values),
    'pvalue': results.pvalues,
    'name': results.params.index
})
coefs

Unnamed: 0,coef,odds ratio,pvalue,name
Intercept,-1.849183,0.1573657,5.177685999999999e-90,Intercept
"C(minority, Treatment('W'))[T.M]",0.012795,1.012877,0.8897253,"C(minority, Treatment('W'))[T.M]"
"C(Officer_Gender, Treatment('Male'))[T.Female]",-0.233205,0.7919913,0.03703136,"C(Officer_Gender, Treatment('Male'))[T.Female]"
"C(Incident_Type, Treatment('Public Initiated'))[T.Rank Initiated]",1.448157,4.255267,4.856171000000001e-62,"C(Incident_Type, Treatment('Public Initiated'))[T.Rank Initiated]"
"C(Officer_Age_bin, Treatment('25-38'))[T.under 25]",0.014646,1.014754,0.9418631,"C(Officer_Age_bin, Treatment('25-38'))[T.under 25]"
"C(Officer_Age_bin, Treatment('25-38'))[T.39-54]",-0.064998,0.9370691,0.4884349,"C(Officer_Age_bin, Treatment('25-38'))[T.39-54]"
"C(Officer_Age_bin, Treatment('25-38'))[T.55-69]",0.048351,1.049539,0.7877151,"C(Officer_Age_bin, Treatment('25-38'))[T.55-69]"
"C(Officer_Age_bin, Treatment('25-38'))[T.over 70]",-17.693635,2.068959e-08,0.9983889,"C(Officer_Age_bin, Treatment('25-38'))[T.over 70]"


# Testing the logistic regression without the NaNs

Since missing values may sometimes affect the result of a regression, we eliminated all of the rows with missing values and created a new dataframe. We then performed the above regression on the new dataframe, but both the odds ratios and p-values remained mostly unchanged. 

In [23]:
new_df = df2.drop(columns = ['Incident Type', 'Date Complaint Received by NOPD (PIB)', 'Complaint classification',
                  'Bureau of Complainant','Division of Complainant','Unit of Complainant','Date Complaint Occurred',
                  'Unit Additional Details of Complainant','Working Status of Complainant','Shift of Complainant',
                 'Unique Officer Allegation ID','Officer Race Ethnicity','Officer Age','Officer years of service',
                 'Officer Gender','Complainant Gender','Complainant Ethnicity','Complainant Age'])

In [24]:
new_df = new_df.dropna()

In [25]:
new_df.head()

Unnamed: 0,Complaint Tracking Number,Date Complaint Investigation Complete,Investigation status,Disposition,Rule Violation,Paragraph Violation,Year_Complete,Officer_Race_Ethnicity,Officer_Age,Officer_Gender,Incident_Type,minority,Disposition_new,Officer_Age_bin,Sustained
1,2016-0002-P,2016-08-03,Completed,Exonerated,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,2016.0,Black,60.0,Male,Public Initiated,M,Other,55-69,0
2,2016-0002-P,2016-08-03,Completed,Exonerated,RULE 4: PERF OF DUTY,PARAGRAPH 04 - NEGLECT OF DUTY,2016.0,Black,44.0,Male,Public Initiated,M,Other,39-54,0
4,2016-0006-P,2016-07-25,Completed,Exonerated,RULE 4: PERF OF DUTY,PARAGRAPH 02 - INSTRUCTIONS FROM AUTHORITATIVE SOURCE,2016.0,Black,54.0,Male,Public Initiated,M,Other,39-54,0
5,2016-0007-P,2016-07-25,Completed,Unfounded,RULE 4: PERF OF DUTY,PARAGRAPH 02 - INSTRUCTIONS FROM AUTHORITATIVE SOURCE,2016.0,Black,53.0,Male,Public Initiated,M,Other,39-54,0
7,2016-0004-R,2017-03-27,Completed,Other,RULE 7: DEPT PROPERTY,PARAGRAPH 03 - CLEANLINESS OF DEPARTMENT EQUIPMENT,2017.0,Black,47.0,Male,Rank Initiated,M,Other,39-54,0


In [26]:
model = smf.logit("""
    Sustained ~
        C(minority, Treatment('W'))
        + C(Officer_Gender, Treatment('Male'))
        + C(Incident_Type, Treatment('Public Initiated'))
""", data=df2)

results = model.fit()
results.summary()

Optimization terminated successfully.
         Current function value: 0.473037
         Iterations 6


0,1,2,3
Dep. Variable:,Sustained,No. Observations:,3817.0
Model:,Logit,Df Residuals:,3813.0
Method:,MLE,Df Model:,3.0
Date:,"Wed, 07 Apr 2021",Pseudo R-squ.:,0.07296
Time:,19:27:28,Log-Likelihood:,-1805.6
converged:,True,LL-Null:,-1947.7
Covariance Type:,nonrobust,LLR p-value:,2.619e-61

0,1,2,3,4,5,6
,coef,std err,z,P>|z|,[0.025,0.975]
Intercept,-1.8555,0.078,-23.650,0.000,-2.009,-1.702
"C(minority, Treatment('W'))[T.M]",0.0177,0.087,0.203,0.839,-0.153,0.188
"C(Officer_Gender, Treatment('Male'))[T.Female]",-0.2653,0.106,-2.512,0.012,-0.472,-0.058
"C(Incident_Type, Treatment('Public Initiated'))[T.Rank Initiated]",1.3862,0.083,16.612,0.000,1.223,1.550


In [27]:
coefs = pd.DataFrame({
    'coef': results.params.values,
    'odds ratio': np.exp(results.params.values),
    'pvalue': results.pvalues,
    'name': results.params.index
})
coefs

Unnamed: 0,coef,odds ratio,pvalue,name
Intercept,-1.855512,0.156373,1.185409e-123,Intercept
"C(minority, Treatment('W'))[T.M]",0.017657,1.017814,0.8389938,"C(minority, Treatment('W'))[T.M]"
"C(Officer_Gender, Treatment('Male'))[T.Female]",-0.265258,0.767008,0.01202003,"C(Officer_Gender, Treatment('Male'))[T.Female]"
"C(Incident_Type, Treatment('Public Initiated'))[T.Rank Initiated]",1.386168,3.999495,5.730806e-62,"C(Incident_Type, Treatment('Public Initiated'))[T.Rank Initiated]"
