# Overview

With the recent publicity regarding police shootings in the United States, two of the questions that have arose from this are 1) Are police in the United States more prone to acts of excessive violence more so than any other developed country and 2) Are African-Americans disproportionately more likely to be targeted and shot at than any other race (respective to their population size)?  

Barring all external factors, we should expect to see officers in uniform treat people of all demographics with the same respect and response, and for all ethnicities to behave more or less the same way, with respect to the law. All of these incidents regarding blue on black violence projects the image that police are unfairly targeting African Americans and quick to turn to excessive force.

# Hypothesis

One component of addressing this issue will be done by examining the demographics behind all reported fatal incidents between February 2015 to July 2017. Our hypothesis is that police unfairly bestow judgment and execution on certain demographics, mostly African-Americans, and we will not not see an equal distribution on killings relative to the proportion of each demographic in the US. The null hypothesis will be that race has no impact on police killing rates, and fatality rates by ethnicities  will be roughly in line with the average fatality rates.

In [1]:
%matplotlib inline
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt
from scipy import stats

In [2]:
PATH = 'C:\\Users\\mhuh22\\Desktop\\Thinkful\\Unit_1\\Capstone\\PoliceKillingsUS.csv'
data = pd.read_csv(PATH, encoding='latin1')

# Data

In order to investigate this issue, i will examining data provided by the Guardian on police shooting fatalities from February 2015 to July 2017. The dataset used for this proposal requires minimal data cleanup, but all incomplete entries have been removed. Even with these entries missing at random (roughly 10% of the dataset), as well as incidents that may not have been properly reported, we have a sufficiently large sample size of 2254 incidents to assess this question.

Since the population size of each ethnicity group is clearly not the same, I’ll be comparing the fatality rates to the US Census Population statistics. If it turns out that ethnicity is not a significant factor in these incidents, then we should expect to see these percentages be fairly close to one another.

In [3]:
print('Total entries: ' + str(len(data.index)))
data = data.dropna()
print('Valid entries: ' + str(len(data.index)))
pd.concat([data.head(3),data.tail(3)])

Total entries: 2535
Valid entries: 2254


Unnamed: 0,id,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera
0,3,Tim Elliot,02/01/15,shot,gun,53.0,M,A,Shelton,WA,True,attack,Not fleeing,False
1,4,Lewis Lee Lembke,02/01/15,shot,gun,47.0,M,W,Aloha,OR,False,attack,Not fleeing,False
2,5,John Paul Quintero,03/01/15,shot and Tasered,unarmed,23.0,M,H,Wichita,KS,False,other,Not fleeing,False
2525,2820,Deltra Henderson,27/07/17,shot,gun,39.0,M,B,Homer,LA,False,attack,Car,False
2533,2817,Isaiah Tucker,31/07/17,shot,vehicle,28.0,M,B,Oshkosh,WI,False,attack,Car,True
2534,2815,Dwayne Jeune,31/07/17,shot,knife,32.0,M,B,Brooklyn,NY,True,attack,Not fleeing,False


# Analysis

For the purposes of this analysis, the only variable that we will be examining from this dataset will be ethnicity. In order to determine if there is a correlation and fatality rates, we will be examining the fatality rates for each ethnicity, divide that by the total to get the fatality rate. We will then compare it to the ethnic proportions in the United States, and finally divide the fatality rate for each ethnicity by their proportion to determine their probability of being a victim of police brutality.. A value of 
1. <1 means that this group is targeted less than average
2. 1 means that this group is target fairly
3. >1 means that this group is targeted more than average

In [5]:
data['fatalities'] = 0
count = data[['race','fatalities']].groupby(['race']).count()

count['bodycount'] = round(count['fatalities'].astype(float)/(len(data.index))*100,1)
count['population'] = [4.8,12.6,16.3,0.9,6.2,72.4]
count['likelihood'] = count['bodycount']/count['population']
count['race'] = ['Asian','Black','Hispanic','Native American','Other','White']
print(count)

      fatalities  bodycount  population  likelihood             race
race                                                                
A             36        1.6         4.8    0.333333            Asian
B            592       26.3        12.6    2.087302            Black
H            401       17.8        16.3    1.092025         Hispanic
N             29        1.3         0.9    1.444444  Native American
O             28        1.2         6.2    0.193548            Other
W           1168       51.8        72.4    0.715470            White


# Conclusion

Our results show that African-Americans are indeed more likely to be a victim of police brutality than any other demographic when taking their population size into account. Now, the obvious way to continue this project would be to first examine other factors for these victims such as their location, what they were carrying that made them appear suspicious, and how they reacted to the officer.

# Sources

Kaggle Dataset

https://data.world/nicholsn/2016-police-killings-us-db

Guardian Database of Police Killings

https://www.theguardian.com/us-news/ng-interactive/2015/jun/01/about-the-counted
https://www.theguardian.com/us-news/ng-interactive/2015/jun/01/the-counted-police-killings-us-database