In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Police Shooting Insights

With the nation divided on the conduct of law enforcement, I decided to look into the Washington Post dataset to investigate trends hidden in the numbers.

Given the current scenarios with protests breaking out across many cities and states, I decided to focus the analysis and visualizations around the following measures:

* Geography - Death toll by State and City. 
* Race - Death toll by race
* Armed Category - If individuals were armed when they were killed
* Flee Category - If individuals were fleeing from the police when they were killed
* Threat Level - If individuals were attacking the police when they were killed
* Time - Death by race analyzed by Month and Year 
* Body Camera - If police disproportionately kills more of a specific race with or without body camera


Basic assumptions. 

As the rules of society and law enforcement tell us, if you encounter law enforcement agent and comply to all their commands, there will be no reason for law enforcement to use lethal force. On the other hand, if you do not comply with commands and try to attack agents in any way, shape or form, law enforcement may use lethal force in case they feel their life is in danger. 


Key Findings



1. Individuals that were 'Unarmed', 'Not Attacking', and 'Not Fleeing' from the police accounted for 1.9% of all deaths in the dataset. This was a total of 103 individuals over the last 5 years. Focusing on racial biases, I found that the % of deaths by race were of:

White: 44.11%, Black: 30.39%, Hispanic: 18.62%, Other: 2.94%, Asian: 2.94%, Native American: 0.09%. Therefore, 'White' was the most affected race by 'unjustified' shooting situations, followed by 'Black' and 'Hispanic'.


2. Individuals that were 'Armed', 'Attacking', but 'Not Fleeing' from the police showed the highest mortality levels. 

White: 55.44%, Black: 24.52%, Hispanic: 15.53%, Asian: 2.06%, Native American: 1.42%, Other: 1.00%. Therefore, we can see that 'White' leads aggressive encounters with the police by a large margin, followed by 'Black' and 'Hispanic' respectively.

3. Individuals that were 'Armed', 'Attacking', and 'Fleeing' from the police was the second highest mortality levels. 

White: 46.43%, Black: 31.83%, Hispanic: 18.33%, Native American: 1.53%, Other 0.98%, Asian: 0.87%. In this analysis, we can see that 'White' leads in this category, but there is a substantial increase in the death toll for 'Black'. This is likely because 'Black' has the highest rate of 'Fleeing' from law enforcement. 

4. There was no substantial variation in death rates throughout the months of the year. The death rates also remained relatively constant from 2015 through 2019, with less than 1% variation.  


5. Regarding Body Cameras, 'White' were killed the most when not using body cameras. Only 9.45% of 'White' deaths were caught on a body camera, followed by 11.75% for 'Hispanic' and 15.71% for 'Black'. 'Asian' was the most likely to be recorded on body cameras, with 17.20%. With these numbers in mind, White were 66.24% more likely to be killed without body camera when compared to Black, and 24.33% more likely to be killed without body camera when compared to Hispanic.



**Were people primarily armed?**


Considering the entire dataset, the 'Other' race was the lowest armed at 86%, and surprisingly 'Asian' led the armed category at 91.1%, despite the low number of total 'Asian' fatalities.

**Which race fled the most from the police?**


'Black' led the flee category at 41.18%, while'Asian' was the least likely race to flee from the police at 19.31%.

**Which race was most likely to attack the Police**


In order to address this issue, I analyzed the likelihood of 'Armed' and 'Unarmed' individuals to 'Attack' officers.

When 'Armed', 'Other' was most likely to attack the police at 74.35%, 'Black' at 72.68% and 'White' at 70.10%
While when 'Unarmed', 'White' led the number of attacks at 43.15% and 'Black' following at 41.46%



**When looking at Geography these were the deadliest States and Cities for White, Black and Hispanic.**

* City: 'Chicago' was the deadliest city for 'Black', 'Los Angeles' the deadliest city for 'Hispanic' and 'Phoenix' the deadliest city for 'White'.

* State: 'California' was the deadliest state for 'Black', 'Hispanic' and 'White'



Surprisingly, the deadliest states for individuals 'Unarmed' , 'Not Attacking' and 'Not Fleeing' diverged from the states with the highest numbers of total fatalities. These states were:

1. NE: 8.33%
2. DC: 7.69%
3. MN: 4.91%
4. CT: 4.76%
5. OK: 4.26%

* These numbers account for the percentage of 'Unarmed' , 'Not Attacking' and 'Not Fleeing' killed compared to the total deaths in the state.



Almost half of all individuals killed under these circumstances above were 'White'. 

1. White: 46.43%
2. Black: 31.83%
3. Hispanic: 18.33%



Therefore, considering only this dataset and disregarding external factors, the analysis does not suggest that the police is disproportionately targeting minorities. In most categories, 'White' led the death toll, followed by 'Black' and 'Hispanic'. A small percentage (1.9%) of deaths were from individuals 'Unarmed' , 'Not Attacking' and 'Not Fleeing', suggesting that 'Police Brutality' is not a predominant source of mortality. Within cases of 'Police Brutality', 'White' was the most impacted, being 50% more likely to die compared to 'Black', and 250% compared to 'Hispanic'. 

# IMPORTING LABRARIES AND READING DATA

In [None]:
import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
import os
import datetime
%matplotlib inline

In [None]:
df1 = pd.read_csv('../input/data-police-shootings/fatal-police-shootings-data.csv')


# INITIAL LOOK AT THE DATAFRAME

In [None]:
df1.head()

In [None]:
df1.shape

In [None]:
df1.info()

In [None]:
df1.isnull().sum()

In [None]:
df1.describe()

In [None]:
df1.id = df1.id.astype('category')
df1.armed = df1.armed.astype('category')
df1.gender = df1.gender.astype('category')
df1.city = df1.city.astype('category')
df1.state = df1.state.astype('category')
df1.race = df1.race.astype('category')
df1.threat_level = df1.threat_level.astype('category')
df1.flee = df1.flee.astype('category')
df1.manner_of_death = df1.manner_of_death.astype('category')

#Properly assinging categorical records as a category

In [None]:
df1.info()

In [None]:
df1.corr()


In [None]:
sns.heatmap(df1.corr())

In [None]:
df1.replace(to_replace = ['A'], value = ['Asian'], inplace = True)
df1.replace(to_replace = ['B'], value = ['Black'], inplace = True)
df1.replace(to_replace = ['H'], value = ['Hispanic'], inplace = True)
df1.replace(to_replace = ['N'], value = ['Native American'], inplace = True)
df1.replace(to_replace = ['O'], value = ['Other'], inplace = True)
df1.replace(to_replace = ['W'], value = ['White'], inplace = True)

#Properly naming each one of the races, to facilitate analysis and comprehension in visualizations

In [None]:
df1['month'] = pd.to_datetime(df1['date']).dt.month
df1['year'] = pd.to_datetime(df1['date']).dt.year
df1.head()

In [None]:
MissingPercentage = (((df1.isna().sum())/df1.shape[0])*100)
MissingPercentage

In [None]:
# Exploratory look at the data. Focus on Manner of Death, Armed, Gender, Race, Threat Level and Flee

In [None]:
df1.manner_of_death.value_counts()
#Majority of individuals were 'just' shot and not tasered and shot. 

In [None]:
df1.armed.unique()

# Large variety of armed categories. Will have to be categorized in order to improve comprehension 

In [None]:
df1.armed.value_counts(normalize=True)

#we can see the majority of the armed categories were gun, knife, toy weapon and undetermined

In [None]:
df1.race.value_counts(normalize=True)
# White, Black and Hispanic accounted for 95.5% of all deaths. Might be worth focusing on them, and contrasting these three races with other races

In [None]:
df1.threat_level.value_counts(normalize=True)

# Majority of individuals killed attacked the Police. One observation is that the 'Other' and 'Undertermined' categories are very subjective.

In [None]:
df1.flee.value_counts(normalize=True)
# We can see that a large part of the individuals don't run from the police. 

# CREATING CATEGORIES - BUCKETING

In [None]:
# In order to facilitate our analysis, and understand if there is racial baisis in shootings, we will create categories for the following
# Armed = Will be categorized into Armed and Unarmed
# Fleeing = Will be categorized into Fleeing and Not Fleeing


**ARMED CATEGORY - BUCKET**

In [None]:
list(df1.armed.unique())

In [None]:
UnavailableUndetermined = ['NaN','undetermined',]
Unarmed = ['unarmed']
Armed = ['gun',
 'toy weapon',
 'nail gun',
 'knife',
 'shovel',
 'hammer',
 'hatchet',
 'sword',
 'machete',
 'box cutter',
 'metal object',
 'screwdriver',
 'lawn mower blade',
 'flagpole',
 'guns and explosives',
 'cordless drill',
 'crossbow',
 'metal pole',
 'Taser',
 'metal pipe',
 'metal hand tool',
 'blunt object',
 'metal stick',
 'sharp object',
 'meat cleaver',
 'carjack',
 'chain',
 "contractor's level",
 'unknown weapon',
 'stapler',
 'beer bottle',
 'bean-bag gun',
 'baseball bat and fireplace poker',
 'straight edge razor',
 'gun and knife',
 'ax',
 'brick',
 'baseball bat',
 'hand torch',
 'chain saw',
 'garden tool',
 'scissors',
 'pole',
 'pick-axe',
 'flashlight',
 'vehicle',
 'baton',
 'spear',
 'chair',
 'pitchfork',
 'hatchet and gun',
 'rock',
 'piece of wood',
 'bayonet',
 'pipe',
 'glass shard',
 'motorcycle',
 'pepper spray',
 'metal rake',
 'crowbar',
 'oar',
 'machete and gun',
 'tire iron',
 'air conditioner',
 'pole and knife',
 'baseball bat and bottle',
 'fireworks',
 'pen',
 'chainsaw',
 'gun and sword',
 'gun and car',
 'pellet gun',
 'claimed to be armed',
 'BB gun',
 'incendiary device',
 'samurai sword',
 'bow and arrow',
 'gun and vehicle',
 'vehicle and gun',
 'wrench',
 'walking stick',
 'barstool',
 'grenade',
 'BB gun and vehicle',
 'wasp spray',
 'air pistol',
 'Airsoft pistol',
 'baseball bat and knife',
 'vehicle and machete',
 'ice pick',
 'car, knife and mace']

In [None]:
df_UnavailableUndetermined = pd.DataFrame({'armed': UnavailableUndetermined})
df_UnavailableUndetermined ['category'] = 'Unavailable_Undetermined'
df_UnavailableUndetermined

In [None]:
df_Unarmed = pd.DataFrame({'armed': Unarmed})
df_Unarmed ['category'] = 'Unarmed'
df_Unarmed

In [None]:
df_Armed = pd.DataFrame({'armed': Armed})
df_Armed ['category'] = 'Armed'
df_Armed

In [None]:
df_lookup2 = df_Armed
df_lookup2

In [None]:
df_lookup1 = df_lookup2.append(df_Unarmed)

In [None]:
df_lookup1.shape

In [None]:
df_lookup = df_lookup1.append(df_UnavailableUndetermined)
df_lookup

In [None]:
df2 = pd.merge(df1, df_lookup, on = 'armed', how = 'outer' )


In [None]:
df2 = df2.rename({'category':'armed_category'}, axis = 1)
df2.head()

In [None]:
df2.armed_category.value_counts(normalize = True)

**FLEE CATEGORY - BUCKET**

In [None]:
df2.flee.unique()

In [None]:
Fleeing = ['Car', 'Foot', 'Other']
NotFleeing = ['Not fleeing']


In [None]:
FleeLookUp2 = pd.DataFrame({'flee': Fleeing})
FleeLookUp2['flee_category'] = "Fleeing"
FleeLookUp1 = pd.DataFrame({'flee': NotFleeing})
FleeLookUp1['flee_category'] = "Not_Fleeing"


In [None]:
FleeLookUp = FleeLookUp1.append(FleeLookUp2)
FleeLookUp.head()

In [None]:
df3 = pd.merge(df2,FleeLookUp,how='outer', on = 'flee')
df3.head()

In [None]:
df3.flee_category.value_counts(normalize=True)

# INITIAL LOOK AT THE DATA

In [None]:
df3.race.value_counts(normalize=True)
#As we've seen previously, the majority of crimes are committed by 3 racial groups. White, Black and Hispanic
df3.race.value_counts(normalize=True).plot(kind='pie', figsize = (8,8))
plt.title('Deaths by Race\nNormalized Data')

In [None]:
df3.state.value_counts(normalize=True)[:10]

In [None]:
df3.state.value_counts(normalize=True)[:10].sum()
#we can see that the top 10 states in the US account for 53.32% of all deaths in the US. Migh be worth focusing on these states to look for trends

In [None]:
df3.city.value_counts(normalize=True)[:10]
#Interesting topic: For the top 10 states, some capitals were not present in the top 10 cities, or the opposite where the city is in the top 10, but not the state. This is the case for:
# Denver/CO, Kansas City/Kansas,Oklahoma / Oklahoma City, Georgia/ Atlanta, North Carolina / Raleigh, Washington / Seattle


In [None]:
# I will make a few filtered data sets to evaluate only specific sections of the dataset related to race, state and city

In [None]:
RaceList = ['White', 'Black', 'Hispanic']
df3_race = df3[df3.race.isin(RaceList)]
df3_race.race.unique()

In [None]:
#StateList = ['CA','TX','FL','AZ','CO','GA','OK','NC','OH','WA']
#df3_race_state = df3_race[df3_race.state.isin(StateList)]
#df3_race_state.state.unique()

In [None]:
CityList = ['Los Angeles','Phoenix','Houston','Las Vegas','San Antonio','Columbus','Chicago','Albuquerque','Kansas City','Jacksonville']
df3_race_city = df3_race[df3_race.city.isin(CityList)]
df3_race_city.city.unique()

# VISUALIZATIONS

# VISUALIZATIONS - GENERAL GEOGRAPHY

Focusing on the top 10 states and cities, and top 3 races

In [None]:
df3_race_city.groupby('race').city.value_counts(normalize=True).unstack().plot(kind='bar', figsize=(18,8))
plt.title('Deaths Per Race and City')
plt.ylabel('% of Total Deaths per Race')

In [None]:
df3_race_city.groupby('race').city.value_counts(normalize = True).unstack()

In [None]:
df3_race_state.groupby('race').state.value_counts(normalize=True).unstack().plot(kind='bar', figsize=(18,8))
plt.title('Deaths Per Race and State')
plt.ylabel('% of Total Deaths per Race')

In [None]:
df3_race_state.groupby('race').state.value_counts(normalize=True).unstack()

# VISUALIZATION - THREAT LEVEL, FLEE & ARMED BY RACE

In [None]:
df3.groupby('race').armed_category.value_counts().unstack().plot(kind = 'bar', stacked=True,figsize = (15,6))
plt.title('Total Number of Armed Individuals by Race')


df3.groupby('race').armed_category.value_counts(normalize=True).unstack().plot(kind = 'bar', stacked=True,figsize = (15,6))
plt.title('Percentage of Armed Individuals by Race')



In [None]:
vis1b_df = df3.groupby('race').flee_category.value_counts(normalize=True).unstack()
vis1b_df

In [None]:
vis1b_df.plot(kind = 'bar', stacked = True, figsize=(15,6))
plt.title('Percentage of Individuals by Flee Category')

In [None]:
VIS1D = df3[df3.armed_category == 'Armed'].groupby('race').threat_level.value_counts(normalize=True).unstack().plot(kind='bar', stacked= True, figsize=(18,6))
plt.title('Likelyhood of Individual to Attack When Armed')


VIS1E = df3[df3.armed_category == 'Unarmed'].groupby('race').threat_level.value_counts(normalize=True).unstack().plot(kind='bar', stacked= True, figsize=(18,6))
plt.title('Likelyhood of Individual to Attack When Unarmed')


# We can see all races are less likely to attack police when unarmed. 
# Asians are least likely to attack police overall. 
# Black, Other and White are the most likely to attack police both Armed and Unarmed

In [None]:
df3.groupby('race').armed_category.value_counts(normalize = True).unstack()

In [None]:
df3[df3.flee_category == 'Fleeing'].groupby('race').armed_category.value_counts(normalize=True).unstack()
#As a surprise, Asians are the most likely to try to flee in case they are unarmed, followed by Black

In [None]:
df3[df3.flee_category == 'Fleeing'].groupby('race').armed_category.value_counts(normalize=True).unstack().plot(kind = 'bar', stacked=False,figsize = (12,6))
# Likelyhood of individual trying to flee in case they are armed or unarmed

#  VISUALIZATION - DEATHS BY RACE AND STATE

In [None]:
df3.state.value_counts(normalize=False)[:10].plot(kind='pie', figsize=(10,10))
plt.title('Percentage of Deaths in Top 10 States')

In [None]:
VIS2A = df3_race_state[df3_race_state.armed_category == 'Armed'].groupby(['state','armed_category']).race.value_counts().unstack().plot(kind = 'bar', stacked=False, figsize = (18,6))
plt.title('Total Number of Individuals Killed When Armed, by State and Race')

VIS2B = df3_race_state[df3_race_state.armed_category == 'Unarmed'].groupby(['state','armed_category']).race.value_counts().unstack().plot(kind = 'bar', stacked=False, figsize=(18,6))
plt.title('Total Number of Individuals Killed When Unrmed, by State and Race')

# VISUALS 3 - IS THE POLICE KILLING UNARMED MINORITIES?

In [None]:
df3.groupby(['armed_category','race']).threat_level.value_counts(normalize=True).unstack()


In [None]:
df3[df3.armed_category == 'Unarmed'].groupby('race').threat_level.value_counts(normalize=False).unstack().plot(kind='bar', figsize=(15,6))
plt.title('Number of Deaths of Unarmed Individuals categorized by Threat Level and Race')

#The owner of the dataset probably needs to be more specific on what 'Other' in Threat Level means, given that it was the largest category for all races

In [None]:
VIS2B = df3_race_state.groupby('race').state.value_counts(normalize=True).unstack().plot(kind = 'bar', figsize = (18,6))
#Where do most races die based on % of total deaths in top 10 states

# VISUALIZATION - DEATHS BY RACE AND CITY


In [None]:
df3.city.value_counts(normalize=False)[:10].plot(kind='pie', figsize=(10,10))
plt.title('Deadliest Cities in the US')

In [None]:
VIS3A = df3_race_city.groupby('race').city.value_counts(normalize=False).unstack().plot(kind='bar', figsize=(18,6))
plt.title('Deadliest Cities in the US by Race')
VIS3A

In [None]:
df3_race_city[df3_race_city.armed_category == 'Unarmed'].groupby(['city','armed_category','threat_level']).race.value_counts(normalize=False).unstack()

In [None]:
df3_race_city.groupby(['armed_category','race']).city.value_counts(normalize=False).unstack().plot(kind='bar', stacked=True, figsize=(18,8))
plt.title('Armed Category and Race of Individuals Killed in Deadliest Cities')
#trend remains the same in deadliest cities, with the majority individuals killed being armed

# CURIOSITIES

Finding cases of Police Brutallity by Race and States

In [None]:
(df3.groupby('armed_category').flee_category.value_counts().unstack())

In [None]:
((df3.groupby('armed_category').flee_category.value_counts().unstack())/(df3.shape[0]))*100

# "Only" 3.5% of all deaths were related to unarmed civilians that were not fleeing. 



In [None]:
((df3.groupby(['armed_category','threat_level']).flee_category.value_counts().unstack())/(df3.shape[0]))*100

# "Only" 1.9% of all deaths were related to unarmed civilians that were not fleeing and were not attacking the police. 


Unarmed, Not attacking, Not Fleeing

In [None]:
ThreatLevelList = ['other', 'undetermined']

df_unarmed_nothreat_notfleeing = df3[(df3.threat_level.isin(ThreatLevelList)) & (df3.armed_category == 'Unarmed') & (df3.flee_category == 'Not_Fleeing')]
df_unarmed_nothreat_notfleeing.shape

In [None]:
df_unarmed_nothreat_notfleeing.race.value_counts(normalize=True)

Armed, Attacking, Not Fleeing

In [None]:
ThreatLevelList = ['attack']

df_armed_threat_notfleeing = df3[(df3.threat_level.isin(ThreatLevelList)) & (df3.armed_category == 'Armed') & (df3.flee_category == 'Not_Fleeing')]
df_armed_threat_notfleeing.shape

In [None]:
df_armed_threat_notfleeing.race.value_counts(normalize=True)

Armed, Attacking, Fleeing


In [None]:
ThreatLevelList = ['attack']

df_armed_threat_fleeing = df3[(df3.threat_level.isin(ThreatLevelList)) & (df3.armed_category == 'Armed') & (df3.flee_category == 'Fleeing')]
df_armed_threat_fleeing.shape

In [None]:
df_armed_threat_fleeing.race.value_counts(normalize=True)

In [None]:
# Percentage of killings per state, of citiezed that were unarmed, no threat and not fleeing

((df_unarmed_nothreat_notfleeing.state.value_counts(normalize=False)/df3.state.value_counts(normalize=False))*100).sort_values(ascending=False)

In [None]:
((df_unarmed_nothreat_notfleeing.state.value_counts()/df3.state.value_counts())*100).sort_values(ascending=False)[:10].plot(kind='pie', figsize=(10,10))

# Despite of low mortality rates in these states, the chance of being shot while unarmed, not posing threat and not fleeing is higher than in the states with higher total killings

# Deaths per Year, Month and Race

In [None]:
df3.year.value_counts(normalize=True)

In [None]:
df3.groupby('month').race.value_counts(normalize=True).unstack().plot(kind='bar', figsize=(18,6))

In [None]:
df3.groupby('year').race.value_counts(normalize=True).unstack().plot(kind='bar', figsize=(18,6))

# DEATH BY RACE WITH BODY CAMERA

In [None]:
df3.groupby('race').body_camera.value_counts(normalize=False).unstack().plot(kind='bar', figsize=(18,8))
plt.title('Total Number of Fatalities Captured on Body Camera by Race')
df3.groupby('race').body_camera.value_counts(normalize=True).unstack().plot(kind='bar', figsize=(18,8))
plt.title('Percentage of Fatalities Captured on Body Camera by Race')



In [None]:
df3.groupby('race').body_camera.value_counts().unstack()

In [None]:
df3.groupby('race').body_camera.value_counts(normalize=True).unstack()