In [None]:
import numpy as np 
import pandas as pd 
import os
from matplotlib import pyplot as plt
plt.style.use('fivethirtyeight')

df = pd.read_csv('/kaggle/input/data-police-shootings/fatal-police-shootings-data.csv')

First of all, I'd like to acknowledge the harrowing magnitude of this data. As data scientists, we may be used to dealing with numbers and unicode characters, but every name below is a PERSON who lost their lives, many at the hands of police brutality. Furthermore, it is incumbent upon us to be aware of and take active steps to rectify the explicit and implicit racism that pervades our country and world. Black Lives Matter.

I encoruage everyone reading (yes, that includes you) to donate in support of black lives and communities of color: https://nymag.com/strategist/article/where-to-donate-for-black-lives-matter.html

I personally have donated to the National Police Accountability Project, considering the topic of this notebook.
https://www.nlg-npap.org/donate-to-npap/



# Tackling Data Bias
Could the fact that the officer was wearing a Body Camera have an impact on the data reported, since it is less likely they could "fudge" the data? The most likely data to be misreported are those that would justify the use of deadly force; specifically, the armed column seems the most likely culprit. I'm not looking at threat_level since it is my understanding that the police themselves did not report that data and it was constructed a posteri


## Armed Bias

Starting with the armed category, the first question we need to contend with is categorizing the weapons in a useful way. Here are the following categories I've come up with:
1. Gun 
2. Long-Range High-Threat Weapon (Crossbow, Nailgun, etc.)
3. Short-Range High-Threat Weapon (Knife, Sword, Axe, etc.)
4. Medium-Threat Weapon (Most Blunt Objects)
5. Low-Threat Weapon (Weak Blunt Objects)
6. Unknown Weapon (Not reported)
7. Faux Weapon (Appeared to have a more dangerous weapon)
8. Unarmed

Keep in mind, these aren't definitive by any means (after all, I'm not a weapons expert). And it is also worth mentioning that the wielder of the weapon also has an influence on the threat level. But for now, I believe these distinctions capture the fact that not all weapons are created equal, especially when it comes to justifying the use of deadly force. I've created my own excel sheet and went through and categorized the weapons by hand.

In [None]:
from IPython.display import display, HTML

df_weapons = pd.read_csv('/kaggle/input/police-shootings-weapon-type/weapon_types.csv')
display(HTML(df_weapons.to_html()))

slices_bc_on = [0, 0, 0, 0, 0, 0, 0, 0]
slices_bc_off = [0, 0, 0, 0, 0, 0, 0, 0]

for i in range(df.shape[0]):
    weapon = df.loc[i, 'armed']
    if isinstance(weapon, float):
        weapon_id = 5
    else:
        weapon_id = int(df_weapons.loc[df_weapons['weapon'] == weapon, 'weapon_type'])
    bc = df.loc[i, 'body_camera']
    if bc == True:
        slices_bc_on[weapon_id] = slices_bc_on[weapon_id] + 1
    elif bc == False:
        slices_bc_off[weapon_id] = slices_bc_off[weapon_id] + 1

bc_on_count = df[df['body_camera'] == True].shape[0]
bc_off_count = df[df['body_camera'] == False].shape[0]    

labeling = ['Gun', 'LR High-Threat', 'SR High-Threat', 'Medium-Threat', 'Low-Threat', 'Unknown', 'Faux', 'Unarmed']




In [None]:

def compare_bars(slices_bc_on, slices_bc_off, labeling, ylabel, title, plot_width, plot_height):
    x = np.arange(len(labeling))  # the label locations
    width = 0.35  # the width of the bars
    fig, ax = plt.subplots()
    rects1 = ax.bar(x - width/2, np.around(slices_bc_on,decimals=3), width, label='Body Cam On')
    rects2 = ax.bar(x + width/2, np.around(slices_bc_off,decimals=3), width, label='Body Cam Off')
    # Add some text for labels, title and custom x-axis tick labels, etc.
    ax.set_ylabel(ylabel)
    ax.set_title(title)
    ax.set_xticks(x)
    ax.set_xticklabels(labeling)
    ax.legend()
    autolabel(rects1)
    autolabel(rects2)
    for rect in rects1:
        height = rect.get_height()
        ax.annotate('{}'.format(height),
                    xy=(rect.get_x() + rect.get_width() / 2, height),
                    xytext=(0, 3),  # 3 points vertical offset
                    textcoords="offset points",
                    ha='center', va='bottom') 
    for rect in rects2:
        height = rect.get_height()
        ax.annotate('{}'.format(height),
                    xy=(rect.get_x() + rect.get_width() / 2, height),
                    xytext=(0, 3),  # 3 points vertical offset
                    textcoords="offset points",
                    ha='center', va='bottom') 
    fig = plt.gcf()
    fig.set_size_inches(plot_width,plot_height)
    plt.show()
    

In [None]:
print("There are %i instances where the body camera was reported to be on" %(bc_on_count))
print("There are %i instances where the body camera was reported to be off" %(bc_off_count))

slices_bc_on = np.array(slices_bc_on) * (1/sum(slices_bc_on))
slices_bc_off = np.array(slices_bc_off) * (1/sum(slices_bc_off))

ylabel = 'Percent Reported Weapon Type'
title = 'Reported Weapon Type Grouped by Body Camera Status'

compare_bars(slices_bc_on, slices_bc_off, labeling, ylabel, title, 15, 10)

The following reported weapon types proportionally *decrease* in the presence of body cameras:
* Gun
* Long Range High Threat
* Unknown

And the following reported weapon types proportionally *increase* in the persence of body cameras:
* Short Range High Threat
* Medium Threat
* Low Threat
* Faux
* Unarmed

The easiest change to understand and explain would be the decrease of Unknown weapon types in the presence of body cameras, since if there is a recording of the entire interaction it should be much easier to surmise what weapon was being used.

Observe that Guns and Long Range High Threat weapons were less likely to be observed in the presence of body cameras and Unarmed, Faux and Low-Threat weapons were more likely to observed in the presence of body cameras. This would be consistent with the hypothesis that police may be more likely to report a weapon type that justifies their use of deadly force in the absence of an objective body camera.

However, we must acknowledge that the sample size for body camera use is somewhat small and some of the variations in weapon type proportions are also small. Additionally, the increase of Short Range Weapon types reported in the presence of body cameras breaks this trend, which is evidence that the sample size might be insufficiently small or that the hypothesis is incorrect.

Let's try slicing this data again, this time by race, to see if that has an influence on the reporting of the data.


In [None]:
slices_bc_on_black = [0, 0, 0, 0, 0, 0, 0, 0]
slices_bc_off_black = [0, 0, 0, 0, 0, 0, 0, 0]

df_black = df[df.race == 'B']
df_black = df_black.reset_index()

for i in range(df_black.shape[0]):
    weapon = df_black.loc[i, 'armed']
    if isinstance(weapon, float):
        weapon_id = 5
    else:
        weapon_id = int(df_weapons.loc[df_weapons['weapon'] == weapon, 'weapon_type'])
    bc = df_black.loc[i, 'body_camera']
    if bc == True:
        slices_bc_on_black[weapon_id] = slices_bc_on_black[weapon_id] + 1
    elif bc == False:
        slices_bc_off_black[weapon_id] = slices_bc_off_black[weapon_id] + 1

bc_on_count_black = df_black[df_black['body_camera'] == True].shape[0]
bc_off_count_black = df_black[df_black['body_camera'] == False].shape[0]  


slices_bc_on_white = [0, 0, 0, 0, 0, 0, 0, 0]
slices_bc_off_white = [0, 0, 0, 0, 0, 0, 0, 0]

df_white = df[df.race == 'W']
df_white = df_white.reset_index()

for i in range(df_white.shape[0]):
    weapon = df_white.loc[i, 'armed']
    if isinstance(weapon, float):
        weapon_id = 5
    else:
        weapon_id = int(df_weapons.loc[df_weapons['weapon'] == weapon, 'weapon_type'])
    bc = df_white.loc[i, 'body_camera']
    if bc == True:
        slices_bc_on_white[weapon_id] = slices_bc_on_white[weapon_id] + 1
    elif bc == False:
        slices_bc_off_white[weapon_id] = slices_bc_off_white[weapon_id] + 1

bc_on_count_white = df_white[df_white['body_camera'] == True].shape[0]
bc_off_count_white = df_white[df_white['body_camera'] == False].shape[0]  

print("There are %i instances where the body camera was reported to be on for black individuals" %(bc_on_count_black))
print("There are %i instances where the body camera was reported to be off for black individuals" %(bc_off_count_black))
print("Probability body camera was on for black individuals: %.0f%%" %(100 * bc_on_count_black / (bc_on_count_black + bc_off_count_black) ))

slices_bc_on_black = np.array(slices_bc_on_black) * (1/sum(slices_bc_on_black))
slices_bc_off_black = np.array(slices_bc_off_black) * (1/sum(slices_bc_off_black))

ylabel = 'Percent Reported Weapon Type'
title = 'Reported Weapon Type Grouped by Body Camera Status for Black Individuals'

compare_bars(slices_bc_on_black, slices_bc_off_black, labeling, ylabel, title, 15, 10)

print("There are %i instances where the body camera was reported to be on for white individuals" %(bc_on_count_white))
print("There are %i instances where the body camera was reported to be off for white individuals" %(bc_off_count_white))
print("Probability body camera was on for white individuals: %.0f%%" %(100 * bc_on_count_white / (bc_on_count_white + bc_off_count_white) ))

slices_bc_on_white = np.array(slices_bc_on_white) * (1/sum(slices_bc_on_white))
slices_bc_off_white = np.array(slices_bc_off_white) * (1/sum(slices_bc_off_white))

ylabel = 'Percent Reported Weapon Type'
title = 'Reported Weapon Type Grouped by Body Camera Status for White Individuals'

compare_bars(slices_bc_on_white, slices_bc_off_white, labeling, ylabel, title, 15, 10)

First interesting observation: Police are a bit less than twice as likely to have their body camera on for black individuals than for white individuals. 

Additionally, we see that the troubling disparities encountered in the general dataset are greater amongst White individuals than Black individuals. 
