# Police Shootings: Is the Police Racist?

2020 has been quite the year thus far, plagued with a variety of bad things (no pun intended). From Kobe Bryant's tragic passing to the COVID-19 pandemic, this year was already a dumpster fire in the making. But wait, there's more! Just when no one thought 2020 could get worse, George Floyd was murdered in cold blood by officers in the Minneapolis Police Department. Sparking national outrage, this event created the most volatile flashpoint in US politics today centered around the ideas of police brutality, systemic racism, and the role of government in elicting change. As pundits from the entire political spectrum offer their takes on what all is going on, the data lays in wait, holding the numbers that can inform the truth. In this notebook, I seek to cut through some of the buzz to see what's really going on using data to inform my conclusions. I do want to stress that this is a very delicate and nuanced issue, and that preconceived notions have to take a backseat when evaluating data and asking questions.  
  
This notebook is structured as follows: 
* Setup and Data
* Data Read-In and Characteristics
* Exploratory Data Analysis:  
1) Univariate Analysis  
2) Geospatial EDA  
3) Multi-Variate Analysis  
* Conclusion


## Setup and Data

### Libraries

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import datetime as datetime
import seaborn as sns
from pandas.plotting import autocorrelation_plot
from statsmodels.tsa.stattools import adfuller
from datetime import datetime
import plotly.graph_objects as go
from wordcloud import WordCloud

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

### Functions

In [None]:
def add_value_labels(ax, spacing=5):
    """Add labels to the end of each bar in a bar chart.

    Arguments:
        ax (matplotlib.axes.Axes): The matplotlib object containing the axes
            of the plot to annotate.
        spacing (int): The distance between the labels and the bars.
    """

    # For each bar: Place a label
    for rect in ax.patches:
        # Get X and Y placement of label from rect.
        y_value = rect.get_height()
        x_value = rect.get_x() + rect.get_width() / 2

        # Number of points between bar and label. Change to your liking.
        space = spacing
        # Vertical alignment for positive values
        va = 'bottom' 

        # If value of bar is negative: Place label below bar
        if y_value < 0:
            # Invert space to place label below
            space *= -1
            # Vertically align label at top
            va = 'top'

        # Use Y value as label and format number with two decimal place
        label = "{:.2f}".format(y_value)

        # Create annotation
        ax.annotate(
            label,                      # Use `label` as label
            (x_value, y_value),         # Place label at end of the bar
            xytext=(0, space),          # Vertically shift label by `space`
            textcoords="offset points", # Interpret `xytext` as offset in points
            ha='center',                # Horizontally center label
            va=va)                      # Vertically align label differently for
                                        # positive and negative values.


In [None]:
def easy_bar_plot(data, variable, title = "", xlab = "", ylab = "Proportion", xtick_rotation = 0): 
    sns.set()
    temp = data[variable].value_counts(dropna = False, normalize = True).to_frame()
    plt.figure(figsize=(10, 6))
    ax = temp[variable].plot(kind='bar',rot = xtick_rotation, color = 'mediumseagreen', edgecolor = 'black')
    ax.set_title(title)
    ax.set_xlabel(xlab)
    ax.set_ylabel(ylab)
    
    add_value_labels(ax)

In [None]:
#Creating a convenient function to plot empirical cumulative distribution plots for continuous
#variables
def ecdf_plot(data, variable, x_lab, display_lines = True):
    
    from statsmodels.distributions.empirical_distribution import ECDF
    
    '''Plot empirical cumulative distribution function of a numerical variable
    and plot mean (red) and median (green) lines
    
    Keyword Arguments: 
    data -- pandas Dataframe 
    variable -- column name (string) **must be a numerical input**
    x_lab -- X axis label (string)
    display_lines -- True or False 
    
    '''
    sns.set()
    ecdf = ECDF(data[variable])
    _= plt.plot(ecdf.x, ecdf.y, marker = ".", linestyle = "none", alpha = 0.2)
    _= plt.xlabel(x_lab)
    _= plt.ylabel("Cumulative Density")
    if (display_lines == True):
        _= plt.vlines(np.nanmean(data[variable]), ymax=1,ymin=0,colors='r')
        _= plt.vlines(np.nanmedian(data[variable]), ymax = 1, ymin = 0, colors = 'g')

### Data Read-in and Characteristics

In [None]:
full_police_data = pd.read_csv("/kaggle/input/data-police-shootings/fatal-police-shootings-data.csv")

In [None]:
full_police_data.head()

In [None]:
#Getting all metadata of the dataset
full_police_data.info()

## Exploratory Data Analysis

### Univariate Analysis

In this section, I'll be examining single variables to extract some preliminary insights from this data. Most of this will be focused on gauging variable distribution. 

This barplot shows the frequency breakdown of the **manner_of_death** variable. Given that this is a dataset of fatal police shootings, it makes sense that everyone died of a gunshot.  ~5% were tasered while dying, but it is unclear if the tasering itself is considered an additional cause of death alongside the shooting, or if the taser was used prior to shots being fired and killing the person. 

In [None]:
#Plotting bar plot of manner_of_death
easy_bar_plot(data= full_police_data, variable= 'manner_of_death', ylab= 'Proportion', title= 'Manner of Death')

The output below shows the proportions of levels of the **armed** variable. What is interesting to note is that roughly 9.5% of fatal police shootings in this data involved someone who was unarmed (roughly 6.5%) or someone who's armed status was undetermined (roughly 3%). The bar plot below illustrates the relative frequencies of a modified version of **armed** (called **armed_binned** in the dataset). The **armed_binned** variable was created because the original **armed** variable has 93 distinct values to it, making it very hard to visualize.

In [None]:
#Getting each state's proportion of transactions 
armed_props = full_police_data["armed"].value_counts().to_frame()/full_police_data.shape[0]
print(armed_props)

This wordcloud gives a good idea into what **armed** contains. Heavy emphasis on guns, knives, unarmed, undetermined, weapon, and toy.

In [None]:
wordcloud_armed = WordCloud(background_color='white', collocations= False).generate(' '.join(full_police_data.dropna()['armed']))

plt.figure(figsize = (10,10), facecolor = None) 
plt.imshow(wordcloud_armed, interpolation='bilinear') 
plt.axis("off") 
plt.tight_layout(pad = 0) 

plt.show() 

In [None]:
#Creating a list of armed levels that make up at least 1% of the entire dataset for visualization purposes
armed_reason_highenough = armed_props.index[armed_props.armed > 0.01]

#Creating the armed_binned variable that bins all other armed values below 1% of the total into "other"
full_police_data["armed_binned"] = full_police_data['armed'].apply(lambda x : x if x in armed_reason_highenough \
                                                                   else x if pd.notnull(x) == False else 'other')

Based on this plot of **armed_binned**, the majority of fatal shootings involve a suspect armed with a gun. Additionally, 7% of fatal police shootings in this data involve someone who is unarmed. It is important to note that this variable on its own does not indicate conduct towards police with regards to a weapon.

In [None]:
#Plotting barplot of armed_binned
easy_bar_plot(data=full_police_data, variable= 'armed_binned', title= "Armed Status", xtick_rotation=45)

The empirical cumulative density function (ECDF) plot below shows how **age** is distributed in this dataset. The distribution is somewhat right skewed, as indicated by the fact that 60% of fatal shootings involve someone under the age of 40. The green line represents the median age, while the red line represents the mean age. The median age is a bit lower than the mean age, which reinforces the notion that **age** is slightly right skewed.

In [None]:
#Plotting empirical cumulative distribution function of age
ecdf_plot(data= full_police_data, variable= 'age', x_lab= 'Age (years)')

The following barplot of **gender** suggests that men are overwhelmingly the victims in fatal police shootings. 

In [None]:
#Plotting barplot of gender
easy_bar_plot(data= full_police_data, variable= 'gender', title= 'Fatal Shootings by Gender', xlab="Gender")

Below you will see how the **race** breaks down across fatal shootings. Relative to their corresponding proportions of the US population, White and Asian people are underrepresented while Black and Hispanic people are overrepresented. Additionally, there is a decent portion (10%) of observations that do not have race reported, which could affect representation.

In [None]:
#Plotting barplot of race
easy_bar_plot(data=full_police_data, variable= 'race', xlab="Race", title='Fatal Police Shootings by Race')

Below is a plot of shootings broken down by **signs_of_mental_illness**. The majority of victims in this data did not appear to have signs of mental illness. These numbers may or may not reflect the true mental health status of these victims; they only reflect what the police officer(s) perceived. 

In [None]:
#Plotting barplot of signs_of_mental_illness
easy_bar_plot(data=full_police_data, variable= 'signs_of_mental_illness', xlab="Mental Illness Suspected",\
              title = 'Fatal Police Shootings by Appearance of Mental Illness')

The plot below shows how **threat_level** is broken down across the victims in this data. Most suspects were considered an "attack" on the threat level scale. To me, this attribute is particularly subjective, as different officers may have different thresholds at which they deem someone an "attack" on this scale. However, I am not a law enforcement officer, so if I am wrong, please correct me!

In [None]:
#Plotting barplot of threat_level
easy_bar_plot(data=full_police_data, variable= 'threat_level', xlab="Perceived Threat", \
              title= "Fatal Police Shootings by Perceived Threat Level")

Below is a plot of **flee**. Most victims in this dataset were not fleeing when fatally shot by the police.

In [None]:
#Plotting barplot of flee
easy_bar_plot(data=full_police_data, variable= 'flee', xlab="Flee Status", \
              title= "Fatal Police Shootings by Flee Status")

The plot below shows **body_camera**. Disturbingly, the vast majority of fatal police shootings occur without any body camera footage. 

In [None]:
#Plotting barplot of body_camera
easy_bar_plot(data=full_police_data, variable= 'body_camera', xlab="Body Camera Present", \
             title= "Fatal Police Shootings by Body Camera Presence")

### Geospatial EDA
Looking at the geographical distribution of fatal police shootings.

In [None]:
#Getting longitude and latitude data at the city level for visualizations
lat_long_source = pd.read_csv('https://raw.githubusercontent.com/kelvins/US-Cities-Database/master/csv/us_cities.csv')
lat_long_source = lat_long_source.rename(columns={'STATE_CODE':'state', 'CITY':'city'})
lat_long_source.head()

In [None]:
#Left joining full_police_data to lat_long_source on their common state and city columns
merged = pd.merge(left=full_police_data, right=lat_long_source, on=['state', 'city'], how='left')

In [None]:
#Viewing merge results
merged.head()

In [None]:
#Dropping duplicates introduced by multiple geographical coordinates from lat_lon_source dataset
merged = merged.drop_duplicates(subset=['city','name','date','id'])

#Converting the date column to datetime, then creating a year and month column from that
merged['date'] = merged.date.apply(lambda x: datetime.strptime(x, '%Y-%m-%d'))
merged['year'], merged['month'] = merged['date'].dt.year, merged['date'].dt.month

Shown below is the dataset that will be going into the map plot. It has **city**, **count** (number of fatal police shootings in the city), **LATITUDE**, and **LONGITUDE**. The latitude and longitude is necessary for geospatial mapping.

In [None]:
#Getting a dataset where each city (and it's shooting count) is the observation, not a fatal shooting incident
city_as_obs = merged.city.value_counts().to_frame()
city_as_obs.head()
city_as_obs = city_as_obs.reset_index()
city_as_obs.columns = ['city', 'count']

#Left joining city_as_obs with merged, isolating the necessary columns, and displaying the result
city_as_obs_latlon = pd.merge(left=city_as_obs, right=merged, on=['city'], how='left')
city_as_obs_latlon = city_as_obs_latlon[['city', 'count', 'LATITUDE', 'LONGITUDE']]
city_as_obs_latlon = city_as_obs_latlon.drop_duplicates(subset=['city'])
city_as_obs_latlon.info()
city_as_obs_latlon.head()

The map below shows the geographical distribution of fatal police shootings by city. As expected, heavily population urban areas have the highest numbers of fatal police shootings. 

In [None]:
#Plotting the count of fatal shooting onto a US map according to city
#Size of bubble represents the number, the coloration represents rank order binning (ie purple is for the top 15 highest 
#fatal shooting count among the list of cities)

city_as_obs_latlon['text'] = city_as_obs_latlon['city'] + '<br>Fatal Shootings ' + city_as_obs_latlon['count'].astype(str)
limits = [(0,15),(16,30),(31,50),(51,100),(101,2470)]
colors = ["royalblue","crimson","lightseagreen","orange","lightgrey"]
cities = []
scale = 0.5

fig = go.Figure()

for i in range(len(limits)):
    lim = limits[i]
    df_sub = city_as_obs_latlon[lim[0]:lim[1]]
    fig.add_trace(go.Scattergeo(
        locationmode = 'USA-states',
        lon = df_sub['LONGITUDE'],
        lat = df_sub['LATITUDE'],
        text = df_sub['text'],
        marker = dict(
            size = df_sub['count']/scale,
            color = colors[i],
            line_color='rgb(40,40,40)',
            line_width=0.5,
            sizemode = 'area'
        ),
        name = '{0} - {1}'.format(lim[0],lim[1])))

fig.update_layout(
        title_text = 'Total Fatal Police Shootings in the US from Jan 2015 to June 2020 <br>(Legend represents rank order, not number of police killings)',
        showlegend = True,
        geo = dict(
            scope = 'usa',
            landcolor = 'rgb(217, 217, 217)',
        )
    )

fig.show()

### Multi-Variate Analysis

In this section, I'll be examining the relationship between variables in this dataset, most notably **race**, as that is the point of most interest for this dataset. From these relationships, I will try to get to the bottom of what this data is truly telling us about the relationships between fatal police shootings and racism.

In [None]:
#Creating unarmed dataset
unarmed = merged.loc[merged['armed_binned'] == 'unarmed', :]

Below is a proportion plot of the "unarmed" category of fatal police shootings (roughly 7% of victims in this data) colorized by **race**. From this visual, we can conclude that unarmed black victims make a disproportionately high percentage of fatal police shootings involving an unarmed person, while unarmed white victims make a disproportionately low percentage. Hispanics are slightly overrepresented while Asians are slightly underrepresented. Also, this graph is basically the same as that of unarmed men by race, since the count of unarmed women fatally shot by police is extremely low. 

In [None]:
#Unarmed stacked proportion plot
x = merged.loc[merged['armed_binned'] == "unarmed", :]['race'].value_counts(normalize = True)
x_ = pd.DataFrame([x])
x_.index = ['unarmed']

_= x_.plot(kind = 'bar', stacked= True, rot = 0, \
                               title = 'Stacked Proportion Chart of Unarmed Victims by Race', figsize=(10,8))
x_

The following plot shows how fatal police shooting victims armed with guns (roughly 56% of all fatal police shootings in the data) break down across **race**. Black victims make up a disproportionately high amount of those carrying a gun, while white victims make a disproportionately low amont of those carrying a gun, but the discrepancy in representation between black and white for "armed with a gun" is not as large as it is for unarmed victims shown in the previous graph. Keep in mind that this metric does not indicate the usage, presentation, or licensing of guns by victims, which is a factor that should be considered when determining if a police officer felt endangered.

In [None]:
#Armed with gun stacked proportion plot
x = merged.loc[merged['armed_binned'] == "gun", :]['race'].value_counts(normalize = True)
x_ = pd.DataFrame([x])
x_.index = ['Armed: Gun']

_= x_.plot(kind = 'bar', stacked= True, rot = 0, \
                               title = 'Stacked Proportion Chart of Victims Armed With Guns by Race', figsize=(10,8))
x_

The following plot shows how **threat_level** breaks down across **race**. Black and white victims had nearly identical **threat_level** distributions. Asians and Hispanics had lower relative proportions of victims classed as the "attack" **threat_level**, while "undetermined" remained fairly constant across most racial groups (low in Asians). What "other" means is anyone's best guess for such a subjective metric. Additionally, with **threat_level** being such a subjective metric, it is not possible to infer from this data if the victim was genuinely threatening an officer's life (despite having no weapon), or if the police officer was not justified in using lethal force. 

In [None]:
#threat_level by race stacked proportion plot
black = merged.loc[merged['race'] == 'B', :]['threat_level'].value_counts(normalize = True)
white = merged.loc[merged['race'] == 'W', :]['threat_level'].value_counts(normalize = True)
hispanic = merged.loc[merged['race'] == 'H', :]['threat_level'].value_counts(normalize = True)
asian = merged.loc[merged['race'] == 'A', :]['threat_level'].value_counts(normalize = True)
other = merged.loc[merged['race'] == 'O', :]['threat_level'].value_counts(normalize = True)

x_y = pd.DataFrame([black, white, hispanic, asian, other])
x_y.index = ['Black', 'White', 'Hispanic', 'Asian', 'Other']

_= x_y.plot(kind = 'bar', stacked= True, rot = 0, \
                               title = 'Stacked Proportion Chart of Race by Threat Level', figsize=(10,8))
x_y

The plot below shows the breakdown of **signs_of_mental_illness** by **race**. An interesting point of note: white and Asian victims had higher percentages of exhibiting signs of mental illness (true), while black and Hispanic victims had lower percentages of suspected mental illness. 

In [None]:
#Body Camera presence by race stacked proportion plot
black = merged.loc[merged['race'] == 'B', :]['signs_of_mental_illness'].value_counts(normalize = True)
white = merged.loc[merged['race'] == 'W', :]['signs_of_mental_illness'].value_counts(normalize = True)
hispanic = merged.loc[merged['race'] == 'H', :]['signs_of_mental_illness'].value_counts(normalize = True)
asian = merged.loc[merged['race'] == 'A', :]['signs_of_mental_illness'].value_counts(normalize = True)
other = merged.loc[merged['race'] == 'O', :]['signs_of_mental_illness'].value_counts(normalize = True)

x_y = pd.DataFrame([black, white, hispanic, asian, other])
x_y.index = ['Black', 'White', 'Hispanic', 'Asian', 'Other']

_= x_y.plot(kind = 'bar', stacked= True, rot = 0, \
                               title = 'Stacked Proportion Chart of Race by Signs of Mental Illness', figsize=(10,8))
x_y

The plot below shows how **gender** breaks down across **race** with regards to fatal police shootings. Black fatal shooting victims make up a disproportionate amount of both genders. Based on the data, it is evident that white women make up the expected proportion of fatal shootings involving women (white women are ~60% of the US female population), while Hispanic women are underrepresented (17% pop. vs 12% shooting victims) in this group to a similar degree that black women (13% pop. vs 21% shooting victims) are overrepresented.  
  
As for males, the story is very different. Unlike their female counterparts, white males are significantly underrepresented compared to their relative proportion of the male population (60% pop vs 50% shooting victims). Black males and Hispanic males are therefore overrepresented relative to all males, but the discrepancy in representation is similar between Hispanic males and females (18% vs 12%) and black males and females (27% vs 21%). 

In [None]:
#Gender by race stacked proportion plot
x = merged.loc[merged['gender'] == 'M', :]['race'].value_counts(normalize = True)
y = merged.loc[merged['gender'] == 'F', :]['race'].value_counts(normalize = True)
x_y = pd.DataFrame([x, y])
x_y.index = ['Male', 'Female']

_= x_y.plot(kind = 'bar', stacked= True, rot = 0, \
            title = 'Stacked Proportion Chart of Gender by Race', figsize=(10,8))
x_y

The following plot shows how **body_camera** presence breaks down by each race. Here, each x-value represents a racial indicator in the data, and the proportion between true and false for **body_camera**. Based on this visual, the demographic of victims with the lowest percentage of body camera presence is white people. However, this plot also tells us something else: no matter what demographic, the percentage of fatal police shootings that took place with police body camera footage is overwhelmingly low.

In [None]:
#Body Camera presence by race stacked proportion plot
black = merged.loc[merged['race'] == 'B', :]['body_camera'].value_counts(normalize = True)
white = merged.loc[merged['race'] == 'W', :]['body_camera'].value_counts(normalize = True)
hispanic = merged.loc[merged['race'] == 'H', :]['body_camera'].value_counts(normalize = True)
asian = merged.loc[merged['race'] == 'A', :]['body_camera'].value_counts(normalize = True)
other = merged.loc[merged['race'] == 'O', :]['body_camera'].value_counts(normalize = True)

x_y = pd.DataFrame([black, white, hispanic, asian, other])
x_y.index = ['Black', 'White', 'Hispanic', 'Asian', 'Other']

_= x_y.plot(kind = 'bar', stacked= True, rot = 0, \
                               title = 'Stacked Proportion Chart of Race by Body Camera Presence', figsize=(10,8))
x_y

Going further into the **body_cam** breakdowns, the following plot shows **body_cam** presence by **race** for unarmed victims. Here, Asians and Other have the highest proportions of fatal shootings with body camera footage, while Hispanics are at the lowest proportion by a decent margin.

In [None]:
#Body Camera Presence for unarmed victims by race stacked proportion plot
black = unarmed.loc[unarmed['race'] == 'B', :]['body_camera'].value_counts(normalize = True)
white = unarmed.loc[unarmed['race'] == 'W', :]['body_camera'].value_counts(normalize = True)
hispanic = unarmed.loc[unarmed['race'] == 'H', :]['body_camera'].value_counts(normalize = True)
asian = unarmed.loc[unarmed['race'] == 'A', :]['body_camera'].value_counts(normalize = True)
other = unarmed.loc[unarmed['race'] == 'O', :]['body_camera'].value_counts(normalize = True)

x_y = pd.DataFrame([black, white, hispanic, asian, other])
x_y.index = ['Black', 'White', 'Hispanic', 'Asian', 'Other']

_= x_y.plot(kind = 'bar', stacked= True, rot = 0, \
                               title = 'Stacked Proportion Chart of Race by Body Camera Presence for Unarmed Victims', figsize=(10,8))
x_y

The following shows how **flee** breaks down within **race** for unarmed victims. Unarmed black victims are more likely to have been fleeing by foot, while Asians and Hispanics had similarly high degrees of fleeing by car. While every racial demographic in this data had "not fleeing" as the highest proportion of **flee**, victims classed as "other" had the highest proportion of not fleeing, with white victims following behind.

In [None]:
#Flee status by race stacked proportion plot for unarmed victims
black = unarmed.loc[unarmed['race'] == 'B', :]['flee'].value_counts(normalize = True)
white = unarmed.loc[unarmed['race'] == 'W', :]['flee'].value_counts(normalize = True)
hispanic = unarmed.loc[unarmed['race'] == 'H', :]['flee'].value_counts(normalize = True)
asian = unarmed.loc[unarmed['race'] == 'A', :]['flee'].value_counts(normalize = True)
other = unarmed.loc[unarmed['race'] == 'O', :]['flee'].value_counts(normalize = True)

x_y = pd.DataFrame([black, white, hispanic, asian, other])
x_y.index = ['Black', 'White', 'Hispanic', 'Asian', 'Other']

_= x_y.plot(kind = 'bar', stacked= True, rot = 0, \
                               title = 'Stacked Proportion Chart of Flee Status by Race for Unarmed Victims', figsize=(10,8))
x_y

The plot below shows the breakdown of **armed_binned** by **race**. The distributions of different categories within **armed_binned** look fairly similar for white and black victims, with percentage carrying guns being nearly identical. Asian victims have a lower percentage of guns but a higher percentage of knives and "other".

In [None]:
#Armed Status by race stacked proportion plot
black = merged.loc[merged['race'] == 'B', :]['armed_binned'].value_counts(normalize = True)
white = merged.loc[merged['race'] == 'W', :]['armed_binned'].value_counts(normalize = True)
hispanic = merged.loc[merged['race'] == 'H', :]['armed_binned'].value_counts(normalize = True)
asian = merged.loc[merged['race'] == 'A', :]['armed_binned'].value_counts(normalize = True)
other = merged.loc[merged['race'] == 'O', :]['armed_binned'].value_counts(normalize = True)

x_y = pd.DataFrame([black, white, hispanic, asian, other])
x_y.index = ['Black', 'White', 'Hispanic', 'Asian', 'Other']

_= x_y.plot(kind = 'bar', stacked= True, rot = 0, \
                               title = 'Stacked Proportion Chart of Race by Armed Status', figsize=(10,8))
x_y

In [None]:
#Creating additional count column, then isolating necessary columns into new dataframe
merged['count'] = 1 
date_count = merged[["date" ,"count"]]

In [None]:
#Looking at date_count 
date_count.head()

The output below shows the month of year and it's corresponding aggregate sum of fatal police shootings over the 5 year period of the data, in ascending order. Interestingly, fall months seem to have the lowest aggregate shooting numbers (September, November, October), followed by summer months (August, July, June) , then spring months (April, May) , then winter months (February, January, March). The exception to the general pattern is December.

In [None]:
#Get aggregate killings for each of the 12 months in the year
date_count.groupby(date_count['date'].dt.strftime('%B'))['count'].sum().sort_values()

In [None]:
#Get time series of police shootings over month/year
count_monthly = date_count.groupby(date_count['date'].dt.strftime('%B %Y'))['count'].sum().to_frame()
count_monthly.reset_index(inplace=True)
count_monthly['date'] =  pd.to_datetime(count_monthly['date'], format='%B %Y')
count_monthly.head()

Shown below is a plot of the time series of police shootings on a monthly level of aggregation. Visually, there does not look to be any sort of trend here. Interestingly enough, May 2020 was the month with the highest number of fatal police shootings, then June 2020 follows with a precipitous drop in police killings, most likely owing to the increased backlash (and resulting reluctancy to do anything) faced by police officers after the murder of George Floyd. 

In [None]:
#Plot time series of aggregate monthly shootings over time
_= count_monthly[['date','count']].plot('date', figsize=(15,8))
_.set_xlabel("Year");
_.set_ylabel("Count");

## Conclusion
Based on the data provided, there is evidence that black people, unarmed or armed, in America are killed by police at disproportionately high rates relative to other races. With that being said, it is important to note that the data here might not be fully capturing what is going on with this issue due to incomplete or low quality reporting. Data collection improvements would result in a clearer truth. I have tried to maintain as objective and neutral a lens as possible when evaluating my takeaways from this analysis. Some conclusions I have drawn from this data:  
* Black people are being killed by police at disproportionately high rates.    
* There does not seem to be a trend up or down over time for fatal police shootings.  
* Men make up the vast majority of fatal police shooting victims.  
* The number of fatal shootings in a city seems to largely depend on its population.  
* Body cameras rarely capture footage in these encounters.  
* Most victims were not fleeing from police when they were fatally shot.  


Some of my thoughts related to my conclusions from the data:
* The absolute number of unarmed people killed in general is extremely small relative to the hundreds of millions of interactions between police and civilians on a yearly basis.  
* There are no objective indicators or reporting of crime or threat posed to officer in this data. One suggestion is to make body camera footage mandatory, then have a neutral party review fatal shooting footage and make a determination on if the victim was truly endangering the officer's life.  
* On that point, officer body camera footage is not as frequent as I thought it was prior to this analysis. Ideally, a body camera should be recording every interaction to ensure accountability of both the officer and the other party(s) involved   
* Socioeconomic class may be a confounding factor. It could be informative to take a look at fatal police shootings broken down first by both race and socioeconomic status. How do white people of low socioeconomic status fare against black people of high socio-economic status?  
* While this data does indicate that black people are killed at higher than expected rates, it does not provide enough additional evidence to confirm that police are consistently disproportionately targeting black people overall. To confirm that, additional data on traffic stops by race, warranted arrests vs unwarranted arrests, non-fatal police brutality, fatal police brutality not involving a firearm, a case by case report on the circumstances and potential crime of the person in question, among others, would be required to make such a broad claim.  
* Regardless of your views around this issue, I think we can all agree that police should strive to minimize the use of lethal force in anyway possible.