# Analyzing Fatal Police Shootings in The US from 2015 to 2020.


## Table of Contents.

- [Introduction](#intro)
- [Data Wrangling](#data)

<a id='intro'></a>
### Introduction
The year 2020 will be a year that lives long in the memory of man. The year was filled with major happenings all over the world. From the Australian bush fires, Prince Harry and Meghan Markle quiting the British royal family to the death toll numbers from COVID-19. The United States of America (USA) was the worst hit by the COVID-19 according to [numbers from John Hopkins University](https://coronavirus.jhu.edu/map.html).

Although [social distancing](https://www.who.int/emergencies/diseases/novel-coronavirus-2019/advice-for-public) being one of the many reccommended ways to reduce and stop the spread of COVID-19, people all over the world protesting for one issue or another defied the advice. One of the issues was the fatal [killing of George Floyd, an African American, by Police officers](https://www.nytimes.com/2020/05/31/us/george-floyd-investigation.html) in the United States. The protest on the killing reverberated accross the world as months earlier an African American woman of name [Breonna Taylor was fatally shot by police in her apartment](https://www.nytimes.com/article/breonna-taylor-police.html).

Her killing which might have been preventable was what inspired this analysis.

In [1]:
# Importing packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
% matplotlib inline

UsageError: Line magic function `%` not found.


<a id='data'></a>
### Data Wrangling
The dataset will be downloaded from [Washington Post's page on Github](https://github.com/washingtonpost/data-police-shootings). It keeps logs of fatal police **shootings** that happened in the Unites States from 2015 to present. Since the database deals with shootings, cases like that of George Floyd who wasn't shot will not be in the database.

In [2]:
# Dowloading and reading dataset into a dataframe
url = 'https://github.com/washingtonpost/data-police-shootings/releases/download/v0.1/fatal-police-shootings-data.csv'
df = pd.read_csv(url,index_col=0,parse_dates=[0])

In [3]:
# View top 5 rows in dataframe.
df.head()

Unnamed: 0_level_0,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
3,Tim Elliot,2015-01-02,shot,gun,53.0,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
4,Lewis Lee Lembke,2015-01-02,shot,gun,47.0,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
5,John Paul Quintero,2015-01-03,shot and Tasered,unarmed,23.0,M,H,Wichita,KS,False,other,Not fleeing,False,-97.281,37.695,True
8,Matthew Hoffman,2015-01-04,shot,toy weapon,32.0,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True
9,Michael Rodriguez,2015-01-04,shot,nail gun,39.0,M,H,Evans,CO,False,attack,Not fleeing,False,-104.692,40.384,True


We can view the top 5 rows and with some observations, we note that the id is not sequential, let's check the tail of the dataset if its same.

In [4]:
# View last 5 rows
df.tail()

Unnamed: 0_level_0,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
6538,David Tovar,2021-01-21,shot,undetermined,27.0,M,H,San Jose,CA,False,undetermined,,False,-121.83,37.379,True
6543,Brian Abbott,2021-01-21,shot,knife,34.0,M,,Caneyville,KY,False,other,Not fleeing,False,-86.486,37.429,True
6544,Steven Verdone,2021-01-22,shot,knife,,M,,Homosassa,FL,False,other,Not fleeing,False,-82.537,28.764,True
6540,Caleb McCree,2021-01-24,shot,undetermined,43.0,M,,Slidell,LA,False,attack,,False,-89.825,30.318,True
6541,,2021-01-24,shot,vehicle,,M,,Wichita Falls,TX,False,attack,Car,False,-98.431,33.883,True


It's the same, and one assumption is that some names were removed due to error in the type of killings.

Before going further let's quickly solve that by saving the dataframe to a csv(without the index) and re-reading the saved file into a dataframe. That's killing the proverbial two birds with one stone. The birds are
- Saving the file
- Fixing the index.

How about that 😎.

In [5]:
# Save df to csv
df.to_csv('fatal_police_shootings.csv',index=False)

In [6]:
df_real = pd.read_csv('fatal_police_shootings.csv')
df_real.head()

Unnamed: 0,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
0,Tim Elliot,2015-01-02,shot,gun,53.0,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
1,Lewis Lee Lembke,2015-01-02,shot,gun,47.0,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
2,John Paul Quintero,2015-01-03,shot and Tasered,unarmed,23.0,M,H,Wichita,KS,False,other,Not fleeing,False,-97.281,37.695,True
3,Matthew Hoffman,2015-01-04,shot,toy weapon,32.0,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True
4,Michael Rodriguez,2015-01-04,shot,nail gun,39.0,M,H,Evans,CO,False,attack,Not fleeing,False,-104.692,40.384,True


Now that we've done that, we can continue 