# <a id='Top'>Does race play a part in police shootings?</a>
<img src='https://assets.bwbx.io/images/users/iqjWHBFdfxIU/izcmgHQWKgHs/v0/1200x803.jpg' height=500 width=500>
<br>
<div class="list-group" id="list-tab" role="tablist">
  <h3 class="list-group-item list-group-item-action active" data-toggle="list"  role="tab" aria-controls="home">Table of Contents</h3>
<a class="list-group-item list-group-item-action" data-toggle="list" href="#one" role="tab" aria-controls="profile">Importing Libaries <span class="badge badge-primary badge-pill">1</span></a>
<a class="list-group-item list-group-item-action" data-toggle="list" href="#two" role="tab" aria-controls="messages">Loading the data & basic cleaning<span class="badge badge-primary badge-pill">2</span></a>
<a class="list-group-item list-group-item-action"  data-toggle="list" href="#three" role="tab" aria-controls="settings">Police shootings per million of each race<span class="badge badge-primary badge-pill">3</span></a>
<a class="list-group-item list-group-item-action" data-toggle="list" href="#four" role="tab" aria-controls="settings">Further EDA<span class="badge badge-primary badge-pill">4</span></a>


# <a id='one'>1. Importing Libaries </a> 

In [None]:
import numpy as np 
import pandas as pd 
import os
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import missingno as msno
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# <a id='two'>2. Loading data & basic cleaning</a>
 

1. Convert the date into a month and year column.
2. Remove rows with empty data.
3. Make a seperate dataset which excludes 2020 data (as having an incomplete years stats could affect certain insights.

In [None]:
df = pd.read_csv('/kaggle/input/data-police-shootings/fatal-police-shootings-data.csv')
df['month'] = pd.to_datetime(df['date']).dt.month
df['year'] = pd.to_datetime(df['date']).dt.year
df.head()

In [None]:
# not many null values, so will remove rows
df.isnull().sum()

In [None]:
df = df.dropna().reset_index(drop=True)
df.shape

In [None]:
# extra dataframe that excludes 2020 data
no2020 = df[df['year'] != 2020]
no2020.shape

# <a id='three'>3. Police shootings per million of each race</a>

## Pie chart of populations of different races in the USA based on 2019 census

In [None]:
# via https://en.wikipedia.org/wiki/Race_and_ethnicity_in_the_United_States
# USA population (2019) = 328.2 million
dict_ = {'White': 60.4, 'Hispanic': 18.3, 'Black': 13.4,'Asian':5.9,'Other':2.0}
census = pd.DataFrame(dict_.items(), columns=['Race', 'Population %'])
census['Population (millions)'] = census['Population %']*3.282

fig = px.pie(census, values='Population %', names='Race', color_discrete_sequence=px.colors.sequential.RdBu)
fig.show()

In [None]:
df['race'].unique()

In [None]:
# will convert 'Native' classification into 'Other', as want to focus analysis on larger groups.
df['race'] = df['race'].apply(lambda x: 'O' if x == 'N' else x)
print(df['race'].value_counts())

## This plot shows police shootings per million of each race grouping 
### The data supports the notion of there being a racial bias in police shootings.

In [None]:
census['count'] = [2253, 786, 1164, 83, 113]
census['Shootings per mil'] = census['count']/census['Population (millions)']
fig = px.bar(census,x='Race',y='Shootings per mil',color='Race')
fig.show()

# <a id='four'>4. Further EDA</a>

## Does time of year affect # of shootings?
### Perhaps in darker winter months, there could be increased confusion on more shootings.

In [None]:
# df_oops = df_oops.groupby('month').agg({'month':'count'})
fig = px.histogram(no2020,x='month',color='month')
fig.show()

There are slight peaks at the beginning and middle of the year, but pretty consistent numbers.

## Perhaps there are states with higher police shooting numbers?
### This analysis could be normalised by population of each states to make analysis more robust

In [None]:
shootout_by_states = df['state'].value_counts()[:10]
shootout_by_states = pd.DataFrame(shootout_by_states)
shootout_by_states=shootout_by_states.reset_index()
fig = px.pie(shootout_by_states, values='state', names='index', color_discrete_sequence=px.colors.sequential.RdBu)
fig.show()

## How has # of shootings caried over years?

In [None]:
yearly_shootouts = no2020['year'].value_counts()
yearly_shootouts = pd.DataFrame(yearly_shootouts)
yearly_shootouts= yearly_shootouts.reset_index()
yearly_shootouts=yearly_shootouts.rename(columns={'index':'year','year':'Shootouts'})
fig = px.bar(yearly_shootouts, y='Shootouts', x='year', barmode='group')
fig.show()

# <a href='#Top'><button>Go to Top</button></a> 