# Dataset Description

It contains basic data about people like their name, age, gender and race. Along with it, is the shooting/killing information, like date of event, where it happened? how they were shot? did they attack? Were they holding weapons? Did they show any mental illness? Was the policeman wearing a camera/was the incident recorded? Did the suspect flee? Apart from that, a category column holds type of weapon used by the suspect etc. In this notebook I will do an in depth analysis about what influenced these events and correlation among different factors.

### If you like the findings, please do <font color="red">UPVOTE</font>

<a id="top"></a>

<div class="list-group" id="list-tab" role="tablist">
<h3 class="list-group-item list-group-item-action active" data-toggle="list"  role="tab" aria-controls="home">Table of content</h3>
    
&#9632; [1. First Look at Data](#1)<br>
&#9632; [2. Gender Ratio](#2)<br>
&#9632; [3. Gender Ratio vs Years](#3)<br>
&#9632; [4. Racial Distributions](#4)<br>
&#9632; [5. Shootings Per Month (2015-2020)](#5)<br>
&#9632; [6. Age Distribution (all)](#6)<br>
&#9632; [7. Age Distribution (Race)](#7)<br>
&#9632; [8. Age Distribution (Gender)](#8)<br>
&#9632; [9. Presence of Weapons ](#9)<br>
&#9632; [10. Statewise Number of Shootings](#10)<br>
&#9632; [11. Mental Conditions](#11)<br>
&#9632; [12. Mental Conditions (by Race)](#12)<br>
&#9632; [13. Year -> Gender -> Race](#13)<br>
&#9632; [14. Threat Level Analysis](#14)<br>
&#9632; [15. Threat Level Analysis by Race](#15)<br>
&#9632; [16. Top 10 States in Shootings](#16)<br>
&#9632; [17. Last 10 States in Shootings](#17)<br>
&#9632; [18. Top 10 Cities in Shootings](#18)<br>
&#9632; [19. Police Shootings of White People in States](#19)<br>
&#9632; [20. Police Shootings of Black People in States](#20)<br>
&#9632; [21. Acknowledgements](#21)<br>   
    
    


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
sns.set(rc={'figure.figsize':(11.7,8.27)})
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.colors import n_colors
from plotly.subplots import make_subplots

#Calendar Heatmap
!pip install calmap
import calmap


data=pd.read_csv('../input/us-police-shootings/shootings.csv')

<a id="1"></a>
<font color="black" size=+2.5><b>1. First Look at Data </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a>

In [None]:
data.info()

In [None]:
data.head(10)

<a id="2"></a>
<font color="black" size=+2.5><b>2. Gender Ratio </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
In any study, it is very important to look at the prevailing gender ratio. With that view let's have a look at the gender ration of the shootings. 

In [None]:
df=data['gender'].value_counts().reset_index().rename(columns={'index':'gender','gender':'count'})
fig = go.Figure([go.Pie(labels=['Male', 'Female'],values=df['count'], hole = 0.5)])
fig.update_traces(hoverinfo='label+percent', textinfo='value+percent', textfont_size=15,insidetextorientation='radial')
fig.update_layout(title="Male to Female Ratio in Shootings",title_x=0.5)
fig.show()

<a id="3"></a>
<font color="black" size=+2.5><b>3. Gender Ratio vs Years </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
Let's have a look at how male to female ratio has been changed over the years from 2015 to 2020. 

In [None]:
# data generation code
data['date']=pd.to_datetime(data['date'])
data['year']=pd.to_datetime(data['date']).dt.year

shoot_gender=data.groupby(['year','gender']).agg('count')['id'].to_frame(name='count').reset_index()
shoot_gender_male=shoot_gender.loc[shoot_gender['gender']=='M']
shoot_gender_female=shoot_gender.loc[shoot_gender['gender']=='F']

# plotting part
male=go.Bar(x=shoot_gender_male['year'],y=shoot_gender_male['count'],marker=dict(color='brown'),name="male")
female=go.Bar(x=shoot_gender_female['year'],y=shoot_gender_female['count'],marker=dict(color='orange'),name="female")
data_genderwise =[male,female]

fig = go.Figure(data_genderwise)
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_layout(title="Gender Ratio vs Years",title_x=0.5,xaxis=dict(title="Year"),yaxis=dict(title="Number of Shootings"), barmode="group")
fig.show()

<a id="4"></a>
<font color="black" size=+2.5><b>4. Racial Distribution </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
As we have seen in several events in 2020 that police shottings in the US are sometimes heavily influenced by racial factor. Let's try to find out how this racial factors have influenced the dataset. 

In [None]:
df=data['race'].value_counts().reset_index().rename(columns={'index':'race','race':'count'})

fig = go.Figure(go.Bar(x=df['race'],y=df['count'],
                       marker={'color': df['count'], 'colorscale': 'Viridis'},  
))
fig.update_layout(title_text='Distribution of Races in Shoootings',xaxis_title="Race",yaxis_title="Number of Shootings")
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.show()

We visualize the same information in a pie chart to have a visual understanding about the propotion of different races in the shootings. 

In [None]:
fig = go.Figure([go.Pie(labels=df['race'],values=df['count'], hole = 0.4)])
fig.update_traces(hoverinfo='label+percent', textinfo='value+percent', textfont_size=15,insidetextorientation='radial')
fig.update_layout(title="Racial Propotions in Shootings",title_x=0.5)
fig.show()

<a id="5"></a>
<font color="black" size=+2.5><b>5. Shootings Per Month </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
Here we are trying to visualize the number of deaths in the consecutive months. The brighter the color, the higher the number of police shootings in that month. 


In [None]:
df=data.groupby('date')['manner_of_death'].count().reset_index()
df['date']=pd.to_datetime(df['date'])
df['year-month'] = df['date'].apply(lambda x: str(x.year) + '-' + str(x.month))
df_ym=df.groupby('year-month')[['manner_of_death']].sum().reset_index()
df_ym['year-month']=pd.to_datetime(df_ym['year-month'])
df_ym=df_ym.sort_values('year-month')



fig = go.Figure(go.Bar(
    x=df_ym['year-month'],y=df_ym['manner_of_death'],
    marker={'color': df_ym['manner_of_death'], 'colorscale': 'Viridis'},  
    text=df_ym['manner_of_death'],
    textposition = "outside",
))
fig.update_layout(title_text='No of deaths (2015-2020)',yaxis_title="no. of deaths", xaxis_title = 'Time')
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.show()

<a id="6"></a>
<font color="black" size=+2.5><b>6. Age Distribution (all) </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
In the following graph, I am trying to find out the distribution of the age in corresponding races in the police shooting cases. Though they are almost similar, However, in the distribution box plot, it shows that there are several outliers in terms of age among the White, Black and Hispanic people killed in the shootings. 

In [None]:
hist_data = [data['age'].values]
group_labels = ['distribution'] # name of the dataset

fig = ff.create_distplot(hist_data, group_labels)
fig.update_layout(title_text='Distribution of age of all',xaxis_title="Age",yaxis_title="Probability")
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.show()

<a id="7"></a>
<font color="black" size=+2.5><b>7. Age Distribution vs Race </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
The age distribution is almost similar for most of the races. 

In [None]:
import plotly.figure_factory as ff

# Add histogram data
age_white = data[data['race'] =='White'].age.values
age_black = data[data['race'] =='Black'].age.values
age_hispanic = data[data['race'] =='Hispanic'].age.values
age_asian = data[data['race'] =='Asian'].age.values
age_native = data[data['race'] =='Native'].age.values
age_other = data[data['race'] =='Other'].age.values

# Group data together
hist_data = [age_white, age_black, age_hispanic, age_asian, age_native, age_other]

group_labels = ['White', 'Black', 'Hispanic', 'Asian', 'Native', 'Other']

# Create distplot with custom bin_size
fig = ff.create_distplot(hist_data, group_labels, bin_size=.2)
fig.show()

Now let's have a look at the age distribution in terms of race. It is evident that among the older people, mean age is highest. So among the shootings white people have higer age and black people has the lowest mean age. 

In [None]:
fig = go.Figure()
fig.add_trace(go.Box(y=data[data['race'] =='White'].age.values , name='White', marker_color = 'gray',boxmean=True))
fig.add_trace(go.Box(y=data[data['race'] =='Black'].age.values , name='Black', marker_color = 'brown',boxmean=True))
fig.add_trace(go.Box(y=data[data['race'] =='Hispanic'].age.values , name='Hispanic', marker_color = 'green',boxmean=True))
fig.add_trace(go.Box(y=data[data['race'] =='Asian'].age.values , name='Asian', marker_color = 'red',boxmean=True))
fig.add_trace(go.Box(y=data[data['race'] =='Native'].age.values , name='Native', marker_color = 'orange',boxmean=True))
fig.add_trace(go.Box(y=data[data['race'] =='Other'].age.values , name='Other', marker_color = 'violet',boxmean=True))
fig.update_layout(title_text='Age Distribution Race Wise', xaxis_title= "Race", yaxis_title="Age")
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.show()

<a id="8"></a>
<font color="black" size=+2.5><b>8. Age Distribution (Gender) </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
In this following graph, I am trying to find out the distribution of the age in corresponding genders in the police shooting cases. Though they are almost similar in the central distribution, However, in the distribution box plot, it shows that there are several outliers among the Men. It shows that that there are several old people shot in the shootings.  

In [None]:
fig = go.Figure()
fig.add_trace(go.Box(y=data[data['gender'] =='M'].age.values , name='Male', marker_color = 'blue',boxmean=True))
fig.add_trace(go.Box(y=data[data['gender'] =='F'].age.values , name='Female', marker_color = 'red',boxmean=True))
fig.update_layout(title_text='Age Distribution Gender wise',
                  xaxis_title= "Gender",
                  yaxis_title="Age")
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.show()

<a id="9"></a>
<font color="black" size=+2.5><b>9. Presence of Weapon </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
Now let's have a look at whether the shot people had any weapons along with them. It seems that more thatn half of the people had gun with them. Among the other weapons, Almost 15% people had knives with them. What's more shocking is, more than 7% people were shot when they were unarmed. 

In [None]:
df=data['armed'].value_counts().reset_index().rename(columns={'index':'weapons used','armed':'count'})
fig = go.Figure([go.Pie(labels=df['weapons used'],values=df['count'], hole=0.5)])
fig.update_traces(hoverinfo='label+percent', textinfo='value+percent', textfont_size=15,insidetextorientation='radial')
fig.update_layout(title="Different Weapons used in Shootings",title_x=0.5)
fig.show()



<a id="10"></a>
<font color="black" size=+2.5><b>10. Statewise Number of Shootings </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
Now let's have a look at the statewise number of shootings. Here the top four states are California, Texas, Florida, Arizona. But the surprising thing is that, in the state of California, the number of deaths are almost twice that of Florida in the second place 

In [None]:
df=data['state'].value_counts().reset_index().rename(columns={'index':'state','state':'deaths'})

fig = go.Figure(go.Bar(x=df['state'], y=df['deaths'],
                       marker={'color': df['deaths'], 'colorscale': 'Viridis'},
                       text=df['deaths'],
                       textposition = "outside"))

fig.update_layout(title_text='Statewise Number of Deaths',xaxis_title="State",yaxis_title="Number of Shootings")
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.show()

<a id="11"></a>
<font color="black" size=+2.5><b>11. Mental Conditions </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
Let's have a look athe the state of mental illness among the shot people. It seems that more than three quarters of the people shot int he police firings are not in any state of mental illness. 

In [None]:
df=data['signs_of_mental_illness'].value_counts().reset_index().rename(columns={'index':'signs_of_mental_illness','signs_of_mental_illness':'count'})
fig = go.Figure([go.Pie(labels=df['signs_of_mental_illness'],values=df['count'], hole = 0.5)])

fig.update_traces(hoverinfo='label+percent', textinfo='value+percent', textfont_size=15,insidetextorientation='radial')

fig.update_layout(title="Signs_of_mental_illness",title_x=0.5)
fig.show()

<a id="12"></a>
<font color="black" size=+2.5><b>12. Mental Conditions (by Race) </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
Now it's time to look at the mental conditions among the different races of people shot by police. Here is something surprising. More than 40% of the people shot among the white people were mentally ill. 

In [None]:
#processing part
race_illness = data.groupby(by=["race", "signs_of_mental_illness"]).count()["id"].unstack()

# plotting part
no_illness = go.Bar(x=race_illness.index.values, y=race_illness.loc[:, 0].values ,marker=dict(color='green'),name="No Illness")
illness =go.Bar(x=race_illness.index.values, y=race_illness.loc[:, 1].values  ,marker=dict(color='orange'),name="Illness Present")
data_genderwise =[no_illness, illness]

fig = go.Figure(data_genderwise)
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_layout(title="Racewise Mental Conditions",title_x=0.5,xaxis=dict(title="Race"),
                  yaxis=dict(title="Number of Shootings"), barmode="group")
fig.show()

<a id="13"></a>
<font color="black" size=+2.5><b>13. Year, Gender, Race </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>

Now let us look at a complex analysis of Gender, year and Race in one figure. 

In [None]:
df=data.groupby(['date','gender','race'])['manner_of_death'].count().reset_index()
df['date']=pd.to_datetime(df['date'])
df['year-month'] = df['date'].apply(lambda x: str(x.year))
df_ym=df.groupby(['year-month','gender','race'])[['manner_of_death']].sum().reset_index()
df_ym['year-month']=pd.to_datetime(df_ym['year-month'])
df_ym=df_ym.sort_values('year-month')
df_ym['year-month']=df_ym['year-month'].astype('str').apply(lambda x: x.split('-')[0])

fig = px.sunburst(df_ym, path=['year-month','gender','race'], values='manner_of_death')
fig.update_layout(title="Number of deaths  by Gender,year,race",title_x=0.5)
fig.show()

<a id="14"></a>
<font color="black" size=+2.5><b>14. Threat Level Analysis </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
Now let's have a look at the threat levels. The analysis shows that almost 65% of the case, Police was attacked. This thing somehow explains the escalation of events. 

In [None]:
df=data['threat_level'].value_counts().reset_index().rename(columns={'index':'threat_level','threat_level':'count'})
fig = go.Figure([go.Pie(labels=df['threat_level'],values=df['count'], hole = 0.5)])

fig.update_traces(hoverinfo='label+percent', textinfo='value+percent', textfont_size=15,insidetextorientation='radial')
fig.update_layout(title="threat_level",title_x=0.5)
fig.show()

<a id="15"></a>
<font color="black" size=+2.5><b>15. Threat analysis by Race </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
Now if we look at them from the racial perspective, it looks like more most of the attack came from white people while the second highest number attacks came from the Blacks.

In [None]:
data.groupby(by=["race", "threat_level"]).count()["id"].unstack().plot.bar(stacked=False, title = "Racewise Threat Level", xlabel="Race",  ylabel="Threat Counts")

<a id="16"></a>
<font color="black" size=+2.5><b>16. Top 10 States in Shooting </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
Top 10 States in Shooting is listed in the following visualization. 

In [None]:
df=data['state'].value_counts().reset_index().rename(columns={'index':'state','state':'deaths'}).head(10)
fig = go.Figure(go.Bar(x=df['state'], y=df['deaths'],
                       marker={'color': df['deaths'], 'colorscale': 'Viridis'},
                       text=df['deaths'],
                       textposition = "outside"))

fig.update_layout(title_text='Top 10 States involved  in Police Shooting in the US',xaxis_title="States",yaxis_title="Number of Shootings")
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.show()

<a id="17"></a>
<font color="black" size=+2.5><b>17. Last 10 States in Shooting </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>

In [None]:
df=data['state'].value_counts().reset_index().rename(columns={'index':'state','state':'deaths'}).iloc[::-1].tail(10)
fig = go.Figure(go.Bar(x=df['state'], y=df['deaths'],
                       marker={'color': df['deaths'], 'colorscale': 'Viridis'},
                       text=df['deaths'],
                       textposition = "outside"))

fig.update_layout(title_text='Last 10 States involved  in Police Shooting in the US',xaxis_title="States",yaxis_title="Number of Shootings")
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.show()

<a id="18"></a>
<font color="black" size=+2.5><b>18. Top 10 Citeis in Shooting </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
Top 10 Cities involved in Shooting is listed in the following visualization. 

In [None]:
df=data['city'].value_counts().reset_index().rename(columns={'index':'city','city':'deaths'}).head(10)
fig = go.Figure(go.Bar(x=df['city'], y=df['deaths'],
                       marker={'color': df['deaths'], 'colorscale': 'Viridis'},
                       text=df['deaths'],textposition = "outside"))

fig.update_layout(title_text='Top 10 Cities involved  in Police Shooting in the US',xaxis_title="Cities",yaxis_title="Number of Shootings")
fig.update_xaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.update_yaxes(showline=True, linewidth=2, linecolor='black', mirror=True)
fig.show()

<a id="19"></a>
<font color="black" size=+2.5><b>19. Police Shootings of Black People in States </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>

In [None]:
black_state=data[data['race']=='Black']['state'].value_counts().to_frame().reset_index().rename(columns={'index':'state','state':'count'})

fig = go.Figure(go.Choropleth(
    locations=black_state['state'],
    z=black_state['count'].astype(float),
    locationmode='USA-states',
    colorscale='Reds',
    autocolorscale=False,
    text=black_state['state'], # hover text
    marker_line_color='white', # line markers between states
    colorbar_title="Millions USD",showscale = False,
))
fig.update_layout(title_text='US Police shooting cases of black people',    title_x=0.5,
    geo = dict( scope='usa', projection=go.layout.geo.Projection(type = 'albers usa'), showlakes=True, 
               lakecolor='rgb(255, 255, 255)'))
fig.update_layout(template="simple_white")
fig.show()

<a id="20"></a>
<font color="black" size=+2.5><b>20. Police Shootings of White People in States </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>

In [None]:
white_state=data[data['race']=='White']['state'].value_counts().to_frame().reset_index().rename(columns={'index':'state','state':'count'})

fig = go.Figure(go.Choropleth(
    locations=white_state['state'],
    z=white_state['count'].astype(float),
    locationmode='USA-states',
    colorscale='Greens',
    autocolorscale=False,
    text=black_state['state'], # hover text
    marker_line_color='white', # line markers between states
    colorbar_title="Millions USD",showscale = False,
))
fig.update_layout(title_text='US Police shooting cases of White people',    title_x=0.5,
    geo = dict( scope='usa', projection=go.layout.geo.Projection(type = 'albers usa'), showlakes=True, 
               lakecolor='rgb(255, 255, 255)'))
fig.update_layout(template="simple_white")
fig.show()

<a id="21"></a>
<font color="black" size=+2.5><b>21. Acknowledgements </b></font>
<a href="#top" class="btn btn-primary btn-sm" role="button" aria-pressed="true" style="color:white" data-toggle="popover" title="go to Colors">Go to TOC</a><br>
* Thanks to [Raenish David](https://www.kaggle.com/raenish) for his excellent tutorials on Plotly. 