# Guns and The Police
<img src="https://image.cnbcfm.com/api/v1/image/106559035-1590929831184gettyimages-1216502171.jpeg?v=1591036727&w=1600&h=900" width=400><br>
With the high on public angst against police brutality exploring this dataset is of great important to gain insights into the picture behind Police Shootouts. In this kernel I aim to explore and understand this data and draw some observations to identify any specific patterns in the data. My study shall find out which communities are most affected in terms of age,race,etc.


In [None]:
!pip install calmap

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
import calmap
plt.rcParams['figure.figsize'] = 8, 5
plt.style.use("fivethirtyeight")
pd.options.plotting.backend = "plotly"
from wordcloud import WordCloud, STOPWORDS, ImageColorGenerator

In [None]:
data = pd.read_csv('../input/data-police-shootings/fatal-police-shootings-data.csv')
data.head()

In [None]:
data.date = pd.to_datetime(data.date)

In [None]:
data.shape

In [None]:
data.info()

## Null Values Exploration

In [None]:
def nulls(df):
    for col in df.columns:
        nll = data[col].isnull().sum()
        print(f"{col} \t\t {round(nll/len(df)*100,2)}% Null")
nulls(data)

# Exploring Shootout Frequency

In [None]:
fig,ax = calmap.calendarplot(data.groupby(['date']).id.count(), monthticks=1, daylabels='MTWTFSS',cmap='YlGn',
                    linewidth=0, fig_kws=dict(figsize=(20,20)))
fig.show()

**Observations**:
- There are more dark spots in the First Quarter of the years indicating higher shootout count.


In [None]:
freq = data[['date','id']]
freq['year'] = freq.date.dt.year
freq['month'] = freq.date.dt.month
freq.head()

In [None]:
fig = go.Figure(data=[go.Bar(x=freq.groupby(['year']).agg('count')['id'].index, y=freq.groupby(['year']).agg('count')['id'].values,)])
fig.update_layout(title_text='Shootouts by year')
fig.show()

**Observations**:
- No real trend noticed. 
- 2020 has half data therefore count seems to be low
- Every year sees upto 1000 shootout cases
- 2020 is at 481 cases halfway through the year

In [None]:
fig = go.Figure(data=[go.Bar(x=freq.groupby('month').agg('count')['id'].index, y=freq.groupby('month').agg('count')['id'].values,)])
fig.update_layout(title_text='Shootouts by Month')
fig.show()

**Observations**:
- This confirms that there are more shootouts in the first quarter of the year

# Exploring Manner of Death

In [None]:
fig = go.Figure(data=[go.Pie(labels=data.manner_of_death.value_counts().index, values=data.manner_of_death.value_counts().values,textinfo='label+percent')])
fig.update_layout(title='How were they killed?')
fig.show()

**Observations**:
- Most shootouts invlove the victim being shot. Rarely is the taser used.

# What age group is affected?
#### Exploring Age

In [None]:
data.plot.hist(x="age")

**Observations**:
- Most affected Age group is 20-40
- Fewer incidents for elder people

# Exploring if Gender Makes a Difference

In [None]:
fig = go.Figure([go.Bar(x=data.gender.value_counts().index, y=data.gender.value_counts().values)])
fig.update_layout(title="Number of Shootouts by Gender")
fig.show()

In [None]:
5176/238

**Observations**:
- Number of cases where men are shot is 20x the number of female cases

# Were they armed? Did They Flee?

In [None]:
print(f"{len(data.loc[data.armed=='unarmed'])/len(data)*100}% Cases were Unarmed")

In [None]:
armed=list(data['armed'].dropna().unique())
fig, (ax2) = plt.subplots(1,1,figsize=[17, 10])
wordcloud2 = WordCloud(width=1000,height=400).generate(" ".join(armed))
ax2.imshow(wordcloud2,interpolation='bilinear')
ax2.axis('off')
ax2.set_title('Most Used Arms',fontsize=20)

**Observations**:
- Most common weapons are Knife, gun, metal

In [None]:
fig = go.Figure(data=[go.Pie(labels=data.flee.value_counts().index, values=data.flee.value_counts().values,textinfo='label+percent')])
fig.update_layout(title='Did they Flee?')
fig.show()

**Observations**:
- 66% victims did not Flee, yet were KILLED
- Majority of the remaining victims used Cars or fled on Foot

In [None]:
print(f'{len(data.loc[(data.flee=="Not fleeing") & (data.armed=="unarmed")])} Cases were Unarmed and Did not Flee. Yet were Killed.')

### 191 Victims were not armed, did not flee, yet were KILLED.

In [None]:
fig = go.Figure([go.Bar(x=data.threat_level.value_counts().index, y=data.threat_level.value_counts().values)])
fig.update_layout(title="Threat Level Assessment")
fig.show()

# Does Race Play a Role?

In [None]:
fig = go.Figure(data=[go.Pie(labels=data.race.value_counts().index, values=data.race.value_counts().values,textinfo='label+percent')])
fig.update_layout(title='Did they Flee?')
fig.show()

* W: Whitenon-Hispanic 
* B: Black, non-Hispanic 
* A: Asian 
* N: Native American 
* H: Hispanic 
* O: Other None: unknown

**Observations**:
- Clearly 50% of the victims are White
- The next most affected group is the Black Community (26.5%) , followed by Hispanic (18%)

# Geographic Trends

In [None]:
fig = go.Figure([go.Choropleth(
    locations=data.groupby(['state']).agg('count')['id'].index,
    z=data.groupby(['state']).agg('count')['id'].values.astype(float),
    locationmode='USA-states',
    colorscale='Reds',
    autocolorscale=False,
    text=data['state'], # hover text
    marker_line_color='white', # line markers between states
    showscale = True,
#     text=data.groupby(['state','race']).agg('count')['id'],
)])
fig.update_layout(geo_scope='usa',title='Shootouts across the States')
fig.show()

**Observations**:
- California, Texas and Florida have highest recorded shootouts
- Northern and NorthEastern States show lower shootout counts

# State vs Race

In [None]:
data.groupby(['state','race'])['id'].count().unstack('state').plot.bar()

**Observations**:
- Goes to show most cases originate from California, Texas and Florida

In [None]:
fig = go.Figure(go.Bar(
    x= data.groupby('city').agg('count')['id'].sort_values(ascending=False)[:20].index, 
    y= data.groupby('city').agg('count')['id'].sort_values(ascending=False)[:20].values,  
    text=data.groupby('city').agg('count')['id'].sort_values(ascending=False)[:20].index,
    textposition='outside',
    marker_color=data.groupby('city').agg('count')['id'].sort_values(ascending=False)[:20].values
))
fig.update_layout(title='Shootout by City Stats')
fig.show()

**Observations**:
- LA, Phoenix and Houston record highest shootout count

**Final Summary**:
- In the past 5 years Police Shooting have continued to remain around the 1000 kills/year mark
- California,Texas and Florida have recorded the most number of shootout deaths.
- Northern and NorthEastern States show lower shootout counts
- LA, Phoenix and Houston record highest shootout count amongst cities
- White, Black and Hispanic race record highest deaths
- People in the age bracket 20-40 are most affected
- 66% victims did not Flee, yet were KILLED
- 191 Victims were not armed, did not flee, yet were KILLED.

References<br>
https://www.kaggle.com/raenish/don-t-shoot

If you find this notebook insightful please do UPVOTE!