In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

#The Basics
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import plotly.express as px
import missingno as msno

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

In [None]:
shootings = pd.read_csv("/kaggle/input/us-police-shootings/shootings.csv")

In [None]:
# Just checking
shootings.head()

There are a lot of insights that can be gained from the data. By the end of this EDA a few details surrounding police shootings in the US will be understood: where most shootings occured, mental illnesses involved, racial issues, whether the police officers were wearing body cameras etc.

**Which states are shooting hotspots?**

There are no missing values for any of the columns (variables). We can now go ahead with the data analysis. The first task is to determine which states have had the most police shootings. We shall use geocoding to convert the names of states within the dataset into latitude and longitude coordinates. This will prove useful when it comes to geographic visualization

In [None]:
# First we do a little Pandas trick to get the total shootings per state
state_shootings=shootings.state.value_counts().reset_index()
state_shootings.columns=['State','Total Shootings']
state_shootings.head()

In [None]:
import plotly.graph_objects as go
fig = go.Figure(data=go.Choropleth(
    locations=state_shootings['State'],
    z = state_shootings['Total Shootings'].astype(float),
    locationmode = 'USA-states',
    colorscale = 'Reds',
    colorbar_title = "Shootings",
))

fig.update_layout(
    title_text = 'Shootings per State',
    geo_scope='usa',
)

fig.show()

**Are Some of the Shootings Justified?**
It quickly becomes apparent that the map showing which states have the most police shootings can become glamorized or misinterpreted.There is a need to answer some questions pertaining to these shootings. For example, are all police shootings in California and Texas (the top 2 states) justified? Let's find out which percentage of all the shootings were the victims unarmed. We shall use the top 5 states in this case.

In [None]:
armed_not=shootings.armed.value_counts().reset_index()
armed_not.columns=['Armed','Total']
armed_not.head(10)

Naturally most police officers shoot when they have been threatened. In 2755 shootings the victims had guns with them,and in 708 shootings they had knives. However, in 418 instances the assailant had an 'unknown' weapon, in 348 instances they were unarmed and shockingly in 171 instances the assailant carried a toy weapon. These 3 categories (unknown, unarmed, and toy weapon) have provided the greatest amount of controversy to police shootings. For this reason they warrant their own investigation.

In [None]:
# Create a dataset with only unarmed, unknown and toy weapon 'armed' categories
controversial_shootings = shootings[shootings['armed'].isin(['unknown','unarmed','toy weapon'])]

In [None]:
controversial_shootings.head()

**Which Races Face The Most Controversial Shootings?**


In [None]:
race_controversial=controversial_shootings.race.value_counts().reset_index()
race_controversial.columns=['Race','Total']
race_controversial.head(5)

In [None]:
# Shootings per race, for the whole dataset
race=shootings.race.value_counts().reset_index()
race.columns=['Race','Total']
race.head(5)

The breakdown of the results looks quite similar. Let's find out using pie charts

**Pie Charts Showing Total Shootings per Race vs Controversial Shootings per Race**

In [None]:
import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}]], subplot_titles = ("Controversial Shootings", "Total Shootings"))

fig.add_trace(go.Pie(labels=race_controversial['Race'], values=race_controversial['Total'], name="Controversial Shootings"),
              1, 1)
fig.add_trace(go.Pie(labels=race['Race'], values=race['Total'], name="Shootings"),
              1, 2)

fig.update_traces(hole=.4, hoverinfo="label+percent")

fig.update_layout(
    title_text="US Police Shotings")
fig.show()

It seems that the controversial shootings (those involving unarmed assailants) have a similar proportion to the total shootings in terms of race. Thus, while the shooting of black unarmed men has received more attention on the news, it seems the US police generally needs to stop shooting individuals who are technically not a threat to them (regardless of race).

**Body Cameras and Shootings**

After the murder of George Floyd by the police in Minneapolis, the use of body camera footage in determining whether officers had a right to shoot, or even to murder citizens, has increased. However, there is also an increased tendency for police officers to turn off their body cameras before events go downhill. 
Using various visualizations a link will be determined between shooting of unarmed men and the turning off of body cameras.

In [None]:
# How many times body cameras were off or false
cameras=shootings.body_camera.value_counts().reset_index()
cameras.columns=['Body Cameras','Total']
cameras.head(5)

This presents a huge challenge. This is mostly because body camera legislation is still new for many states and for this reason simply taking the False value as 'Off' could lead to wrong conclusions. It is thus quite fitting to do the same (determining how often body cameras were off) for the controvesial shootings data subset.

In [None]:
# How many times body cameras were off or false
c_cameras=controversial_shootings.body_camera.value_counts().reset_index()
c_cameras.columns=['Body Cameras','Total']
c_cameras.head(5)

It seems that any further investigation on body cameras will fall short if further information on the state laws during the time of a specific shooting are not presented.

**Were the unarmed assailants fleeing?**
One reason given by many police departments is that assailants or in this case suspects were fleeing and for this reason they had to be stopped. I shall include those who had toy weapons and those who had unknown weapons i.e. use the controversial shootings data subset.

In [None]:
# Were the victims fleeing arrest?
fleeing=controversial_shootings.flee.value_counts().reset_index()
fleeing.columns=['Fleeing?','Total']
fleeing.head(5)

In [None]:
# Bar chart of whether victims were fleeing
fig = px.bar(fleeing, x='Fleeing?', y='Total')
fig.show()

**Most of the victims in the controversial shootings were not fleeing. 125 of them were on foot and 74 of them  are categorized as 'other.' It seems that many of the police departments that were invovled in these controversial shootings could really use better education on how to handle volatile situations.**