<h1>Fatal Force<h1>
<center><img src=https://i.imgur.com/sX3K62b.png></center>



Database of every fatal shooting in the US by a police officer in the line of duty from 2015.
The Washington Post has been tracking more than a dozen details about each killing. This includes the race, age and gender of the deceased, whether the person was armed, and whether the victim was experiencing a mental-health crisis. The Washington Post has gathered this supplemental information from law enforcement websites, local new reports, social media, and by monitoring independent databases such as "Killed by police" and "Fatal Encounters". The Post has also conducted additional reporting in many cases.

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go

In [None]:
path = "../input/police-deadly-force-usage-us/fatal-police-shootings-data.csv"

data = pd.read_csv(path)

<h2>Data Exploration<h2>

In [None]:
data.head()

In [None]:
data.shape

In [None]:
data.dtypes

<h2>Data Cleaning - Checking for Missing Values and Duplicates<h2>

In [None]:
data.isna().any()

In [None]:
data[data.isna().any(axis=1)].shape

The 5 columns has the Nan values at the 1115 row in all data.
Replace all nan values in the columns with object types as "not specified". 

In [None]:
data[["armed", "gender", "race", "flee"]] = data[["armed", "gender", "race", "flee"]].fillna(value="not specified")

In [None]:
data.isna().any()

In [None]:
data.duplicated().any()

<h2>Converting to datatime type<h2>

In [None]:
data["date"] = data["date"].apply(pd.to_datetime)

<h2>Race characterisation<h2>

In [None]:
data["race"].unique()

In [None]:
race = data["race"].value_counts()

In [None]:
rac = px.pie(race, values=race.values, names=["white", "black", "hispanic", "unknown", "asian", "native american", "other"], title="Racial distribution")
rac.update_layout(font_size=16)
rac.update_traces(textfont_size=18, hoverinfo='label+percent')
rac.show()

Almost half of the killed people were white americans. However, black and hispanic occupy the second and third place.

<h2>Total Number of Deaths of Men and Women<h2>

In [None]:
gender = data["gender"].value_counts()

In [None]:
sex = px.pie(gender, values=gender.values, names=["male", "female", "unknown"], title="Gender distribution", color_discrete_sequence=["blue", "orange", "yellow"], hole=0.4)
sex.update_layout(font_size=16)
sex.update_traces(textfont_size=18, textposition="inside",hoverinfo='label+percent')
sex.show()

Mostly the men were killed by police. The women ratio is only 4,38%.

<h2>Distribution of Age and Manner of Death<h2>

In [None]:
manner2 = data.groupby(["age", "manner_of_death"])["name"].count().reset_index()

In [None]:
man2 = px.bar(manner2, color="manner_of_death", y="name",x="age")
man2.update_layout(yaxis_title="Number of death")
man2.show()

Considering the graph the small part of people were tasered. The people at age 25-50 year were more like to be tasered than others. The elderly people were only shot.

<h2>Were People Armed? What kind of weapons they used?<h2>

In [None]:
armed = data["armed"].value_counts()

In [None]:
tab = go.Figure(data=[go.Table(
    header=dict(values=["Kind of weapons", "Count"], fill_color="lavender", font=dict(size=14, color='black')),
    cells=dict(values=[armed.index, armed.values], fill_color="#F5F5DC"))
    ])
tab.show()                      

In [None]:
(318*100)/armed.values.sum()

In [None]:
arm = px.bar(armed, x=armed.index[:3], y=armed.values[:3], title="Top 3 kind of weapons")
arm.update_layout(xaxis_title="Kind of weapons", yaxis_title="Count")
arm.show()

The most popular kind of weapons that victims had was guns and knives. However, people defended themselves with the most surprising objects that could find. Only 5.95% of the people were unarmed completely.

<h2>Did people fleeing?<h2>

In [None]:
flee = data["flee"].value_counts()
flee

In [None]:
(3356*100)/(flee.values.sum() - 250)

In [None]:
fle = px.bar(flee, x=flee.index[:3], y=flee.values[:3], title="Types of fleeing")
fle.update_layout(xaxis_title="Type of fleeing", yaxis_title="Count")
fle.show()

65.95% of people were not fleeing. The others were using cars or were running from the police.

<h2>Age distribution<h2>

In [None]:
age_id = data.groupby("id").agg({"age": pd.Series.mean})
age_id = age_id.sort_values("age", ascending=False)
age_id.isna().sum()

In [None]:
age_id.dropna()

In [None]:
age_id.describe()

In [None]:
sns.histplot(data=age_id, x="age", bins=age_id["age"].nunique(), kde=True, alpha=0.4)
plt.show()

The age distribution fluctuates from 6 to 91. The most popular age range of 30-40. The average is 37 years old with the most killed cases.

<h4>Race and age distribution<h4>

In [None]:
age_race = data.groupby(["race", "id"], as_index=False).agg({"age": pd.Series.mean})
age_race.dropna()

In [None]:
sns.histplot(data=age_race, x="age", hue="race", multiple="stack")
plt.show()

<h2>Mental Illness<h2>

In [None]:
mental = data["signs_of_mental_illness"].value_counts()

In [None]:
men = px.pie(mental, names=["Don't have", "Have"], values=mental.values, color_discrete_sequence=["#6dcbdb", "#ff8777"], title="Percentage of killed people who have mental illness")
men.update_traces(textfont_size=18, hoverinfo='label+percent')
men.show()

Most of the victims did not have a mental illness.

<h1>Geographical distribution<h1>

In [None]:
city = data["city"].value_counts()
city = city.sort_values(ascending=False)

In [None]:
cit = go.Figure(data=[go.Table(
    header=dict(values=["City", "Count"], fill_color="lavender", font=dict(size=14, color='black')),
    cells=dict(values=[city.index[:30], city.values[:30]], fill_color="#F5F5DC"))
                     ])
cit.update_layout(title_text="Top 30 cities with the number of death")
cit.show()

In [None]:
ci = px.bar(x=city.index[:10], y=city.values[:10], title="Top 10 cities with the highest rate of death")
ci.update_layout(xaxis_title="City", yaxis_title="Number of death")
ci.show()

In [None]:
race_city = data.groupby(["city", "race"])["name"].count().reset_index()
race_city = race_city.sort_values("name", ascending=False)
race_city.head()

In [None]:
rc = px.bar(race_city[:20], x="city", y="name", color="race", title="Share of each race in the top 20 cities")
rc.update_layout(yaxis_title="Number of death")
rc.show()

<h4>Choropleth Map of Police Killings by US State<h4>

In [None]:
state = data["state"].value_counts()


In [None]:
ma = px.choropleth(locations=state.index, color=state.values, hover_name=state.index, locationmode="USA-states", scope="usa", color_continuous_scale="Viridis", title="Map of Police Killings")
ma.show()

The geo analysis showed that people race strongly depends on the location.The most dangerous state in the USA is California.

<h2>Number of Police Killings Over Time<h2>

In [None]:
data["year"] = data["date"].apply(lambda x: x.year)

In [None]:
time_c = data.groupby(['year'])['id'].count().reset_index()

In [None]:
tc = px.bar(x=time_c["year"], y=time_c["id"], title="Number of killed people over time")
tc.update_layout(xaxis_title="Year", yaxis_title="Number of death")
tc.show()

In [None]:
time_group = data.groupby(["year", "race"], as_index=False).agg({"id": pd.Series.count})


In [None]:
time_g = px.line(x=time_group.year, y=time_group.id, color=time_group.race, title="Number of killed people over time with race distribution")
time_g.update_layout(xaxis_title="year", yaxis_title="Number of death")
time_g.show()

The time distribution shows that the numbers of kills did not decrease over time. The low rate at the 2020 year because there is no data for the whole year. However, we could see some correlation between race over time. The number of white people killed slightly decreases over time, but other rate of race stay unchanged.

*Thank you for  looking at my Fatal Force analysis. Please leave some comments or feedback.*