This notebook encases the police shootings and which states have higher record of shootings.

In [None]:
#For DataFrame
import pandas as pd
import numpy as np

#For Data Analysis
import matplotlib.pyplot as plt
import seaborn as sns

In [None]:
df = pd.read_csv("../input/data-police-shootings/fatal-police-shootings-data.csv")

In [None]:
df.head()

In [None]:
df.shape

# Exploratory Data Analysis 

In [None]:
df.isnull().sum()

This shows the columns where the values are not empty. These rows can be discarded or filled with an appropriate value.

In [None]:
fig, ax = plt.subplots(figsize=(15,7.5))   
sns.heatmap(df.isnull(),yticklabels=False, ax=ax)

The heatmap for the values that are missing in the columns. This shows where more number of values are missing from starting to the end.

In [None]:
df.groupby("race").count()

Grouping of DataFrame by race.

In [None]:
df.replace(to_replace = ['A'], value = ['Asian'], inplace = True)
df.replace(to_replace = ['B'], value = ['Black Non-Hispanic'], inplace = True)
df.replace(to_replace = ['H'], value = ['Hispanic'], inplace = True)
df.replace(to_replace = ['N'], value = ['Native American'], inplace = True)
df.replace(to_replace = ['O'], value = ['Other'], inplace = True)
df.replace(to_replace = ['W'], value = ['White Non-Hispanic'], inplace = True)

Changing of values from Abbreviations to actual category names they belong to as it is much easier to read this way.

In [None]:
df.groupby("gender").count()

Grouping of DataFrame by Gender.

In [None]:
df.groupby("body_camera").count()

Grouping of DataFrame by body camera.

In [None]:
df["armed"].value_counts()

Grouping of DataFrame by the weapon being carried.

In [None]:
df["threat_level"].value_counts()

Grouping of DataFrame by threat level.

In [None]:
df['manner_of_death'].value_counts()

Grouping of DataFrame by manner of death.

In [None]:
df['flee'].value_counts()

Grouping of DataFrame by the attempt to flee.

In [None]:
df['age'].value_counts()

Grouping of DataFrame by Age.

In [None]:
df['signs_of_mental_illness'].value_counts()

Grouping of DataFrame by mental illness.

In [None]:
df['date'].value_counts()

Grouping of DataFrame by the date they were shot at.

In [None]:
df['date'] = df['date'].apply(str)
df[['Year','Month','Day']] = df.date.apply(   lambda x: pd.Series(str(x).split("-")))
df.drop(['Month','Day','date'],axis = 'columns',inplace = True)
df['Year'] = df['Year'].apply(int)

Conversion of date to the particular year only. Removal of date and month takes place so that it becomes easier to categorize.

In [None]:
df.head()

This shows Date being dropped from the DataFrame and Year being added at the end.

In [None]:
df['Year'].value_counts()

Grouping of DataFrame by cases in years.

In [None]:
df['city'].value_counts()

Grouping of DataFrame by City.

In [None]:
df['state'].value_counts()

Grouping of DataFrame by state,

# Univariate Data Analysis by plotting

In [None]:
sns.set(rc={'figure.figsize':(15,5)})
df["age"].plot.hist()

This shows the age of the people killed. According to the data, 20-40 age span has the highest number of cases.

In [None]:
sns.countplot(x = "manner_of_death", data = df)

The manner of death of the people.

In [None]:
sns.countplot(x = "gender", data = df)

The gender of the people killed.

In [None]:
sns.countplot(x = "race", data = df)

The race of the people killed.

In [None]:
sns.countplot(x = "state", data = df)

The states where the shootings occured. 

In [None]:
sns.countplot(x = "signs_of_mental_illness", data = df)

This shows whether the people shot showed signs of mental illness.

In [None]:
sns.countplot(x = "threat_level", data = df)

The various threat level the people killed posed.

In [None]:
sns.countplot(x = "flee", data = df)

The cases of not fleeing are high whereas the cases of fleeing are at a low number.

In [None]:
sns.countplot(x = "body_camera", data = df)

This shows when the people were shot whether the officer had body camera for recording the incident.

In [None]:
sns.countplot(x = "Year", data = df)

The shootings that occured over the years. The 2020 data is less as it was only recorded till 2020 to the month of June.

In [None]:
sns.countplot(x = "gender", hue = "signs_of_mental_illness",data = df, palette = 'Set2')

Gender and Signs of Mental Illnesses has little dependence on each other.

In [None]:
sns.countplot(x = "gender", hue = "manner_of_death",data = df, palette = 'Set2')

Only men were tasered and shot. This was not the case for females.

In [None]:
sns.countplot(x = "race", hue = "manner_of_death",data = df, palette = 'Set2')

The races which were tasered and shot.

In [None]:
sns.countplot(x = "race", hue = "signs_of_mental_illness",data = df, palette = 'Set2')

The races showing mental illness and which were not showing mental illness.

In [None]:
sns.countplot(x = "race", hue = "threat_level",data = df, palette = 'Set2')

The threat level posed by the people by race.

In [None]:
sns.countplot(x = "race", hue = "flee",data = df, palette = 'Set2')

In [None]:
sns.countplot(x = "race", hue = "body_camera",data = df, palette = 'Set2')

This shows whether body camera was recording while the person was shot.

# Bivariate Data Analysis by plotting

In [None]:
sns.axes_style('whitegrid')
sns.jointplot(x = df['Year'], y =df['age'], kind = 'hex',color = 'lightcoral')

The dependence of Age and Year. The values where black hexagonal values are present shows that people are present more in average.

In [None]:
sns.catplot(x = 'race',y='age', kind = 'strip', data = df,aspect = 4)

Relation of race and age of the person killed.

In [None]:
sns.catplot(x = 'state',y='age', kind = 'strip', data = df,aspect = 4)

Relation of race and state to which the person killed belonged.

In [None]:
sns.catplot(x = 'flee',y='age', kind = 'strip', data = df,aspect = 4)

Relation of attempt to flee and age of the person killed.

In [None]:
sns.catplot(x = 'age',y='armed', kind = 'strip', data = df,height = 15)

This shows the relation whether the person was armed and the weapon being carried at particular age.

In [None]:
sns.catplot(x = 'Year',y='armed', kind = 'strip', data = df,height = 15)

The kind of weapon being carried and the Year in which such weapons were present.

![Explanation of Box Plot](https://www.simplypsychology.org/boxplot.jpg)

Max means the maximum value of records for the boxplot.

Upper Quartile encompasses the 75% of the population in the box plot.

Lower Quartile encompasses the 25% of the population in the box plot.

Inter Quartile encompasses the 25-75% of the population in the box plot. 

Median represents the middle value of the population of the category.

Min represents the minimum value from where records start for boxplot.

In [None]:
sns.catplot(x = 'race',y='Year', kind = 'box', data = df,aspect = 4)

This shows the values over the years for the races being killed.

In [None]:
sns.catplot(x = 'Year',y='state', kind = 'box', data = df,height = 10, col = 'body_camera')

State and the number of people in a year killed.This also shows whether the officer had recorded the incident or not.

In [None]:
sns.catplot(x = 'Year',y='state', kind = 'box', data = df,height = 10, col = 'signs_of_mental_illness')

The people killed in different states over the years. The different columns are for whether they showed signs of being mentally ill.

In [None]:
sns.catplot(x = 'age',y='state', kind = 'strip', data = df,height = 10, col = 'threat_level')

This shows the threat level posed by the person and whether threat level was higher for people with lower ages or higher ages.

In [None]:
sns.catplot(x = 'age',y='state', kind = 'strip', data = df, col_wrap = 2, col = 'race', height = 10)

This shows whether age was a factor for the race the person belong to who was killed. This also takes into account the state where the incident happened.

**Insights** ->

1.) The state of California had the highest number of police shootings.

2.) People of all races between the age span of 20-40 were killed the most. White non-hispanic were the race that was killed the most followed by Black Non-Hispanic.

3.) The rate of shooting for each year is almost 1000 people being killed by the officers.