# Police Shootings in the United States

![](https://static.independent.co.uk/s3fs-public/thumbnails/image/2015/12/03/11/Shootings.jpg?w968)

One of the biggest news that shook the entire world a few days ago was the Killing of George Floyd by a Minneapolic Police Officer. But this incident is not very rare. In Fact, United States of America has been recorded with one of the highest number of Police Shootings in the entire world. This analyssi is a summarization of the same, and how have the conditions varied over the span of almost 5 Years.

# Importing the Libraries

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

import seaborn as sns

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 5GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

# Reading the File

In [None]:
shootings=pd.read_csv('/kaggle/input/us-police-shootings/shootings.csv')
shootings.head()

# Checking the dimensions of the dataset

In [None]:
shootings.shape

# Checking the data types

In [None]:
shootings.dtypes

**Observation** All the columns have the right datatypes, except for Date. We shall be converting it for out ease to work.

Coverting the "Date" column from String to Date Format

In [None]:
shootings['date']=pd.to_datetime(shootings['date'])
shootings.dtypes

# Identifying the null values- and the Cleaning Requirement

In [None]:
shootings.isnull().sum()

**Obseravtion:** The data looks cleaned with no Null values in each of the columns.
Now we shall start our analysis on the same.

In [None]:
yrs_count=shootings.groupby('date').apply(lambda x:x['name'].count()).reset_index(name='Count')
plt.figure(figsize=(15,15))
plt.scatter(yrs_count['date'],yrs_count['Count'],color='r')
plt.xlabel('Date',size=15)
plt.ylabel('Death Counts',size=15)
plt.title('Death over the Years',size=20)

**Observation** The number of shooting Deaths are really high in US. Almost everyday atleast one person has been shot as seen from the scatter plot that is available above.As we go up,i.e. more number of shooting deaths on a day, the count is decreasing. Overall highest number of shootings in a day have been 9, on 3 occasions between 2018 and 2019.

# Number of People killed in Police Shootings and the Average Age

In [None]:
People_death=shootings.name.nunique()
Death_avg_age=np.average(shootings.age)
print('{} People with an average age of {} have been killed in Shooting'.format(People_death,round(Death_avg_age,2)))

# Count of deaths by age

In [None]:
age_death=shootings.groupby('age').apply(lambda x:x['name'].count()).reset_index(name='Counts')
sns.regplot(age_death['age'],age_death['Counts'],fit_reg=True)
plt.title('Death Counts by Age',size=20)

**Observation** As can be clearly seen from the chart above,the chart is reaching its peak in the 30s range. It has outliers, as seen in the 80+ range. Overall the chart is right skewed. As noted in the previous code, the average age of the victims is about 36.5, the data clearly signifies, that the mentioned age has the maximum death counts.

# Overall Number of Shooting Deaths by Gender

In [None]:
gender_dist=shootings.groupby('gender').apply(lambda x:x['name'].count()).reset_index(name='Counts')
plt.bar(gender_dist['gender'],gender_dist['Counts'],color='br')
plt.xlabel('Gender',size=15)
plt.ylabel('# of Shooting Deaths',size=15)
plt.title('Genderwise Distribution of Shooting Deaths',size=20)

**Observation** It is evidently seen the total count of Shooting Deaths is unevenly distributed, with a very high percentage of the victims being male.

# Genderwise Shooting Deaths over the last 5 years

In [None]:
shootings['year'] = pd.DatetimeIndex(shootings['date']).year
gender_yr_dist=shootings.groupby(['year','gender']).apply(lambda x:x['name'].count()).reset_index(name='Counts')
female_yr_dist=gender_yr_dist[gender_yr_dist['gender']=='F']
male_yr_dist=gender_yr_dist[gender_yr_dist['gender']=='M']
plt.plot(male_yr_dist['year'],male_yr_dist['Counts'],color='r')
plt.plot(female_yr_dist['year'],female_yr_dist['Counts'],color='b')
plt.legend(['Male','Female'])
plt.xlabel('Year',size=15)
plt.ylabel('Gender wise Death Counts',size=15)
plt.title('Genderwise Death Counts over the years',size=20)


**Observation** As discussed earlier, the number of male victims are much more than compared to the female victims, As the pattern suggests the overall number of shooting deaths have decreased over the years. But when we check the numbers, we find the number of decrease of male victims over the years is much more compared to the count decrease of female victims.

# Weapon vs Threat Analysis

In [None]:
armed_threat=shootings.groupby(['armed','threat_level']).apply(lambda x:x['name'].count()).reset_index(name='Deaths')
pivot_attack=pd.pivot(armed_threat,index='armed',columns='threat_level',values='Deaths')
plt.figure(figsize=(5,25))
sns.heatmap(pivot_attack,annot=True,fmt='.0f',cmap='GnBu')
plt.xlabel('Threat Level',size=10)
plt.ylabel('Attacking Weapon',size=15)
plt.title('Attacking Weapon vs Treat from Victims',size=25)

**Observation** Gun and Knife has been the two most suspected weapons carried by the shooting death victims. An obvious understanding is reflected here that upon attempting to attack, the death counts are more than situations where the victim did not attack with the weapons. On a surprising turn of events, we see when the victim was unarmed, or possessed some unknown weapon, the killing is more when he/she is not attacking.One of the primary reason for this could be a suspected shot.

# Attack vs Flee analysis

In [None]:
armed_threat=shootings.groupby(['arms_category','flee']).apply(lambda x:x['name'].count()).reset_index(name='Deaths')
pivot_attack=pd.pivot(armed_threat,index='arms_category',columns='flee',values='Deaths')
plt.figure(figsize=(5,25))
sns.heatmap(pivot_attack,annot=True,fmt='.0f',cmap='RdBu')
plt.xlabel('Threat Level',size=10)
plt.ylabel('Attacking Weapon',size=15)
plt.title('Attacking Weapon vs Treat from Victims',size=25)

**Observations** Most shooting deaths victims have a record of not trying to flee, with trying to flee on foot as the second most common. But for cases where the attack category is unknown, the flee rate is highest in cars.

# Is mental sickness a reason for attacks by Victims?

In [None]:
mental_attack=shootings.groupby(['signs_of_mental_illness','threat_level']).apply(lambda x:x['name'].count()).reset_index(name='Counts')
sick_threat=pd.pivot(mental_attack,index='threat_level',columns='signs_of_mental_illness',values='Counts')
sns.heatmap(sick_threat,annot=True,fmt='.0f',cmap='pink')

**Observation** As seen in the heatmap above, in most of the attacks done by the victims- they have not shown any signs of mental sickness, i.e. it was done in complete consciousness.

# Shooting Deaths by Race-Yearwise Summary

In [None]:
yr_race=shootings.groupby(['year','race']).apply(lambda x:x['name'].count()).reset_index(name='Counts')
pivoted_yr_race=pd.pivot(yr_race,columns='year',index='race',values='Counts')
plt.figure(figsize=(6,6))
plot=sns.heatmap(pivoted_yr_race,annot=pivoted_yr_race.values,fmt='d',cmap='YlOrBr')
plot.set_xlabel('Year',size=15)
plot.set_ylabel('Race',size=15)
plot.set_title('Count of Shooting Deaths by Race- Over the years',size=15)

**Observation** Upon analyzing the shooting victims, we see the Whites have been the most victimzed followd by Blacks and the Hispanics. The Natives, Asians and Other communities are comparatively less victimzed as the number shows us.

# Cities with the Highest Shooting Deaths- Over the Years

In [None]:
ct_total_shoot=shootings.groupby(['city']).apply(lambda x:x['name'].count()).reset_index(name='Shoots')
ct_max=ct_total_shoot.sort_values(by='Shoots', ascending=False)
max_shoot=ct_max[0:10] # Only considering 10 cities- taht have the maximum Shooting Death Counts
print('10 Cities with the most shooting deaths:\n',max_shoot)
city_target=shootings[shootings['city'].isin(max_shoot['city'])]
ct_shoot=city_target.groupby(['year','city']).apply(lambda x:x['name'].count()).reset_index(name='Shoot')
shoot_ct=pd.pivot(ct_shoot,values='Shoot',index='year',columns='city')
shoot_ct.replace(np.NaN,0,inplace=True)
shoot_ct.reset_index(inplace=True)
plt.figure(figsize=(10,10))
shoot_ct.reset_index(inplace=True)
for i in ct_shoot['city']:
    plt.plot(shoot_ct['year'],shoot_ct[i])
    plt.scatter(shoot_ct['year'],shoot_ct[i])
plt.legend(max_shoot['city'])
plt.xlabel('Year',size=15)
plt.ylabel('Shoot Count',size=15)
plt.title('Yearwise counts in the Cities with most Shooting Deaths',size=20)

**Observation** Los Angeles and Phoenix have been the states with the highest number of Shooting Death victims. The kill rate has been almost consistent, although it is decreasing with the years passing. 2018 and 2019 had shown extremely high killings in the cities of Albuquerque and Columbus.

# States with the most number of Shooting Deaths

In [None]:
state_summ=shootings.groupby(['state']).apply(lambda x:x['manner_of_death'].count()).reset_index(name='Counts')
state_kill_max=state_summ.sort_values(by='Counts',ascending=False)
Top_state_kill=state_kill_max[:15]
print('States in US with the most number of Police Shootings\n',Top_state_kill)


# Yearwise Shooting Deaths in States of US

In [None]:
state_yr_summ=shootings.groupby(['year','state']).apply(lambda x:x['manner_of_death'].count()).reset_index(name='Counts')
pivot_st_yr=pd.pivot(state_yr_summ,columns='year',index='state',values='Counts')
pivot_st_yr.replace(np.NaN,0)
plt.figure(figsize=(10,20))
sns.heatmap(pivot_st_yr,annot=True,fmt='.0f',cmap='RdPu')
plt.xlabel('Year',size=15)
plt.ylabel('State',size=15)
plt.title('Yearwise Shooting Deaths in US- States Data',size=20)

**Observation** The States on the Southern and South Western Parts of the country have the highest number of Shooting Deaths. In order to understand this distribution, its best we study it through a map plot. We have tried to do the same in the follwoing section.

# Visualizing the Statewise Shooting Data on the Map (Using GeoPandas)

In [None]:
import geopandas as gpd
fp = "/kaggle/input/us-shape-files/USA_States.shp"
map_df = gpd.read_file(fp)
merged = map_df.set_index('STATE_ABBR').join(state_summ.set_index('state'))
variable = 'Counts'
vmin, vmax = np.min(merged.loc[:,['Counts']]), np.max(merged.loc[:,['Counts']])
fig, ax = plt.subplots(1,figsize=(20,20))
pt=merged.plot(column=variable, cmap='Reds',ax=ax,linewidth=0.8, edgecolor='0.8')
ax.axis('on')
ax.set_title('US Statewise Shooting Counts', fontdict={'fontsize': '25', 'fontweight' : '3'})
ax.annotate('Source:US Police Shootings',xy=(0.01, .08),  xycoords='figure fraction', horizontalalignment='left', verticalalignment='top', fontsize=20, color='#555555')
sm = plt.cm.ScalarMappable(cmap='Reds', norm=plt.Normalize(vmin=vmin, vmax=vmax))
sm._A = []
cbar = fig.colorbar(sm,ax=ax)

**Observation** The Southern states of California and Texas have been the most victimized by Police Shooting Deaths over the years.

# Conclusion

1. The maximum number of killing victims have been males, and a much less percentage of victims are females.
2. The Whites have been the most victimized in Police Shootings in the United States, followed by Blacks and Hispanics.
3. Most of the victims who have been shot fell into the category, where they have been attacking with a weapon. Most of them have not attempted to flee after the attack, and were killed on spot.
4. The Southern States and South Western states have been more victimized in the number of Killings as evident from the heat map. Los Angeles and Phoenix have topped the list in the cities where the most number of people have been killed by Police Shooting.
5. The count of deaths are decreasing over the years.


# Please upvote if you liked the analysis!