# Geospatial Analysis on Police Shooting in USA
Sadly, the trend of fatal police shootings in the United States seems to only be increasing, with a total 506 civilians having been shot, 105 of whom were Black, as of June 30, 2020. In 2018, there were 996 fatal police shootings, and in 2019 this figure increased to 1,004. Additionally, the rate of fatal police shootings among Black Americans was much higher than that for any other ethnicity, standing at 31 fatal shootings per million of the population as of June 2020.

<img src="https://www.chinadaily.com.cn/world/images/attachement/jpg/site1/20160708/b083fe9fe78518e95a6503.jpg">


### Importing Libraries

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import os
import math

#taking input files
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
#libraries to plot
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
import plotly
plotly.offline.init_notebook_mode (connected = True)

#Calendar Heatmap
!pip install calmap
import calmap

#GEOSPATIAL LIBRARIES
import geopandas as gpd
import folium
from folium import Choropleth, Circle, Marker
from folium.plugins import HeatMap, MarkerCluster
from geopy.geocoders import Nominatim
from geopy.extra.rate_limiter import RateLimiter

%matplotlib inline

### Loading Dataset and Basic characteristics  of dataset

In [None]:
#load dataset into pandas dataframe
df = pd.read_csv('/kaggle/input/data-police-shootings/fatal-police-shootings-data.csv', parse_dates=["date"])
df.head()

In [None]:
#check the information luke count and datatype of each column
df.info()

In [None]:
#null values in each columns
df.isnull().sum()

In [None]:
#removing null values i.e. drop rows that has atleast one NaN value
df=df.dropna(axis=0)
df.isnull().sum()

 > *Now the dataset is free from null values, next task is to check the unique values in each column*

### Information of Data in each column

1. "id": unique id for the incident
2. "name": name of the person being shoot
3. "date": date on which incident happened
4. "manner_of_death": Type: shot, shot and Tasered
5. "armed": Type: gun, knife etc.
6. "age": age of the person
7. "gender": male/female
8. "race": W: White, B: Black, A: Asian, N: Native American, H: Hispanic, O: Other
9. "city": city of incident
10. "state": state of incident
11. "signs_of_mental_illness": True/ False 
12. "threat_level": attack, other or undetermined
13. "flee": car, foot, not-fleeing, other
14. "body_camera": does the officer had body camera on?

In [None]:
#number of unique values in each column
df.nunique()

# Relation Between Various Attributes

In [None]:
df.head()

In [None]:
df_tmp = df["body_camera"].value_counts()
fig = px.bar(df_tmp,title="Body Camera Available",color=df_tmp.index)
fig.show()

In [None]:
df_tmp = df["armed"].value_counts()[:10]
fig = px.bar(df_tmp,color=df_tmp.index,
             title="Top 10 Cases with Armed Types", color_discrete_sequence= px.colors.sequential.Plasma_r)
fig.show()

In [None]:
df_tmp = df[df['armed']=='unarmed']['race']
fig = px.histogram(df_tmp,x='race',title="Unarmed People Shoot vs Race",color='race',color_discrete_sequence=px.colors.qualitative.T10)
fig.show()


In [None]:
df["manner_of_death"].value_counts()

In [None]:
df_tmp=df[df['manner_of_death']=='shot']
fig = px.histogram(df_tmp,x='race',title="Ethinicity vs Races",color='race',color_discrete_sequence=px.colors.qualitative.T10)
fig.show()

In [None]:
df_tmp=df["age"]
fig = px.histogram(df_tmp,histnorm='probability density', title="Probability Density of Age")
fig.show()

# Calendar Describing the incidents as HeatMap

In [None]:
df_date_group = df.groupby(df["date"])
incidents = df_date_group["id"].count()
print(incidents.max(), " is the maximum no. of incidents  happened in a day")
print(incidents.min(), " is the minimum no. of incident/s happened in a day")

In [None]:
#each day heatmap showing the number of cases
fig,ax = calmap.calendarplot(incidents, monthticks=1, daylabels='MTWTFSS',
                    fillcolor='grey', linewidth=1,
                    fig_kws=dict(figsize=(15,15)))

#fig.colorbar(ax[0].get_children()[1],ax=ax, cmap=plt.cm.get_cmap('Reds', 9), orientation='horizontal',label='Number of incidents')

# Geospatial Analysis

##### the city in which incident has occured is given, so first task will be to extract the location of city i.e. lat-long values so that we can locate the city on map

In [None]:
#due to 2116 unique values of cities, the geocoder takes too much time and shows timeout error, 
#so i used google sheets Add-ons i.e. geocode by Awesome Table to geocode the values.
"""geolocator = Nominatim(user_agent="my_application")
geocode = RateLimiter(geolocator.geocode, min_delay_seconds=1)
df1['location'] = df1['city'].apply(geocode)
df1['latitude'] = df1['location'].apply(lambda loc: loc.latitude if loc else None)
df1['longitude'] = df1['location'].apply(lambda loc: loc.longitude if loc else None)
df1.head()
"""
print("city geocoded file is updated in datasets i.e. cities.csv")

In [None]:
#dataframe containing lat long values for the centroid of all counties in US
city_df = pd.read_csv("/kaggle/input/cities-geocoded-for-data-police-shootings/cities.csv")
city_df.drop(["Unnamed: 0","geom","address"], axis=1, inplace=True)
city_df

In [None]:
#add lat long values of cities to the data of shoots
df_with_cities = pd.merge(df, city_df, on=["city","state"])
df_with_cities.head()

In [None]:
#count the cases in each county 
locations = df_with_cities.groupby(df["city"])
cases = locations["id"].count()
print(cases.max()," is maximum number of cases in a city and \n",cases.min()," is minimum number of cases of shooting")
cases

In [None]:
#map total number of cases with city
data = pd.merge(cases, city_df, on=["city"])
data = data.rename(columns = {"id":"count"})
data.head()

### 1. PIN MAP: Markers representing each county and Number of cases.
> each popup on click shows the number of total cases and the city name.

In [None]:
m = folium.Map(location=[32, -100], tiles='openstreetmap', zoom_start=3)

for idx, row in data.iterrows():
    Marker([row['latitude'], row['longitude']], popup=[row['city'],row["count"]]).add_to(m)
m

### 2. CLUSTER MAP: Clustering the data on the basis of cases
> zoom in to see the cluster and bread down in smaller cluster and hence end up with the markers indicating the location and cases occurred in particular location.

In [None]:
df_cluster = df_with_cities[["name","city","longitude","latitude"]]

m = folium.Map(location=[32, -100], tiles='openstreetmap', zoom_start=3)

mc = MarkerCluster()

for idx, row in df_cluster.iterrows():
    if not math.isnan(row['longitude']) and not math.isnan(row['latitude']):
        mc.add_child(folium.Marker([row['latitude'], row['longitude']], popup=[row['city'],row["name"]]))

m.add_child(mc)
m

### 3. HEATMAP: representing Hot-spots of crime occurances.


In [None]:
m = folium.Map(location=[39, -119], tiles='cartodbpositron', zoom_start=4)

HeatMap(data=df_with_cities[['latitude', 'longitude']], radius=15).add_to(m)

m

# Statewise Analysis of Incidents
1. Getting total number of cases happening in a particular state.
2. Representation in choropleth map as per the cases.


In [None]:
states_full = gpd.read_file('/kaggle/input/us-administrative-boundaries/USA_adm1.shp')
states_geom = states_full[["NAME_1","geometry"]]
states_geom = states_geom.rename(columns={"NAME_1":"name"})
states_geom.head()

In [None]:
states = pd.read_csv("../input/states-geocoded-for-data-police-shootings/states.csv")
states.head()

In [None]:
state_count = df.groupby("state")
state_count = state_count["id"].count()
state_count = state_count.reset_index().rename(columns={"id":"count"})
print(state_count.max()["count"]," is the maximum number of incidents in a state and",state_count.min()["count"], " is the minimum.")
state_count.head()

In [None]:
#mapping state code with count of cases
df_with_states = pd.merge(states, state_count, on=["state"])
df_with_states = df_with_states.rename(columns={"id":"count"})
df_with_states.head()

### 4. CHOROPLETH MAP: represents number of cases in respective State in US

In [None]:
#url to get data of the state boundaries of USA
url = 'https://raw.githubusercontent.com/python-visualization/folium/master/examples/data'
state_geo = f'{url}/us-states.json'

In [None]:
m = folium.Map(location=[48, -102], zoom_start=3)

folium.Choropleth(
    geo_data=state_geo,
    name='choropleth',
    data=state_count,
    columns=['state', 'count'],
    key_on='feature.id',
    fill_color='BuPu',
    fill_opacity=0.8,
    line_opacity=0.2,
    legend_name='Incidents '
).add_to(m)

folium.LayerControl().add_to(m)

m

### Thanks for Having a look :)
Please upvote if u liked this notebook, as this is my first notebook i have published on kaggle.