### This kernel is meant for educational purposes and isn't intented to hurt anyone's sentiments. The objective of the kernel is to explore the dataset and to not objectify/degrade particular communities.

# Minneapolis Police Interactions: A Detailed Analysis

Minneapolis is the largest city in the U.S. state of Minnesota and the principal city of the 16th-largest metropolitan area in the United States.

The dataset contains interactions of the Minneapolis Police Department.

Let us explore the dataset by importing the libraries. 

In [1]:
import pandas as pd
import numpy as np
from collections import Counter
import plotly.express as px

Now, we'll load the .csv file into a DataFrame.

In [2]:
df = pd.read_csv('/kaggle/input/minneapolis-police-stops-and-police-violence/police_stop_data.csv', low_memory = False)
force_df = pd.read_csv('/kaggle/input/minneapolis-police-stops-and-police-violence/police_use_of_force.csv')

In [3]:
df.head()

Unnamed: 0,OBJECTID,masterIncidentNumber,responseDate,reason,problem,callDisposition,citationIssued,personSearch,vehicleSearch,preRace,race,gender,lat,long,x,y,policePrecinct,neighborhood,lastUpdateDate
0,1,16-395258,2016/10/31 22:40:47+00,,Suspicious Person (P),BKG-Booking,,YES,NO,Black,Black,Male,44.97957,-93.27257,-10383060.0,5618306.0,1.0,Downtown West,2017/08/08 10:25:31+00
1,2,16-395296,2016/10/31 23:06:36+00,,Traffic Law Enforcement (P),TAG-Tagged,,NO,NO,Unknown,Black,Male,44.962689,-93.275921,-10383430.0,5615650.0,5.0,Steven's Square - Loring Heights,2017/08/08 10:26:13+00
2,3,16-395326,2016/10/31 23:20:54+00,,Attempt Pick-Up (P),RFD-Refused,,NO,NO,Unknown,Unknown,Unknown,45.024836,-93.288069,-10384780.0,5625432.0,4.0,Webber - Camden,2017/08/08 10:24:35+00
3,4,16-395328,2016/10/31 23:23:20+00,,Suspicious Person (P),BKG-Booking,,YES,NO,Black,Black,Male,44.94656,-93.24741,-10380250.0,5613112.0,3.0,Corcoran,2017/08/08 10:25:31+00
4,5,16-395333,2016/10/31 23:26:05+00,,Suspicious Vehicle (P),GOA-Gone on Arrival,,NO,NO,Other,Unknown,Male,44.90617,-93.25501,-10381100.0,5606762.0,3.0,Hale,2017/08/08 10:25:03+00


Let us inspect the DataFrame and check which columns have missing values.

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 161644 entries, 0 to 161643
Data columns (total 19 columns):
 #   Column                Non-Null Count   Dtype  
---  ------                --------------   -----  
 0   OBJECTID              161644 non-null  int64  
 1   masterIncidentNumber  161644 non-null  object 
 2   responseDate          161644 non-null  object 
 3   reason                125294 non-null  object 
 4   problem               161644 non-null  object 
 5   callDisposition       159402 non-null  object 
 6   citationIssued        110318 non-null  object 
 7   personSearch          141417 non-null  object 
 8   vehicleSearch         141417 non-null  object 
 9   preRace               141417 non-null  object 
 10  race                  141417 non-null  object 
 11  gender                141417 non-null  object 
 12  lat                   161644 non-null  float64
 13  long                  161644 non-null  float64
 14  x                     161644 non-null  float64
 15  

# Distribution of Cases Per Year

Now we'll extract years from dates, which will help us in further plotting.

We'll append the counts of each year to a DataFrame.

In [5]:
year_values = []
for i in range(len(df)):
    date = df['responseDate'][i].split(" ")[0]
    year = date.split("/")[0]
    year_values.append(year)
    
year_counts = dict(Counter(year_values))
year_counts = {'year': list(year_counts.keys()), 'count': list(year_counts.values())}
years_df = pd.DataFrame(year_counts)
years_df

Unnamed: 0,year,count
0,2016,6822
1,2017,54156
2,2018,47977
3,2019,36540
4,2020,16149


Here is our first donut chart, which contains the distribution of cases recorded in each year:

In [6]:
fig_yearly = px.pie(years_df, values = 'count', names = 'year', title = 'Yearly Cases Distribution', hole = .5, color_discrete_sequence = px.colors.diverging.Portland)
fig_yearly.show()

Most cases were recorded in 2017 and 2018.

We'll do the same pre-processing for other variables too:

# Distribution of Case Types
Let us see the distribution of cases on the basis of the problem:

In [7]:
problem_counts_dict = dict(Counter(df['problem']))
problem_df_dict = {'problem': list(problem_counts_dict.keys()), 'count': list(problem_counts_dict.values())}

problem_df = pd.DataFrame(problem_df_dict)
problem_df

Unnamed: 0,problem,count
0,Suspicious Person (P),44509
1,Traffic Law Enforcement (P),79183
2,Attempt Pick-Up (P),3615
3,Suspicious Vehicle (P),34164
4,Curfew Violations (P),120
5,Truancy (P),53


In [8]:
fig_yearly = px.pie(problem_df, values = 'count', names = 'problem', title = 'Type of Cases', hole = .5, color_discrete_sequence = px.colors.sequential.Agsunset)
fig_yearly.show()

Hence, most people were caught violating traffic laws or were displaying suspicious activity.

# Interactive Maps

Now, we'll be using an interactive map to see at which locations were the cases recorded:

In [9]:
import folium
from folium.plugins import FastMarkerCluster
locations = df[['lat', 'long']]
locationlist = locations.values.tolist()

In [10]:
map = folium.Map(location=[44.986656, -93.258133], zoom_start=12)
FastMarkerCluster(data=list(zip(df['lat'].values, df['long'].values))).add_to(map)
map

**This map is interactive. Click on the orange clusters to see more cases in the neighborhood.**

**Each cluster indicates the collective amount of cases in the surrounding areas highlighted in blue (visible on hover).**

# Distribution of Races

In [11]:
df['race'].fillna('No Data', inplace = True)
race_counts_dict = dict(Counter(df['race']))

race_counts_dict['Unknown'] += race_counts_dict['No Data']
del race_counts_dict['No Data']

race_df_dict = {'race': list(race_counts_dict.keys()), 'count': list(race_counts_dict.values())}

race_df = pd.DataFrame(race_df_dict)
race_df

Unnamed: 0,race,count
0,Black,49940
1,Unknown,54287
2,East African,7280
3,White,34737
4,Latino,5664
5,Asian,1935
6,Native American,4183
7,Other,3618


In [12]:
fig_race = px.pie(race_df, values = 'count', names = 'race', title = 'Distribution of Races', hole = .5, color_discrete_sequence = px.colors.diverging.Temps)
fig_race.show()

Let us now use the second dataset:

In [13]:
force_new = force_df[['ForceType', 'EventAge', 'TypeOfResistance', 'Is911Call']]
force_new.head()

Unnamed: 0,ForceType,EventAge,TypeOfResistance,Is911Call
0,Bodily Force,44.0,Fled in Vehicle,No
1,Bodily Force,17.0,Commission of Crime,No
2,Chemical Irritant,24.0,Commission of Crime,Yes
3,Bodily Force,42.0,Unspecified,Yes
4,Taser,41.0,Commission of Crime,Yes


# Forces Used by the Police

Let us now see the various types of forces used by the police in incidents:

In [14]:
force_counts_dict = dict(Counter(force_new['ForceType']))

force_counts_dict['Unknown'] = force_counts_dict[np.nan]
del force_counts_dict[np.nan]

force_df_dict = {'force': list(force_counts_dict.keys()), 'count': list(force_counts_dict.values())}

force_type_df = pd.DataFrame(force_df_dict)
force_type_df

Unnamed: 0,force,count
0,Bodily Force,20694
1,Chemical Irritant,4172
2,Taser,2698
3,Improvised Weapon,330
4,Baton,57
5,Police K9 Bite,289
6,Firearm,44
7,Less Lethal Projectile,16
8,Gun Point Display,439
9,Maximal Restraint Technique,128


In [15]:
fig_force = px.bar(force_type_df, x = 'force', y = 'count')
fig_force.show()

We can see that most people were arrested with the help of bodily force or some chemical irritant. Tasers were also used in forceful arrest.

# Distribution of Ages of People Involved

We can see that people between the 20-40 were arrested. 

In [16]:
fig_age_hist = px.histogram(force_new, x = 'EventAge', nbins=10, opacity = 0.7)
fig_age_hist.show()

# Distribution of types of Resistance

The DataFrame contains several values of the same value in different formats.

For example: There are several rows with 'Assualting Police Horse' as the value which is similar to 'Assualted Police Horse'. We need to merge these values together.

This is a serious issue as the same type of resistance is classified into different bins. For ideal plotting, we'll process the values and add into several bins.

Here is the dataframe after pre-processing:

In [17]:
force_df['TypeOfResistance'].fillna('Unknown', inplace = True)
cleaned_types = []
for item in force_df['TypeOfResistance']:
    p1_item = item.strip()
    p2_item = p1_item.title()
    cleaned_types.append(p2_item)
    
force_df['TypeNew'] = cleaned_types

resistance_counts_dict = dict(Counter(force_df['TypeNew']))

resistance_counts_dict['Unspecified'] += resistance_counts_dict['Unknown']
del resistance_counts_dict['Unknown']

resistance_counts_dict['Commission Of Crime'] += resistance_counts_dict['Commission Of A Crime']
del resistance_counts_dict['Commission Of A Crime']

resistance_counts_dict['Fled In Vehicle'] += resistance_counts_dict['Fled In A Vehicle']
del resistance_counts_dict['Fled In A Vehicle']

resistance_counts_dict['Assaulting Police Horse'] += resistance_counts_dict['Assaulted Police Horse']
del resistance_counts_dict['Assaulted Police Horse']

resistance_counts_df_dict = {'type': list(resistance_counts_dict.keys()), 'count': list(resistance_counts_dict.values())}

resistance_df = pd.DataFrame(resistance_counts_df_dict)
resistance_df

Unnamed: 0,type,count
0,Fled In Vehicle,825
1,Commission Of Crime,6112
2,Unspecified,3629
3,Fled On Foot,4476
4,Tensed,7938
5,Verbal Non-Compliance,3168
6,Assaulted Officer,3225
7,Assaulting Police Horse,51
8,Assaulting Police K9,14
9,Other,692


In [18]:
fig_resistance = px.pie(resistance_df, values = 'count', names = 'type', title = 'Distribution of Resistance', hole = .5, color_discrete_sequence = px.colors.diverging.Picnic)
fig_resistance.show()

# How many people called 911?

In [19]:
_911_counts_dict = dict(Counter(force_new['Is911Call']))

_911_counts_dict['Unspecified'] = _911_counts_dict[np.nan]
del _911_counts_dict[np.nan]

_911_df_dict = {'val': list(_911_counts_dict.keys()), 'count': list(_911_counts_dict.values())}

_911_df = pd.DataFrame(_911_df_dict)
_911_df

Unnamed: 0,val,count
0,No,16305
1,Yes,12690
2,Unspecified,1135


In [20]:
fig_911 = px.pie(_911_df, values = 'count', names = 'val', title = 'Distribution of 911 Calls', hole = .5, color_discrete_sequence = ['#ff4757', '#10ac84', '#2f3542'])
fig_911.show()

## Feel free to  give suggestions and upvote this kernel if you loved it!