<a href="https://colab.research.google.com/github/neerajthandayan/CourseProject/blob/main/Notebooks/Measurements_of_Bias.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Measurements of Police Bias
#### Referring to Bowling and Phillips (2009) study on comparing criterion for estimating the police bias, this notebook explores the four different citerion stated by the same when assessing Police bias based on Stop & Search Data, these being:


*   Resident Population Data
*   Available Population Data
*   Crime Rates
*   Stop & Search 'Hit Rate'



In [1]:
# Importing Libraies

import pandas as pd
import numpy as np

## 1. Stop & Search vs. Resident Population (Over-estimation error)
#### Here, resident population is defined as the total population in PFA. Hence to compare the same with stop and search values, we take the ratio  between total population and stop and search incidence by ethinicity.

In [2]:
# Importing the PFA population data and Stop & Search Data

ss_data = pd.read_csv('https://raw.githubusercontent.com/neerajthandayan/CourseProject/main/Data/ss_data.csv', index_col=0)
pop_data = pd.read_csv('https://raw.githubusercontent.com/neerajthandayan/CourseProject/main/Data/pop_data.csv', index_col=0)
rs = ss_data.merge(pop_data, on='Geography')
rs.head()

Unnamed: 0,Geography,SS_White,SS_Black,SS_Asian,SS_Other,SS_Mixed,White,Black,Asian,Other,Mixed
0,Cleveland,28793.0,227.0,605.0,150.0,136.0,526456,3156,17419,4434,5762
1,Durham,4848.0,15.0,15.0,10.0,14.0,605364,1058,5114,3030,4240
2,Northumbria,42225.0,189.0,864.0,49.0,114.0,1343747,9006,37782,17871,12447
3,Cheshire,5746.0,64.0,63.0,78.0,51.0,995998,3264,12794,5230,10423
4,Cumbria,9652.0,29.0,54.0,11.0,31.0,492257,579,2913,1605,2504


In [3]:
# PFA wise total population

total = rs.iloc[:,6:].sum(axis=1).values

In [4]:
# Comparing SS values based on the Resident population

rs1 = rs.iloc[:,:6].copy()

for i in rs1.columns[1:6]:
  rs1[i] = rs1[i].values/total

max = []
for i in range(len(rs1)):
  coln = rs.columns[6:]
  max.append(coln[np.argmax(rs1.iloc[i,1:])])

rs1['Max_obs'] = max
rs1.head(5)

Unnamed: 0,Geography,SS_White,SS_Black,SS_Asian,SS_Other,SS_Mixed,Max_obs
0,Cleveland,0.051672,0.000407,0.001086,0.000269,0.000244,White
1,Durham,0.007834,2.4e-05,2.4e-05,1.6e-05,2.3e-05,White
2,Northumbria,0.029718,0.000133,0.000608,3.4e-05,8e-05,White
3,Cheshire,0.005591,6.2e-05,6.1e-05,7.6e-05,5e-05,White
4,Cumbria,0.019309,5.8e-05,0.000108,2.2e-05,6.2e-05,White


In [5]:
rs1['Max_obs'].value_counts()

# We find using this measure emnt of the bias leading the over-estimation of the one ethnic group which is 'white' in this instance.
# Hence due to this caveat, this method will be disregarded and not used the this studies analysis.

White    42
Name: Max_obs, dtype: int64

## 2. Stop and Search vs. Available Population
#### Here, we calculation the ratio between stop and Search incidents by ethinic and the total population to of the respective ethnic group. 

In [6]:
ap = rs.copy()
ap_rate =  pd.DataFrame(ap.iloc[:,1:6].values/ap.iloc[:,6:].values, columns=ap.columns[6:])
ap1 = pd.concat([ap['Geography'],ap_rate],axis=1)

apmax = []
for i in range(len(ap1)):
  coln = ap.columns[6:]
  apmax.append(coln[np.argmax(ap1.iloc[i,1:])])

ap1['Max_obs'] = apmax
ap1.head(5)

Unnamed: 0,Geography,White,Black,Asian,Other,Mixed,Max_obs
0,Cleveland,0.054692,0.071926,0.034732,0.033829,0.023603,Black
1,Durham,0.008008,0.014178,0.002933,0.0033,0.003302,Black
2,Northumbria,0.031423,0.020986,0.022868,0.002742,0.009159,White
3,Cheshire,0.005769,0.019608,0.004924,0.014914,0.004893,Black
4,Cumbria,0.019608,0.050086,0.018538,0.006854,0.01238,Black


In [7]:
ap1['Max_obs'].value_counts()

## In contrats to the 'Resident Population' observation, here we observe as proprotionated values.
## This clearly shows the disproportion risk risk of the Black ethnicity towards stop and search in comparison to the other ethnicities.
## There expections such as in the case of North Yorkshire and Northumbria, where 'Asian' and 'White' populations are comparitively at higher risk. 

Black    40
Asian     1
White     1
Name: Max_obs, dtype: int64

In [8]:
# Using varince to estimate the most biased Police Departments

ap1['Bias'] = np.var(ap1.iloc[:,1:-1].values, axis=1)
ap1[['Geography','Bias']].sort_values(by='Bias', ascending=False).head(10)

Unnamed: 0,Geography,Bias
27,Metropolitan Police,0.002172
35,Dorset,0.00114
25,Norfolk,0.000636
13,Leicestershire,0.000503
31,Sussex,0.000431
38,Dyfed-Powys,0.000325
0,Cleveland,0.0003
19,West Mercia,0.000269
28,Hampshire,0.000263
30,Surrey,0.000251


In [9]:
# Creating csv file for bias

ap1[['Geography','Bias']].to_csv('Police_Bias_1.csv')

## 3. Stop & Search vs. Crime Rates
#### Here, we compare in crime rate (equated using the ration of number arrest to population by ethinicity) and stop and search rate. For this, we use the spreaman rank correlation method as we are comparing the correlation between the dicrete categories of the ethinics in terms of their arrest and ss rates. From we look to estimate bias based on the correlation coefficient, i.e., lower the correlation higher the bias and vice-versa. 

In [10]:
# Fetching Arrest Rate And Stop & Search Rate Data from Git repo

ss_rate = pd.read_csv('https://raw.githubusercontent.com/neerajthandayan/CourseProject/main/Data/ss_rate.csv', index_col=0)
ar_rate = pd.read_csv('https://raw.githubusercontent.com/neerajthandayan/CourseProject/main/Data/ar_rate.csv', index_col=0)

In [11]:
# Calculating the spearman rank correlaton between arrest rate and SS rate

arss = ss_rate.merge(ar_rate, on='Geography')
arss.iloc[:,1:6] = arss.iloc[:,1:6].copy().rank(axis=1)
arss.iloc[:,6:] = arss.iloc[:,6:].copy().rank(axis=1)
diffsq =  pd.DataFrame(np.square(arss.iloc[:,1:6].values - arss.iloc[:,6:].values))
arssbias = pd.concat([arss['Geography'],diffsq], axis=1)
arssbias['sumd'] = arssbias.iloc[:,1:].sum(axis=1)
arssbias['ARSS_Bias'] = 1 - ((6*(arssbias['sumd']))/120)
arssbias[['Geography','ARSS_Bias']].sort_values(by='ARSS_Bias').head(10)

Unnamed: 0,Geography,ARSS_Bias
21,Lincolnshire,-0.3
26,North Yorkshire,0.0
23,Metropolitan Police,0.4
38,West Midlands,0.4
4,Cleveland,0.6
34,Surrey,0.6
7,Devon and Cornwall,0.6
33,Suffolk,0.6
18,Kent,0.6
16,Hertfordshire,0.6


## Stop Search vs. Hit Rate
#### Same methodology as the above-section.
#### Note: Hit Rate is calculated using the ratio of the stop and searchs to consequent arrest resulting from the same by ethinicity.

In [12]:
# Importinh Hit Rate Data

h_rate = pd.read_csv('https://raw.githubusercontent.com/neerajthandayan/CourseProject/main/Data/hit_rate.csv', index_col=0)

In [13]:
## Measuring Bias Spearman rank correlation of the Stop & Search Rate and Hit Rate

sshr = ss_rate.merge(h_rate, on='Geography')
sshr.iloc[:,1:6] = sshr.iloc[:,1:6].copy().rank(axis=1)
sshr.iloc[:,6:] = sshr.iloc[:,6:].copy().rank(axis=1)
diffsq1 =  pd.DataFrame(np.square(sshr.iloc[:,1:6].values - sshr.iloc[:,6:].values))
sshrbias = pd.concat([sshr['Geography'],diffsq1], axis=1)
sshrbias['sumd'] = sshrbias.iloc[:,1:].sum(axis=1)
sshrbias['SSHR_Bias'] = 1 - ((6*(sshrbias['sumd']))/120)
sshrbias[['Geography','SSHR_Bias']].sort_values(by='SSHR_Bias').head(10)

Unnamed: 0,Geography,SSHR_Bias
28,Northumbria,-1.0
14,Gwent,-0.9
4,Cleveland,-0.8
23,Metropolitan Police,-0.8
27,Northamptonshire,-0.8
35,Sussex,-0.8
1,Bedfordshire,-0.7
25,North Wales,-0.7
13,Greater Manchester,-0.7
12,Gloucestershire,-0.5


In [14]:
# reating csv file

sshrbias[['Geography','SSHR_Bias']].to_csv('Police_Bias_2.csv')