# Racism or not? - Fatal Police US

Is there a correlation between the victims killed by the police and the problem of racism in the USA?

In this analysis that we will do we will see if there is a correlation, taking into consideration some datasets such as the one created by the Washington Post, from January 2015. This database was born after the murder of Michael Brown in Ferguson which gave rise to the Black Lives movement Matters. The main database, compiled by the Post, records all gunshot victims by the American police.

In [None]:
#importo tutte le librerie che mi serviranno per visualizzare e manipolare i dati 

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

Here is the main dataset:

In [None]:
kill = pd.read_csv('../input/data-police-shootings-washington-post/fatal-police-shootings-data.csv', parse_dates=True, index_col= 'date') #importo il file csv che mi serve per il progetto
kill.head()#visualizzo le prime 5 righe del file

Let's see how many victims were for each race.

In [None]:
#rinomino i dati della colonna race per chiarezza
kill['race'] = kill['race'].replace(['B','A', 'W', 'O', 'H', 'N'], ['Black', 'Asian', 'White', 'Other', 'Hispanic', 'Native America'])
sns.countplot(y = 'race', data= kill, order = kill['race'].value_counts().index) #conto il numero di vittime in base alla razza
plt.xlabel('kills')

From this graph, looking at the number of victims by race, there is no strong discrimination against the black race, as the white race was the most affected.

Now let's compare how many men and women have been killed among the various races.

In [None]:
sns.countplot(x='race', data=kill, hue='gender') #vediamo anche se sono stati uccisi più uomini o più donne in base alla razza di appartenenza
plt.xticks(rotation=90) #ruoto i nomi dell'asse X

We can see that most of the victims are male. Because?

It is not a problem the police have against the "stronger" gender but, according to my research and this link https://www.bjs.gov/index.cfm?ty=tp&tid=955, the male sex is that more likely to commit more serious and more violent crimes than women who commit less serious crimes, risking their lives less.

So according to this data and this research we can rule out an abuse of power based on sex.

I now want to check if the number of victims has decreased or increased from 2015 to 2019.

In [None]:
df_kill = kill.groupby('date')['state'].count() #per avere il numero di vittime conto il numero dello stato in base alla data
kill_year = df_kill.resample('Y').sum() #per sapere il numero di vittime annuali sommo i numeri ottenuti
plt.figure(figsize=(15,5))
plt.plot(kill_year[:'2019'], marker='o',label='Number of kill at the end of the year')
plt.ylabel('Number of kills')
plt.xlabel('Year')
plt.legend()
plt.show()

The number of victims has remained more or less constant since the dataset was opened up to the date considered. This leads me to think that the crime rate in America has not dropped in recent years, but has always remained the same and that therefore there is not such an evident problem of racism in the police, especially against the black race.

What weapon were the victims in possession of?

In [None]:
armed = kill.armed.value_counts()

plt.figure(figsize=(9,6))
sns.barplot(x=armed[:7].index,y=armed[:7].values)
plt.ylabel('Number of victims w/ arms')
plt.xlabel('Tipo di arma')
plt.title('Armi',color = 'blue',fontsize=15)

Why did the police shoot? Was the victim armed?

In [None]:
sns.countplot(x='armed', data=kill, hue='manner_of_death', order = kill['armed'].value_counts().index[:7])
plt.xticks(rotation=90) #ruoto i nomi dell'asse X
plt.xlabel('type of weapon')
plt.ylabel('number of victim')

The graph shows us why the police shot and used the teaser.
The teaser was used relatively little compared to the pistol, even when the victim was not in possession of a dangerous weapon or ranged weapon. This behavior can mean, depending on the case, that the police at that moment were really prejudiced on the victim, so they did not hesitate to shoot her and kill her!

What was the average age of the victims by state?

In [None]:
plt.figure(figsize=(15,8))
agevictim = kill.groupby('state')['age'].mean()#faccio la media dell'età raggruppata per stato
sns.barplot(x=agevictim.index,y=agevictim.values, order=agevictim.sort_values().index)
plt.ylabel('Età media')
plt.title('Average age for each state')

The average age shows may show another police bias. Most of the victims are around 35 years old.

It may be that the police are actually biased and there are also biases based on the victim's age. Let's check this in the graph below by comparing age to signs of insanity and level of threat.

In [None]:
plt.figure(figsize=(6,6))
sns.barplot(x= 'age', y= 'threat_level',data= kill, hue='signs_of_mental_illness')
plt.title('Mental state and attack on the police')

Apparently there is actually a bias!

The police take a bias in all cases of crime. We can verify this because most of the people who commit a crime are mentally ill, and for this reason even criminals who are not, but with whom one could "reason" before reaching a death, are involved in the bias.

Racism also plays a fundamental role here, because as we can see from the threat level, there have not been many cases of direct attack. It may also be that for the smallest offense connected with racism by the police, he triggered the gunshot.

To conclude and check if the police attacked as a sign of racism or not, let's see if when the victims attacked, the police had a camera attached to their bodies.

- A hypothesis could be that maybe the police who did not have the camera on them, deliberately killed the victim and justified themselves by the fact that they were attacking her.

In [None]:
sns.countplot(x='threat_level', data=kill, hue='body_camera')
plt.ylabel('Number of victims & police w/ camera')
plt.title ('Number of deaths with the body camera')

Most of the victims were killed without even the superiors of the agent on duty being able to confirm the true version of what happened.
This graphic can confirm a racism problem on the part of law enforcement. The officer on duty is more likely to expose his form of racism when not wearing the camera, this allows him not to be accused of abuse of power and especially of aggravated murder.

## Further biases of poverty and school level?

Now let's analyze two other datasets, the one between the poverty rate and the school level of people to see if there are any biases.

In [None]:
#importo i dataset relativi al livello di persone superiori ai 25 anni che hanno completato la scuola, il tasso di povertà e la stima delle popolazioni
school= pd.read_csv('../input/fatal-police-shootings-in-the-us/PercentOver25CompletedHighSchool.csv', encoding= 'unicode_escape')
poverty = pd.read_csv('../input/fatal-police-shootings-in-the-us/PercentagePeopleBelowPovertyLevel.csv', encoding= 'unicode_escape')
tot_people = pd.read_csv('../input/total-people-usa-2019/sub-est2019_all.csv', encoding= 'unicode_escape')
tot_people.head()

We take into consideration the number of inhabitants of the cities of the States so that we can make a more correct average of each state.

In [None]:
peoplepercity = tot_people[['NAME','POPESTIMATE2019' ]] #estrapolo solo le colonne che mi servono e creo un nuovo dataset
peoplepercity2=peoplepercity.rename(columns={'NAME':'City', 'POPESTIMATE2019': 'People_city'}) #rinomino le colonne
peoplepercity2.head()

In [None]:
poverty.head()

In [None]:
school.head()

In [None]:
hs_pr = pd.merge(school, poverty) #con la funzione pd.merge unisco i due dataframe
#nel caso ci fossero sostituisco i valori null con 0.0
hs_pr.replace(['-'],0.0,inplace = True)
hs_pr.replace(['(X)'],0.0,inplace = True)
hs_pr[['percent_completed_hs','poverty_rate']] = hs_pr[['percent_completed_hs','poverty_rate']].astype(float) #converto le colonne nominate in float
corr = pd.merge(hs_pr, peoplepercity2)#correlo i dataset di povertà e scuola a quello che è il dato della popolazione

# elimino tutti i risultati duplicati
corr_drop=corr.drop_duplicates(subset =['City']) #elimino tutti i duplicati creati con l'unione del dataset e li "filtro" per città
corr_drop2= corr_drop.set_index('Geographic Area')
corr_drop2.head()

With the functions performed up to this point, we obtained the weighted average of the poverty rate and of the people (in percentage) over 25 who have finished their studies. See the table below.

In [None]:
#calcolo la media ponderata delle due colonne così da ottenere un dataset con la media dei tassi in base allo stato
poverty_average = corr_drop2.groupby('Geographic Area').apply(lambda x: np.average(x.poverty_rate, weights= x.People_city))
school_average = corr_drop2.groupby('Geographic Area').apply(lambda x: np.average(x.percent_completed_hs, weights= x.People_city))
average = pd.DataFrame({'poverty_average': poverty_average, 'school_average': school_average })
average.reset_index(inplace=True)
average.head()

In [None]:
#per fare una giusta correlazione estrapolo il numero delle vittime per stato dal dataset principale
killperstate= pd.DataFrame(kill.state.value_counts())
killperstate.reset_index(inplace=True)
killperstate.columns=['Geographic Area','Tot_kills']


killperstate.head()

In [None]:
average_kills = pd.merge(average, killperstate) #unisco il dataset creato antecedentemente a quello delle vittime
average_kills.set_index('Geographic Area', inplace=True)
average_kills.head()

Having also taken into account the number of victims by state, we begin to see if there is a poverty bias. Are the police crueler in the poorest states? Is there a correlation?

In [None]:
plt.figure(figsize=(15,15))

sns.jointplot(x= average_kills.Tot_kills,y=poverty_average,  data=average_kills, kind="kde")
plt.ylabel('Poverty rate')
plt.xlabel('Number of kills')

So is there a poverty bias when comparing the number of victims to the poverty rate?

NO!
There is no poverty bias. If there was a poverty bias in the graph we would see the darker level higher and further to the right. This does not mean that if the police see a man or woman of any race in 'bad shape', they are (in most cases) more than ready to shoot.

We also compare the number of victims with the rate of graduates over 25. Are fewer graduates more likely to be killed by the police?

In [None]:
plt.figure(figsize=(15,15))

sns.jointplot(x= average_kills.Tot_kills,y=school_average,  data=average_kills, kind="kde")
plt.ylabel('Percentage of people over 25 w/ graduate')
plt.xlabel('Number of kills')

Here we have a second graph, similar to the previous one but which shows us not the poverty rate but the rate of people over 25 who have completed school and graduated.

The rate is really high and we have two dark spots that are manifested by the correlation. The two points are in agreement, that is, the greater the number of people who have finished their studies, the fewer the victims, as can be seen in the upper point. While as we can see from the lowest point, the fewer people who have finished their studies and the greater the victims, we can also see this from the extent of this second dark point.

## Conclusion

What conclusion can be drawn from this analysis?

According to my point of view based on what I have analyzed, racism is present in the police also because of the biases that are there and manifest themselves. We can take cases as in the example of the body camera and unarmed people, but he killed the same even when the subject who caused the "crime" could be reasoned with.

Obviously we cannot make a bundle of all the grass and for this reason we could not condemn all the police of the States.

Please Vote :D