https://samuelkellum.github.io/FinalTutorial/

https://github.com/samuelkellum/FinalTutorial

<h1 align='center'> Police Brutality in New Orleans </h1>
<h3> Ajit Alapati and Samuel Kellum </h3>

---

<h4>Table of contents:</h4>
<ol>
<li> Introduction </li>
<li> Data Collection </li>
<li> Extraction, Transform and Load </li>
<li> Exploratory Data Analysis and Data Visualization </li>
<li> Hypothesis Testing </li>
<li> Conclusion </li>
</ol>

---

<h2 align='center'>1. Introduction </h2>
<br>
<p>Shortly following the death of George Floyd while in the custody of the police, the topic of police brutality has gained a lot of attention across the United States. According to <a href='https://thelawdictionary.org/article/what-is-police-brutality/'>The Law Dictionary</a>, "Police brutality is the use of excessive and/or unnecessary force by police when dealing with civilians." Ever since George Floyd's death, there have been protests across the country calling to reform the police system. One of the biggest issues raised during these protests is that black people are disproportionately targeted by the police, and that police are more aggressive when encountering black people.</p>

<p>This issue is important in New Orleans because the city has a high proportion of black people, almost 60% according to <a href='https://www.census.gov/quickfacts/fact/table/US,neworleanscitylouisiana/LND110210'>the US Census</a>, compared to about 13% nationally. compared to about 13% nationally. Therefore, this issue of the police disproportionately targeting black people is a serious issue in our community.</p>

<p>For this project, we will be analyzing whether police interactions in New Orleans involving use of force disproportionately affect people of different ethnicities at different levels. For example, if race is not a factor in police interactions involving use of force, the distribution of ethnicities of these interactions should be similar to the population demographics.</p>
<p>Additionally, we will analyze for all police interactions in New Orleans involving use of force, whether people of different races are more likely to get injured.</p>
<p>We will also compare the results in New Orleans to Austin and Orlando. According to <a href='https://www.datacenterresearch.org/'>The New Orleans Data Center</a>, Austin and Orlando are two cities that New Orleans aspires to be like due to the growth of their economy and opportunities. We want to explore how the police in these cities compare to the results we find in New Orleans. We believe that the relationship between the police and its citizens is very important in a city's growth and prosperity, and we hope to shine a light on this relationship in New Orleans.</p>

<p>The reason we are doing this analysis is to better understand cities' relationships to their local governments. Interactions with the police are important and can help define how a population relates to their government. We aim to analyze police violence in general, and then look more into the racial breakdown of police violence. If a discrepancy is found among the data, we have then found an actionable pursuit for policymakers and lawyers to make. After the analysis, these policymakers will have somewhere to look for policy or social discrepancies.</p>


<p>In this analysis, we will answer the following questions:</p>
<ul>
<li> Are black people in New Orleans more likely to be involved in police interactions involving use of force? </li>
<li> Are certain races more likely to be injured in police encounters involving use of force in Austin, New Orleans, or Orlando?</li>
<li> Is race a more significant factor in police interactions in New Orleans than in Austin or Orlando?</li>
</ul>

---

<h2 align='center'>2. Data Collection </h2>


---

The first thing we need to do is collect all of the data we want to use for this project. For this project, we will be using three CSV files. Each CSV file contains information on the city police department's Use of Force or Response to Resistance. We will also extract census data from one URL containing demographic information on all three cities.

The first dataset we collected is a CSV file called <a href='https://data.nola.gov/Public-Safety-and-Preparedness/NOPD-Use-of-Force-Incidents/9mnw-mbde'>NOPD Use of Force Incidents</a>. This dataset represents each use of force incident by the New Orleans Police Department reported per NOPD Use of Force policy (starting in August 2016).

The next two CSV files are <a href='https://data.austintexas.gov/Public-Safety/2018-Response-to-Resistance-Data/rus9-w6q5'>Austin Police Department Response to Resistance</a> and <a href='https://data.cityoforlando.net/Orlando-Police/OPD-Response-To-Resistance/ap4w-p9kt'>Orlando Police Department Response to Resistance</a>. Since almost all police response to resistance involves use of force, this data provides the best comparison to the New Orleans Use of Force data.

We will also use census data containing racial demographic information for each city. One great feature of <a href='https://www.census.gov/'>census.gov</a> is that it allows users to compare multiple cities on a single table side-by-side. Not only can create a <a href='https://www.census.gov/quickfacts/fact/table/orlandocityflorida,neworleanscitylouisiana,austincitytexas/LND110210'>side-by-side table</a> to compare the racial demographics on the website, but we can easily extract this table from the URL for this project.


<h2 align='center'>3. Extraction, Transform and Load </h2>

<h4> Extraction and Load </h4>

In the first block of code below, we are importing the libraries we will need for the project.

In [1]:
## Loading libraries
# Load Pandas
import pandas as pd

# Load MatPlotLib
import matplotlib
import matplotlib.pyplot as plt
matplotlib.style.use('fivethirtyeight')

# Load Requests
import requests

# Load Scipy Stats
import scipy.stats as stats

Next, we will load the CSV police files from each city. We can load a CSV file in Pandas by using ```pd.read_csv()```. After loading the file, we can display the head of the DataFrame before performing any transformation on the DataFrame, as shown below for New Orleans.

In [2]:
#New Orleans
nopd_df = pd.read_csv('../_data/NOPD_Use_Of_Force_Incidents.csv')
nopd_df.head()

FileNotFoundError: [Errno 2] No such file or directory: '../_data/NOPD_Use_Of_Force_Incidents.csv'

We can do the same thing for Austin's and Orlando's police data. Once again, we can display the heads of the DataFrame before any tidying.

In [None]:
#Austin
austin_police_df = pd.read_csv('../_data/Austin_Response_To_Resistance.csv')
austin_police_df.head()

In [None]:
#Orlando
orlando_police_df = pd.read_csv("../_data/OPD_Response_To_Resistance.csv")
orlando_police_df.head()

Next, we need to extract the census data from the <a href='https://www.census.gov/quickfacts/fact/table/orlandocityflorida,neworleanscitylouisiana,austincitytexas/LND110210'>side-by-side</a> table we created on <a href='https://www.census.gov/'>census.gov</a>. We can use ```requests.get()``` from the library we imported, to send a GET request (a request to get the information from a website) to the side-by-side table's URL that we will pass as a parameter.

In [None]:
r = requests.get('https://www.census.gov/quickfacts/fact/table/orlandocityflorida,neworleanscitylouisiana,austincitytexas/LND110210')

To test that the GET request was successful, we can use ```r.status_code()```. If the status code is 200, that means the request has succeeded.

In [None]:
r.status_code

To load the side-by-side table into the URL, we can use ```pd.read_html()```, where we pass in the content of the URL. If we take the 2nd element (1st index) of the result, we will get the entire table from the website, which we can tidy in the next step.

In [None]:
demographic_df = pd.read_html(r.content)[1]
demographic_df.head(20)

<h4> Transformation </h4>

Now that we have extracted and loaded all of the data we need for this project, now we need to transform all of the data, by tidying it, to make it useful for the rest of the project.

We can first tidy the demographic DataFrame by keeping the rows that rows that relate to <b>Race and Hispanic Origin</b>, then renaming the columns. we will call the <b>Race and Hispanic Origin</b> column <b>Ethnicity</b> to remain consistent with the other websites.

In [None]:
demographic_df = demographic_df.loc[11:17].copy()
demographic_df = demographic_df.rename(columns={'Population' : 'Ethnicity', 'Unnamed: 1' : 'Orlando', 'Unnamed: 2' : 'New Orleans', 'Unnamed: 3' : 'Austin'})
demographic_df

Next, we need to remove the symbols in front of the numbers of each column. We can do this by slicing each cell in the city columns. The result is a clean looking DataFrame without the symbols.

In [None]:
demographic_df['Orlando'] = demographic_df['Orlando'].str[3:]
demographic_df['New Orleans'] = demographic_df['New Orleans'].str[3:]
demographic_df['Austin'] = demographic_df['Austin'].str[3:]

demographic_df

We should next slice out the percent sign from each column, convert the numbers into numeric types, then divide the result by 100 to return each value into a proportion.

In [None]:
demographic_df['Orlando'] = demographic_df['Orlando'].str[:-1]
demographic_df['Orlando'] = pd.to_numeric(demographic_df['Orlando'])
demographic_df['Orlando'] = demographic_df['Orlando'] / 100

demographic_df['New Orleans'] = demographic_df['New Orleans'].str[:-1]
demographic_df['New Orleans'] = pd.to_numeric(demographic_df['New Orleans'])
demographic_df['New Orleans'] = demographic_df['New Orleans'] / 100

demographic_df['Austin'] = demographic_df['Austin'].str[:-1]
demographic_df['Austin'] = pd.to_numeric(demographic_df['Austin'])
demographic_df['Austin'] = demographic_df['Austin'] / 100

demographic_df

The last thing we should do to tidy this data is to map <b>Race and Hispanic Origin</b> into the following categories: Asian, Black, Hispanic, and White. We should do this because these are the categories for race/ethnicity in the Orlando DataFrame, whereas New Orleans and Austin provide more categories. Since Orlando has fewer categories for race/ethnicity than the rest of the data, we have to group the number of categories for race/ethncity for the rest of the data. 

Since the city police data does not contain information on mixed individuals, we will exclude "Two or more races" from our demographic DataFrame. 

After mapping the Data, which will automatically any column we do not list as a key (in the mapping dictionary), we should use ```groupby()``` to group each of the categories, then sum the combined proportions of each group. 

In [None]:
demographic_df["Ethnicity"] = demographic_df["Ethnicity"].map({
    'White alone, percent': 'White',
    'Black or African American alone, percent(a)': 'Black',
    'American Indian and Alaska Native alone, percent(a)': 'Asian',
    'Asian alone, percent(a)': 'Asian',
    'Native Hawaiian and Other Pacific Islander alone, percent(a)': 'Asian',
    'Hispanic or Latino, percent(b)': 'Hispanic'
})

demographic_df = demographic_df.groupby("Ethnicity").sum()

Lastly, we should normalize the values in our new categories so the proportions of each column add to 1.

In [None]:
demographic_df['Orlando'] = demographic_df['Orlando'] / demographic_df['Orlando'].sum()
demographic_df['New Orleans'] = demographic_df['New Orleans'] / demographic_df['New Orleans'].sum()
demographic_df['Austin'] = demographic_df['Austin'] / demographic_df['Austin'].sum()

Now we can move onto the CSV files for each city. For each city, although almost all of the columns are informative, there are way more columns than we plan on using. We can tidy the data by only keeping the columns we plan on using.

In [None]:
nopd_df = nopd_df[['Subject Ethnicity', 'Subject Injured']]
austin_police_df = austin_police_df[["Subject Race", "Subject Ethnicity", "Subject Effects"]]
orlando_police_df = orlando_police_df[["Offenders Race", "Offenders Ethnicity", "Offender Injured"]]

To keep Orlando consistent with New Orleans and Austin, we should rename the columns from <b>"Offender"</b> to <b>"Subject"</b>.

In [None]:
orlando_police_df = orlando_police_df.rename(columns={
    "Offenders Race": "Subject Race",
    "Offenders Ethnicity": "Subject Ethnicity",
    "Offender Injured": "Subject Injured"
})

Now we can look at the heads of the new DataFrames.

In [None]:
austin_police_df.head()

In [None]:
orlando_police_df.head()

In [None]:
nopd_df.head()

If you look at the row matching to index 3 for the NOPD DataFrame, you can tell that <b>Subject Ethnicity</b> is "White | Black" and <b>Subject Injured</b> is "Yes | Yes." This represents a NOPD Use of Force case involving two subjects. Where the first subject was white and was injured, and the second subject was black and injured. As shown below, there are many rows involving more than two subjects.

In [None]:
nopd_df['Subject Ethnicity'].value_counts().head(10)

We can write code to split rows with multiple subjects into separate rows by using ```.apply()```, which will apply the ```lambda``` function splitting entires delimited by the regular expression ```r'\s*\|\s*'``` into multiple elements. We will use ```.explode()``` to transform the delimited elements into new rows.

The result is a new row for each NOPD Use of Force case involving multiple subjects.

In [None]:
nopd_df = nopd_df.apply(lambda col: col.str.split(r'\s*\|\s*').explode())
nopd_df.head(6)

The Orlando Police Department also enters police encounters with multiple subjects into one row.

In [None]:
orlando_police_df['Subject Race'].value_counts().head(10)

We can use the same code above for Orlando and Austin, except this time, the delimiter is a semicolon. We can use the same regular expression, substituting the pipe with the semicolon.

In [None]:
orlando_police_df = orlando_police_df.apply(lambda col: col.str.split(r'\s*\;\s*').explode())

In [None]:
orlando_police_df['Subject Race'].value_counts()

In [None]:
austin_police_df = austin_police_df.apply(lambda col: col.str.split("r'\s*\;\s*'").explode())

In [None]:
austin_police_df['Subject Race'].value_counts()

Next, since the subject races/ethnicities were listed differently for each city, we need to map the <b>Subject Race</b> for each city in a consistent format. For Austin, we are excluding the seven mixed subjects since there is no information on mixed subjects for New Orleans or Orlando.

In [None]:
nopd_df["Subject Ethnicity"] = nopd_df['Subject Ethnicity'].map({
    "Black": "Black",
    "White": "White",
    "W": "White",
    "Asian": "Asian",
    "Indian": "Asian",
    "Hispanic": "Hispanic",   
})

In [None]:
austin_police_df["Subject Race"] = austin_police_df["Subject Race"].map({
    'W': 'White',
    'B': 'Black',
    'A': 'Asian',
    'P': 'Asian',
    'I': 'Asian'})

In [None]:
orlando_police_df["Subject Race"] = orlando_police_df["Subject Race"].map({
    'W': 'White',
    'B': 'Black',
    'A': 'Asian'
    })

Since Orlando and Austin have an additional column indicating whether or not the subject was Hispanic, whereas New Orleans had one column for race and ethnicity, we need to combine the <b>Subject Race</b> and <b>Subject Ethnicity</b> columns into one column. For Austin and Orlando's data, we can do that by converting the Subject's "Race" to Hispanic, based on whether their ethnicity is listed as Hispanic, then setting the <b>Subject Ethnicity</b> values to the result of the <b>Subject Race</b> column.

In [None]:
austin_police_df.loc[austin_police_df['Subject Ethnicity'] == "H", 'Subject Race'] = "Hispanic"
austin_police_df["Subject Ethnicity"] = austin_police_df["Subject Race"]
del austin_police_df["Subject Race"]

In [None]:
orlando_police_df.loc[orlando_police_df['Subject Ethnicity'] == "HI", 'Subject Race'] = "Hispanic"
orlando_police_df["Subject Ethnicity"] = orlando_police_df["Subject Race"]
del orlando_police_df["Subject Race"]

The last transformation to perform on the data is in the <b>Subject Injured</b> column.
This column for New Orleans requires no transformation, as shown below.

In [None]:
nopd_df["Subject Injured"].value_counts()

Orlando requires a simple mapping to convert the two "OffenderInjured" values, as shown below, into "Yes."

In [None]:
orlando_police_df["Subject Injured"].value_counts()

In [None]:
orlando_police_df["Subject Injured"] = orlando_police_df["Subject Injured"].map({
    "Yes" : "Yes",
    "No": "No",
    "OffenderInjured": "Yes"
})

The Austin DataFrame is more specific about the subject's injury status, as shown below.

In [None]:
austin_police_df["Subject Effects"].value_counts()

Therefore, we have to map the results in this column indicating whether or not the subject was injured, applying the same criteria Orlando or New Orleans would use to classify if someone was injured or not, which is through the perspective of the officer.

The biggest reason we want to do this is to keep Austin's data consistent with New Orleans and Orlando, but we are making a big assumption by classifying (as an example) "COMPLAINT OF INJURY/PAIN BUT NONE OBSERVED" as "No" in the new <b>Subject Injured</b> column.

In [None]:
austin_police_df['Subject Injured'] = austin_police_df['Subject Effects'].map({
"NO COMPLAINT OF INJURY/PAIN": "No",
"MINOR INJURY" : "Yes",
"COMPLAINT OF INJURY/PAIN": "Yes",
"COMPLAINT OF INJURY/PAIN BUT NONE OBSERVED" : "No",
"MINOR INJURY; COMPLAINT OF INJURY/PAIN": "Yes",
"SERIOUS INJURY": "Yes",
"DEATH": "Yes",
"COMPLAINT OF INJURY/PAIN; COMPLAINT OF INJURY/PAIN BUT NONE OBSERVED": "No",
"COMPLAINT OF INJURY/PAIN BUT NONE OBSERVED; NO COMPLAINT OF INJURY/PAIN": "Yes",
"SERIOUS INJURY; MINOR INJURY": "Yes",
"SERIOUS INJURY; COMPLAINT OF INJURY/PAIN": "Yes",
"MINOR INJURY; COMPLAINT OF INJURY/PAIN BUT NONE OBSERVED": "Yes"})
del austin_police_df["Subject Effects"]

That concludes the transformation of the data. Below are the heads of all of the DataFrames. 

In [None]:
demographic_df.head()

In [None]:
nopd_df.head()

In [None]:
austin_police_df.head()

In [None]:
orlando_police_df.head()


---

<h2 align='center'> 4. Exploratory Data Analysis and Data Visualization </h2>

To compare the distributions between subject ethnicity for each city's police department and racial demographics of that city, a side-by-side pie chart would be the best visualization.

In [None]:
#New Orleans
fig, axes = plt.subplots(1,2, figsize=(25,9))
fig.suptitle('New Orleans', fontweight='bold', fontsize=40)

axes[0].pie(demographic_df['New Orleans'].sort_index(), labels=demographic_df['New Orleans'].sort_index().index, autopct='%1.2f%%')
axes[0].set_title('Population Demographics')

axes[1].pie(nopd_df['Subject Ethnicity'].value_counts().sort_index(), labels=nopd_df['Subject Ethnicity'].value_counts().sort_index().index, autopct='%1.2f%%')
axes[1].set_title('Use of Force Subject Ethnicity')

As we can see from this figure, Black people in New Orleans are more likely to be involved in police interactions involving use of force, whereas White, Hispanic, and Asian people are less likely to be involved in police interactions involving use of force. 

In [None]:
#Austin
fig, axes = plt.subplots(1,2, figsize=(25,9))
fig.suptitle('Austin', fontweight='bold', fontsize=40)

axes[0].pie(demographic_df['Austin'].sort_index(), labels=demographic_df['Austin'].sort_index().index, autopct='%1.2f%%')
axes[0].set_title('Population Demographics')

axes[1].pie(austin_police_df['Subject Ethnicity'].value_counts().sort_index(), labels=austin_police_df['Subject Ethnicity'].value_counts().sort_index().index, autopct='%1.2f%%')
axes[1].set_title('Response to Resistance Subject Ethnicity')

From this figure, we can see that Black and Hispanic people in Austin are more likely to be involved in police interactions involving use of force, whereas White and Asian people are less likely to be involved in police interactions involving use of force.

In [None]:
#Orlando
fig, axes = plt.subplots(1,2, figsize=(25,9))
fig.suptitle('Orlando', fontweight='bold', fontsize=40)

axes[0].pie(demographic_df['Orlando'].sort_index(), labels=demographic_df['Orlando'].sort_index().index, autopct='%1.2f%%')
axes[0].set_title('Population Demographics')

axes[1].pie(orlando_police_df['Subject Ethnicity'].value_counts().sort_index(), labels=orlando_police_df['Subject Ethnicity'].value_counts().sort_index().index, autopct='%1.2f%%')
axes[1].set_title('Response to Resistance Subject Ethnicity')

In this figure, we can see that Black people in New Orleans are more likely to be involved in police interactions involving use of force, whereas White, Hispanic, and Asian people are less likely to be involved in police interactions involving use of force.

To visualize the difference in ethnicity of the subject when they were injured or not, as a result of use of force, we should create a stacked bar plot from a cross-tabulation between <b>"Subject Ethnicity"</b> and <b>"Subject Injured"</b> for each city.

To create the stacked bar plot from each city, we create a crosstab with the numbers normalized, meaning that we make the numbers appear as a proportion, rather than a raw count. To find the ethnicity breakdown for both injured and not injured, we can calculate the conditional probability, ```ethnicity_given_injured``` for each ethnicity for both injured and not injured. After that, we can plot the DataFrame transposed (indicated by ```.T```) and set ```stacked=True``` to plot the stacked bar graph, as shown below. 

In [None]:
#New Orleans
proportions = pd.crosstab(nopd_df['Subject Ethnicity'], nopd_df['Subject Injured'],normalize=True)
ethnicity_given_injured = proportions.divide(proportions.sum(axis=0), axis=1)

#Transposing the above table to make Subject Injured the two bars to compare
(ethnicity_given_injured.T).plot.bar(stacked=True, figsize = (20, 9))
plt.title('New Orleans', fontweight='bold', fontsize=40)

From this figure, we can see that White people in police interactions involving use of force were more likely to be classified as injured than not injured, whereas Black people in police interactions were less likely to be classified as injured.

In [None]:
#Austin
proportions = pd.crosstab(austin_police_df['Subject Ethnicity'], austin_police_df['Subject Injured'],normalize=True)
ethnicity_given_injured = proportions.divide(proportions.sum(axis=0), axis=1)

#Transposing the above table to make Subject Injured the two bars to compare
(ethnicity_given_injured.T).plot.bar(stacked=True, figsize = (20, 9))
plt.title('Austin', fontweight='bold', fontsize=40)

Looking at this figure, we can see that White people in police interactions involving use of force were more likely to be classified as injured than not injured, whereas Black people in police interactions were less likely to be classified as injured. However, it is not as easy to notice a difference in the two bar graphs.

In [None]:
#Orlando
proportions = pd.crosstab(orlando_police_df['Subject Ethnicity'], orlando_police_df['Subject Injured'],normalize=True)
ethnicity_given_injured = proportions.divide(proportions.sum(axis=0), axis=1)

#Transposing the above table to make Subject Injured the two bars to compare
(ethnicity_given_injured.T).plot.bar(stacked=True, figsize = (20, 9))
plt.title('Orlando', fontweight='bold', fontsize=40)

In this figure, we can see that White people in police interactions involving use of force were more likely to be classified as injured than not injured, whereas Black people in police interactions were less likely to be classified as injured.

In the next section, we will test to see whether the differences in proportions for each city we visualized were statistically significant. 

---

<h2 align='center'> 5. Hypothesis Testing </h2>

<p>In this section, we will conduct a hypothesis test on each visualization we created above. For each hypothesis test, the <b>null hypothesis</b> will be that there is no difference between the distributions, and the <b>alternative hypothesis</b> is that there is a difference between the distributions.</p>

If the p-value generated from the hypothesis test is less than <b>0.05</b>, then we will <b>reject</b> the null hypothesis and accept the alternative hypothesis. However, if the p-value is not less than 0.05, then we will <b>fail to reject</b> the null hypothesis.

For each city, we will conduct a chi-square goodness of fit test to test whether the difference in use of force ethnic breakdown was statistically significant from the population demographics. We will also perform a chi-square test of independence for each city to see if the differences in ethnic breakdown for injury status were statistically significant for each city.

To compute the expected breakdown of use of force, based on population ethnic breakdown, we multiply each row in the demographic DataFrame by the number of values in each city's police data. We will compare this to the actual breakdown, which we can compute just by using ```.value_counts()``` in the <b>Subject Ethnicity</b> column.

We start by creating an empty DataFrame, then adding the expected and actual breakdown for each city as columns, with the resulting DataFrame shown below.

In [None]:
expected_actual_df = pd.DataFrame()

#New Orleans
expected_actual_df['New Orleans Expected'] = demographic_df['New Orleans'] * nopd_df["Subject Ethnicity"].value_counts().sum() 
expected_actual_df['New Orleans Actual'] = nopd_df["Subject Ethnicity"].value_counts() 

#Austin
expected_actual_df['Austin Expected'] = demographic_df['Austin'] * austin_police_df["Subject Ethnicity"].value_counts().sum() 
expected_actual_df['Austin Actual'] = austin_police_df["Subject Ethnicity"].value_counts() 

#Orlando
expected_actual_df['Orlando Expected'] = demographic_df['Orlando'] * austin_police_df["Subject Ethnicity"].value_counts().sum() 
expected_actual_df['Orlando Actual'] = austin_police_df["Subject Ethnicity"].value_counts() 

expected_actual_df

For each city, we can perform the chi-square goodness of fit test by calling ```chisquare()``` from the ```scipy.stats``` library. The two parameters we enter are the actual frequencies and the expected frequencies. The test will give us the chi-square test statistic (the higher the test statistic, the lower the p-value), and the p-value, which we will compare to 0.05 to determine whether or not we <b> reject </b> the null hypothesis.

In [None]:
#New Orleans
stats.chisquare(f_obs=expected_actual_df['New Orleans Actual'], f_exp=expected_actual_df['New Orleans Expected'])

For New Orleans, since the chi-squared test statistic is 881.13, and the p-value is well below 0.05 ( 1.0 * 10<sup>-190</sup>), we <b>reject</b> the null hypothesis which is that there is no relation between the ethnic breakdown of police use of force cases and the population ethnic breakdown.

In [None]:
#Austin
stats.chisquare(f_obs=expected_actual_df['Austin Actual'], f_exp=expected_actual_df['Austin Expected'])

For Austin, since the chi-squared test statistic is 3546, and the p-value is well below 0.05 (0), we <b>reject</b> the null hypothesis which is that there is no relation between the ethnic breakdown of police response to resistance interactions and the population ethnic breakdown.

In [None]:
#Orlando
stats.chisquare(f_obs=expected_actual_df['Orlando Actual'], f_exp=expected_actual_df['Orlando Expected'])

For Orlando, since the chi-squared test statistic is 506.93, and the p-value is well below 0.05 (1.5 * 10<sup>109</sup>), we <b>reject</b> the null hypothesis which is that there is no relation between the ethnic breakdown of police response to resistance interactions and the population ethnic breakdown.

We will also perform a chi-square test of independence for each city to see if ethnicity is independent of whether or not the subject was injured in use of force or response to resistance cases.

All we need to do is create a contingency table (or cross-tabulation), between subject ethnicity and whether or not the subject was injured, and input the table into ```chi2_contingency()``` from ```scipy.stats```. This test will give us the chi-square test statistic (the higher the value, the lower the p-value), the p-value, and an array showing the expected ethnic breakdown if ethnicity and whether or not the subject was injured was truly independent. Once again, we will compare the p-value to 0.05 to determine whether or not we <b> reject </b> the null hypothesis.

In [None]:
#New Orleans
counts = pd.crosstab(nopd_df['Subject Ethnicity'], nopd_df['Subject Injured'])
counts

In [None]:
stats.chi2_contingency(counts)

For New Orleans, since the chi-squared test statistic is 26.8, and the p-value is well below 0.05 (0.000006), we <b>reject</b> the null hypothesis which is that subject ethnicity is independent of whether or not they were injured from the police encounter.

In [None]:
counts = pd.crosstab(austin_police_df['Subject Ethnicity'], austin_police_df['Subject Injured'])
counts

In [None]:
stats.chi2_contingency(counts)

For Austin, since the chi-squared test statistic is 4.62, and the p-value is not below 0.05 (0.20), we <b>fail to reject</b> the null hypothesis which is that subject ethnicity is independent of whether or not they were injured from the police encounter.

In [None]:
counts = pd.crosstab(orlando_police_df['Subject Ethnicity'], orlando_police_df['Subject Injured'])
counts

In [None]:
stats.chi2_contingency(counts)

For Orlando, since the chi-squared test statistic is 16.1, and the p-value is below 0.05 (0.001), we <b>fail to reject</b> the null hypothesis which is that subject ethnicity is independent of whether or not they were injured from the police encounter.

---


<h2 align='center'> 6. Conclusion </h2>


<p>In conclusion, we found that in all cities, the proportion of Black people who were involved in use of force incidents was disproportional to the given city's demographic. So, there are no avenues of further policy research because this is a consistent issue across every that we analyzed. Despite this, we found that in Austin, the proportion of ethnicities that were injured vs those who were not injured was not statistically significant. Therefore, we suggest to lawmakers and policymakers to investigate more into policy discrepancies between New Orleans and Austin to improve on our policing laws. </p>
   
<p>However, it is important to note that the outcomes that we analyzed, i.e. the police data, are representative of real human interactions. Since this is true, we can also assume that human assumptions and biases are embedded into the data. Essentially, we can't take this data as truth. The cities' population data and use of force data might be even more discrepant than they currently are, or more unlikely, they are less discrepant. Police could be underreporting injuries or overreporting for a given race. Another point that is important in this discussion is how complex the inputs are to this outcome (police data). Socioeconomic status, each city's history of racial equality, and practically countless more factors go into these outcomes. Despite this, with the data we have, we think that analyzing policy would be a good start for lawmakers in trying to create better policing for the city of New Orleans. </p>