# Statistics Hate Crime Analysis Project
***

## Setup

In [0]:
# Import all necessary packages, and set all necesssary settings
%config IPCompleter.greedy=True
import pandas as pd
import numpy as np
import scipy.stats as st
pd.options.display.max_rows = 500
pd.options.display.max_columns = 500

## Intro

For this project, I wanted to test if Trump's inauguration has increased the amount of hate crimes, relative to total crimes in America. Additionally, I wanted to sample data from a dataset by myself in order to make the problem more relevant to the real world. Unfortunately finding a dataset of all reported crimes in the country from 2010 to now is very impossible. So, I narrowed the scope of the project to focusing only on crimes reported to the LAPD.

## Part 1: Hypothesis

After Trump’s inauguration, the proportion of hate crimes reported to the LAPD relative to total crimes reported to the LAPD is significantly higher than the proportion during Obama’s presidency. For my test, I am treat the proportion during Obama's presidency as a population proportion, and the proportion after Trump's inauguration as the sample proportion. To test the hypothesis I will use a Single-Tail 1 Proportion Z-Test instead of a T-Test, since the number of samples I will collect will be far greater than 30. The level of significance for the test is α = 0.01.

## Part 2: Define Null and Alternate Hypotheses

H<sub>0</sub>: p = 0.00029306487384288103

H<sub>1</sub>: p > 0.00029306487384288103

## Part 3: Dataset Processing

I gathered my data from the LAPD's crime database, which is updated daily and dates back to 2010. The database was really big (1.94 million entries), so I split the dataset into a population dataset (of crimes reported before 2017) and a sample dataset (which consisted of a 10% sample of crimes reported during and after 2017). I categorized a hate crime as a crime with a hate-crime associated Modulus Operandi code.

* Dataset source: https://data.lacity.org/A-Safe-City/Crime-Data-from-2010-to-Present/y8tr-7khq
* LAPD Modulus Operandi Codes source: https://data.lacity.org/api/views/y8tr-7khq/files/3a2acdce-ef66-4cd6-b32c-ca6fa9c5336f?download=true&filename=MO_CODES_Numerical_20180627.pdf

### Part 3.1: Importing data

In [0]:
# Import the data from the csv to a Pandas data structure
crime_data = pd.read_csv('Crime_Data_from_2010_to_Present.csv',error_bad_lines=False) 

# Display the first 4 rows of the Pandas data structure (so you can understand what the data looks like)
crime_data.head()

Unnamed: 0,DR Number,Date Reported,Date Occurred,Time Occurred,Area ID,Area Name,Reporting District,Crime Code,Crime Code Description,MO Codes,Victim Age,Victim Sex,Victim Descent,Premise Code,Premise Description,Weapon Used Code,Weapon Description,Status Code,Status Description,Crime Code 1,Crime Code 2,Crime Code 3,Crime Code 4,Address,Cross Street,Location
0,151521112,11/04/2015,11/03/2015,2230,,N Hollywood,1555,330,BURGLARY FROM VEHICLE,0344,33.0,F,W,707.0,GARAGE/CARPORT,,,IC,Invest Cont,330.0,,,,11100 CAMARILLO ST,,"(34.1577, -118.3727)"
1,151521113,11/04/2015,10/30/2015,200,,N Hollywood,1548,330,BURGLARY FROM VEHICLE,0344 1609 1307,44.0,F,B,108.0,PARKING LOT,,,IC,Invest Cont,330.0,,,,11100 CHANDLER BL,,"(34.1681, -118.3724)"
2,151521117,11/04/2015,11/04/2015,1400,,N Hollywood,1506,930,CRIMINAL THREATS - NO WEAPON DISPLAYED,0421,0.0,M,H,704.0,ELEMENTARY SCHOOL,511.0,VERBAL THREAT,JA,Juv Arrest,930.0,,,,7300 BAKMAN AV,,"(34.203, -118.3779)"
3,151521121,11/04/2015,04/28/2015,2125,,N Hollywood,1567,121,"RAPE, FORCIBLE",2000 0429 1241 0416 0400 0527 1813 2002,23.0,F,O,501.0,SINGLE FAMILY DWELLING,400.0,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",AO,Adult Other,121.0,,,,10700 LANDALE ST,,"(34.1513, -118.3642)"
4,151521123,11/05/2015,10/27/2015,600,,N Hollywood,1515,354,THEFT OF IDENTITY,0100 1822,25.0,F,A,501.0,SINGLE FAMILY DWELLING,,,IC,Invest Cont,354.0,,,,11700 LEMAY ST,,"(34.1912, -118.3891)"


### Part 3.2: Splitting Data into Sample and Population Datasets

In [0]:
# Extract the year from the "Date Occurred column" and copy it to a new "Year Occurred" column
crime_data['Year Occurred'] = pd.DatetimeIndex(pd.to_datetime(crime_data['Date Occurred'])).year 

#Display the first 4 rows of the dataset again, but with the "Year Occurred" column
crime_data.head() 

Unnamed: 0,DR Number,Date Reported,Date Occurred,Time Occurred,Area ID,Area Name,Reporting District,Crime Code,Crime Code Description,MO Codes,Victim Age,Victim Sex,Victim Descent,Premise Code,Premise Description,Weapon Used Code,Weapon Description,Status Code,Status Description,Crime Code 1,Crime Code 2,Crime Code 3,Crime Code 4,Address,Cross Street,Location,Year Occurred
0,151521112,11/04/2015,11/03/2015,2230,,N Hollywood,1555,330,BURGLARY FROM VEHICLE,0344,33.0,F,W,707.0,GARAGE/CARPORT,,,IC,Invest Cont,330.0,,,,11100 CAMARILLO ST,,"(34.1577, -118.3727)",2015
1,151521113,11/04/2015,10/30/2015,200,,N Hollywood,1548,330,BURGLARY FROM VEHICLE,0344 1609 1307,44.0,F,B,108.0,PARKING LOT,,,IC,Invest Cont,330.0,,,,11100 CHANDLER BL,,"(34.1681, -118.3724)",2015
2,151521117,11/04/2015,11/04/2015,1400,,N Hollywood,1506,930,CRIMINAL THREATS - NO WEAPON DISPLAYED,0421,0.0,M,H,704.0,ELEMENTARY SCHOOL,511.0,VERBAL THREAT,JA,Juv Arrest,930.0,,,,7300 BAKMAN AV,,"(34.203, -118.3779)",2015
3,151521121,11/04/2015,04/28/2015,2125,,N Hollywood,1567,121,"RAPE, FORCIBLE",2000 0429 1241 0416 0400 0527 1813 2002,23.0,F,O,501.0,SINGLE FAMILY DWELLING,400.0,"STRONG-ARM (HANDS, FIST, FEET OR BODILY FORCE)",AO,Adult Other,121.0,,,,10700 LANDALE ST,,"(34.1513, -118.3642)",2015
4,151521123,11/05/2015,10/27/2015,600,,N Hollywood,1515,354,THEFT OF IDENTITY,0100 1822,25.0,F,A,501.0,SINGLE FAMILY DWELLING,,,IC,Invest Cont,354.0,,,,11700 LEMAY ST,,"(34.1912, -118.3891)",2015


In [0]:
# Create a new dataset for crime when Obama was in office, only containing instances before 2017 
population_data = crime_data[crime_data['Year Occurred'] < 2017]

# Create a new dataset containing a 10% sample of the crime during Trump's presidency (during and after 2017)
sample_data = crime_data[crime_data['Year Occurred'] >= 2017].sample(frac=.1)

### Part 3.3: Evaluating Sampling Conditions

* **Random:** The sample is a Simple Random Sample (I checked the documentation for the function I used). 
* **10% Condition:** The population is more than 10 times as large as the sample. 
* **Success/Failure:** n * p<sub>0</sub> = 14.2581922422 > 10 and n * (1-p<sub>0</sub>) = 48,637 > 10.

All conditions are met, thus we can proceed with the hypothesis test.


## Part 4: Collecting Metrics for Statistical Test

In [0]:
# Make a list of Modulus Operandi codes that indicate a hate crime
hate_crime_codes = ['0921','1907','2031','2030','2035','2036']

### Part 4.1: Collecting Population Metrics

In [0]:
# Begin counting the number of successes (reported hate crimes) in population
x_0 = 0

# Iterate through every entry in the 'MO Codes' column and check if hate crime code is present
for entry in population_data['MO Codes'].values.astype(str):
    # For each entry, iterate through the hate crime code list, and check if that code is present
    for code in hate_crime_codes:
        # If a hate crime code is present...
        if code in entry:
            # Add one to the hate crime count...
            x_0 += 1
            # And stop searching for hate crime codes in that entry
            break

# Show the number of successes in population
print('Population successes: ' + str(x_0))

# Find the number of data points in population
n_0 = population_data.values.shape[0]

# Show number of samples in population
print('Population samples: ' + str(n_0))

# Calculate the population proportion
p_0 = x_0/n_0

# Show the population proportion
print('Population proportion: ' + str(p_0))

Population successes: 421
Population samples: 1436542
Population proportion: 0.00029306487384288103


### Part 4.2: Collecting Sample Metrics

In [0]:
# Begin counting the number of successes (reported hate crimes) in sample dataset
x_hat = 0

# Iterate through every entry in the 'MO Codes' column and check if hate crime code is present
for entry in sample_data['MO Codes'].values.astype(str):
    # For each entry, iterate through the hate crime code list, and check if that code is present
    for code in hate_crime_codes:
        # If a hate crime code is present...
        if code in entry:
            # Add one to the hate crime count...
            x_hat += 1
            # And stop searching for hate crime codes in that entry
            break

# Show the number of successes in sample
print('Sample successes: ' + str(x_hat))

# Find the number of data points in sample
n_hat = sample_data.values.shape[0]

# Show number of samples
print('Sample samples: ' + str(n_hat))

# Calculate the population sample
p_hat = x_hat/n_hat

# Show the sample proportion
print('Sample proportion: ' + str(p_hat))

Sample successes: 44
Sample samples: 48652
Sample proportion: 0.0009043821425635123


## Part 5: Calculating Test Statistic

In [0]:
# Use the 1 Proportion Z test statistic formula to calculate the z_score
z_score = (p_hat-p_0)/np.sqrt(p_0*(1-p_0)/n_hat)

# Show the z_score
print('z_score = ' + str(z_score))

z_score = 7.87768800658138


## Part 6: Calculating p-value

In [0]:
# Calculate p-value (one sided)
p_val = st.norm.cdf(-z_score)

# Show the p-value
print('p-value = %8.20f' % (p_val))

P_value = 0.00000000000000166747


## Part 7: Conclusion

Our p-value of 0.00000000000000166747 is less than 0.01. Thus, we reject the null hypothesis win favor of the alternative hypothesis based upon the evidence collected. We have proven with 99% confidence that the proportion of hate crimes relative to all crimes reported to the LAPD has risen significantly since Trump’s inauguration.