[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/googlecolab/colabtools/blob/master/notebooks/colab-github-demo.ipynb)


# Results:

By Friday, April 29 at 11:59PM, you must submit a draft of your results for at least one research question (we recommend trying to have a draft of both done by this time). If (and only if) you address all the criteria in the corresponding Results section above, you will receive full credit on the
checkpoint.
- MULTIPLE HYPOTHESIS TESTING: 
  - Summarize and interpret the results from the hypothesis tests themselves.
  - For the two correction methods you chose, clearly explain what kind of error rate is being controlled by each one
- CAUSAL INFERENCE:
  - Summarize and interpret your results, providing a clear statement about causality (or a lack thereof) including any assumptions necessary.
  - Where possible, discuss the uncertainty in your estimate and/or the evidence against the hypotheses you are investigating.
  
You are free to change your results section or add (or remove) content between the checkpoint and your final submission. Course staff will not provide any feedback on the research question checkpoint.

## Multiple Hypothesis Testing

In [2]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import statsmodels.api as sm
from statsmodels.graphics.regressionplots import abline_plot
import datetime
import math

import matplotlib.pyplot as plt

In [3]:
import random 
df = pd.read_csv('https://raw.githubusercontent.com/haroldcha/gunviolencedata/master/gun_violence.csv')

In [4]:
# reminder of cleaned dataset
df

Unnamed: 0.1,Unnamed: 0,incident_id,date,state,city_or_county,address,n_killed,n_injured,congressional_district,gun_stolen,...,participant_status,participant_type,state_house_district,state_senate_district,datetime,year,state_pop,city_pop,area (sq. mi),pop_density
0,0,461105,2013-01-01,Pennsylvania,Mckeesport,1506 Versailles Avenue and Coursin Street,0,4,14.0,,...,0::Arrested||1::Injured||2::Injured||3::Injure...,0::Victim||1::Victim||2::Victim||3::Victim||4:...,,,2013-01-01,2013,12776309,,46058,277.396088
1,1,460726,2013-01-01,California,Hawthorne,13500 block of Cerise Avenue,1,3,43.0,,...,0::Killed||1::Injured||2::Injured||3::Injured,0::Victim||1::Victim||2::Victim||3::Victim||4:...,62.0,35.0,2013-01-01,2013,38260787,85863.0,163707,233.715034
2,2,478855,2013-01-01,Ohio,Lorain,1776 East 28th Street,1,3,9.0,0::Unknown||1::Unknown,...,"0::Injured, Unharmed, Arrested||1::Unharmed, A...",0::Subject-Suspect||1::Subject-Suspect||2::Vic...,56.0,13.0,2013-01-01,2013,11576684,63735.0,44828,258.246721
3,3,478925,2013-01-05,Colorado,Aurora,16000 block of East Ithaca Place,4,0,6.0,,...,0::Killed||1::Killed||2::Killed||3::Killed,0::Victim||1::Victim||2::Victim||3::Subject-Su...,40.0,28.0,2013-01-05,2013,5269035,345613.0,104100,50.615130
4,4,478959,2013-01-07,North Carolina,Greensboro,307 Mourning Dove Terrace,2,2,6.0,0::Unknown||1::Unknown,...,0::Injured||1::Injured||2::Killed||3::Killed,0::Victim||1::Victim||2::Victim||3::Subject-Su...,62.0,27.0,2013-01-07,2013,9843336,279244.0,53821,182.890247
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
239672,239672,1083142,2018-03-31,Louisiana,Rayne,North Riceland Road and Highway 90,0,0,,0::Unknown,...,"0::Unharmed, Arrested",0::Subject-Suspect,,,2018-03-31,2018,4659690,,51843,89.880794
239673,239673,1083139,2018-03-31,Louisiana,Natchitoches,247 Keyser Ave,1,0,4.0,0::Unknown,...,"0::Killed||1::Unharmed, Arrested",0::Victim||1::Subject-Suspect,23.0,31.0,2018-03-31,2018,4659690,,51843,89.880794
239674,239674,1083151,2018-03-31,Louisiana,Gretna,1300 block of Cook Street,0,1,2.0,0::Unknown,...,0::Injured,0::Victim,85.0,7.0,2018-03-31,2018,4659690,,51843,89.880794
239675,239675,1082514,2018-03-31,Texas,Houston,12630 Ashford Point Dr,1,0,9.0,0::Unknown,...,0::Killed,0::Victim,149.0,17.0,2018-03-31,2018,28628666,2318573.0,268601,106.584361


In [5]:
def test_statistic(df, column):
    means_table = df.groupby(column).mean()
    if 1 not in means_table.index:
        diff = 0 - means_table.loc[0].values
    elif 0 not in means_table.index:
        diff = means_table.loc[1].values - 0
    else:
        diff = means_table.loc[1].values - means_table.loc[0].values
    
    return diff



In [6]:
def simulated_test_stat(df, column):
    shuffled_labels = df.sample(frac=1, replace=True)[column]
    df["shuffled_labels"] = shuffled_labels.to_numpy()
    df = df.drop(column,axis=1)
    
    return test_statistic(df, 'shuffled_labels')   

In [7]:
def p_value(df, column, observed_difference, comparison):
    differences = []

    repetitions = 1000
    for i in np.arange(repetitions):
        new_difference = simulated_test_stat(df, column)
        differences = np.append(differences, new_difference) 
    if comparison=="greater":
        #indicates that higher values of the difference favor the alternative hypothesis that the treatment incidents were higher on average
        empirical_p = np.count_nonzero(differences >= observed_difference) / repetitions
    else:
        #indicates that lower values of the difference favor the alternative hypothesis that the treatment incidents were lower on average
        empirical_p = np.count_nonzero(differences <= observed_difference) / repetitions  
    return empirical_p

### 1. Are more populated cities associated with deadlier instances of gun violence?

Null hypothesis: In the U.S., the distribution of deadly instances of gun violence (according to our definition of deadly above) is the same for cities that are populated and cities that are not. The difference in the sample is due to chance.

Alternative hypothesis: In the U.S., the instances of gun violence are deadlier for cities that are populated vs. cities that are not. 



In [8]:
most_populous = df.loc[df['city_or_county'].isin(['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix', 'Philadelphia', 'San Antonio', 'San Diego', 'Dallas', 'San Jose'])]
most_populous['populous?'] = 1
least_populous = df.loc[df['city_or_county'].isin(['Charleston', 'Niagara Falls', 'Saginaw', 'Troy', 'Lakewood', 'Enid', 'Coral Gables', 'Cerritos', 'Texas City', 'Twin Falls'])]
least_populous['populous?'] = 0
populous = pd.concat([most_populous, least_populous])
populous_df = populous.groupby(['city_or_county', 'populous?']).sum()
populous_df['num_incidents'] = populous.groupby(['city_or_county', 'populous?']).count()['incident_id'].values
populous_df['num_victims'] = populous_df['n_killed'].values + populous_df['n_injured'].values
populous_df = populous_df.loc[:,['num_incidents', 'num_victims']]
populous_df = populous_df.reset_index('populous?')
populous_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  most_populous['populous?'] = 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  least_populous['populous?'] = 0


Unnamed: 0_level_0,populous?,num_incidents,num_victims
city_or_county,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Cerritos,0,7,6
Charleston,0,519,350
Chicago,1,10814,12531
Coral Gables,0,14,12
Dallas,1,1179,999
Enid,0,60,24
Houston,1,2501,2400
Lakewood,0,113,85
Los Angeles,1,1066,1189
New York,1,377,327


In [9]:
victim_populous = populous_df.loc[:,['populous?', 'num_victims']]
test_victim_populous = test_statistic(victim_populous, 'populous?')
p_value(victim_populous, 'populous?', test_victim_populous, 'greater')

0.01

### 2. Are more populated cities associated with more instances of gun violence?

Null hypothesis: In the U.S., the distribution of instances of gun violence is the same for cities that are populated and cities that are not. The difference in the sample is due to chance.

Alternative hypothesis: In the U.S., the number of instances of gun violence (according to our definition of deadly above) is higher for cities that are populated vs. cities that are not. 

In [10]:
incidents_populous = populous_df.loc[:,['populous?', 'num_incidents']]
test_incidents_populous = test_statistic(incidents_populous, 'populous?')
p_value(incidents_populous, 'populous?', test_incidents_populous, 'greater')

0.005

### 3. Are more densely populated states associated with deadlier instances of gun violence?

Null hypothesis: In the U.S., the distribution of deadly instances of gun violence (according to our definition of deadly above) is the same for cities that are densely populated and cities that are not. The difference in the sample is due to chance.

Alternative hypothesis: In the U.S., the instances of gun violence are deadlier for cities that are densely populated vs. cities that are not. 

In [11]:
df

Unnamed: 0.1,Unnamed: 0,incident_id,date,state,city_or_county,address,n_killed,n_injured,congressional_district,gun_stolen,...,participant_status,participant_type,state_house_district,state_senate_district,datetime,year,state_pop,city_pop,area (sq. mi),pop_density
0,0,461105,2013-01-01,Pennsylvania,Mckeesport,1506 Versailles Avenue and Coursin Street,0,4,14.0,,...,0::Arrested||1::Injured||2::Injured||3::Injure...,0::Victim||1::Victim||2::Victim||3::Victim||4:...,,,2013-01-01,2013,12776309,,46058,277.396088
1,1,460726,2013-01-01,California,Hawthorne,13500 block of Cerise Avenue,1,3,43.0,,...,0::Killed||1::Injured||2::Injured||3::Injured,0::Victim||1::Victim||2::Victim||3::Victim||4:...,62.0,35.0,2013-01-01,2013,38260787,85863.0,163707,233.715034
2,2,478855,2013-01-01,Ohio,Lorain,1776 East 28th Street,1,3,9.0,0::Unknown||1::Unknown,...,"0::Injured, Unharmed, Arrested||1::Unharmed, A...",0::Subject-Suspect||1::Subject-Suspect||2::Vic...,56.0,13.0,2013-01-01,2013,11576684,63735.0,44828,258.246721
3,3,478925,2013-01-05,Colorado,Aurora,16000 block of East Ithaca Place,4,0,6.0,,...,0::Killed||1::Killed||2::Killed||3::Killed,0::Victim||1::Victim||2::Victim||3::Subject-Su...,40.0,28.0,2013-01-05,2013,5269035,345613.0,104100,50.615130
4,4,478959,2013-01-07,North Carolina,Greensboro,307 Mourning Dove Terrace,2,2,6.0,0::Unknown||1::Unknown,...,0::Injured||1::Injured||2::Killed||3::Killed,0::Victim||1::Victim||2::Victim||3::Subject-Su...,62.0,27.0,2013-01-07,2013,9843336,279244.0,53821,182.890247
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
239672,239672,1083142,2018-03-31,Louisiana,Rayne,North Riceland Road and Highway 90,0,0,,0::Unknown,...,"0::Unharmed, Arrested",0::Subject-Suspect,,,2018-03-31,2018,4659690,,51843,89.880794
239673,239673,1083139,2018-03-31,Louisiana,Natchitoches,247 Keyser Ave,1,0,4.0,0::Unknown,...,"0::Killed||1::Unharmed, Arrested",0::Victim||1::Subject-Suspect,23.0,31.0,2018-03-31,2018,4659690,,51843,89.880794
239674,239674,1083151,2018-03-31,Louisiana,Gretna,1300 block of Cook Street,0,1,2.0,0::Unknown,...,0::Injured,0::Victim,85.0,7.0,2018-03-31,2018,4659690,,51843,89.880794
239675,239675,1082514,2018-03-31,Texas,Houston,12630 Ashford Point Dr,1,0,9.0,0::Unknown,...,0::Killed,0::Victim,149.0,17.0,2018-03-31,2018,28628666,2318573.0,268601,106.584361


In [12]:
most_dense = df.loc[df['city_or_county'].isin(['New York','San Francisco','Boston','Miami','Chicago', 'Philadelphia', 'District of Columbia', 'Long Beach', 'Seattle', 'Los Angeles'])]
most_dense['dense?'] = 1
least_dense = df.loc[df['city_or_county'].isin(['Oklahoma City', 'Jacksonville', 'Nashville', 'Kansas City', 'Virginia Beach', 'Tulsa', 'Memphis', 'Tucson', 'New Orleans', 'Aurora'])]
least_dense['dense?'] = 0
dense = pd.concat([most_dense, least_dense])
dense_df = dense.groupby(['city_or_county', 'dense?']).sum()
dense_df['num_incidents'] = dense.groupby(['city_or_county', 'dense?']).count()['incident_id'].values
dense_df['num_victims'] = dense_df['n_killed'].values + dense_df['n_injured'].values
dense_df = dense_df.loc[:,['num_incidents', 'num_victims']]
dense_df = dense_df.reset_index('dense?')
dense_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  most_dense['dense?'] = 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  least_dense['dense?'] = 0


Unnamed: 0_level_0,dense?,num_incidents,num_victims
city_or_county,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Aurora,0,373,273
Boston,1,1737,762
Chicago,1,10814,12531
Jacksonville,0,2448,1643
Kansas City,0,1381,1442
Long Beach,1,343,339
Los Angeles,1,1066,1189
Memphis,0,2386,2313
Miami,1,846,922
Nashville,0,1329,1068


In [13]:
victims_dense = dense_df.loc[:,['dense?', 'num_victims']]
test_victim_dense = test_statistic(victims_dense, 'dense?')
p_value(victims_dense, 'dense?', test_victim_dense, 'greater')

0.291

### 4. Are more densely populated states associated with more instances of gun violence?

Null hypothesis: In the U.S., the distribution of instances of gun violence is the same for cities that are densely populated and cities that are not. The difference in the sample is due to chance.

Alternative hypothesis: In the U.S., the number of instances of gun violence is higher for cities that are densely populated vs. cities that are not. 

In [14]:
incidents_dense = dense_df.loc[:,['dense?', 'num_victims']]
test_incidents_dense = test_statistic(incidents_dense, 'dense?')
p_value(incidents_dense, 'dense?', test_incidents_dense, 'greater')

0.302

### 5. Are crimes getting less violent over time (number of casualties)?

Null hypothesis: In the U.S., the distribution of instances of deadly gun violence is the same before as it was after. The difference in the sample is due to chance.

Alternative hypothesis: In the U.S., the number of instances of deadly gun is higher before than it was after. 

if before == 1, we predict a greater number of casualties. if before == 0, it would be less because the number went down. therefore mean(1) - mean(0) would be positive. we want to see how likely it is to observe values more extreme (greater than this)

In [15]:
df

Unnamed: 0.1,Unnamed: 0,incident_id,date,state,city_or_county,address,n_killed,n_injured,congressional_district,gun_stolen,...,participant_status,participant_type,state_house_district,state_senate_district,datetime,year,state_pop,city_pop,area (sq. mi),pop_density
0,0,461105,2013-01-01,Pennsylvania,Mckeesport,1506 Versailles Avenue and Coursin Street,0,4,14.0,,...,0::Arrested||1::Injured||2::Injured||3::Injure...,0::Victim||1::Victim||2::Victim||3::Victim||4:...,,,2013-01-01,2013,12776309,,46058,277.396088
1,1,460726,2013-01-01,California,Hawthorne,13500 block of Cerise Avenue,1,3,43.0,,...,0::Killed||1::Injured||2::Injured||3::Injured,0::Victim||1::Victim||2::Victim||3::Victim||4:...,62.0,35.0,2013-01-01,2013,38260787,85863.0,163707,233.715034
2,2,478855,2013-01-01,Ohio,Lorain,1776 East 28th Street,1,3,9.0,0::Unknown||1::Unknown,...,"0::Injured, Unharmed, Arrested||1::Unharmed, A...",0::Subject-Suspect||1::Subject-Suspect||2::Vic...,56.0,13.0,2013-01-01,2013,11576684,63735.0,44828,258.246721
3,3,478925,2013-01-05,Colorado,Aurora,16000 block of East Ithaca Place,4,0,6.0,,...,0::Killed||1::Killed||2::Killed||3::Killed,0::Victim||1::Victim||2::Victim||3::Subject-Su...,40.0,28.0,2013-01-05,2013,5269035,345613.0,104100,50.615130
4,4,478959,2013-01-07,North Carolina,Greensboro,307 Mourning Dove Terrace,2,2,6.0,0::Unknown||1::Unknown,...,0::Injured||1::Injured||2::Killed||3::Killed,0::Victim||1::Victim||2::Victim||3::Subject-Su...,62.0,27.0,2013-01-07,2013,9843336,279244.0,53821,182.890247
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
239672,239672,1083142,2018-03-31,Louisiana,Rayne,North Riceland Road and Highway 90,0,0,,0::Unknown,...,"0::Unharmed, Arrested",0::Subject-Suspect,,,2018-03-31,2018,4659690,,51843,89.880794
239673,239673,1083139,2018-03-31,Louisiana,Natchitoches,247 Keyser Ave,1,0,4.0,0::Unknown,...,"0::Killed||1::Unharmed, Arrested",0::Victim||1::Subject-Suspect,23.0,31.0,2018-03-31,2018,4659690,,51843,89.880794
239674,239674,1083151,2018-03-31,Louisiana,Gretna,1300 block of Cook Street,0,1,2.0,0::Unknown,...,0::Injured,0::Victim,85.0,7.0,2018-03-31,2018,4659690,,51843,89.880794
239675,239675,1082514,2018-03-31,Texas,Houston,12630 Ashford Point Dr,1,0,9.0,0::Unknown,...,0::Killed,0::Victim,149.0,17.0,2018-03-31,2018,28628666,2318573.0,268601,106.584361


In [16]:
before = df.loc[df['year'].isin([2013, 2014, 2015])]
before['before?'] = 1
after = df.loc[df['year'].isin([2016, 2017, 2018])]
after['before?'] = 0
time = pd.concat([after, before])
time_df = time.groupby(['year', 'before?']).sum()
time_df['num_victims'] = time_df['n_killed'].values + time_df['n_injured'].values
time_df = time_df.loc[:,['num_victims']]
time_df = time_df.reset_index('before?')
time_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  before['before?'] = 1
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  after['before?'] = 0


Unnamed: 0_level_0,before?,num_victims
year,Unnamed: 1_level_1,Unnamed: 2_level_1
2013,1,1296
2014,1,35559
2015,1,40451
2016,0,45646
2017,0,46214
2018,0,9704


In [17]:
test_time = test_statistic(time_df, 'before?')
p_value(time_df, 'before?', test_time, 'greater')

0.65

### Summary and Interpretation of the Result on 5 Hypothesis Based on P-Value testing

For the procedure of multiple hypothesis testing, we gave 5 hypothesis regarding the count of deadly gun-violence incident, and the count of gun-violence incidents based off of the factos on the population and density of the cities. 

For our first hypothesis question regarding the correlation between population of the city and deadly gun-violence incident, our p-value was 0.01. This p-value was statistically significant as to show that our test rejects the null hypothesis. This also signifies that there are some positive correlation between the populous city and the deadly gun-violence incidents. 

Simmilarly, we discovered a similar result for our second hypothesis regarding the correlation between population of the city and total count of gun-vioence incidents. The p-value between the two factors was 0.005, which demonstrated statistically significant correlation between the two factors. The result was naturally predictable as the factor on deadly gun-violence incidents and total count of gun-violence incidents showed similar numbers. Therefore, our p-value indicated statistically significant relationship between total gun-violence incident and populous cities. Once again, we rejected our null hypothesis. 

For our third hypothesis, we studied the correlation between density of the cities against the gun-violence cases that led to death or severe injuries. The p-value that we derived from studying their correlation was 0.291, which was statistically not significant. This showed that there wasn't a strong correlation between the two factors and it failed to reject the null hypothesis. The results were expected due to multiple confounding variables involved in strictness on regulatioins of gun-violence.

Correspondingly, our fourth hypothesis, which studied the correlation between the density of the cities against the total count of gun-violence, had a close result as our third hypothesis. The resulting p-value between the two factos was 0.302, which was statistically insignificant. The p-value showed that this study also failed to reject the null hypothesis for a similar reason from our third hypothesis. 

The fifth hypothesis studied the correlation between the total count of gun-violence and the time. We tried to discover if the total count of gun-violences have gone down over time. Different from our expectation, the p-value that resulted from our study was 0.65 which was, again, statistically insignificant. This demonstrated that our study failed to reject the null hypothesis. This was unexpected as we were able to observe that the total number of gun-violence have gone down overtime from our previous analysis on the gun-violence case reports between Philladelphia and Camden.

### Corrections

In [18]:
def bonferroni(df, column, test, alpha):
    """
    Returns decisions on p-values using the Bonferroni correction.
    
    Inputs:
        p_values: array of p-values
        alpha_total: desired family-wise error rate (FWER = P(at least one false discovery))
    
    Returns:
        decisions: binary array of same length as p-values, where `decisions[i]` is 1
        if `p_values[i]` is deemed significant, and 0 otherwise
    """
    p_values = []

    for i in range(10):
        # test = test_statistic(df, column)
        p = p_value(df, column, test, 'greater')
        p_values.append(p)
    
    decisions = [(p_value <= alpha/len(p_values)) for p_value in p_values]
    return np.sum(decisions)/len(decisions)

In [19]:
def benjamini_hochberg(df, column, test, alpha):
    """
    Returns decisions on p-values using Benjamini-Hochberg.
    
    Inputs:
        p_values: array of p-values
        alpha: desired FDR (FDR = E[# false positives / # positives])
    
    Returns:
        decisions: binary array of same length as p-values, where `decisions[i]` is 1
        if `p_values[i]` is deemed significant, and 0 otherwise
    """
    p_values = []
    for i in range(10):
        # test = test_statistic(df, column)
        p = p_value(df, column, test, 'greater')
        p_values.append(p)
        
    p_list = np.array(p_values)
    p_list.sort()
    k = np.arange(len(p_values)) + 1
    pk = k*alpha/len(p_values)
    boolean_array = list(p_list <= pk)
    if sum(boolean_array) == 0:
        return 0
    p_max = np.max(p_list[boolean_array])
    decisions = [(p_value <= p_max) for p_value in p_values]
    return np.sum(decisions)/len(decisions)

### 1. Are more populated cities associated with deadlier instances of gun violence?

In [20]:
alpha = 0.05


In [21]:
bonferroni_decisions_1 = bonferroni(victim_populous, 'populous?', test_victim_populous, alpha)
bonferroni_decisions_1

0.1

In [22]:
bh_decisions_1 = benjamini_hochberg(incidents_populous, 'populous?', test_victim_populous, alpha)
bh_decisions_1

1.0

### 2. Are more populated cities associated with more instances of gun violence?

In [23]:
bonferroni_decisions_2 = bonferroni(incidents_populous, 'populous?', test_incidents_populous, alpha)
bonferroni_decisions_2

0.0

In [24]:
bh_decisions_2 = benjamini_hochberg(incidents_populous, 'populous?', test_incidents_populous, alpha)
bh_decisions_2

1.0

### 3. Are more densely populated states associated with deadlier instances of gun violence?

In [25]:
bonferroni_decisions_3 = bonferroni(victims_dense, 'dense?', test_victim_dense, alpha)
bonferroni_decisions_3

0.0

In [26]:
bh_decisions_3 = benjamini_hochberg(victims_dense, 'dense?', test_victim_dense, alpha)
bh_decisions_3

0

### 4. Are more densely populated states associated with more instances of gun violence?


In [27]:
bonferroni_decisions_4 = bonferroni(victims_dense, 'dense?', test_incidents_dense, alpha)
bonferroni_decisions_4

0.0

In [28]:
bh_decisions_4 = benjamini_hochberg(incidents_dense, 'dense?', test_incidents_dense, alpha)
bh_decisions_4

0

### 5. Are crimes getting less violent over time (number of casualties)?


In [29]:
bonferroni_decisions_5 = bonferroni(time_df, 'before?', test_time, alpha)
bonferroni_decisions_5

0.0

In [30]:
bh_decisions_5 = benjamini_hochberg(time_df, 'before?', test_time, alpha)
bh_decisions_5

0

### Summary and Interpretation of the Result



– After applying your correction procedures, which discoveries remained significant? If none
did, explain why.

~  


– What decisions can or should be made from the individual tests? What about from the
results in aggregate?

~

– Discuss any limitations in your analysis, and if relevant, how you avoided p-hacking.

~

– What additional tests would you conduct if you had more data?

~


# Citations

- https://inferentialthinking.com/chapters/12/1/AB_Testing.html
- 'lab01', Data 102 https://data102.datahub.berkeley.edu/user/doud/notebooks/sp22/labs/lab01/lab01.ipynb