# Class 21: Hypothesis tests continued

Plan for today:
- Review of statistical inference and hypothesis tests for a single proportion
- Hypothesis tests for more than one proportion
- Hypothesis tests assessing causality  


## Notes on the class Jupyter setup

If you have the *ydata123_2023e* environment set up correctly, you can get the class code using the code below (which presumably you've already done given that you are seeing this notebook).  

In [None]:
import YData

# YData.download.download_class_code(21)   # get class code    
# YData.download.download_class_code(21, TRUE) # get the code with the answers 

YData.download_data("bta.csv")

There are also similar functions to download the homework:

In [None]:
# YData.download.download_homework(8)  # downloads the homework 

If you are using colabs, you should install the YData packages by uncommenting and running the code below.

In [None]:
# !pip install https://github.com/emeyers/YData_package/tarball/master

If you are using google colabs, you should also uncomment and run the code below to mount the your google drive

In [None]:
# from google.colab import drive
# drive.mount('/content/drive')

In [None]:
import statistics
import pandas as pd
import numpy as np
import plotly.express as px
from urllib.request import urlopen

import matplotlib.pyplot as plt
%matplotlib inline

## Hypothesis tests

In hypothesis testing, we start with a claim about a population parameter (e.g., µ = 4.2, or π = 0.25).

This claim implies we should get a certain distribution of statistics, called "The null distribution". 

If our observed statistic is highly unlikely to come from the null distribution, we reject the claim. 

We can break down the process of running a hypothesis test into 5 steps. 

1. State the null and alternative hypothesis
2. Calculate the observed statistic of interest
3. Create the null distribution 
4. Calculate the p-value 
5. Make a decision

Let's run through these steps now!


## 1. Hypothesis test for multiple proportions

In a hypothesis test for multiple proportions, we are testing whether each proportion is equal to a particular value. I.e., we are testing whether $\pi_1 = p_1$, $\pi_2 = p_2$, ..., $\pi_k = p_k$, for some proportions $p_1$, $p_2$, ..., $p_k$.

A special case of this is whether all populations proportions are the same, which can be written as: $\pi_1 = \pi_2 = ... = \pi_k$.


### Motivating example: ALCU vs. Almeda County

As a motivating example, let's look look at a report by the American Civil Liberties Union (ACLU) of Almada County jury selection. In particular, the ACLU claimed that jury panels in Almeda were not representative of the underlying demographics of the population of the citizens who lived there. 

The demographics of Almeda county, and the proportion of people selected to be on jury panels, is shown in the DataFrame below, which is based on 1453 people selected to be on jury panels. Let's use this data to run a hypothesis test to examine whether the proportion of people selected to be on jury panels is consistent with the underlying demographics of Almeda. 


In [None]:

ethnicities = np.array(['Asian', 'Black', 'Latino', 'White', 'Other'])
population_proportions = np.array([0.15, 0.18, 0.12, 0.54, 0.01])
panel_proportions = np.array([0.26, 0.08, 0.08, 0.54, 0.04])


demographics = pd.DataFrame({"Ethnicity": ethnicities, 
                             "Population proportions": population_proportions, 
                             "Jury proportions": panel_proportions})

display(demographics)

# built in pandas plotting functions
demographics.plot.bar("Ethnicity");
plt.ylabel("Proportion");

### Step 1: State the null and alternative hypotheses

**In words** 

Null hypothesis: The proportions of members on the jury panels of different ethnicities match the underlying demographics. 

Alternative hypothesis: The proportion of at least one ethnicity does not match the underlying demographics. 


**In symbols**

$H_0$: $\pi_{Asian} = .15$,  $\pi_{Black} = .18$,  $\pi_{Latino} = .12$,  $\pi_{White} = .54$,  $\pi_{Other} = .01$

$H_A$: At least one $\pi_{i}$ is different from the values specified in the null hypothesis



### Step 2: Calculate the observed statistic

For our observed statistic we will use the Total Variational Distance (TVD) which is defined as:  $TVD ~ = ~ \sum_{i = 1}^{k} |\pi_i - \hat{p}_i |$

Let's write a function `total_variation_distance(distribution_1, distribution_2)` that can calculate the TVD. We can then use this function to calculate the TVD statistic value for the jurors in Almeda county.


In [None]:
def total_variation_distance(distribution_1, distribution_2):
    
    return np.sum(np.abs(distribution_1 - distribution_2))


observed_statistic_value = total_variation_distance(panel_proportions, population_proportions)

observed_statistic_value

### Step 3: Create the null distribution 

To create the null distribution we need to simulate drawing random sample proportions from the underlying population.

To do this we can generate (uniform) random numbers between 0 and 1. We can then use the `pd.cut()` function to simulate randomly selected jurors ethnicities and convert these to proportions. 

Once we have these proportions, we can calculate the TVD. If we repreat this process 1,000 times we can get a null distribution. 

In [None]:
# calculate the cumulative proportions we can use to split the data into categories consistent with the null hypothesis

cumulative_proportions = np.append(0, np.cumsum(population_proportions))

cumulative_proportions


In [None]:
# generate random jury panelist ethnicities

num_jury_members = 1453

rand_nums = np.random.rand(num_jury_members)

one_sample = pd.cut(rand_nums, cumulative_proportions, labels = ethnicities, ordered = False)

print(one_sample[0:5])

In [None]:
# get the proportions from our sample

unique, counts = np.unique(one_sample, return_counts=True)

sample_proportions = counts/sum(counts)

sample_proportions

In [None]:
# Let's convert the following steps into one function

def get_sample_proportions(sample_size, true_proportions):
    
    cumulative_proportions = np.append(0, np.cumsum(true_proportions))
    
    rand_nums = np.random.rand(sample_size)
    one_sample = pd.cut(rand_nums, cumulative_proportions)
    unique, counts = np.unique(one_sample, return_counts=True)
    return counts/sum(counts)

    
get_sample_proportions(1453, population_proportions)



In [None]:
# Step 3: create null distribution 

null_dist = []

num_null_dist_points = 1000

for i in range(num_null_dist_points):
    
    curr_sample_props = get_sample_proportions(1453, population_proportions)
    curr_tvd = total_variation_distance(curr_sample_props, population_proportions)
    null_dist.append(curr_tvd)

In [None]:
# plot the null distribution as a histogram

plt.hist(null_dist, edgecolor = "black");


### Step 4: Calculate the p-value

The p-value is the proportion of points in the null distribution that are more extreme than the observed statistic. 


In [None]:
p_value = np.mean(null_dist >= observed_statistic_value)

p_value

### Step 5: Draw a conclusion

Since the p-value is very small, it is very unlikely our statistic comes from the null distribution. Thus we can reject the null distribution and conclude that the proportion of members of different ethnicities on jury panels in Almeda do not reflect the underlying distribution of ethnicities in the population. 


## 2. Hypothesis test assessing causal relationships

To get at causality we can run a Randomized Controlled Trial (RTC), where have of the participants are randomly assigned to a "treatment group" that receives an intervention and the other half of participants are put in a "control group" which receives a placebo. If the treatment group shows a an improvement over the control group that is larger than what is expected by chance, this indicates that the treatment **causes** an improvement. 


#### Botulinum Toxin A (BTA) as a treatment to chronic back pain

A study by Foster et al (2001) examined whether Botulinum Toxin A (BTA) was an effective treatment for chronic back pain.

In the study, participants were randomly assigned to be in a treatment or control group: 
- 15 in the treatment group
- 16 in the control group (normal saline)

Trials were run double-blind (neither doctors nor patients knew which group they were in)

Result from the study were coded as:
  - 1 indicates pain relief
  - 0 indicates lack of pain relief 


Let's run a hypothesis test to see if BTA causes a decrease in back pain.

### Step 1: State the null and alternative hypotheses


$H_0$: $\pi_{treat} =  \pi_{control}$   or    $H_0$: $\pi_{treat} -  \pi_{control} = 0$ 

$H_A$: $\pi_{control} < \pi_{treat}$    or    $H_0$: $\pi_{treat} -  \pi_{control} < 0$ 



Where $\pi_{treat}$ and $\pi_{control}$ and the population proportions of people who experienced back pain relief after receiving the BTA or control respectively. 


### Step 2: Calculate the observed statistic

The code below loads the data from the study. We can use the difference in proportions  $\hat{p}_{treat} - \hat{p}_{control}$  as our observed statistic. 

Let's calculate the observe statistic and save it to the name `obs_stat`.


In [None]:
bta = pd.read_csv('bta.csv')
bta.sample(frac = 1)

In [None]:
# create a DataFrame with the proportion of people in the treatment and control groups that have pain relief 

results_table = bta.groupby("Group").mean()

results_table

In [None]:
# calculate the observed statistic from our DataFrame

obs_stat = results_table["Result"][1] - results_table["Result"][0]

obs_stat

In [None]:
# let's write a function to make it easy to get statistic values

def get_prop_diff(bta_data):
    
    group_means = bta_data.groupby("Group").mean()
    
    return group_means["Result"][1] - group_means["Result"][0]

    
get_prop_diff(bta)


### Step 3: Create the null distribution 

To create the null distribution, we need to create statistics consistent with the null hypothesis. 

In this example, if the null hypothesis was true, then there would be no difference between the treatment and control group. Thus, under the null hypothesis, we can shuffle the group labels and get equally valid statistics. 

Let's create one statistic consistent with the null distribution to understand the process. We can then repeat this 10,000 times to get a full null distribution. 

In [None]:
# shuffle the data 

shuff_bta = bta.copy()
shuff_bta['Group'] = np.random.permutation(bta["Group"])

shuff_bta.head()


In [None]:
# get one statistic consistent with the null distribution 

get_prop_diff(shuff_bta)

In [None]:
%%time

# create a full null distribution 

null_dist = []

for i in range(10000):
    
    shuff_bta['Group'] = np.random.permutation(bta["Group"])
    
    shuff_stat = get_prop_diff(shuff_bta)
    
    null_dist.append(shuff_stat)


In [None]:
# visualize the null distribution 

plt.hist(null_dist, edgecolor = "black");


# put a line at the observed statistic value

plt.axvline(obs_stat, color = "red");
plt.xlabel("prop treat - prop control");
plt.ylabel("Count");

### Step 4: Calculate the p-value

The p-value is the proportion of points in the null distribution that are more extreme than the observed statistic. 


In [None]:
p_value = np.mean(np.array(null_dist) >= obs_stat)

p_value

In [None]:
np.mean(np.array(null_dist) >= .5)

### Step 5: Draw a conclusion

Since the p-value is less than the typical significance level of 0.05, we can reject the null hypothesis and conclude that BTA does **cause** pain relief at a higher rate than the placebo. 