In [None]:
import numpy as np
import pandas as pd

%matplotlib inline
import matplotlib.pyplot as plots
plots.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)

<center> 

# Hypothesis Testing Part II

## Dr. Lange- University of Chicago
## Data 11800 - Winter 2024 

<img src="https://raw.githubusercontent.com/amandakube/Data118LectureImages/main/UChicago_DSI.png" alt="UC-DSI" width="500" height="600">
    
</center>

### Steps for Hypothesis Testing

 - Identify $H_0$ and $H_A$

 - Decide on significance level (convention is $0.05$)

 - Determine the test statistic
    
 - Model the null distribution

 - Determine your p-value (the probability your observed test statistic is more extreme under the null distribution)
    
 - Conclusion (Determine if $H_0$ is to be rejected or not)

<code style="background:Thistle;color:black"> Q: True/False? If H0 is rejected at 5% level, then we can be certain that H0 is false.

</code>


In [None]:
answer

<code style="background:Thistle;color:black"> Q: True or False? If H0 is rejected at 5% level, there is less than a 5% chance for H0 to be true. </code>

In [None]:
answer

**Q1**: A: False. H0 has a <5% chance of producing the observed result.
      Could reject H0 but H0 could be true or false.


**Q2** A:False. 
If H0 is rejected at 5% level.....that doesn't tell us anything about H_0 being true or false. A P-value does not give the chance of H0 being true.
In fact, the P-value is computed assuming H0 is true.
 

## Decisions: Type 1 and Type 2 Errors

In a hypothesis test, we could make a decision about which of H0 or HA might be true, but our decision might be incorrect.

<img src="https://raw.githubusercontent.com/amandakube/Data118LectureImages/main/Errors.png" alt="Type1 and Type2 Errors" width="500">


* A Type 1 Error is rejecting the H0 when it is true.
* A Type 2 Error is failing to reject the H0 when it is false.

We (almost) never know if H0 or HA is true, but we need to consider all possibilities.

Significance Level = Type 1 Error Rate.

## Consequences of Type 1 and 2 errors

Type 1 and type 2 errors are different sorts of mistakes and have different consequences

Usually H0 is the status quo (generally believed to be true)

Rejecting H0 means something generally accepted is overturned. It might be a scientific breakthrough.

A type 1 error introduces a false conclusion and can lead to a tremendous waste of resources before further research invalidates the original finding

A type 2 error — failing to recognize a scientific breakthrough — represents a missed opportunity

Type 2 errors can be costly as well, but generally go unnoticed

Many scientists believe it is more important to control the Type 1 error rate than the Type 2 error rate.

## Two sample testing (A/B testing) 

Compare value of sampled observations in two groups
(say, Group A and Group B) (groups are disjoint)


When your null hypothesis equates a test statistic from **one group** to a test statistic of **another group**

$$H_0: \text{test statistic1} = \text{test statistic2}$$

that is, two test statistics are the same

You can rewrite your null hypothesis as $H_0$ : $\text{test_statistic_1} - \text{test_statistic_2} = 0$

**How do we simulate such a null distribution corresponding to the above?**

### Permutation Testing: Main idea

**Idea:** If there is no difference between the two distributions, then a value in Group A and Group B can be switched without affecting the test statistic.

**We can simulate the null distribution based on this!**


Does not make any assumptions about the distribution of the samples - is it binomial or uniform...we actually don't care!

### Permutation Test: The process

If the null hypothesis is true: a test_stat is equally likely to be sampled from group A or group B

If the null is true: all rearrangements (permutations) of the test stat among the two groups are equally likely

Simulate to learn the null distribution:
* Shuffle (permute) the data and randomly assign it to each experimental group.
* Assign some to “Group A“ and the rest to “Group B“ maintaining the two sample sizes
* Find the differences between the test statistics of the two shuffled (permuted) groups
* Repeat

The generated distribution and the value of the test statistic are used to calculate a p-value.



With this distribution, you can then determine the probability of getting your observed $H_A$.



<img src = "https://raw.githubusercontent.com/SusannaLange/Data_118_images/main/images/Probability/perm.png" width="800" style="display: inline-block; margin: 0" />
</center>

Notes: Hypothetically, this means values from your data could have come from either group that you test statistic represents, allowing you to model a distribution where the mean is 0.

## Example: Two sample testing (A/B testing) : Smokers and Nonsmokers


The data frame *baby_df* contains the following variables for 1,174 mother-baby pairs: the baby’s birth weight in ounces, the number of gestational days, the mother’s age in completed years, the mother’s height in inches, pregnancy weight in pounds, and whether or not the mother smoked during pregnancy.

**We want to investigate first the association of maternal smoking with birth weight**

In [None]:
baby_df=pd.read_csv("https://raw.githubusercontent.com/SusannaLange/Data_118_images/main/Data/baby.csv")
baby_df.head(8)

**We are interested in maternal smoking and birth weight. In particular, we consider the two groups:**

A: Birth weights of babies of mothers who smoked during
pregnancy


B: Birth weights of babies of mothers who did not smoke


Question: Do the two sets of values come from the same
distribution?

#### Setting up our test

We will be using the mean, but we could also consider the median or even the variance

**Null Hypothesis:** In the population, the distributions of the
birth weights of babies in the two groups are the same. That is the means of baby weights of smoking mothers and the means of baby weights of non-smoking mothers is the same.


**Alternative Hypothesis:** In the population, the two
distributions are different. That is the means of baby weights of smoking mothers and the means of baby weights of non-smoking mothers are different.

### We want to section out by Birth Weight and Smoker vs Non-smokers

In [None]:
# a data frame with the two columns of interest
smoking_weight = baby_df[['Maternal Smoker', 'Birth Weight']]

smoking_weight.sample(5)

### We can visualize these values

In [None]:
# the arrays of weights for non-smoker and smoker
nsw=smoking_weight[smoking_weight['Maternal Smoker']==False]['Birth Weight'].values
sw=smoking_weight[smoking_weight['Maternal Smoker']==True]['Birth Weight'].values

In [None]:
#overlapping histograms
bins = np.linspace(50, 180, 20)
plots.hist(nsw, bins, alpha=0.5, label='No') ## alpha controls opacity
plots.hist(sw, bins, alpha=0.5, label='Yes')
plots.scatter(np.mean(sw), -2, color='red', s=30);
plots.scatter(np.mean(nsw), -2, color='blue', s=30);
plots.legend(loc='upper right')
plots.title('Distributions of Birth weight by Maternal Smoker')
plots.xlabel('weight')
plots.ylabel('Frequency')
plots.show()

### Let's consider the means!

In [None]:
# the sample means in the two groups
means_df = smoking_weight.groupby('Maternal Smoker').mean()
means_df

**Other potential test statistics:**

In [None]:
# the sample medians
medians_df = smoking_weight.groupby('Maternal Smoker').median()
medians_df

In [None]:
# the sample standard deviation
sds_df = smoking_weight.groupby('Maternal Smoker').std()
sds_df

### In our case we are interested in the observed difference as our test statistic

In [None]:
## the difference in the means
observed_difference = means_df.iloc[0,0] - means_df.iloc[1,0]
observed_difference 

**Goal**
Is this difference significant? We can only know through hypothesis testing using the difference in means as our test statistic.


### How to find the null distribution?!

## Permutation test
- If the null hypothesis is true: a birth weight is equally likely to be sampled from group A or group B
- If the null is true: all rearrangements (permutations) of the birth weights among the two groups are equally likely

#### Simulate to learn the null distribution:
- Shuffle (permute) the birth weights
- Assign some to ``Group A`` and the rest to ``Group B`` maintaining the two sample sizes
- Find the differences between averages of the two shuffled (permuted) groups
- Repeat

The generated distribution and the value of the test statistic are used to calculate a p-value. 


**What does this look like for our example?**

In [None]:
weights = smoking_weight[['Birth Weight']]
group_labels = baby_df['Maternal Smoker'].values


differences = np.array([])

for i in np.arange(5000):
    shuffled_weights = weights.sample(len(weights),replace = False)['Birth Weight'].values
    shuffled_df = pd.DataFrame({"Shuffled Weights" : shuffled_weights, 
                            "Label" : group_labels})    
    smeans_df=shuffled_df.groupby('Label').mean()
    new_diff=smeans_df.iloc[0]-smeans_df.iloc[1]  ##calculating the difference here
    differences = np.append(differences, new_diff) #array of mean differences

Note: We are shuffling the weights and keeping the labels the same. Therefore the same number of False and True values are in the shuffled data.

If there really is no difference, shuffling the labels and recalculating our test statistic won't matter - we'll obtain similar values for both groups.

#### Histogram of the simulated differences

In [None]:
plots.hist(differences,bins=np.arange(-4,4,0.25))
plots.title("Null Difference Between Means");
plots.ylabel('Frequency')
plots.xlabel('Difference in Means');

### What was our observed difference?

In [None]:
observed_difference

In [None]:
# Histogram of the simulated differences
plots.hist(differences,bins=np.arange(-4,4,0.25))
plots.scatter(observed_difference, -2, color='red', s=30);
plots.title("Null Difference Between Means");


And now we calculate the p-value based on this distribution and observed statistic.

In [None]:
extreme = sum(differences >= observed_difference)/5000
extreme

Our $H_A$ is that $\mu_\text{smokers} \neq \mu_\text{non-smokers}$ 

In words, $H_A$ is the mean of birth weights in smokers is different than the mean of birth weights in non-smokers.

This means we have a two-tailed test. Hence we need to consider both extremes.


In [None]:
p_value=2*extreme
p_value

**Conclusion:**

**We reject the null hypothesis. At the 5% significance level, since p =0.0 <0.05, there is enough statistical evidence to support the claim that the means of birth weight from smoking and non-smoking mothers are different.**

### AB Testing

 - A null hypothesis ($H_0$) 
 
 -  An alternative hypothesis ($H_A$) 
 
 - A test statistic (to be compared to the null distribution) **difference of test statistics!**
 
 -  A significance threshold ($\alpha=0.5$ default).
    

 - Model the null distribution  (Permutation Testing)

 - Determine your p-value (the probability your observed test statistic is more extreme under the null distribution)
    
 - Conclusion (Determine if $H_0$ is to be rejected or not)

**The null distribution comes from the assumption that there is no difference between observations or a previously accepted theory.**

In general the null distrubtion could be simulated via

    - np.random.binomal
    
    - np.random.choice
    
    - Permutation test (for AB testing)

A few notes:
    
You can approach many problems with the classical approach to hypothesis (focus of last lecture) or with AB testing. 

Hypothesis Testing:

- We know the distribution of the null

- Advantage of being a quick process


AB Testing:

- We don't need to know the distribution

- Need to be in a situation where we have two groups

- Might be tedious if have a super large sample - 100,000 and need to collect all permutations

Note: There are other methods besides a permutation test to approach this type of problem (AB testing) (If you have taken previous statistics course, you may have done a t-test). But we want to present some basic methods in this course instead of all possible methods.

<code style="background:Thistle;color:black"> Let's repeat this process, but instead consider the difference in sample medians. </code>





**Null Hypothesis:** That is the median of baby weights of smoking mothers and the means of baby weights of non-smoking mothers is the same.

<code style="background:Thistle;color:black">**Alternate Hypothesis:**  </code>

The test statistic is the median

In [None]:
medians_df = smoking_weight.groupby('Maternal Smoker').median()
medians_df

In [None]:
observed_difference_median = medians_df.iloc[0]-medians_df.iloc[1]  
observed_difference_median

<code style="background:Thistle;color:black"> Goal: Find the null distribuion (by permutation test)  </code>

In [None]:
## this is the code for the mean calcuation above...how should we make changes
differences_mean = np.array([])

for i in np.arange(5000):
    shuffled_weights = weights.sample(len(weights),replace = False)['Birth Weight'].values 
    
    shuffled_df = pd.DataFrame({"Shuffled Weights" : shuffled_weights, 
                            "Label" : group_labels})
    
    smeans_df=shuffled_df.groupby('Label').mean()
    new_diff=smeans_df.iloc[0]-smeans_df.iloc[1] 
    differences_mean = np.append(differences_mean, new_diff) 

<code style="background:Thistle;color:black"> You will have to do this on the homework!! See if you can answer the below questions. </code> 

<code style="background:Thistle;color:black"> What is the 5000 doing in the above code? </code> 

In [None]:
answer

<code style="background:Thistle;color:black"> what is the output of the below line of code?  </code>

In [None]:
shuffled_weights = weights.sample(len(weights),replace = False)['Birth Weight'].values 

In [None]:
answer

In [None]:
# Histogram of the simulated differences
plots.hist(differences_median,bins=np.arange(-4,4,1))
plots.scatter(observed_difference_median, -2, color='red', s=30);
plots.title("Null Difference Between Medians");


<code style="background:Thistle;color:black">p-value: </code>

<code style="background:Thistle;color:black"> Conclusion:</code>

**We ____________________the null hypothesis. At the 5% significance level, since p =________, there is ____________________________.**

Still a little confused with permutation testing? See another [example here about llamas (alpacas actually)](https://www.jwilber.me/permutationtest/#:~:text=To%20calculate%20the%20p%2Dvalue,of%20test%2Dstatistics%20we%20calculated.)