# Hypothesis Testing

## Lesson Objectives

1. Establish the basic framework and vocabulary for hypothesis testing
2. State a Null and Alternative Hypothesis
3. Interpret p-value
4. Choose an alpha
5. Explain Type 1 and Type 2 Errors
6. Perform Z-tests and T-tests and interpret results

## Scenario

The Donut Fairy is a new gourmet donut shop. They want to increase sales at their store so they decide to give out some coupons.  They give twenty of their customers a 50% off coupon and an additional 20 people a 10 dollars off coupon.  After a month The Donut Fairy looked at their sales and found that the customers who had a 50% off coupon spent on average 20 dollars, and those who got the 10 dollars off coupon spent 15 dollars on average.  The Donut Fairy is now wondering how sure they can be that the difference between the average sales were a result of difference in coupon types rather than chance occurrence.


<img src="./img/coupon1.png" width="300"> <img src="./img/coupon2.png" width="300"> 

## Refresher on CLT

<img src="./img/sampling-dist.jpg" width="500" class="center">
Remember the CLT states that as a sample size $n$ increases, the sampling distribution of of the mean from a random sample size of $n$ more closely approximates a normal distribution.  

<img src="./img/Normal-Distribution.png" width="500" class="center">

# Hypothesis Testing Steps

1. Set up null and alternative hypotheses

2. Specify the appropriate statistical test

3. Choose a significance level (alpha)

4. Determine the critical value of test statistic or p-value (find the region of rejection)

5. Calculate the observed value of the test statistic

6. Make a decision (reject or fail to reject) regarding the Null Hypothesis by comparing the observed value of the test-statistic with critical value.

## Step 1: Define Null and Alternative Hypotheses


### The Null Hypothesis

![](./img/same.gif) 

There is NOTHING, **no** difference.


### The Alternative hypothesis

![](./img/giphy.gif)

For example if we're testing the function of a new drug designed to help lower depression, then the null hypothesis will say that the drug has _no effect_ on patients who have depression.  The alternative hypothesis would be that those who were given the drug would have lower depression.

![](./img/summary_one_two.jpg)

## Practice: Write the hypotheses

<img src="img/talking.jpeg" width="60" align='left'>

</br>

With a classmate write out the null and alternative hypotheses for The Donut Fairy example in your own words.


Null(HO):

Alternative(H1):

## Step 2: Choosing the appropriate test

There are a variety of statistical tests one can use to test their hypotheses.  It is important that you pick the one that fits your data and your hypothesis.  Today we will be focusing on just a few of these tests.

#### For our scenario we will use a t-test which is appropriate for comparing means of two groups.
<img src="./img/choosing_test.png" width=700  class="center">


## Step 3: Choosing a significance level

### How Unlikely Is Too Unlikely?

### $Alpha: \alpha$

In order to test our hypothesis we will need a way to decide if we should reject the null hypothesis.  The way we do this is by setting the $\alpha$ value, otherwise known as the significance level before we conduct our analysis.  The $\alpha$ value is the probability of rejecting the null hypothesis when it is true.  For example an $\alpha$ value of 0.05 indicates that there is a 5% risk of rejecting the null hypothesis when it fact it is true.  

Note:  We can chance our alpha value based on how important it is to be accurate in our decision making. It is common to have alpha values of .10, .05, or .10 but you can set any value.

<img src="./img/alpha_regions.jpg" width=800  class="center">

### Since it is a common alpha level and we don't have a particularly risky decision to make we will use an alpha level of 0.05.

## Step 4: Determine the critical value of test statistic using p-values

![](./img/tpdftb.gif)

In [2]:
from scipy.stats import t
# define probability
a = 0.025
df = 19
# retrieve critical value for our set alpha a
value = t.ppf(a, df)
print(f'The critical t-statistic for a two tailed test with alpha=0.05 is: {value}')
# confirm with cdf
p = t.cdf(value, df)
print(f'The critical p-value two tailed test with a critical value of {value} is: {p}')

The critical t-statistic for a two tailed test with alpha=0.05 is: -2.0930240544082634
The critical p-value two tailed test with a critical value of -2.0930240544082634 is: 0.02500000000000231


The basic idea of a p-value is the probability the you would obtain this value in your sample, or larger, if the null hypothesis is true of the population. So in our case getting a t-statistic greater than 2.093 or less than -2.093 has a probability of .05.

<img src="./img/one_tailed.png" width=800  class="center">


# Step 5: Calculating the observed value

In [4]:
from scipy import stats

# ad1 - 50% off coupon
ad_1=[22, 16, 26, 29, 14, 10, 23, 20, 17, 19, 27, 22, 15, 21, 24, 9, 29, 22, 19, 16]

#ad2 - 10 dollars off coupon
ad_2=[12, 14, 11, 22, 13, 17, 15, 20, 15, 13, 15, 14, 12, 14, 15, 16, 17, 14, 21, 10]


#conduct the t-test to get the observed t-statistic and p-value
stats.ttest_ind(ad_1, ad_2)

Ttest_indResult(statistic=3.4460121880225554, pvalue=0.0014032666129068826)

# Step 6: Making A Decision!


Once we have our observed test value we can compare it to our alpha value to make a decision regarding our hypothesis.  In step 4 we determined that the critical t-statistic was $\pm$ 2.093 and our critical p-value was 0.025.  Our observed t-statistic was 3.45 and observed p-value of 0.0014 are more extreme than the critical values, falling in the region of rejection.  Thus we are able to reject the null hypothesis that the sales did not differ depending on the type of coupon the customers received.


<img src="./img/pval_1.jpg" width=500  class="center"> <img src="./img/pval_2.jpg" width=500  class="center">


If $p$ observed $\lt \alpha$, we reject the null hypothesis.:

If $p$ observed $\geq \alpha$, we fail to reject the null hypothesis.

> **We do not accept the alternative hypothesis, we only reject or fail to reject the null hypothesis in favor of the alternative.**

* We do not throw out failed experiments! 
* We say "this methodology, with this data, does not produce significant results" 
    * Maybe we need more data!


## Type 1 Errors (False Positives) and Type 2 Errors (False Negatives)
Most tests for the presence of some factor are imperfect. And in fact most tests are imperfect in two ways: They will sometimes fail to predict the presence of that factor when it is after all present, and they will sometimes predict the presence of that factor when in fact it is not. Clearly, the lower these error rates are, the better, but it is not uncommon for these rates to be between 1% and 5%, and sometimes they are even higher than that. (Of course, if they're higher than 50%, then we're better off just flipping a coin to run our test!)

Predicting the presence of some factor (i.e. counter to the null hypothesis) when in fact it is not there (i.e. the null hypothesis is true) is called a "false positive". Failing to predict the presence of some factor (i.e. in accord with the null hypothesis) when in fact it is there (i.e. the null hypothesis is false) is called a "false negative".

![](./img/decisions.jpg)


![](./img/Type1_type2_errors.jpg)

#### Turn and Talk
<img src="img/talking.jpeg" width="60" align='left'>
</br>
</br>
</br>

1. In context of our ad scenario what are the type 1 and type 2 errors?
2. How does changing our alpha value change the rate of type 1 and type 2 errors?

## Other types of hypothesis tests

###  Z-tests

![z](https://media.giphy.com/media/4oku9cpYuCNwc/giphy.gif)

A z-test is used when you know the population mean and standard deviation.

Our test statistic is the z-stat.

<br>Our z score tells us how many standard deviations away from the mean our point is.
<br>We assume that the sample population is normally destributed, and we are familiar with the empirical rule: <br>66:95:99.7

![](img/Empirical_Rule.png)


Because of this, we can say, with a z-score of approximately 2, our data point is 2 standard deviations from the mean, and therefore has a probability of appearing of 1-.95, or .05. 

## Connecting Z-Tests to T-Tests

According to the **Central Limit Theorem**, the sampling distribution of a statistic, like the sample mean, will follow a normal distribution _as long as the sample size is sufficiently large_. 

__What if we don't have large sample sizes?__

When we do not know the population standard deviation or we have a small sample size, the sampling distribution of the sample statistic will follow a t-distribution.  
* Smaller sample sizes have larger variance, and t-distributions account for that by having heavier tails than the normal distribution.
* t-distributions are parameterized by degrees of freedom, fewer degrees of freedom fatter tails. Also converges to a normal distribution as df >> 0

<img src="./img/distributions1.png" width=600  class="center">


### How do we know whether we need to use a z-test or a t-test? 

<img src="img/z_or_t_test.png" width="500">


## One-sample z-test

* For large enough sample sizes $n$ with known population standard deviation $\sigma$, the test statistic of the sample mean $\bar x$ is given by the **z-statistic**

* Our hypothesis test tries to answer the question of how likely we are to observe a z-statistic as extreme as our sample's given the null hypothesis that the sample and the population have the same mean, given a significance threshold of $\alpha$. This is a one-sample z-test.  

## One-sample t-test

* For small sample sizes or samples with unknown population standard deviation, the test statistic of the sample mean is given by the **t-statistic**, 

* Our hypothesis test tries to answer the question of how likely we are to observe a t-statistic as extreme as our sample's given the null hypothesis that the sample and population have the same mean, given a significance threshold of $\alpha$. This is a one-sample t-test.

## Two-sample t-tests 

Sometimes, we are interested in determining whether two population means are equal. In this case, we use two-sample t-tests.

There are two types of two-sample t-tests: **paired** and **independent** (unpaired) tests. 

What's the difference?  

**Paired tests**: How is a sample affected by a certain treatment? The individuals in the sample remain the same and you compare how they change after treatment. 

**Independent tests**: When we compare two different, unrelated samples to each other, we use an independent (or unpaired) two-sample t-test.

## Time to Practice!

I'm buying jeans from store A and store B. I know nothing about their inventory other than prices. 

``` python
store1 = [20,30,30,50,75,25,30,30,40,80]
store2 = [60,30,70,90,60,40,70,40]
```

Should I go just to one store for a less expensive pair of jeans? I'm pretty apprehensive about my decision, so $\alpha = 0.1$ . It's okay to assume the samples have equal variances.

**Step 1: State the null and alternative hypotheses**

**Step 2: Choosing the appropriate test.  Which test should you select and why?** 

**Step 3: Choosing a Significance level**

**Step 4: Determine the critical values**

**Step 5:  Calculating the observed value**

**Step 6: Make a decision**

## Level Up: More practice!

A hockey coach wanted to determine whether the mean skating speed of his team was less than the hypothesized league mean speed of 12 meters per second.  He collected data on 16 players on his team.  The mean speed of the team was 10 meters per second.  Help the coach determine if his team is faster than the leauge's speed. Set your significance level to 0.01 because he wants to be extra certain!

``` python
team_speeds = [8, 12, 9, 7, 8, 10, 9, 11, 13.5, 8.5, 10.5, 9.5, 11.5, 12.5, 9.5, 10.5]

```

**Step 1: State the null and alternative hypotheses**

**Step 2: Choosing the appropriate test.  Which test should you select and why?** 

**Step 3: Choosing a Significance level**

**Step 4: Determine the critical values**

**Step 5:  Calculating the observed value**

**Step 6: Make a decision**