# Hypothesis Testing
---

## Import Python Libraries

In [1]:
# import Python libraries
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from scipy import stats

## Left Align Cell Contents

In [2]:
%%html
<style>
table {float:left}
</style>

---

## Inferential Statistics

Using the statistics we can calculate from the samples collected and making inferences about the population the samples are from is called **inferential statistics**.

A key process in inferential statistics is **hypothesis testing** which consists of 5 steps:
1. State the null and alternative hypotheses
2. Select a level of significance
3. Calculate the test statistic
4. Determine the critical value(s) and locate the regions of acceptance and rejection
5. State the conclusion

The **null hypothesis**, $ H_0 $, is a statement about the status quo for the population parameter we are interested in.

The **alternative hypothesis**, $ H_a $, is a statement about the population parameter we are testing for.

If the samples we have collect support the alternative hypothesis then we can reject the null hypothesis.

Summary of how null and alternative hypotheses are stated:
- If $ H_0 $ is $ = \mu $ or $ = p $ then $ H_a $ is $ \neq \mu $ or $ \neq p $
- If $ H_0 $ is $ \le \mu $ or $ \le p $ then $ H_a $ is $ \gt \mu $ or $ \gt p $
- If $ H_0 $ is $ \ge \mu $ or $ \ge p $ then $ H_a $ is $ \lt \mu $ or $ \lt p $

Where:
- ($ \mu $) is the population mean
- ($ p $) is the population proportion



---

## Type I and Type II Errors

**Type I error**, or **false positive**, is to reject a null hypothesis that is actually true and accept an alternative hypothesis that is actually false.

**Type II error**, or **false negative**, is to accept a null hypothesis that is actually false and reject an alternative hypothesis that is actually true.

$ \alpha $, or level of significance, is the probability of making a Type I error.
$ \beta $ is the probabilityh of making a Type II error.

The **power** of a hypothesis test is the probability that the null hypothesis will be rejected when it is false. The power is $ 1 - \beta $ .

The higher the confidence level the lower the $ \alpha $, which means that confidence interval will be larger and the margin of error will be larger, but the probability of making a Type I error will be smaller.

There is an inverse relationship between $ \alpha $ and $ \beta $, or between the probability of making a Type I error and the probability of making a Type II error.

As $ \alpha $ decreases:
- the region of acceptance increases
- the probability of accepting a null hypothesis that is actually false increases
- $ \beta $ increases (or the probability of making a Type II error)
- the power of the test decreases

The only way to decrease $ \alpha $ and $ \beta $ simultaneously is to increase the sample size. When the sample size equals the size of the population both $ \alpha $ and $ \beta $ are 0.

---

## One and Two Tailed Tests

**Two-tailed test** is used when the null and alternative hypotheses are stated using $ = $ and $ \neq $. This test is non-directional as the test is only that the population parameter is different than the value stated in the null hypothesis. The region of rejection will include both tails.

**One-tailed test** is used when the null and alternative hypotheses are stated using either $ \le $ and $ \gt $ or $ \ge $ and $ \lt $. These are directional tests because what is being tested is whether the population parameter is less than or greater than the value in the null hypothesis. The region of rejection will be limited to one of the tails. Because all of the area that is the region of rejection is in a single tail, a one-tailed test is less conservative than a two-tailed test as there is a greater probability that a result will fall into the larger region of rejection. The two-tailed test requires a more extreme result to reject the null hypothesis. 

**Lower-tailed test** or **Left-tailed test** - If the alternative hypothesis is that the population parameter is less than the value in the null hypothesis. 

**Upper-tailed test** or **Right-tailed test** - If the alternative hypothesis is that the population parameter is greater than the value in the null hypothesis.

The $ \alpha $ for a two-tailed test is split between the two tails. The $ \alpha $ for a one-tailed test is entirely contained within one of the tails.


---

## Critical Value

A **critical value** is a threshold in hypothesis testing that determines the boundary or boundaries for rejecting the null hypothesis. It is a point on the test distribution that is compared to the test statistic to decide whether to reject the null hypothesis. The critical value depends on the chosen significance level (alpha, usually 1%, 5% or 10%) and the nature of the test (one-tailed or two-tailed).

Using a z-score as a critical value:
- For a lower-tailed test, if $ z \le -z_\alpha $ then reject the null hypothesis
- For an upper-tailed test, if $ z \ge z_\alpha $ then reject the null hypothesis
- For a two-tailed test, if $ z \le -z_\frac{\alpha}{2} $ or $ z \ge z_\frac{\alpha}{2} $ then reject the null hypothesis

The above also applies when the critical value is a t-score found in the t-table.

### Looking up critical values

- One-tailed test: Find the value in the z-table or t-table that is associated with the selected $ \alpha $.
- Two-tailed test: Find the values in the z-table or t-table that is associated with half the selected $ \alpha $, one for the lower tail and one for the upper tail.

#### Example

If the selected $ alpha $ is 0.10 (for a 90% confidence level) using the z-distribution:
- One-tailed test: the critical value $ z_\alpha $ would be the value in the z-table for $ alpha = 0.1000 $ for lower-tailed test or $ 1 - \alpha = 0.9000 $ for upper-tailed test
- Two-tailed test: the critical values, $ z_{\frac{\alpha}{2}} $, would be the value in the z-table for $ \frac{\alpha}{2} = 0.0500 $ for the lower tail and $ 1 - \frac{\alpha}{2} = 0.9500 $ for the upper tail.

---

## Calculating Test Statistics

The **test statistic** provides a standardized value that can be used to determine whether to reject the null hypothesis. It essentially quantifies the difference between the observed data and what is expected under the null hypothesis.

Once the test statistic is calculated, it is compared to critical values from the chosen statistical distribution. These critical values are determined by the significance level (alpha, usually 0.05) and the type of test (one-tailed or two-tailed). If the test statistic falls beyond the critical value(s), the null hypothesis is rejected.

In general the test statistic is

$$ \text{test statistic} = \frac{\text{observed} - \text{expected}}{\text{standard deviation}} $$

Where $ \text{expected} $ is the value hypothesized by the null hypothesis.

### Mean

When standard deviation is known:

$$ z = \frac{\bar{x} - \mu_0}{\sigma_{\bar{x}}} = \frac{\bar{x} - \mu_0}{\frac{\sigma}{\sqrt{n}}} $$

When standard deviation is unknown or sample size is small:

$$ t = \frac{\bar{x} - \mu_0}{s_{\bar{x}}} = \frac{\bar{x} - \mu_0}{\frac{s}{\sqrt{n}}} $$

### Proportion

When $ n\hat{p} \ge 5 $ and $ n(1 - \hat{p}) \ge 5 $:

$$ z = \frac{\hat{p} - p_0}{\sigma_{\hat{p}}} = \frac{\hat{p} - p_0}{\sqrt{\frac{p_0(1 - p_0)}{n}}} $$



---

## P-value

The **p-value** is the smallest level of significance at which the null hypothesis can be rejected if it is assumed to be true.

Calculating the p-value for a one-tailed test:
1. Calculate the z-score.
2. Look up the value for the z-score in the z-table, using the negative z-table for a lower-tailed test and a positive z-table for an upper-tailed test.
3. For a lower-tailed test the value in the z-table is area of the region of rejection for the test. For an upper-tailed test the 1 less the value in the z-table is the area of the region of rejection. In both cases the area is the p-value.

Calculating the p-value for a two-tailed test:
1. Calculate the z-score.
2. Look up the value for the z-score in the z-table.
3. The value in the z-table is half the total area of the region of rejection in both tails. Double the value to get the p-value.

Using the p-value:
- If $ p \le \alpha $ then reject the null hypothesis
- If $ p \gt \alpha $ then do not reject the null hypothesis

Using critical value:
- For a lower-tailed test, if $ z \le -z_\alpha $ then reject the null hypothesis
- For an upper-tailed test, if $ z \ge z_\alpha $ then reject the null hypothesis
- For a two-tailed test, if $ z \le -z_\frac{\alpha}{2} $ or $ z \ge z_\frac{\alpha}{2} $ then reject the null hypothesis

The **significance** or **statistical significance** of a test is the probability that the result was obtained by chance.

The following are equivalent:
- The confidence level is 99%
- The Type I error rate is 0.01
- The alpha level or $ \alpha $ is 0.01
- The area of the region of rejection is 0.01
- The finding is significant at the 0.01 level
- The p-value is 0.01
- There is a 1 in 100 (0.01) chance of getting a result as extreme or more extreme than this one


---

## Hypothesis Testing For Population Proportion

To perform hypothesis testing for population proportion $ np \ge 5 $ and $ n(1 - p) \ge 5 $ must both be true to ensure an approximately normal distribution of the sample proportion.

If the population proportion is unknown then the sample proportion, $ \hat{p} $, can be used instead.

---

## Confidence Interval for the Difference of Means

We often want to compare two populations and the difference in their means, $ \bar{x}_1 - \bar{x}_2 $, as way to see the effect of a treatment on the two groups.

If both populations are normally distributed and large enough samples are taken from both populations, $ n_1, n_2 \ge 30 $, then the Central Limit Theorem will apply and the sampling distribution of the difference of means will be normally distributed.

Mean of the sampling distribution of the difference of means:
$ \mu_{\bar{x}_1 - \bar{x}_2} = \mu_{\bar{x}_1} - \mu_{\bar{x}_2} $

The standard error of the sampling distribution of the difference of means:

$$ \sigma_{\bar{x}_1 - \bar{x}_2} = \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} $$

The confidence interval:

$$ (a,b) = (\bar{x}_1 - \bar{x}_2) \pm z_{\frac{\alpha}{2}} \cdot \sigma_{\bar{x}_1 - \bar{x}_2} $$

$$ (a,b) = (\bar{x}_1 - \bar{x}_2) \pm z_{\frac{\alpha}{2}} \cdot \sqrt{\frac{\sigma_1^2}{n_1} + \frac{\sigma_2^2}{n_2}} $$

Where:
- $ (a,b) $ is the confidence interval
- ($ \bar{x}_1 $) is the sample mean from the first population 
- ($ \bar{x}_2 $) is the sample mean from the second population
- ($ z_{\frac{\alpha}{2}} $) is the critical value
- ($ \sigma_1 $) is the first population standard deviation
- ($ \sigma_2 $) is the second population standard deviation
- ($ n_1 $) is the sample size from the first population
- ($ n_2 $) is the sample size from the second population

If the population standard deviations are unknown and/or the sample sizes are small then we need to use the sample standard deviations.

There are two scenarios to consider:
1. Unequal population variances
2. Nearly equal population variances

### Unequal Population Variances

For unequal population variances, which can be assumed if the sample variances are significantly unequal:

$$ (a,b) = (\bar{x}_1 - \bar{x}_2) \pm t_{\frac{\alpha}{2}} \cdot \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} $$

with degrees of freedom (df):

$$ \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{1}{n_1 -1}\left(\frac{s_1^2}{n_1}\right)^2 + \frac{1}{n_2 -1}\left(\frac{s_2^2}{n_2}\right)^2} $$

Note round down when df is not an integer.

### Nearly Equal Population Variances

For nearly equal population variances, which can be assumed if the sample variances are very close to each other, a **pooled variance** is calculated.

Pooled variance:

$$ s_p^2 = \frac{(n_1 -1)s_1^2 + (n_2 -1)s_2^2}{n_1 + n_2 - 2} $$

Pooled standard deviation:

$$ s_p = \sqrt{\frac{(n_1 -1)s_1^2 + (n_2 -1)s_2^2}{n_1 + n_2 - 2}} $$

In general, the pooled variance can be used when the samples are taken from the same population or when neither sample variance is more than twice the other.

The standard error is:

$$ \sigma_{\bar{x}_1 - \bar{x}_2} = s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}} $$

The confidence interval is:

$$ (a,b) = (\bar{x}_1 - \bar{x}_2) \pm t_{\frac{\alpha}{2}} \cdot s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}} $$

with $ \text{df} = n_1 - n_2 -2 $



---

## Hypothesis Testing For Difference of Means

### Hypothesis Statements

#### Two-tailed test

$ H_0: \mu_1 - \mu_2 = 0 $

$ H_a: \mu_1 - \mu_2 \ne 0 $

Or

$ H_0: \mu_1 - \mu_2 $

$ H_a: \mu_1 \ne \mu_2 $

#### Upper-tailed test

$ H_0: \mu_1 - \mu_2 \le 0 $

$ H_a: \mu_1 - \mu_2 \gt 0 $

Or

$ H_0: \mu_1 \le \mu_2 $

$ H_a: \mu_1 \gt \mu_2 $

#### Lower-tailed test

$ H_0: \mu_1 - \mu_2 \ge 0 $

$ H_a: \mu_1 - \mu_2 \lt 0 $

Or

$ H_0: \mu_1 \ge \mu_2 $

$ H_a: \mu_1 \lt \mu_2 $


### Calculating Test Statistic for Difference of Means

#### Large sample sizes and unequal variances

$$ z = \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}} $$

#### Small sample sizes and unequal variances

$$ t = \frac{(\bar{x_1} - \bar{x_2}) - (\mu_1 - \mu_2)}{\sqrt{\frac{s_1^2}{n_1}+\frac{s_2^2}{n_2}}} $$

$$ df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{1}{n_1 -1}\left(\frac{s_1^2}{n_1}\right)^2 + \frac{1}{n_2 -1}\left(\frac{s_2^2}{n_2}\right)^2} $$

#### Large sample sizes and nearly equal variances

$$ z = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} $$

#### Small sample sizes and nearly equal variances

$$ t = \frac{(\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{s_p\sqrt{\frac{1}{n_1} + \frac{1}{n_2}}} $$

$$ df = n_1 + n_2 -2 $$

Where $ s_p $ is the pooled standard deviation



---

## Matched-pair Hypothesis Testing

Hypothesis testing with dependent samples is called a **matched-pair test**, where each subject in the first sample is also a subject in the second sample. This test is commonly used to look at the before and after change of applyin a treatment.

This is very similar to hypothesis testing for the difference of means, but in that case the samples are independent.

### Mean Difference

$$ \bar{d} = \frac{\sum_{i=1}^n d_i}{n} $$

### Sample Standard Deviation

$$ \sqrt{\frac{\sum_{i=1}^n (d_i - \bar{d})}{n - 1}} $$

### Use

Use the above to calculate the confidence interval and test statistic and perform hypothesis testing as usual.

---

## Confidence Interval for Difference of Proportions

### Population Proportion Difference Point Estimator

$$ \hat{p} = \frac{x_1}{n_1} \text{and} \hat{p} = \frac{x_2}{n_2}

Where:
($ x_1 $) are the number of 'successes' from population 1
($ x_2 $) are the number of 'successes' from population 2
($ n_1 $) is the size of the sample from population 1
($ n_2 $) is the size of the sample from population 2

The point estimator for the difference of the population proportions is then:

$$ p_1 - p_1 = \hat{p_1} - \hat{p_2} $$

### Standard Error

The sampling distribution of $ \hat{p_1} - \hat{p_2} $ has a mean of $ p_1 - p_1 $ and the standard error is:

$$ \sigma_{\hat{p_1} - \hat{p_2}} =  \sqrt{\frac{p_1(1 - p_1)}{n_1} + \frac{p_2(1 - p_2}{n_2}} $$

Which will be normal if:
- $ n_1p_1 \ge 5 \text{and} n_1(1 - p_1) \ge 5 $
- $ n_2p_2 \ge 5 \text{and} n_2(1 - p_2) \ge 5 $

### Margin of Error

$$ \text{ME}_{\hat{p_1} - \hat{p_2}} =  z_{\frac{\alpha}{2}}\sqrt{\frac{\hat{p}_1(1 - \hat{p}_1)}{n_1} + \frac{\hat{p}_2(1 - \hat{p}_2}{n_2}} $$

### Confidence Interval

$$ (a,b) =  \hat{p_1} - \hat{p_2} \pm z_{\frac{\alpha}{2}}\sqrt{\frac{\hat{p}_1(1 - \hat{p}_1)}{n_1} + \frac{\hat{p}_2(1 - \hat{p}_2}{n_2}} $$

**Note**: When the confidence interval contains 0 then there is likely no difference in proportions. However, if the confidence does not contain 0, then there likely is a difference in proportions

---

## Hypothesis Testing for Difference of Proportions

### Hypothesis Statements

#### Two-tailed test

$ H_0: p_1 - p_2 = 0 $

$ H_a: p_1 - p_2 \ne 0 $

Or

$ H_0: p_1 = p_2 $

$ H_a: p_1 \ne p_2 $

#### Upper-tailed test

$ H_0: p_1 - p_2 \le 0 $

$ H_a: p_1 - p_2 \gt 0 $

Or

$ H_0: p_1 \le p_2 $

$ H_a: p_1 \gt p_2 $

#### Lower-tailed test

$ H_0: p_1 - p_2 \ge 0 $

$ H_a: p_1 - p_2 \lt 0 $

Or

$ H_0: p_1 \ge p_2 $

$ H_a: p_1 \lt p_2 $

### Calculating the Test Statistic

If
- The samples are independent and random
- $ n\hat{p}_1 \ge 5 $ and $ n(1 - \hat{p}_1) \ge 5 $
- $ n\hat{p}21 \ge 5 $ and $ n(1 - \hat{p}_2) \ge 5 $

Then the test statistic is:

$$ z = \frac{(\hat{p}_1 - \hat{p}_2) - (p_1 - p_2)}{\sqrt{\hat{p}(1 - \hat{p})(\frac{1}{n_1} + \frac{1}{n_2})}} $$

Where the proportion of the combined sample, $ \hat{p} $ is given by:

$$ \hat{p} = \frac{\hat{p}_1n_1 + \hat{p}_2n_2}{n_1 + n_2} $$

which can also be written as:

$$ \hat{p} = \frac{x_1 + x_2}{n_1 + n_2} $$

where $ x_1 $ and $ x_2 $ are the number of 'successes' in each sample.





---