# Hypothesis Testing

## Review:

### Normal Distribution for Population

$$Z \backsim N(\mu, \sigma)$$

### Standardize Normal Distribution for Population Data

$$Z \backsim N(0, 1)$$

$$z = \frac{(x - \mu)}{\sigma}$$

where $\sigma$ is the standard deviation of the population.

### Standardize Normal Distribution for Sample Means (Known $\sigma$)

$$Z \backsim N(0, 1)$$

$$z = \frac{(x - \bar{x})}{\frac{\sigma}{\sqrt{n}}}$$

where $\sigma = \frac{\sum{(x - \bar{x})^2}}{N}$ and $\frac{\sigma}{\sqrt{n}}$ is called standard error.

### Student's t Distribution for Sample Means (Unknown $\sigma$)

$$T \backsim t(n-1)$$

$$t = \frac{(x - \bar{x})}{\frac{s}{\sqrt{n}}}$$

where $s = \frac{\sum{(x - \bar{x})^2}}{n - 1}$ and $\frac{s}{\sqrt{n}}$ is called standard error.


### Confidence Intervals 

#### Known $\sigma$

$$x_{low} = \bar{x} - z_{\alpha/2} *  (\frac{\sigma}{\sqrt{n}})$$

$$x_{up} = \bar{x} + z_{\alpha/2} *  (\frac{\sigma}{\sqrt{n}})$$

where $z_{\alpha/2}$ can be found using the following code,

``` Python
from scipy.stats import norm

# alpha = 0.10
alpha_one_tail = 1 - 0.05

# Calculate the 95th percentile
z = norm.ppf(q=alpha_one_tail)
```

#### Unknown $\sigma$

$$t_{low} = \bar{x} - t_{\alpha/2} *  (\frac{s}{\sqrt{n}})$$

$$t_{up} = \bar{x} + t_{\alpha/2} *  (\frac{s}{\sqrt{n}})$$

where $t_{\alpha/2}$ can be found using the following code,

``` Python
from scipy.stats import t

# alpha = 0.10, df = 30 - 1
alpha_one_tail = 1 - 0.05
df = 30 - 1

# Calculate the 95th percentile
t = t.ppf(q=alpha_one_tail, df=df)
```

In [None]:
from scipy.stats import t

# Calculate the 95th percentile
alpha_one_tail = 1 - 0.05
df = 30 - 1

t = t.ppf(q=alpha_one_tail, df=df)

print(f'The t statistic that yields 5% in the right tail: {round(t, 4)}')

## Hypothesis Test Procedures

### Step-by-Step Process

#### Define Null and Alternative Hypotheses

We can usually write our hypotheses statements in the following forms,

1. Two Tail Test: 
- Null Hypotheses: Population mean is equal to hypothesized mean
- Alternative Hypotheses: Population mean is not equal to hypothesized mean
- $H_0: \mu = \mu_0$
- $H_a: \mu \neq \mu_0$

2. Left Tail Test:
- Null Hypotheses: Population mean is greater than or equal to hypothesized mean
- Alternative Hypotheses: Population mean is less than hypothesized mean
- $H_0: \mu \geq \mu_0$
- $H_a: \mu < \mu_0$

2. Right Tail Test:
- Null Hypotheses: Population mean is less than or equal to hypothesized mean
- Alternative Hypotheses: Population mean is greater than hypothesized mean
- $H_0: \mu \leq \mu_0$
- $H_a: \mu > \mu_0$

#### Define $\alpha$ Level for the Test

State the significance level ($\alpha$ level) of the test. The significance level represents the Type I Error of the test, which is the risk for rejecting the null hypotheses when it's actually true.

Typical $\alpha$ Levels: 0.10, 0.05, 0.01, 0.001

#### Calculate the Related Statistic

Depending on the distribution for the test, the statistic used for the hypothesis test will be different.

#### Make Final Decision of the Test

Z score with known population mean and standard deviation

Relating a point to the population

$z = \dfrac{{x} - \mu}{\sigma}$

Relating the sample to the population

$z = \frac{(\bar{x} - \mu_0)}{\sigma/\sqrt{n}}$

``` Python
# Import norm from scipy.stats
from scipy.stats import norm

# Setup the parameters
x = 5
mu = 8
n = 30
sigma = 1.5

# Calculate the z score
z = (x - mu) / (sigma / np.sqrt(n))

# Calculate cumulative probability on the left tail
norm.cdf(z)

# Calculate the probability on the right tail
norm.sf(z)
```

In [None]:
# Import norm from scipy.stats
from scipy.stats import norm
import math

# Set up the parameters
x = 7.4
mu = 8
n = 30
sigma = 5.4

# Calculate the z score for sample related to the population
z = (x - mu) / (sigma / math.sqrt(n))
print(f'The associated z score is {round(z, 4)}')

# Calculate cumulative probability on the left tail
left = norm.cdf(z)

# Calculate the probability on the right tail
right = norm.sf(z)

print(f'The probability on the left and right tails:\
      ({round(left, 4)}, {round(right, 4)})')