# t-Test: Population Mean (Variance Unknown)

## Overview

The goal of this test is to determine if there is a sgnificant difference \
between an assumed (known) population mean $(\mu_0)$ and a measured \
sample mean $(\bar{x})$.

## Limitation

This test assumes the population is normally distributed.\
If population is not normally distributed then test is "approximate."

## Example Problem 

My wife assumes that a specific type of penguin from "region a" have an \
average height of 30 inches. I was able to collect a sample of 15 of these \
penguins and found that the mean height was 35 inches, with standard \
deviation of 1.5 inches. Did my wife pproximate the penguin height correctly \
or was her estimate significantly higher or lower than the average?

Let's solve in python...

## Step 0: Import Python Libraries

In [1]:
import numpy as np
import scipy.stats as ss

## Step 1: Establish Hypothesis

Null Hypothesis $(H_0)$ :  $\mu = \mu_0$ \
Alternate Hypothesis $(H_1)$ :  $\mu \ne \mu_0$

## Step 2: Compute Test Statistic

In [2]:
# population mean 
mu_0 = 30

# sample size 
n = 15 

# sample mean
x_bar = 35

# sample standard deviation 
std = 1.5

# sample average deviation from population mean
mean_deviation = x_bar - mu_0

# standard error
se = std / np.sqrt(n)
 
# test statistic (t-statistic)
t_statistic = mean_deviation / se
t_statistic = t_statistic.round(2)

t_statistic

12.91

## Step 3: Determine Critical t-value

In [3]:
# assumed alpha (type I - error level) = 5%
alpha = 0.05

# degrees of freedom
nu = n - 1  

# lower-tail probability (2-tailed test)
q = 1 - (alpha / 2)

# probability that raw score (x) > t_statistic
t_critical = ss.t.ppf(q = q, df = nu).round(2)

t_critical

2.14

## Step 4: Establish Rejection Region

Reject Null Hypothesis if |t_statistic| > |t_critical|

## Step 5: Conclusion

In [4]:
if np.abs(t_statistic) > np.abs(t_critical):
  decision = 'reject'
else:
  decision = 'fail to reject'

# state conclusion
txt = f'At significance level alpha = {alpha}, given sample size of n = {n},\n\
sample average = {x_bar}, and sample standard deviation = {std}, we\n\
{decision} null hypothesis that average region a penguin height\n\
is equal to {mu_0}.'

print(txt)

At significance level alpha = 0.05, given sample size of n = 15,
sample average = 35, and sample standard deviation = 1.5, we
reject null hypothesis that average region a penguin height
is equal to 30.


## Convert Code To Function

In [5]:
def jra_t_test_population_mean(mu_0, n, x_bar, std, alpha):
  # sample average deviation from population mean
  mean_deviation = x_bar - mu_0

  # standard error
  se = std / np.sqrt(n)
  
  # test statistic (t-statistic)
  t_statistic = mean_deviation / se
  t_statistic = t_statistic.round(2)

  # degrees of freedom
  nu = n - 1  

  # lower-tail probability (2-tailed test)
  q = 1 - (alpha / 2)

  # probability that raw score (x) > t_statistic
  t_critical = ss.t.ppf(q = q, df = nu).round(2)
  
  if np.abs(t_statistic) > np.abs(t_critical):
    decision = 'reject'
  else:
    decision = 'fail to reject'

  # state conclusion
  a = f'At significance level alpha = {alpha}, given sample size of n = {n},\n'
  b = f'sample average = {x_bar}, and sample standard deviation = {std}, we\n'
  c = f'{decision} null hypothesis that average region a penguin height\n'
  d = f'is equal to {mu_0}.'
  txt = a + b + c + d
  
  return print(txt)

## Test Function

In [6]:
for x_bar in np.arange(30, 32, 0.5):
  jra_t_test_population_mean(mu_0, n, x_bar.round(2), std, alpha)
  # jra_t_test_population_mean(mu_0, n, x_bar, std, alpha) # has precision error
  print('\n')

At significance level alpha = 0.05, given sample size of n = 15,
sample average = 30.0, and sample standard deviation = 1.5, we
fail to reject null hypothesis that average region a penguin height
is equal to 30.


At significance level alpha = 0.05, given sample size of n = 15,
sample average = 30.5, and sample standard deviation = 1.5, we
fail to reject null hypothesis that average region a penguin height
is equal to 30.


At significance level alpha = 0.05, given sample size of n = 15,
sample average = 31.0, and sample standard deviation = 1.5, we
reject null hypothesis that average region a penguin height
is equal to 30.


At significance level alpha = 0.05, given sample size of n = 15,
sample average = 31.5, and sample standard deviation = 1.5, we
reject null hypothesis that average region a penguin height
is equal to 30.


