# t-Test: 2 Population Means (Variance Unknown and Equal)

## Overview

The goal of this test is to determine if there is a sgnificant difference \
between two population means $(\mu_1,\mu_2)$.

## Limitation

This test assumesboth populations are normally distributed.\
If populations are not normally distributed then test is "approximate."

## Example Problem 

My wife assumes that a specific type of penguin from "region b" has a \
taller average height compared with penguins from "region a." I was able \
to collect a sample of 20 penguin heights from both regions as follows. \
The mean height of penguins from "region a" was 37 inches, with standard \
deviation of 0.8 inches. The mean height of penguins from "region b" was \
40 inches, with standard deviation of 0.8 inches. Was my wife's assumption \
correct?

Let's solve in python...

## Step 0: Import Python Libraries

In [1]:
import numpy as np
import scipy.stats as ss

## Step 1: Establish Hypothesis

Null Hypothesis $(H_0)$ :  $\mu_2 = \mu_1$ \
Alternate Hypothesis $(H_1)$ :  $\mu_2 \gt \mu_1$ \

or

Null Hypothesis $(H_0)$ :  $(\mu_2 - \mu_1) = 0$ \
Alternate Hypothesis $(H_1)$ : $(\mu_2 - \mu_1) > 0$

## Step 2: Compute Test Statistic

In [2]:
# sample 1 size 
n1 = 20

# sample 2 size 
n2 = n1

# sample 1 mean
x_bar_1 = 37

# sample 1 mean
x_bar_2 = 40

# sample standard deviation (same for both groups)
std = 0.8

# difference in sample means (note mu2 - mu1 = 0 so not ncluded in numerator)
mean_difference = x_bar_2 - x_bar_1

# standard error
a = (1 / n1) + (1 / n2)
se = std * np.sqrt(a)
 
# test statistic (t-statistic)
t_statistic = mean_difference / se
t_statistic = t_statistic.round(2)

t_statistic

11.86

## Step 3: Determine Critical t-value

In [3]:
# assumed alpha (type I - error level) = 5%
alpha = 0.05

# degrees of freedom
nu = n1 + n2 - 2  

# lower-tail probability (1-tailed test)
q = 1 - alpha

# probability that raw score (x) > t_statistic
t_critical = ss.t.ppf(q = q, df = nu).round(2)

t_critical

1.69

## Step 4: Establish Rejection Region

Reject Null Hypothesis if |t_statistic| > |t_critical|

## Step 5: Conclusion

In [31]:
if np.abs(t_statistic) > np.abs(t_critical):
  decision = 'reject'
  claim = 'greater than'
else:
  decision = 'fail to reject'
  claim = 'equal to'

# state conclusion
txt = f'At significance level alpha = {alpha}, given sample sizes n1 = {n1},\n\
n2 = {n2}, sample1 and sample 2 averages = {{{x_bar_1}, {x_bar_2}}} and \n\
sample1 and sample2 standard deviations = {std}, we {decision} null \n\
hypothesis that average region b penguin height is equal to region a\n\
penguin height. We statistically claim  region b penguin height is {claim}\n\
region b penguin height.'

print(txt)

At significance level alpha = 0.05, given sample sizes n1 = 20,
n2 = 20, sample1 and sample 2 averages = {37, 40} and 
sample1 and sample2 standard deviations = 0.8, we reject null 
hypothesis that average region b penguin height is equal to region a
penguin height. We statistically claim  region b penguin height is greater than
region b penguin height.


## Convert Code To Function

In [53]:
def jra_t_test_population_mean(x_bar_1, x_bar_2, n1, n2, std, alpha):
 # difference in sample means (note mu2 - mu1 = 0 so not ncluded in numerator)
  mean_difference = x_bar_2 - x_bar_1

  # standard error
  a = (1 / n1) + (1 / n2)
  se = std * np.sqrt(a)
  
  # test statistic (t-statistic)
  t_statistic = mean_difference / se
  t_statistic = t_statistic.round(2)

  # assumed alpha (type I - error level) = 5%
  alpha = 0.05

  # degrees of freedom
  nu = n1 + n2 - 2  

  # lower-tail probability (1-tailed test)
  q = 1 - alpha

  # probability that raw score (x) > t_statistic
  t_critical = ss.t.ppf(q = q, df = nu).round(2)
  
  if np.abs(t_statistic) > np.abs(t_critical):
    decision = 'reject'
    claim = 'greater than'
  else:
    decision = 'fail to reject'
    claim = 'equal to'

  # state conclusion
  txt = f'  At significance level alpha = {alpha}, given sample sizes n1 = {n1},\n\
  n2 = {n2}, sample1 and sample 2 averages = {{{x_bar_1}, {x_bar_2}}} and \n\
  sample1 and sample2 standard deviations = {std}, we {decision} null \n\
  hypothesis that average region b penguin height is equal to region a\n\
  penguin height. We statistically claim  region b penguin height is {claim}\n\
  region b penguin height.'
  
  return print(txt)

## Test Function

In [54]:
jra_t_test_population_mean(x_bar_1, x_bar_2, n1, n2, std, alpha)

  At significance level alpha = 0.05, given sample sizes n1 = 20,
  n2 = 20, sample1 and sample 2 averages = {37, 40} and 
  sample1 and sample2 standard deviations = 0.8, we reject null 
  hypothesis that average region b penguin height is equal to region a
  penguin height. We statistically claim  region b penguin height is greater than
  region b penguin height.
