# t-Test: 2 Population Means (Variance Unknown and Unequal)

## Overview

The goal of this test is to determine if there is a sgnificant difference \
between two population means $(\mu_1,\mu_2)$.

## Limitation

(1) \
This test assumes both populations are normally distributed.\
If populations are not normally distributed then test is "approximate."

(2) \
This test only measures if $\mu_2 = \mu_1$ and $\mu_2 \ne \mu_1$



## Example Problem 

My wife would like to buy a penguin however she is unable to decide between\
Species A penguin and Species B penguin. She cares about the speed at which\
the penguins can walk, but I told her they all walk slow and it doesn't matter.\
My wife "insists" so we observe 10 penguins from Species A and then 13 from\
Species B in terms of their speed over 5 meters. The results are as follows.

Species A: average speed = 0.67 m/s, standard deviation = 0.3 m/s\
Species B: average speed = 0.58 m/s, standard deviation = 0.6 m/s

Does it really matter which penguin we get?\
Let's solve in python...

## Step 0: Import Python Libraries

In [1]:
import numpy as np
import scipy.stats as ss

## Step 1: Establish Hypothesis

Null Hypothesis $(H_0)$ :  $\mu_2 = \mu_1$ \
Alternate Hypothesis $(H_1)$ :  $\mu_2 \ne \mu_1$

## Step 2: Compute Test Statistic

In [2]:
# sample 1 size 
n1 = 10

# sample 2 size 
n2 = 13

# sample 1 mean
x_bar_1 = 0.67

# sample 1 mean
x_bar_2 = 0.58

# sample 1 standard deviation
std1 = 0.3

# sample 2 standard deviation
std2 = 0.6

# difference in sample means (note mu2 - mu1 = 0 so not ncluded in numerator)
mean_difference = x_bar_2 - x_bar_1

# standard error
a = std1 / np.sqrt(n1)
b = std2 / np.sqrt(n2)
se = a + b
 
# test statistic (t-statistic)
t_statistic = mean_difference / se
t_statistic = t_statistic.round(2)

t_statistic

-0.34

## Step 3: Determine Critical t-value

In [5]:
# assumed alpha (type I - error level) = 5%
alpha = 0.05

# degrees of freedom
part_1 = np.power(se, 4)
part_21 = np.power(std1, 4) / (np.power(n1, 3) - np.power(n1, 2))
part_22 = np.power(std2, 4) / (np.power(n2, 3) - np.power(n2, 2))
nu = part_1 / (part_21 + part_22)

# lower-tail probability (2-tailed test)
q = 1 - (alpha / 2)

# probability that raw score (x) > t_statistic
t_critical = ss.t.ppf(q = q, df = nu).round(2)

t_critical

2.0

## Step 4: Establish Rejection Region

Reject Null Hypothesis if |t_statistic| > |t_critical|

## Step 5: Conclusion

In [15]:
if np.abs(t_statistic) > np.abs(t_critical):
  decision = 'reject null hypothesis so it matters which penguin we buy.'
else:
  decision = 'fail to reject null hypothesis so it does not matter which penguin we buy.'

# state conclusion
txt = f'Given all statistical circumsances around the problem we\n{decision}'

print(txt)

Given all statistical circumsances around the problem we
fail to reject null hypothesis so it does not matter which penguin we buy.


## Convert Code To Function

In [19]:
def jra_t_test_population_mean(x_bar_1, x_bar_2, n1, n2, std1, std2, alpha):
  # difference in sample means (note mu2 - mu1 = 0 so not ncluded in numerator)
  mean_difference = x_bar_2 - x_bar_1

  # standard error
  a = std1 / np.sqrt(n1)
  b = std2 / np.sqrt(n2)
  se = a + b
  
  # test statistic (t-statistic)
  t_statistic = mean_difference / se
  t_statistic = t_statistic.round(2)
  
  # assumed alpha (type I - error level) = 5%
  alpha = 0.05

  # degrees of freedom
  part_1 = np.power(se, 4)
  part_21 = np.power(std1, 4) / (np.power(n1, 3) - np.power(n1, 2))
  part_22 = np.power(std2, 4) / (np.power(n2, 3) - np.power(n2, 2))
  nu = part_1 / (part_21 + part_22)

  # lower-tail probability (2-tailed test)
  q = 1 - (alpha / 2)

  # probability that raw score (x) > t_statistic
  t_critical = ss.t.ppf(q = q, df = nu).round(2)

  if np.abs(t_statistic) > np.abs(t_critical):
    decision = 'reject null hypothesis so it matters which penguin we buy.'
  else:
    decision = 'fail to reject null hypothesis so it does not matter which penguin we buy.'

  # state conclusion
  txt = f'Given all statistical circumsances around the problem we\n{decision}'

  return print(txt)

## Test Function

In [21]:
jra_t_test_population_mean(x_bar_1, x_bar_2, n1, n2, std1, std2, alpha)

Given all statistical circumsances around the problem we
fail to reject null hypothesis so it does not matter which penguin we buy.
