**Hypothesis Tests for One Population Mean when Sigma is Unknown**

A hypothesis test for a population mean when the population standard deviation, $\sigma$, is unknown is conducted in the same way as if the population standard deviation is known. The only difference is that the **$t$-distribution** is invoked, instead of the standard normal distribution ($z$-distribution).

For a test with null hypothesis $H_0 : \mu = \mu_0$, the test statistic $t$, calculated as

$$t = \frac{\bar{x}-\mu_0}{s/\sqrt{n}}$$

This hypothesis testing prrocedure is called **one-mean $t$-test** or simply **$t$-test**. Recall that hypothesis tests follow a step wise procedure, which is summarized as follows

\begin{array}{l}
\hline
\ \text{Step 1}  & \text{State the null hypothesis } H_0 \text{ and alternative hypothesis } H_A \text{.}\\
\ \text{Step 2}  & \text{Decide on the significance level, } \alpha\text{.} \\
\ \text{Step 3}  & \text{Compute the value of the test statistic.} \\
\ \text{Step 4a} & \text{Critical value approach: Determine the critical value.}
\\
\ \text{Step 4b} &\text{P-value approach: Determine the p-value.} \\
\ \text{Step 5a} & \text{Critical value approach: If the value of the test statistic falls in the rejection region, reject} H_0  \text{; otherwise, do not reject} H_0 \text{.}
\\
\ \text{Step 5b} & \text{P-value approach: If } p \le \alpha \text{, reject }H_0 \text{; otherwise, do not reject } H_0 \text{.} \\
\ \text{Step 6} &\text{Interpret the result of the hypothesis test.} \\
\hline 
\end{array}

**One-mean $t$-test: An example**

We work with the *students* data set

We examine the average weight of students and compare it to the average weight of Europeans adults. [Walpole et al. (2012)](https://bmcpublichealth.biomedcentral.com/articles/10.1186/1471-2458-12-439) published data on the average body mass *(kg)* per region, including Europe. They report the mean the average body mass for the European adult population to be $70.8$ *kg*. We set $\mu_0$, the population mean accordingly, $\mu_0 = 70.8$. 

We take a random sample $x$ with a sample size of $n=9$. The sample consists of the weights in *kg* of $9$ randomly picked students from the *students* data set.

In [1]:
import pandas as pd
import numpy as np

students_df = pd.read_csv("https://userpage.fu-berlin.de/soga/200/2010_data_sets/students.csv")

n = 9
mu_0 = 70.8

x = np.random.choice(students_df['weight'], n )

**Hypothesis testing: The critical value approach**

**Step 1: State the null hypothesis, $H_0$, and alternative hypothesis, $H_A$.**

The null hypothesis states that the average weight of students $(\mu)$ equals the average weight of European adults of $70.8$ *kg* $(\mu_0)$ as reported by Walpole et al. ($2012$). In other wordsm there is no difference in the mean weight of students and the mean weight of European adults.

$$H_0 : \mu = 70.8$$

For the purpose of illustration we test three alternating hypothesis.

**Alternative hypothesis 1**: The average weight of students does not equal the average weight of European adults. In other words there is a difference in the mean weight of students and the mean weight of European adults.

$$H_{A_1} : \mu \neq 70.8$$

**Alternative hypothesis 2**: The average weight of students is less than the average weight of European adults.

$$H_{A_2}: \mu < 70.8$$

**Alternative hypothesis 3**: The average weight of students is higher than the average weight of European adults.

$$H_{A_3} : \mu > 70.8$$

**Step 2: Decide on the significance level, $\alpha$.**

$$\alpha = 0.05$$

In [2]:
alpha = 0.05

**Step 3, 4 and 5: Compute the value of the test statistic, determine the critical value, and evaluate the value of the test statistic. If it falls in the rejection region, reject $H_0$; otherwise, do not reject $H_0$**

In [4]:
from scipy.stats import t

df = n-1

t_A1 = t.ppf(1-alpha/2, df)
t_A2 = -t.ppf(1-alpha, df)
t_A3 = t.ppf(1-alpha, df)

print(t_A1,t_A2,t_A3)

2.3060041350333704 -1.8595480375228424 1.8595480375228424


In [11]:
import math

x_bar = np.mean(x)
sigma = np.std(x,ddof=1)

t_statistic = (x_bar - mu_0)/ (sigma / math.sqrt(n))

t_statistic

0.9431506816302334

In all three cases, we conclude that at $5$% significance level, the data does not provide sufficient evidence to reject $H_0$

**Hypothesis testing The $p$-value approach**

In [12]:
from scipy.stats import t

upper = 1- t.cdf(t_statistic,df)
lower = t.cdf(-t_statistic,df)
p_A1 = upper + lower 

p_A2 = t.cdf(t_statistic,df)

p_A3 = 1- t.cdf(t_statistic,df)

print(p_A1, p_A2, p_A3)

0.3732104831406428 0.8133947584296786 0.18660524157032143


In [13]:
from scipy.stats import ttest_1samp

ttest_1samp(x,mu_0)

Ttest_1sampResult(statistic=0.9431506816302333, pvalue=0.3732104831406429)

So in all three scenarios, we don't have sufficient evidence to reject the null hypothesis at the $5$% significance level.