
# Hypothesis tests for the mean



## 1 Imports and data

Required imports

In [1]:
import numpy as np
import scipy.stats as stats
import math
import matplotlib.pyplot as plt

Here are some samples to work with.

In [2]:
l_sample = np.array([459.93937471, 428.56911056, 441.33412389, 443.39645684,
       448.76532849, 461.89116293, 460.17701055, 447.27797476,
       439.97474578, 446.18116418, 450.84726679, 444.12490592,
       447.6593213 , 435.0076449 , 442.86110955, 453.7571971 ,
       449.71292478, 453.2010883 , 459.21085915, 447.12120059,
       457.94628451, 462.76464606, 443.56578646, 460.61908841,
       454.01523285, 456.27753429, 455.83920688, 463.99479717,
       443.32681673, 425.16538452, 430.32581718, 451.99434723,
       456.33306817, 461.0676949 , 458.58939368, 449.34334726,
       466.17066526, 466.63469241, 458.02337508, 439.28442225,
       437.05792495, 448.37091484, 445.05077328, 456.44145098,
       443.71392147, 459.60968608, 455.88773547, 453.03291995,
       454.15425348, 442.19805884, 437.75805556, 437.29066819,
       432.19569756, 444.17177696, 453.42677273, 471.02255555,
       433.49383922, 445.29045676, 439.57916517, 432.60572773,
       447.94654466, 441.79983365, 445.90910055, 462.04020037,
       447.58976195, 440.291391  , 451.92558266, 454.98075125,
       456.17392379, 453.503855  , 449.99363999, 440.83880686,
       426.06544724, 468.00954174, 449.83632402, 445.32911325,
       453.03749456, 431.71877637, 466.18331614, 452.74054317,
       458.33794027, 455.92566141, 465.38373873, 448.95775753,
       429.93375205, 454.73857533, 466.70740866, 447.02532815,
       468.2701383 , 458.93974002, 439.20753806, 453.6336253 ,
       444.82446294, 462.2869467 , 452.57589117, 457.92968932,
       454.70956354, 444.92896199, 437.50753704, 447.58023815])

s_sample = np.array([456.58180245, 464.50546396, 456.14594757, 449.25808585,
       453.10372431, 463.00835959, 447.70233383, 450.86259841,
       454.43326052, 452.47623745, 429.33418203, 461.56528077,
       450.92313554, 428.44846109, 448.76978192, 466.86569181,
       440.93813882, 446.04164502, 439.17304359, 439.69070222,
       433.53070749, 441.36511285, 453.81691262, 465.00803267,
       438.93570373])

## 2 Hypothesis formulation

A hypothesis of the value of the mean is formulated using a null and alternative hypothesis, as follows:

- in the case of a two-tailed test (we do not expect the value to be 'off' in any particular direction)

    - H<sub>0</sub>: &mu; = &mu;<sub>0</sub>
    - H<sub>a</sub>: &mu; &ne; &mu;<sub>0</sub>
    
- in the case of a one-tailed test (we are testing against the value to be 'off' in a particular direction e.g. larger, as in the example below; this results in an upper-tailed test if we are testing against a value larger than hypothesised and a lower-tailed test if we are testing against a value smaller than hypothesised)

    - H<sub>0</sub>: &mu; &le; &mu;<sub>0</sub>
    - H<sub>a</sub>: &mu; &gt; &mu;<sub>0</sub>
    

## 3 Using the normal distribution

We can use the normal distribution for the test for the mean in either of these cases:

- the sample size is large (this is our case - we will be using `l_sample` where `n = 100`)
- the sample size is small, the distribution is known to be normal and the population standard deviation is known

### Test `l_sample (n=100)` for population mean of 452 (two tailed), with significance of 0.05

   - H<sub>0</sub>: &mu; = 452
   - H<sub>a</sub>: &mu; &ne; 452

Calculate the test statistic

**NOTE:** Here we are using the standard deviation estimate calculated on a sample (S) but if the population standard deviation is known (&sigma;), that should be used in the calculations

In [3]:
mu_zero = 452
T1 = abs((l_sample.mean() - mu_zero)/(l_sample.std()/math.sqrt(len(l_sample))))
T1

2.1428837142679193

Find the two-tailed critical value for significance

In [4]:
alpha = 0.05
alpha_half = alpha/2
critical_value = stats.norm.ppf(1 - alpha_half)
critical_value

1.959963984540054

Compare to see if the value is significant

In [5]:
T1 > critical_value

True

The statistic is more extreme (in this case greater) than the critical value, which means that the result is **significant**. We **reject the null hypothesis**. Thus, there is statistical evidence that the mean is different from the hypothesised value, at the 0.05 level of significance.

### Test `l_sample (n=100)` for population mean of 451 or greater (lower tail), with significance of 0.05

   - H<sub>0</sub>: &mu; &ge; 451
   - H<sub>a</sub>: &mu; &lt; 451

Calculate the test statistic

In [6]:
mu_zero = 451
T2 = (l_sample.mean() - mu_zero)/(l_sample.std()/math.sqrt(len(l_sample)))
T2

-1.169012238528652

Find the **lower-tailed** critical value for significance

In [7]:
alpha = 0.05
critical_value = stats.norm.ppf(alpha)
critical_value

-1.6448536269514729

Compare to see if the value is significant (we are testing if the statistic is **smaller** than the critical value, as it is a lower-tailed test)

In [8]:
T2 < critical_value

False

The statistic is not more extreme (in this case not smaller) than the critical value, which means that the result is **no significant**. We **fail to reject the null hypothesis**. Thus, there is no statistical evidence for the hypothesis (&mu; &ge; 451) not being true, at the 0.05 level of significance.

## 4 Using the t-distribution

We must use the t-distribution for the test for the mean in the following case:

- the sample size is small, the distribution is known to be normal but the population standard deviation is not known


### Test `s_sample (n=25)` for population mean of 452 (two tailed), with significance of 0.05

   - H<sub>0</sub>: &mu; = 452
   - H<sub>a</sub>: &mu; &ne; 452

Calculate the test statistic (note that this is the same as when using the normal distribution)

In [9]:
mu_zero = 452
T3 = abs((s_sample.mean() - mu_zero)/(s_sample.std()/math.sqrt(len(s_sample))))
T3

1.2687248567337825

Find the two-tailed critical value for significance

In [10]:
alpha = 0.05
alpha_half = alpha/2
critical_value = stats.t.ppf(1 - alpha_half, len(s_sample) - 1)
critical_value

2.0638985616280205

Compare to see if the value is significant

In [11]:
T3 > critical_value

False

The statistic is not more extreme (in this case not greater) than the critical value, which means that the result is **not significant**. We **fail to reject the null hypothesis**. Thus, there is no statistical evidence for the hypothesis (&mu; = 452) not being true, at the 0.05 level of significance.

If we repeat this experiment many times with different random samples, the result will vary. This is the nature of probability.

### Test `s_sample (n=25)` for population mean of 453 or greater (lower tail), with significance of 0.05

   - H<sub>0</sub>: &mu; &ge; 453
   - H<sub>a</sub>: &mu; &lt; 453

Calculate the test statistic

In [12]:
mu_zero = 453
T4 = (s_sample.mean() - mu_zero)/(s_sample.std()/math.sqrt(len(s_sample)))
T4

-1.738514003262983

Find the **lower-tailed** critical value for significance

In [13]:
alpha = 0.05
critical_value = stats.t.ppf(alpha, len(s_sample) - 1)
critical_value

-1.7108820799094282

Compare to see if the value is significant (we are testing if the statistic is **smaller** than the critical value, as it is a lower-tailed test)

In [14]:
T4 < critical_value

True

The statistic is more extreme (in this case smaller) than the critical value, which means that the result is **significant**. We **reject the null hypothesis**. Thus, there is statistical evidence that the mean is different than the hypothesised range (i.e. smaller than 453), at the 0.05 level of significance.

## 5 Inspecting p-values (an alternative way of testing)

When computationally working with distributions (as we are here), instead of looking for critical values and comparing statistics with these, we can find p-values for the statistics and compare them with our significance (alpha) values. 

Two-tailed test requires multiplication by 2. Also, we know that T1 is positive and thus use the complement of cdf.

In [15]:
p1 = (1 - stats.norm.cdf(T1)) * 2
print (p1, p1 < 0.05)

0.03212243699528594 True


This is a lower-tailed test hence we use cdf.

In [16]:
p2 = (stats.norm.cdf(T2))
print (p2, p2 < 0.05)

0.1211993502643135 False


Another two-tailed test

In [17]:
p3 = (1 - stats.norm.cdf(T3)) * 2
print (p3, p3 < 0.05)

0.20453921383298113 False


Another lower-tailed test.

In [18]:
p4 = (stats.norm.cdf(T4))
print (p4, p4 < 0.05)

0.04106014241758428 True


## 6 Cases where hypothesis tests cannot be performed with normal or t-distribution

The type of hypothesis test described here cannot be performed in the following case:

- sample size is small (<30) and the distribution of the variable is not known
