# Hypothesis testing and *p* value

How surprising is my result? Calculating a p-value

There are many circumstances where we simply want to check whether an observation looks like it is compatible with the null hypothesis, $H_{0}$.

Having decided on a significance level $\alpha$ and whether the situation warrants a one-tailed or a two-tailed test, we can use the cdf of the null distribution to calculate a p-value for the observation.

Acknowledgement: examples are from Dr John Pinney [link here](https://github.com/johnpinney/sampling_and_hypothesis_testing/blob/master/python_version/hypothesis_testing_python.html)

In [None]:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

## Example 1: probability of rolling a six?
Your arch-nemesis Blofeld always seems to win at ludo, and you have started to suspect him of using a loaded die.

You observe the following outcomes from 100 rolls of his die:

In [None]:
data = np.array([6, 1, 5, 6, 2, 6, 4, 3, 4, 6, 1, 2, 5, 6, 6, 3, 6, 2, 6, 4, 6, 2,
       5, 4, 2, 3, 3, 6, 6, 1, 2, 5, 6, 4, 6, 2, 1, 3, 6, 5, 4, 5, 6, 3,
       6, 6, 1, 4, 6, 6, 6, 6, 6, 2, 3, 1, 6, 4, 3, 6, 2, 4, 6, 6, 6, 5,
       6, 2, 1, 6, 6, 4, 3, 6, 5, 6, 6, 2, 6, 3, 6, 6, 1, 4, 6, 4, 2, 6,
       6, 5, 2, 6, 6, 4, 3, 1, 6, 6, 5, 5])

Do you have enough evidence to confront him?

In [None]:
# We will work with the binomial distribution for the observed number of sixes

# Write down the hypotheses
# H0: p = 1/6
# H1: p > 1/6

# choose a significance level
# alpha = 0.01

In [None]:
# number of sixes
# number of trials

_stat_k = sum(data == 6)
_trials = len(data)

print("number of sixes: %d" %_stat_k)
print("number of trials: %d" %_trials)

In [None]:
# test statistic: number of sixes out of 100 trials
# null distribution: binomial(k, n=100, p=1/6)
# calculate p value

p_val = 1 - stats.binom.cdf(_stat_k - 1, n=_trials, p=1/6)

print("Observed statistic is %d" %_stat_k)
print("p value is %.3e" %p_val)

#### Visualize the null distribution and the test statistic

In [None]:
# plot the probability mass function of null distribution

x = np.arange(101)
_pmf = stats.binom.pmf(x, n=100, p=1/6)

fig = plt.figure(dpi=70)
x = np.arange(43)
_pmf = stats.binom.pmf(x, n=100, p=1/6)
plt.plot(x, _pmf, 'ko', ms=1)
plt.vlines(x, 0, _pmf, colors='k', lw=0.1)

x = np.arange(43, 101)
_pmf = stats.binom.pmf(x, n=100, p=1/6)
plt.plot(x, _pmf, 'ro', ms=1)
plt.vlines(x, 0, _pmf, colors='r', lw=0.1)
plt.xlabel('Number of sixes')
plt.ylabel('Probability Mass Function')
plt.title('Distribution of n_six under the null hypothesis')
plt.show()

## Example 2: difference in birth weight

The birth weights of babies (in kg) have been measured for a sample of mothers split into two categories: nonsmoking and heavy smoking.

* The two categories are measured independently from each other.
* Both come from normal distributions
* The two groups are assumed to have the same unknown variance.

In [None]:
data_heavysmoking = np.array([3.18, 2.84, 2.90, 3.27, 3.85, 3.52, 3.23, 2.76, 
                              3.60, 3.75, 3.59, 3.63, 2.38, 2.34, 2.44]) 
data_nonsmoking   = np.array([3.99, 3.79, 3.60, 3.73, 3.21, 3.60, 4.08, 3.61, 
                              3.83, 3.31, 4.13, 3.26, 3.54])

We want to know whether there is a significant difference in mean birth weight between the two categories.

In [None]:
# Write down the hypotheses
# H0: there is no difference in mean birth weight between groups: d == 0
# H1: there is a difference, d != 0

# choose a significance level
# alpha = 0.05

In [None]:
# Define test statistic: difference of group mean

_stat_mu = data_heavysmoking.mean() - data_nonsmoking.mean()
_stat_mu

### Permutation test: null distribution approximated by resampling

In [None]:
def get_permutation_null(x1, x2, n_permute=1000):
    """Simple function to generate permutation distribution
    """
    _n1, _n2 = len(x1), len(x2)
    x_pool = np.append(x1, x2)
    
    RV = np.zeros(n_permute)
    for i in range(n_permute):
        _x_perm = np.random.permutation(x_pool)
        RV[i] = _x_perm[:_n1].mean() - _x_perm[_n1:].mean()
    return RV

In [None]:
np.random.seed(10)
null_distr = get_permutation_null(data_heavysmoking, data_nonsmoking)

In [None]:
fig = plt.figure(dpi=70)
plt.hist(null_distr, bins=20)
plt.axvline(x=_stat_mu, color='tab:orange', ls='--')
plt.axvline(x=-_stat_mu, color='tab:orange', ls='--')
plt.xlabel('Difference of group mean')
plt.ylabel('Resampling frequency')
plt.title('Distribution of $\mu$ under the null hypothesis')
plt.show()

In [None]:
## Two tailed p value
p_two_tailed = np.mean(np.abs(null_distr) >= np.abs(_stat_mu))
p_one_tailed = np.mean(null_distr < _stat_mu)

print("Two tailed p value: %.4f" %p_two_tailed)
print("One (left) tailed p value: %.4f" %p_one_tailed)

### *t* test: null distribution approximated by *t* distribution


We use the t test to assess whether two samples taken from normal distributions have significantly different means.

The test statistic follows a Student's t-distribution, provided that the variances of the two groups are equal.

Other variants of the t-test are applicable under different conditions.

The test statistic is
$$ t = \frac{\bar{X}_{1} - \bar{X}_{2}}{s_p \cdot \sqrt{\frac{1}{n_{1}} + \frac{1}{n_{2}}}} $$

where
$$ s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}} $$

is an estimator of the pooled standard deviation.

Under the null hypothesis of equal means, the statistic follows a Student's t-distribution with $(n_{1} + n_{2} - 2)$ degrees of freedom.


In [None]:
# Same test statistic: difference of group mean

_stat_t = data_heavysmoking.mean() - data_nonsmoking.mean()

_stat_t

Calculate parameters for approximate t distribution

In [None]:
n_ns = len(data_nonsmoking)
n_hs = len(data_heavysmoking)

s_ns = np.std(data_nonsmoking, ddof=1)
s_hs = np.std(data_heavysmoking, ddof=1)

# the pooled standard deviation
sp = np.sqrt(((n_ns - 1)*s_ns**2 + (n_hs - 1)*s_hs**2) / (n_ns + n_hs - 2))
print("Pooled standard deviation: %.3f" %sp)

_std = sp * np.sqrt(1/n_ns + 1/n_hs)
print("Estimated standard error of mean difference: %.3f" %_std)

In [None]:
_xx = np.arange(-0.8, 0.8, 0.01)
_pdf = stats.t.pdf(_xx, df=n_hs+n_ns-2, loc=0, scale=_std)

fig = plt.figure(dpi=70)
plt.plot(_xx, _pdf)
plt.axvline(x=_stat_t, color='tab:orange', ls='--')
plt.axvline(x=-_stat_t, color='tab:orange', ls='--')
plt.xlabel('Difference of group mean')
plt.ylabel('PDF approximated by t distr.')
plt.title('Distribution of $\mu$ under the null hypothesis')
plt.show()

In [None]:
print('t-test p value (two-tailed):')
stats.t.cdf(_stat_t, df=n_hs+n_ns-2, loc=0, scale=_std) * 2

#### Equivalent to nomalised *t* statistic

In [None]:
_xx = np.arange(-0.8/_std, 0.8/_std, 0.01)
_pdf = stats.t.pdf(_xx, df=n_hs+n_ns-2, loc=0, scale=1)

fig = plt.figure(dpi=70)
plt.plot(_xx, _pdf)
plt.axvline(x=_stat_t/_std, color='tab:orange', ls='--')
plt.axvline(x=-_stat_t/_std, color='tab:orange', ls='--')
plt.xlabel('Difference of group mean')
plt.ylabel('PDF approximated by t distr.')
plt.title('Distribution of $\mu$ under the null hypothesis')
plt.show()

In [None]:
# test statistic: mean_diff / standard_error
# null distribution: standard t distribution

print('t-test p value (two-tailed):')
stats.t.cdf(_stat_t/_std, df=n_hs+n_ns-2, loc=0, scale=1) * 2

#### Direct use of ``scipy.stats.ttest_ind``

In your future analysis, you can directly use this ``scipy`` function.

In [None]:
stats.ttest_ind(data_nonsmoking, data_heavysmoking)