In [1]:
import sys
sys.path.append('./lib/')

with open('./lib/common_imports.py') as f:
    exec(f.read())

# P-values

## What is a P-value?

The central idea of a P-value is to assume that the null hypothesis is true and calculate how unusual it would be to see data (in the form of a test statistic) as extreme as was seen in favor of the alternative hypothesis. The formal definition is: 

> A P-value is the probability of observing a test statistic as or more extreme in favor of the alternative than was actually obtained, where the probability is calculated assuming that the null hypothesis is true.

A P-value then requires a few steps. 
1. Decide on a statistic that evaluates support of the null or alternative hypothesis.
2. Decide on a distribution of that statistic under the null hypothesis (null distribution).
3. Calculate the probability of obtaining a statistic as or more extreme as was observed using the distribution in 2.

The way to interpret P-values is as follows. If the P-value is small, then either $H_{0}$ is true and we have observed a rare event or $H_{0}$ is false (or possibly the null model is incorrect). Let’s do a quick example. Suppose that you get a $t$ statistic of 2.5 for 15 degrees of freedom testing $H_{0} : \mu = \mu_{0}$ versus $H_{a} : \mu > \mu_{0}$. What’s the probability of getting a $t$ statistic as large as 2.5?

In [2]:
from scipy.stats import t as t_distribution
print(1-t_distribution.cdf(2.5, df = 15))

0.012252901623256984


Therefore, the probability of seeing evidence as extreme or more extreme than that actually obtained under $H_{0}$ is $0.0123$. So, (assuming our model is correct) either we observed data that was pretty unlikely under the null, or the null hypothesis if false.

## The attained significance level

Recall in a previous chapter that our test statistic was $2$ for $H_{0} : \mu_{0} = 30$ versus $H_{a} : \mu > 30$ using a normal test ($n$ was 100). Notice that we rejected the one sided test when $\alpha = 0.05$, would we reject if $\alpha = 0.01$, how about $0.001$? 

The smallest value for alpha that you still reject the null hypothesis is called the ***attained significance level***. 

> This is mathematically equivalent, but philosophically a little different from, the P-value. Whereas the P-value is interpreted in the terms of how probabilistically extreme our test statistic is under the null, the attained significance level merely conveys what the smallest level of α that one could reject at.

This equivalence makes P-values very convenient to convey. The reader of the results can perform the test at whatever $\alpha$ he or she choses. This is especially useful in multiple testing circumstances. 

Here’s the two rules for performing hypothesis tests with P-values. 
1. If the P-value for a test is less than $\alpha$ you reject the null hypothesis 
2. For two sided hypothesis test, double the smaller of the two one sided hypothesis test Pvalues

## Binomial P-value

Suppose a friend has 8 children, 7 of which are girls and none are twins. If each gender has an independent 50% probability for each birth, what’s the probability of getting 7 or more girls out of
8 births?

This calculation is a P-value where the statistic is the number of girls and the null distribution is a fair coin flip for each gender. We want to test $H_{0} : p = 0.5$ versus $H_{a} : p > 0.5$, where $p$ is the probability of having a girl for each birth.

Recall here’s the calculation:

In [5]:
from scipy.stats import binom

np.round(1-binom.cdf(6,8, p=0.5),5)

0.03516

Since our P-value is less than $0.05$ we would reject at a $5 \%$ error rate. Note, however, if we were doing a two sided test, we would have to double the P-value and thus would then fail to reject.

## Poisson example

Suppose that a hospital has an infection rate of $10$ infections per $100$ person/days at risk (rate of $0.1$) during the last monitoring period. Assume that an infection rate of $0.05$ is an important benchmark.

Given a Poisson model, could the observed rate being larger than $0.05$ be attributed to chance? We want to test $H_{0} : \lambda = 0.05$ where $\lambda$ is the rate of infections per person day so that $5$ would be the rate per 100 days. Thus we want to know if $9$ events per $100$ person/days is unusual with respect to a Poisson distribution with a rate of $5$ events per $100$. Consider $H_{a} : \lambda > 0.05$.

In [4]:
from scipy.stats import poisson
np.round(1-poisson.cdf(9,5),5)

0.03183

Again, since this P-value is less than $0.05$ we reject the null hypothesis. The P-value would be $0.06$ for two sided hypothesis (double) and so we would fail to reject in that case.