# Lab 12 - hypothesis testing


<img src="img/12/significant.png">

## Data science is OSEMN
According to a popular model, the elements of data science are

* Obtaining data
* Scrubbing data
* Exploring data
* Modeling data
* Interpreting data

We have looked at every step of this process! This lab we look more closely at something useful for interpreting data: hypothesis testing. The main point of the lab is to learn how to interpret p-values and what they **really** mean.

# Hypothesis testing

Great sources of information: https://en.wikipedia.org/wiki/P-value, https://en.wikipedia.org/wiki/Statistical_hypothesis_testing, http://ipython-books.github.io/featured-07/

Hypothesis testing is about.. testing hypothesis. Process has following steps:

1. Writing down the hypotheses, notably the null hypothesis which is the opposite of the hypothesis we want to prove (with a certain degree of confidence).
2. Computing a test statistics, a mathematical formula depending on the test type, the model, the hypotheses, and the data.
3. Using the computed value to accept the hypothesis, reject it, or fail to conclude.

## Example

Is the coin fair? 
1. Null hypothesis: coin is fair

In [4]:
import numpy as np
import scipy.stats as st
import scipy.special as sp
n = 100  # number of coin flips
h = 61  # number of heads
q = .5  # null-hypothesis of fair coin

Let's compute z-score

In [5]:
xbar = float(h)/n
z = (xbar - q) * np.sqrt(n / (q*(1-q))); z

2.1999999999999997

In [6]:
pval = 2 * (1 - st.norm.cdf(z))
pval

0.02780689502699718

This p-value is less than 0.05, so we reject the null hypothesis and conclude that the coin is probably not fair.

pval = $P(n_{heads} = 61 | \mbox{is fair}) = 0.0278$

<img src="http://ipython-books.github.io/images/gaussian.png" width=600>

## Interpreting p-values

Sources: http://www.johndcook.com/blog/2008/11/18/five-criticisms-of-significance-testing/, http://blog.minitab.com/blog/adventures-in-statistics/how-to-correctly-interpret-p-values

P values address only one question: how likely are your data, assuming a true null hypothesis? It does not measure support for the alternative hypothesis. 

Low P value indicates that your data are unlikely assuming a true null, it can’t evaluate which of two competing cases is more likely:

* The null is true but your sample was unusual.
* The null is false.

This is quite tricky, but we have to use Bayes rule to ask $P(H_0 | D)$ (p value is $P(D | H_0)$). Usually:

<img width=600 src="img/12/img2.png">

So please be cautious :)

<img width=300 src="img/12/img1.png">

# Exercise 1,  1 point

Read http://www.nature.com/news/scientific-method-statistical-errors-1.14700#/b1 
    and explain in 2 sentences why p-values are dangerous and often lead to 
    incorrect results

# Multiple testing

One of the problems with p-values is that if you perform enough tests, you will definitely reject the null, unless
you correct for multiple testing. Simplest correction is Bonferroni, which simply divides threshold p-value by number of tests performed (which follows directly from union bound)

<img src="https://imgs.xkcd.com/comics/p_values.png">

## Exercise 2, 3 points

Read http://multithreaded.stitchfix.com/blog/2015/10/15/multiple-hypothesis-testing/ and:

* Replicate experiment (as python code, remember to include plots) in "Simple example: Difference between p-value and being “right”" section
* Create artificial example that shows that correcting for multiple tests is necessary (you can get inspiration from ipython notebook that is provided with the blog post or from xkcd strap that is linked with this notebook)