# Hypothesis Testing

This notebook displays how hypothesis testing is done. (https://en.wikipedia.org/wiki/Statistical_hypothesis_testing)
<br>
This notebook will deal with the simplest example: test your hypothesis on the true unknown mean of a distribution (not necessarily normal),
<br>
when the distribution's true variance is known.
<br>
<br>
Run the code chunk below, adjust the slidebars and press "Run Interact" to create the visualization. 
<br> In this example, a gamma distribution is used with a mean of your choice.
<br>
mean: the true mean of the gamma distribution (unknown to the tester)
<br>
hypothesis: the tester's hypothesized mean of the distribution
<br>
pvalue (https://en.wikipedia.org/wiki/P-value): the probability that the test statistics from the true distribution can be more extreme than your observed observations.

In [1]:
import numpy as np
import math 
import matplotlib.pyplot as plt
import ipywidgets as widgets
from ipywidgets import interact, interactive, fixed, interact_manual
import scipy.stats as stats

def HT(mean, hypothesis, pvalue):
    shape = math.sqrt(mean)
    scale = math.sqrt(mean)
    truevar = shape*scale*scale
    N=100
    
    sample1 = np.random.gamma(shape=shape, scale=scale, size=N)
    sample2 = np.random.gamma(shape=scale, scale=scale, size=N)
    mean1 = np.mean(sample1)
    mean2 = np.mean(sample2)
    
    print ("Below are the histograms of two sets of samples from a gamma distribution with your chosen mean="+str(mean))
    
    f, axarr = plt.subplots(1, 2, figsize=(20,5), sharey=True)
    axarr[0].hist(sample1)
    axarr[0].axvline(x=mean1, color='b', label='sample mean='+str(mean1))
    axarr[1].hist(sample2)
    axarr[1].axvline(x=mean2, color='b', label='sample mean='+str(mean2))
    axarr[0].legend()
    axarr[1].legend()
    plt.show()
    
    nmean1 = (mean1 - hypothesis) / (math.sqrt(truevar) / math.sqrt(N))
    nmean2 = (mean2 - hypothesis) / (math.sqrt(truevar) / math.sqrt(N))
    
    p1 = stats.norm(0, 1).ppf(pvalue/2.0)
    p2 = stats.norm(0, 1).ppf(1-pvalue/2.0)  
    
    print ("normalized test statistics for the two sample means (blue line): "+str(nmean1)+", "+str(nmean2))
    print ("p-value threshold for a standard normal distribution (green): "+str(p1)+", "+str(p2))
    
    plt.figure(figsize=(12, 5))
    
    x = np.linspace(-3.5, 3.5, 300)
    y = [ 1 / math.sqrt(2 * math.pi * 1) * math.exp( - (xx-0)**2 / (2 * 1) ) for xx in x ]
    plt.plot(x, y)
    plt.axvline(x=0, color='r', label='mean=0 of standard normal')
    plt.axvline(x=nmean1, color='b', label='test statistics based on sample')
    plt.axvline(x=nmean2, color='b')
    plt.xlabel("x")
    plt.ylabel("P(x)")

    plt.axvline(x=p1, color='g', label='p-value threshold (p='+str(pvalue)+')')
    plt.axvline(x=p2, color='g')       
    plt.legend()
    plt.show()

interact_manual(HT, mean=(1.0, 5.0), hypothesis=(1.0, 5.0), pvalue=widgets.FloatSlider(min=0.01, max=0.15, step=0.01))

Widget Javascript not detected.  It may not be installed or enabled properly.


<function __main__.HT>

If your hypothesized mean exactly matches the true mean, 
the test statistics will almost always be positioned within the two p-value thresholds,
in which case you correctly accept your hypothesis.
However, if the hypothesized mean does not match the true mean,
there is an increasing chance that the test statistics may exceed the thresholds, in which case you should reject the hypothesis.
<br>
Depending on the p-value, higher p-values may reject a hypothesis, while lower p-values may accept the same hypothesis.