# Null hypothesis testing example

1. We see a difference between the two groups
2. But let's still, assume the treatment had no effect.  This is our null hypothesis.
3. Assuming the treatment had no effect, what's the probability of seeing an difference at least this large.
4. If that probability is small, then we reject the null hypothesis and conclude there is evidence for the alternative hypothesis.

### Example

Ok, so let's say that we are thinking of a different marketing tags for a product we are selling online.  Currently we emphasize the low price, and we want to set up an experiment to see if we should emphasize the quality.  We decide to send two different emails, randomly selecting our treatment and control group.

The click through rate for the control group is $.20$.  

We send emails to a total of 100 users, 50 in the treatement and 50 in the control.  

We see that the click through rate for the control group is .22, and that the pooled standard deviation of both the treatment and the control is $.10$.

We set a significance level of .05, or $\alpha = .05$.

### Performing the analysis

Ok, now with this information, we can see if we have evidence for the alternative hypothesis that advertising quality improves our click through rate.

First, we state our null and alternative hypotheses.

* $H_0: CTR \le .20$
* $H_A: CTR \gt$  $.20$

And remember that we observed a click through rate of .25.

* $x_T = .25$

So the question that we have is, what is the probability that we would see this CTR of $.25$, if there were no change.  In other words:

* What is $P(x < .25 | H_0)$


We can answer this question like so.  

First, what is the distribution of the means.  Well we would plot a normal distribution (because of central limit theorem), with a standard deviation of the means of $se = \frac{S}{\sqrt(n)}$.  

In [2]:
import numpy as np
sd = .1
se = sd/np.sqrt(100)

In [3]:
se

0.01

From there, we can plot the standard deviation of the means like so.  We assume the null hypothesis that there's an average of .20, with the standard deviation of the means that we calculated above.

In [4]:
import scipy.stats as stats
import numpy as np
x_null = .20
norm_dist = stats.norm(x_null, se)

We can plot the corresponding distribution of the means, assuming the null hypothesis, like so.

In [20]:
x_vals = np.linspace(norm_dist.ppf(0.00001), norm_dist.ppf(0.99999), 100)
pdf_nums_norm = norm_dist.pdf(x_vals)

In [1]:
# from scipy.stats import norm
# import matplotlib.pyplot as plt
# fig = plt.figure()
# ax = fig.add_subplot(111)
# ax.plot(x_vals, pdf_nums_norm,
# 'r-', lw=5, alpha=0.6, label='norm pdf')
# ax.axvline(x=.22)
# ax.set(title = 'normal distribution')

In [23]:
1 - norm_dist.cdf(.22)

0.02275013194817932