# Distribution of Proportions

## References

* https://stats.stackexchange.com/questions/165142/is-the-standard-deviation-of-a-binomial-dataset-informative#comment314138_165142
* https://stats.stackexchange.com/questions/29641/standard-error-for-the-mean-of-a-sample-of-binomial-random-variables
* https://en.wikipedia.org/wiki/Standard_error
* http://www.statisticshowto.com/satterthwaite-approximation/

## Distribution of Sample Means for Binomial Random Variables

$X$ is a binomial random variable where $X \sim Binomial(n, p)$.

In other words, $X$ is the number of successes we will get if we run an experiment with two outcomes (success and failure) $n$ times with a probability for success of $p$

The mean or expected value of $X$ is $\mu = np$ and the variance of $X$ is $\sigma = np(1-p)$ according to equations for binomial distributions.

In order to compare two independent samples or two independent binomial distributions, we want to compare the proportion of success for each distribution.

The proportion of successes is therefore $\frac{X}{n}$ or the number of success over the number of trials or the sample size.

If we run the binomial experiment many times, the distribution for $\frac{X}{n}$ will be normal per the central limit theorem.

A normal distribution is defined by its mean and standard deviation.

The mean of this distribution would be:

$$ Mean\left(\frac{X}{n}\right) = \frac{\bar{X}}{n}= \frac{np}{n} = p $$

The standard deviation can be found as follows:

According to one [rule for variances](https://en.wikipedia.org/wiki/Variance): "the variance of (a random variable multiplied by a constant) is equal to (the variance of the random variable) muliplied by the square of the constant".

$$ Var\left(\frac{X}{n}\right) = \frac{Var(X)}{n^2} $$

$$\frac{Var(X)}{n^2} = \frac{np(1-p)}{n^2} = \frac{p(1-p)}{n}$$

The [standard error](https://en.wikipedia.org/wiki/Standard_error) of the proportion is the standard deviation of this normal distribution, which would be the square root of the variance.

$$SE = \sqrt{Var\left(\frac{X}{n}\right)} = \frac{\sqrt{p(1-p)}}{\sqrt{n}} = \frac{s}{\sqrt{n}} $$

## Pooled Standard Error

The equation for pooled standard error can be found by using the Satterthwaite Approximation:

$$ SE_p = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} $$

In other words, the pooled standard error is the SRSS of the individual standard error of the proportion for each sample. This equation can be used when your variance and sample sizes are different.

## In the Context of A/B Testing

When we run an A/B test, we set up a control group and a variant group. The control group will see or experience the current design while the variant group sees a new design.

The pooled standard error is used to construct the null hypothesis, which is represented as a normal distribution, $N(0,\, SE_p)$. We pool the standard error for the two groups because our null hypothesis states that there is no difference between the results from the two groups and the proportions come from the same population. Under the null hypothesis, the difference between the groups will fall within this normal distribution. 

The alternative hypothesis is also a normal distribution with the same standard deviation, but with a mean of $\hat{d}$, which is the difference between the two groups. Our alternative hypothesis represents the distribution of possible improvement over the control group.



As the sample size for each group increases, our normal distributions for the null and alternate hypotheses become tighter and any overlap between the distributions decreases.