In [None]:
%matplotlib inline
import numpy as np
import pandas as pd
import math
import matplotlib.pyplot as plt
import scipy.stats

### Hypothesis test for the sample proportion

In this notebook we demonstrate how to carry out a hypothesis test for the sample proportion. If the conditions to ensure that the sampling distribution of the proportion is almost normal hold we can apply the same techniques we already applied in [this notebook](./02_Hypothesis_testing_based_on_the_normal_model.ipynb) for the population mean. There are, however, some changes in the way the standard error is calculated. 

The sample proportion can be represented as the average of the set of successes (1) and failures (0) in a sample of independent trials which probality is the population proportion. 

#### The sampling distribution of the sample proportion

If the sample is large enough the sampling distribution of the sample proportion is normally distributed around the value of the population proportion. The standard distribution of the sample proportion, or standard error, is calculated as:

SE = sqrt(p\*(1-p)/n)

where p is the population proportion and n is the size of the sample.

In this example we are assuming that the population proportion is p=0.7. We simulate the sampling of 10000 samples of size 10000 from that population and plot the sampling distribution. 

In [None]:
P = 0.7
NUMBER_SAMPLES = 10000
SIZE_SAMPLE = 10000

p_sample = []
for i in range(NUMBER_SAMPLES):
    p_sample.append(np.mean(np.random.binomial(size=SIZE_SAMPLE, n=1, p=P)))

fig, ax = plt.subplots()
mean = np.mean(p_sample)
std = np.std(p_sample)
weights = np.ones_like(p_sample)/float(len(p_sample))
ax.hist(p_sample, bins=50, weights=weights)
ax.set_xlabel('mean = ' + str(mean) + ', \nstd = ' + str(std))

se = math.sqrt(0.7*(1-0.7)/SIZE_SAMPLE)
print('Standard error from population mean value = ' + str(se))