## Confidence intervals

Suppose you have three numbers: a, b and c. Then, "b is within c of a" means the same as "a is within c of b". Both of those statements are equivalent to "the absolute value of the difference between a and b is less than c".

If $\sigma$ If somebody is $2\sigma$ away from you, what is a prob that you are $2\sigma$ away from them? => ~95%.

The above also means that there is a 95% prob. that $p$ is within $2\sigma_{\hat{p}}$ of $\hat{p}$.

In [66]:
import numpy as np

n = 30
N = 1000
p = .3
polls = np.random.binomial(n, p, N)
p_hat = polls / n

# standard error
se = np.sqrt(p_hat*(1-p_hat)/n)

conf95_range = np.zeros((N, 2))
conf95_range[:,0] = p_hat - 2*se
conf95_range[:,1] = p_hat + 2*se

p_in_range = (conf95_range[:,0] <= support) & (support <= conf95_range[:,1])
np.sum(p_in_range)/N

0.952

This helps with a problem like this:

> How confident we are that a sample mean $\hat{p} = .33$ (support for Donald Trump) represents actual support by whole population and we are OK with 95% of time being right and 5% time wrong.

We can compute standard error of sample statistic ($\hat{p}$) and use it as a standard deviation of sampling distribution and check for a given confidence interval (number of STDs) where would actual population statistic fall.

$$
\text{SE} = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}
= \sqrt{\frac{.33(1-.33)}{30}} = 0.086
$$

We can say that we are 95% confident that .33 of population will support Trump with SE range from 0.158 to 0.502.