# Approximate Confidence Intervals

Recall the *Wald confidence interval* for the parameter $p$ in a Bernoulli distribution:

$$\hat p \pm z_{\alpha/2}\sqrt{\frac{\hat p(1 - \hat p)}{n}},$$

where $\hat p$ is the sample proportion (which is just the sample mean, i.e., number of "successes" divided by sample size), and $z_{\alpha/2}$ satisfies $P(Z > z_{\alpha/2})=\frac{\alpha}{2}$.

We frequently are interested in $1 - \alpha = 0.95$, or $\alpha = 0.05$, and so we'd like to find $z_{0.025}$:

In [3]:
z <- qnorm(0.025, lower.tail = FALSE)
z

This is usually approximated in statistics textbooks as $z_{0.025}=1.96$.

Now let's suppose we have $n=8000$ and $p=0.63$.  Assuming we didn't actually know the value of $p$, let's use a Wald confidence interval to approximate it.  In particular, we'll pick 8000 elements from a $\operatorname{Bernoulli}(p)$ distribution using **rbinom**, use **mean** to compute the sample proportion, and then compute the bounds of our confidence interval:

In [7]:
pHat <- mean(rbinom(8000, 1, 0.63))
E <- z * sqrt(pHat * (1 - pHat)/8000)
pHat - E
pHat + E

So we're 95% confident that $0.6238 < p < 0.6449$.

Now let's perform a Monte Carlo simulation.  We'll do the same thing as we did above 10000 times, and compute the proportion of times that our interval actually contains the population parameter $p = 0.63$:

In [9]:
mean(replicate(10000, {pHat <- mean(rbinom(8000, 1, 0.63));
                       E <- z * sqrt(pHat * (1 - pHat)/8000);
                       pHat - E < 0.63 && 0.63 < pHat + E}))

So 95.14% of the time, the confidence interval we computed contained $p = 0.63$.  Not bad!