# Confidence Intervals

This tutorial demonstrates how to create confidence intervals for estimators.

Suppose that there are two candidates, candidate A and candidate B, running for an election, and suppose that you want to determine the probability that candidate A will win. Suppose that a total of 100,000 people, also known as the population, can vote in this election. You could try and count the number of people in the population who will vote for candidate A, and then divide this by 100,000. However, this is not practical. Instead, you can select a small subset of the population, called a sample, and count how many of them will vote for candidate A and divide that by the sample size. Suppose that you select a sample of 1,000 people.

Let $X_i$ be $1$ if the $i^{th}$ person in the sample will vote for candidate A, and let $X_i$ be 0 if the $i^{th}$ person in the sample will vote for candidate B, for $i = 1,2,...,1000$. Since the sample's voting choices are random, then $X_1,X_2,...,X_{1000}$ are all random variables. More precisely, they are **Bernoulli** random variables. For example, after picking a random sample of 1,000 people and asking each of them who they will vote for, the observed voting choices could be:

$$\mathcal{D} = \{X_1 = 1,X_2 = 0,X_3 = 1,...,X_{1000} = 1\}$$

A good estimate for the probability that candidate A will win can then be computed by summing all the $X_i$'s for $i=1,2,...,1000$ and then dividing this summation by 1000. Mathematically, if $\hat{\Theta}=f(\mathcal{D},1000)$ is an estimator, or **statistic**, then:

$$\hat{\Theta} = \frac{\sum\limits_{i = 1}^{1000} X_i}{1000}$$

More generally, if $\mathcal{D}$ is a sample of size $n$, then:

$$\hat{\Theta} = \frac{\sum\limits_{i = 1}^{n} X_i}{n}$$

If 800 out of 1,000 people said that they would vote for candidate A, then $\hat{\Theta} = 0.8$. Note, however, that since $\hat{\Theta}$ is a function of random variables, then it is also a random variable. This is because if you select a different sample of 1,000 people from the population, you may observe a different set of voting choices. Thus, $\hat{\Theta}$ has an associated probability distribution $p\left(\hat{\Theta}\right)$ called the **sampling distribution**. In this case, the sampling distribution is dependent on the sample size $n$, such that $p\left(\hat{\Theta}\right) = p\left(\hat{\Theta};n\right)$. Looking closer at the definition of $\hat{\Theta}$:

$$\hat{\Theta} = \frac{\sum\limits_{i = 1}^{1000} X_i}{1000}$$

Since $\sum\limits_{i = 1}^{1000} X_i$ is a sum of i.i.d Bernoulli random variables, then:

$$Y = \sum\limits_{i = 1}^{1000} X_i$$

Is a binomial random variable with mean equal to:

\begin{align}
E[Y] &= E\left[\sum\limits_{i = 1}^{1000} X_i\right] \\
&= \sum\limits_{i = 1}^{1000} E[X_i] \\
&= \sum\limits_{i = 1}^{1000} \mu = 1000\mu
\end{align}

Where:

$$E[X_1] = E[X_2] = ... = E[X_{1000}] = \mu$$

Additionally, the variance of $Y$ is:

$$
Var(Y) = Var(X_1 + X_2 + ... + X_{1000})
$$

Since $X_1,X_2,...,X_{1000}$ are independent and therefore uncorrelated, then the [variance of their sum is equal to the sum of their variances](https://en.wikipedia.org/wiki/Variance#Sum_of_uncorrelated_variables_(Bienaym%C3%A9_formula)):

\begin{equation}
Var(X_1 + X_2 + ... + X_{1000}) = Var(X_1) + Var(X_2) + ... + Var(X_{1000})
\end{equation}

So:

\begin{align}
Var(X_1) &= E\left[(X_1 - \mu)^2\right] \\
&= E\left[X_1^2 - 2X_1\mu + \mu^2\right] \\
&= E\left[X_1^2\right] - 2 \mu E[X_1] + \mu^2
\end{align}

Using [LOTUS](https://en.wikipedia.org/wiki/Law_of_the_unconscious_statistician):

$$
E[X_1^2] = 0^2\cdot p(X_1 = 0) + 1^2\cdot p(X_1 = 1) = p(X_1 = 1) = \mu
$$

So:
$$
Var(X_1) = \mu - 2\mu^2 + \mu^2 = \mu - \mu^2 = \mu(1-\mu)
$$

Since:

$$
Var(Y) = \sum_{i=1}^{1000} Var(X_i)
$$

Then:

$$
Var(Y) = 1000\mu(1-\mu)
$$

Since:

$$\hat{\Theta} = \frac{Y)d

