# Confidence intervals

## 0. Intro

A **confidence interval** is an interval, computed from the statistics of the observed data, that might contain the true value of an unknown population parameter. The interval has an associated **confidence level** that represents the frequency (i.e. the proportion) of possible confidence intervals that contain the true value of the unknown population parameter.

**Interpretation**: (Taking the 90% confidence interval as an example.) "There is a 90% probability that the calculated confidence interval from some future experiment encompasses the true value of the population parameter."

## 1. For mean

### z-interval

When variance $\sigma^2$ is known.

$$\bar{X_n} \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{\mathbb{D} X}{n}}$$

$$\bar{X_n} \pm z_{1-\frac{\alpha}{2}} \frac{\sigma}{\sqrt{n}}$$

### t-interval

Uses sample variance $S^2$.

$$\bar{X_n} \pm t_{1-\frac{\alpha}{2}} \frac{S}{\sqrt{n}}$$

## 2. For a portion

If confidence intervals of portions of two groups do not overlap, the groups are significantly different.

### normal

$$\hat{p} \pm z_{1-\frac{\alpha}{2}} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

### wilson

$\hat{p}$ is close to $0$ or $1$.

$$
\frac{1}{1 + \frac{z^2}{n}}
\left( 
\hat{p} + \frac{z^2}{2n} \pm z \sqrt{\frac{\hat{p}(1-\hat{p})}{n} + \frac{z^2}{4n^2}} 
\right), \ z \equiv z_{1 - \frac{\alpha}{2}}
$$

## 3. For difference of two portions

When confidence intervals of portions of two groups overlap.

### independent samples

$$\hat{p}_1 - \hat{p}_2 \pm z_{1-\frac{\alpha}{2}} 
\sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}$$

### dependent samples

E.g. two groups are the same people.

| $X_1$ \ $X_2$ | 1 | 0 | $\Sigma$ |
| --- | --- | --- | --- |
| 1 | e | f | e+f |
| 0 | g | h | g+h |
| $\Sigma$ | e+g | f+h | n |

$$\frac{f-g}{n} \pm z_{1-\frac{\alpha}{2}} 
\sqrt{\frac{f+g}{n^2} - \frac{(f-g)^2}{n^3}}$$

## 4. For an arbitrary statistic

Bootstrap $T(x)$ using a sample $X^n$.

- Sample $X^k$ from $X^n$ with repetitions multiple times.
- For each sample calculate $T(X^k)$ and thus get $F_{T(X^k)}(x)$.
- Use the distribution to obtain the confidence interval.