
##Confidence intervals##

**General idea**


The $p \%$ confidence interval $[a, b]$ for the parameter, $y$, is the interval that contains the true parameter $p \%$ of the time. In math speak,

$$
a \leqslant y \leqslant b
$$

For better or worse, you will find that the $p=95 \%$ confidence interval is used the majority of the time.

With real data, if you have enough of it, you can calculated the confidence interval by calculating the $2.5 \%$-tile and $97.5 \%$-tile, and then you have your confidence interval.

**Relationship with $\alpha$ and the t-score and z-score**

Recall from our discussion of cumulative probability density function F, that

$$
\begin{aligned}
\operatorname{Pr}(a \leqslant x \leqslant b) &=\int_{a}^{b} f(x) d x \\
\operatorname{Pr}(a \leqslant x \leqslant b) &=F(b)-F(a)
\end{aligned}
$$

For a standard normal random variable $z$, we determined that

$$
\begin{aligned}
&\boldsymbol{\operatorname { P r }}(-1 \leqslant z \leqslant 1)=68.27 \% \\
&\boldsymbol{\operatorname { P r }}(-2 \leqslant z \leqslant 2)=95.45 \%
\end{aligned}
$$

These are confidence intervals for $z$. The first is the $68.27 \%$ confidence interval, and the second is the $95.45 \%$ interval.

To find the $95 \%$ confidence interval for $z$, or $\alpha=0.05$, we need to use our $F(z)$ table, and find the upper-bound b where

$$
\boldsymbol{\operatorname { P r }}(z) \geqslant \mathrm{b}=1-\frac{\alpha}{2}=0.975
$$

Recall, we look for $0.975$ because we want the total area to add to $5 \%(\alpha=0.05)$, and we want the interval to be symmetric - so, $2.5 \%(\alpha / 2)$ on the upper and $2.5 \%(\alpha / 2)$ on the lower.

If you look at your table, you will find that $z_{\alpha / 2}=b=1.96$. Thus, the $95 \%$ confidence bounds on $z$ are

$$
\begin{aligned}
-z_{\alpha / 2} & \leqslant z \leqslant z_{\alpha / 2} \\
-z_{0.025} & \leqslant z \leqslant z_{0.025} \\
-1.96 & \leqslant z \leqslant 1.96
\end{aligned}
$$

A useful way to interpret this confidence interval is that $95 \%$ of the time, the true $z$ falls within - $1.96$ and $1.96$, so, the probability of measuring a $z$ outside of this range is $5 \%$ by random chance. This is often considered small odds. Thus, one often interprets this as interesting, and potentially implying that the measured z has something interesting going on, different from what might be expected from random chance.

For the standard t-statistic t, the confidence intervals are different than for $z$ when the number in the sample $\mathrm{N} \leqslant 30 .$ Also, recall that the t-statistic requires the degrees of freedom $v$ as well. However, these slight differences withstanding, calculating the confidence interval is exactly the same as for the z-score. For example, say we have a random variable $t$ with $\mathrm{N}=12$ (degrees of freedom $v=11$ ). Using our table, we find that the $95 \%$ confidence interval $(\alpha=0.05)$ for $t$ is

$$
\begin{aligned}
-t_{\alpha / 2} & \leqslant t \leqslant t_{\alpha / 2} \\
-t_{0.025} & \leqslant t \leqslant t_{0.025} \\
-2.20 & \leqslant t \leqslant 2.20
\end{aligned}
$$

Likewise, for the $90 \%$ confidence interval with $v=11$

$$
\begin{aligned}
&-t_{\alpha / 2} \leqslant t \leqslant t_{\alpha / 2} \\
&-t_{0.05} \leqslant t \leqslant t_{0.05} \\
&-1.80 \leqslant t \leqslant 1.80
\end{aligned}
$$

**Applying confidence intervals to non-standardized variables**

The above examples, are a relatively straight-forward application of the t-score and z-score because we are dealing with standardized data. However, what if your data is not standardized? You have two options, (1) you can standardize your data and then do all of your analysis using standard normal variables (as above), or, (2) you can use a modified equation for the confidence interval that takes into consideration the real data's mean $\mu$ and standard deviation $\sigma$.

Recall that for the sample mean, we know that the standardized z-score is

$$
z=\frac{\bar{\chi}-\mu}{\frac{\sigma}{\sqrt{N}}}
$$

So, if you want to calculate the $95 \%$ confidence interval using the z-statistic for the true distribution mean $\mu$ given your measured sample mean $\bar{\chi}(\mathrm{N}$ samples $)$, you would write the following

$$
\bar{\chi}-z_{0.025} \cdot \frac{\sigma}{\sqrt{N}} \leqslant \mu \leqslant \bar{\chi}+z_{0.025} \cdot \frac{\sigma}{\sqrt{N}}
$$

Note that while this equation looks messy, it is really just the z-distribution shifted over by $\bar{\chi}$ and scaled by $\sigma / \sqrt{\mathrm{N}}$. As before $z_{\alpha / 2}=1.96$.

For the t-statistic, the true distribution mean has the following $95 \%$ confidence interval

$$
\begin{aligned}
&\qquad \bar{\chi}-t_{0.025} \cdot \frac{\sigma}{\sqrt{\mathrm{N}-1}} \leqslant \mu \leqslant \bar{\chi}+t_{0.025} \cdot \frac{\sigma}{\sqrt{\mathrm{N}-1}} \\
\end{aligned}
$$

Finally, since the normal distribution is symmetric, the above confidence intervals can be re-written more simply as

$$
\begin{aligned}
&\mu=\bar{\chi} \pm z_{0.025} \cdot \frac{\sigma}{\sqrt{\mathrm{N}}} \\
&\mu=\bar{\chi} \pm t_{0.025} \cdot \frac{\sigma}{\sqrt{\mathrm{N}-1}}
\end{aligned}
$$