# Confidence intervals

## Overview

We already know that when estimating a parameter $\theta$ of a population, using an estimator $\hat{\theta}$, we are subject to errors. In fact, we have also derived some error estimates which we called them standard errors. The question now we want to address is how much we want to trust $\hat{\theta}$. Specifically, we are interested in the following questions: 

- How much can we trust the reported estimator? 
- How far can it be from the actual parameter of interest? 
- What is the probability that it will be reasonably close? 
- If we observed an estimator $\hat{\theta}$ then what can the actual parameter $\theta$ be?

We can address the questions above by constructing <a href="https://en.wikipedia.org/wiki/Confidence_interval">_confidence intervals_</a> (CI). Thus, in this section, we will review the following

- What is a CI
- Constructing of CI for population mean
- Construct CI, for difference between means
- Construct CI for proportions
- Construct CI for difference between proportions
- Sample size selection

## Confidence intervals

In order to answer the questions above, we  can use _confidence intervals_. We have the following definition [1, 2, 4].

----

**Definition.**

An interval $[a,b]$ is a $(1-\alpha)$ confidence interval for the parameter $\theta$ if it contains the parameter with probability $(1-\alpha)$ i.e.

$$P(a \leq \theta \leq b) \geq 1 - \alpha$$

We usually refer to  $(1-\alpha)$ as the **coverage probability** or as the **confidence level**.


Notice that bothe $a,b$ are random variables as they depend on the data.

----

Given that the population parameter of interest is not random but constant, then the coverage probability represents the probability that our interval contains a constant parameter $\theta$. The coverage probability therefore refers to the chance that our interval covers a constant parameter $\theta$ and not a probability
statement about $\theta$.

----

**Remark.**

When interpreting confidence intervals we should think that it is more accurate to say that there is a $(1-\alpha)$ probability that the specific sample confidence interval is one of the confidence intervals that actually capture the population parameter [2]


----

----

**Remark: Credibility set**

We will see that when adopting a bayesian framework, we no longer have to explain the condifence level $(1-\alpha)$ in terms of long run of samples. Instead we can have an interval that has a posterior probability  $(1-\alpha)$ and state that the parameter $\theta$ belongs to this set with that probabiltiy. This interval or, more abstractly, this set $C$ is a called a **credible set** [1].

----

## Construction of confidence intervals


Now that we know how to interpret confidence intervals, let's turn our attention onto how to construct one. In particular, we are interested in how to construct a CI that will satisfy the following 

$$P(a \leq \theta \leq b) = 1 - \alpha$$

We will look into various case. However, we start with the general case.

**Normally distributed data**

In this section, we show the general mechanics of constructing a confidence interval. We will specialize it further below.  Assume that there is an unbiased estimator $\hat{\theta}$ that is normally distributed. We can standardize it and get a standard normal variable

$$ z = \frac{\hat{\theta} - E\left[\hat{\theta}\right]}{se(\hat{\theta})} \sim N(0, 1)$$

where $se(\hat{\theta})$ is the standard error associated with the estimator. Furthermore, we know that

$$P(z_{-\alpha/2} \leq z \leq z_{\alpha/2}) = 1 - \alpha$$

by substituting in the equation above the representation for $z$ and solving the inequality for $\theta$, we can solve for the bounds of the interval:

$$a = \hat{\theta} - z_{\alpha/2}\sigma(\hat{\theta}), ~~ b = \hat{\theta} + z_{\alpha/2}\sigma(\hat{\theta})$$

Thus, we arrive at the following conclusion.


----

**CI for normal distribution.**

If $\hat{\theta}$ is an unbiased estimator for $\theta$ and $\hat{\theta}$ is normally distributed then

$$[a = \hat{\theta} - z_{\alpha/2}se(\hat{\theta}), ~~ b = \hat{\theta} + z_{\alpha/2}se(\hat{\theta})]$$


is a $(1- \alpha)\%$ C.I. for $\theta$.

If the distribution of $\hat{\theta}$ is approximately normal,
we get an approximately $(1- \alpha)$ CI.

----

In this formula, $\hat{\theta}$ is the center of the interval, and $z_{\alpha /2}se(\theta)$ is the margin. The margin of error is often reported along with poll and survey results. In newspapers and
press releases, it is usually computed for a 95\% confidence interval. We will see below that by constraining the margin not to exceed a certain threshold, we can get a formula for the required sample size in order to satisfy this threshold.

We now turn our attention onto constructing confidence intervals for some frequently occurring cases.
In general, we distinguish between the following scenarios

- Large sample and known variance
- Large sample and unknown variance
- Small sample and unknown variance

When the variance is unknown we need to estimate it from the sample. However, when the sample is small $s$ is not an accurate estimate of the population $\sigma$. In this case we need to adjust the confidence interval. 

###  CI for population mean

One of the most common scenarios, is to construct a C.I. for the population mean. Let's assume that the variance $\sigma^2$ is known. Then, the CI is given by the formula 

$$\bar{x} \pm z_{\alpha/2}\frac{\sigma}{\sqrt{n}} $$

The following Scala code shows how to calculate the C.I

```
import breeze.linalg.DenseVector
import breeze.stats.distributions.Gaussian

class CINormalDistBuilder(val alpha: Double, val sigma: Double) {

  def build(sample: DenseVector[Double]): Interval = {

    val mu = breeze.stats.mean(sample)
    val gaussian  = Gaussian(mu = mu, sigma = sigma)
    val n: Int = sample.size
    val zAlpha2 = gaussian.inverseCdf(0.5 * alpha)
    val low = mu - zAlpha2 * ( sigma / math.sqrt(n) )
    val high = mu + zAlpha2 * ( sigma / math.sqrt(n) )
    Interval(L = low, U = high)
  }

}

```

----

**Remark.**

If a sample comes from any distribution, but the sample size $n$ is large, then $\bar{x}$ has an approximately normal distribution according to the CLT.


----

**Unknown variance**

The calculation above is rather straightforward. However, what happens when $\sigma$ is not known. In this case we need to estimate it using the sample. If the sample size is large, then we can use the following estimator $\hat{s}$ 

$$\hat{s} = \frac{1}{n-1}\sum_i (x_i - \bar{x})^2$$

Thus, the formula now becomes

$$\bar{x} \pm z_{\alpha/2}\frac{\hat{s}}{\sqrt{n}} $$

However, when the sample size is small, a sample standard deviation $s$ is not an
accurate estimator of the population standard deviation $\sigma$. Thus, we need to adjust the C.I. We do so by using the <a href="https://en.wikipedia.org/wiki/Student%27s_t-distribution">Student's t-distribution</a> and arrive at the following interval

$$\bar{x} \pm t_{\alpha/2}\frac{\hat{s}}{\sqrt{n}} $$

----

**Remark**

The density function of the $t-$distribution is similar to that of the normal distribution.
In general, its peak is lower and its tails are thicker when compared with the normal distribution.
Thus, a larger $t_{\alpha}$ is needed to cut area $\alpha$ from the right tail meaning

$$t_{\alpha} > z_{\alpha}$$

for small $\alpha$. Consequently, the confidence interval that uses $t_{\alpha}$ is wider than the interval that uses $z_{\alpha}$ when $\sigma$ is known. This is exactly the price we pay for not knowing the standard
deviation $\sigma$. Thus, when we lack a certain piece of information, we cannot get a more accurate
estimator


----

### <a name="subsec2"></a> Selection of sample size

Very often we are interested in how large the sample size should be in order to have a desired precision of our estimator. The equation above specifies a C.I. in the form

$$\text{center} ~~ \pm ~~ \text{margin}$$

where $\text{center}=\hat{\theta}$ and $\text{margin}=z_{\alpha/2}\sigma(\hat{\theta})$. 

What we now ask is  what sample size $n$ guarantees that the margin of a $(1-\alpha)100\%$ confidence interval does not exceed a specified limit $\Delta$?In order to answer this question, we can solve the following inequality

$$\text{margin} \leq \Delta$$

in terms of $n$

Typically, parameters are estimated more accurately based on larger samples, so that the standard error $\sigma(\hat{\theta})$ and the margin are decreasing functions of sample size $n$.
Then, the equation above must be satisfied for sufficiently large $n$. Hence, we arrive at

----

**Sample size for a given precision**


In order to attain a margin of error $\Delta$ for estimating
a population mean with a confidence level $(1-\alpha)$, a sample of size 


$$n \geq \left(\frac{z_{\alpha/2}\sigma}{\Delta}\right)^2$$

is required. Notice that we can only round it up to the nearest integer sample size. If we round it down, our margin will exceed $\Delta$.


----

### <a name="subsec3"></a> C.I for difference between means 

Let's now consider the scenario, where we want to compare two populations e.g.  a comparison of two
materials, two suppliers, two service providers, two communication channels, two labs, etc. We collect a sample from each population. We assume that the samples are are collected independently of each other

Again we assume that 

- Normal distribution of data or
- Sufficiently large sample size

An unbiased estimator for the difference between two means is

$$\hat{\theta} = \bar{x} - \bar{y}$$

The standard error associated with $\hat{\theta}$ is given by

$$\sigma(\hat{\theta}) = \sqrt{\frac{\sigma_{X}^2}{n} +\frac{\sigma_{Y}^2}{m}}$$

Thus we arrive at the following formula

$$\bar{x} - \bar{y} \pm z_{\alpha/2}\sigma(\hat{\theta})$$

----

**C.I. for for difference between means with known standard deviations**


$$\bar{x} - \bar{y} \pm z_{\alpha/2}\sigma(\hat{\theta})$$

where

$$\sigma(\hat{\theta}) = \sqrt{\frac{\sigma_{X}^2}{n} +\frac{\sigma_{Y}^2}{m}}$$

----

The following Scala code shows how to calculate the C.I

```
import breeze.linalg.DenseVector
import breeze.stats.distributions.Gaussian

class CINormalDistTwoMeansBuilder(val alpha: Double, val sigma1: Double, val sigma2: Double) {

  def build(sample1: DenseVector[Double], sample2: DenseVector[Double]): Interval = {

    val mu1 = breeze.stats.mean(sample1)
    val mu2 = breeze.stats.mean(sample2)
    val sigma = math.sqrt((sigma1 * sigma1) / sample1.size + (sigma2 * sigma2) / sample2.size)
    val gaussian  = Gaussian(mu = mu1 - mu2, sigma = sigma)

    val zAlpha2 = gaussian.inverseCdf(0.5 * alpha)
    val low = mu1 - mu2 - zAlpha2 * sigma
    val high = mu1 - mu2 + zAlpha2 *  sigma
    Interval(L = low, U = high)
  }
  
}

```

**Unknown standard deviation** 

Knowing the standard deviation of the two populations is most likely an exception rather than the rule. When the standard deviations are unknown, we use their estimates. When the sample sizes are large we can estimate $s_X$ and $s_Y$ and use these estimates in our formula. Hence we have the following C.I.

----

**C.I. for for difference between means with unknown standard deviations (large samples)**


$$\bar{x} - \bar{y} \pm z_{\alpha/2}\sigma(\hat{\theta})$$

where

$$\sigma(\hat{\theta}) = \sqrt{\frac{s_{X}^2}{n} +\frac{s_{Y}^2}{m}}$$

----

When the sample size is small, and in contrast to the population mean case, we need to consider two important cases 

- In one case, there exists an exact and simple solution based on $t$-distribution. 

- The other case suddenly appears to be a famous Behrens-Fisher problem, where no exact solution exists, and only approximations are available.

**Case 1: Equal variances**

Suppose there are reasons to assume that the two populations have equal, but unknown, variances. For example, two sets of data are collected with the same measurement device. Thus, measurements have different means but the same precision. In this case, 

$$\sigma^{2}_{X} = \sigma^{2}_{Y} = \sigma^{2}$$

In this case, there is only one variance $\sigma^2$ to estimate instead of two. We should use both
samples $X$ and $Y$ to estimate their common variance. This estimator of $\sigma^2$  is called a
**pooled sample variance**, and it is computed as

$$s^{2}_p = \frac{(n-1)s^{2}_X + (m-1)s^{2}_Y}{n + m -2}$$

Thus, we can construct the following C.I.

----

**Confidence interval for the difference of means; equal, unknown
standard deviations**


$$\hat{X} - \hat{Y} \pm t_{\alpha/2}s_p \sqrt{\frac{1}{n} + \frac{1}{m}}$$


where $s_p$ is the pooled standard deviation,; the root of the pooled variance  and $t_{\alpha/2}$ is a critical value from $t-$distribution with $(n + m - 2)$ degrees of freedom.


----

**Case 2: Unequal variances**

The most difficult case is when both variances are unknown and unequal. This is known as the <a href="https://en.wikipedia.org/wiki/Behrens%E2%80%93Fisher_problem">Behrens-Fisher</a> problem. If we replace $\sigma_X$ and $\sigma_Y$ by their estimates $s_X$ and $s_Y$ and form the ratio 

$$t = \frac{(\bar{x} - \bar{y}) - (\mu_X - \mu_Y)}{\sqrt{\frac{s_{X}^{2}}{n} + \frac{s_{Y}^{2}}{m}}}$$

Then $t$ will not follow the $t-$distribution. Instead, we need to use the following approximation to the number of degrees of freedom.

$$\nu = \frac{\left(\frac{s^{2}_X}{n} + \frac{s^{2}_Y}{m}\right)^2}{\frac{s^{4}_X}{n^2(n-1)} + \frac{s^{4}_Y}{m^2(m-1)}}$$

This is known as the  <a href="https://en.wikipedia.org/wiki/Welch%E2%80%93Satterthwaite_equation">Satterthwaite approximation</a>.

----

**C.I. for for difference between means with unequal and unknown standard deviations (small samples)**


$$\bar{x} - \bar{y} \pm t_{\alpha/2}\sigma(\hat{\theta})$$

where

$$\sigma(\hat{\theta}) = \sqrt{\frac{s_{X}^2}{n} +\frac{s_{Y}^2}{m}}$$

and the degrees of freedom for the $t-$ distribution are given by the formula for $\nu$ above.

----

### <a name="subsec4"></a> C.I for proportions 

We now look into proportions and how to construct confidence intervals for them.  Assume that we have a subpopulation $A$ of items with  a certain attribute. By the population proportion we mean the probability 

$$p = P\{ i \in A\}$$

for a randomly selected item $i$ to have this attribute. A sample proportion $\hat{p}$ is used to estimate $p$ according to 

$$\hat{p} = \frac{\text{number of sampled items from }A}{n}$$

The difficulty here is that we don't know the standard deviation. As usual, we will estimate it by 

$$s(\hat{p}) = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

And construct an approximate $(1-\alpha)100\%$ C.I according to 

----

**Confidence interval for a population propotion**

$$s(\hat{p}) \pm z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

----

### <a name="subsec5"></a> C.I for difference between two proportions 

Similarly, we can construct a confidence interval for the difference between two proportions.
In this scenario, we have proportions $p_1$ and $p_2$ of items with an attribute. We assume that independent sample of size $n_1$ and $n_2$ are collected respectively. Both parameters are estimated by sample proportions $\hat{p}_1$ and $\hat{p}_2$. Thus, we can construct the following C.I.

----

**Confidence interval for the difference of proportions**


$$\hat{p}_1 + \hat{p}_2 \pm z_{\alpha/2} \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_1} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_2}}$$


----


### <a name="subsec6"></a> Estimate proportions

A related theme that occurs is how do we estimate the sample size $n$ such that a desired level of margin $\Delta$. We will proceed just like we did for the population mean in section <a name="subsec3">Selection of sample size</a>. In the same pace, the margin is 

$$\text{margin} = z_{\alpha/2}\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$$

and the standard way of finding the sample size that provides the desired margin $\Delta$ is by solving the inequality

$$\text{margin} \leq \Delta$$

which leads to 

$$n \geq \hat{p}(1 - \hat{p})\left(\frac{z_{\alpha/2}}{\Delta}\right)^2$$

However, the difficulty here is that the inequality includes $\hat{p}$ To know $\hat{p}$, we first need to collect a sample, but to know the sample size, we first need to know $\hat{p}$. The way to break the ties here is to observe that the function $\hat{p}(1-\hat{p})$ has a global maximum at 0.25. Thus, we can replace  $\hat{p}(1-\hat{p})$ in the formula above using this value. In this case, we may end up with a sample size larger than we actually need.  However, this  will ensure that we estimate$\hat{p}$  with a margin that does
not exceed $\Delta$. Therefore, we conclude with the following sample size

$$ n \geq 0.25 \left( \frac{z_{\alpha/2}}{\Delta}\right)^2$$

## <a name="sum"></a> Summary

In this section we reviewed various methods to construct confidence intervals. In general, we want an as much small as possible confidence interval. Large intervals may be more likely to contain the value of the population parameter but they are less accurate and thus not desirable. 

For statistical functions with symmetric sampling distributions like the normal distributions or the $t-$distribution we take equal areas from the edges of the distribution. Otherwise, when the sampling distribution is not symmetric like the $\chi^2$ we need to control differently the edge areas in order to get the smallest interval.

We also saw a new variance namely the pooled sample variance given by

$$s^{2}_p = \frac{(n-1)s^{2}_X + (m-1)s^{2}_Y}{n + m -2}$$

and the Satterthwaite approximation for the degrees of freedom of the $t-$distribution given by 

$$\nu = \frac{\left(\frac{s^{2}_X}{n} + \frac{s^{2}_Y}{m}\right)^2}{\frac{s^{4}_X}{n^2(n-1)} + \frac{s^{4}_Y}{m^2(m-1)}}$$

Finally, we saw how to obtain the sample size $n$ such that the constructed C.I. satisfies a certain margin.

When we deal with confidence intervals we should think precision. A narrow confidence interval tells us that the estimate is fairly precise whereas a wide confidence interval tells us that the estimate is relatively imprecise.

## <a name="refs"></a> References

1. Michael Baron, _Probability and statistics for computer scientists_, 2nd Edition, CRC Press.
2. Larry Hatcher, _Advanced statistics in research_, Shadow Finch Media.
3. Murray R. Spiegel, _Probability and statistics_, Schaum's Outline Series.
4. Larry Wasserman, _All of Statistics. A Concise Course in Statistical Inference_, Springer 2003.