# Confidence Intervals - Central Limit Theorem vs Bootstrapping 
#### \**NOTE: most of this text is copied and pasted (and lightly edited) from wiki or the references listed at the bottom!

## Confidence Interval

A [confidence interval](https://en.wikipedia.org/wiki/Confidence_interval) is a type of [interval estimate](https://en.wikipedia.org/wiki/Interval_estimation) (of a population parameter $\mu$) that is computed from the sample data.

Confidence intervals consist of a range of values that act as good estimates of the unknown population parameter. However, **the interval computed from a particular sample does not necessarily include the true value of the parameter.**

The confidence level is the proportion of possible confidence intervals that contain the true value of their corresponding parameter.

![confidence_interval.gif](attachment:confidence_interval.gif)

**Confidence intervals only assess sampling error** in relation to the parameter of interest.  As you increase the sample size, the sampling error decreases and the intervals become narrower. 

## Misunderstanding the Confidence Interval

Confidence intervals are frequently misunderstood, published studies have shown that even professional scientists often misinterpret them!!! 

A 95% confidence interval does not mean that for a given realized interval there is a 95% probability that the population parameter lies within the interval - the interval either covers the parameter value or it does not; it is no longer a matter of probability.  The **95% probability relates to the reliability of the estimation procedure**, not to a specific calculated interval.

- A 95% confidence interval does not mean that 95% of the sample data lie within the interval.
- A confidence interval is not a definitive range of plausible values for the sample parameter, though it may be understood as an estimate of plausible values for the population parameter.
- A particular confidence interval of 95% calculated from an experiment does not mean that there is a 95% probability of a sample parameter from a repeat of the experiment falling within this interval.

## Calcluation of the Confidence Interval

When the population standard deviation ($\sigma$) is known, an confidence interval is calulated via:

![ci_known_sd.svg](attachment:ci_known_sd.svg)

where $\bar{x}$ is the sample mean, $z^{*}$ is the z value (for a 95% confidence interval, $z^{*}$ $\sim$ 2), and n is the sample size.

When the population standard deviation ($\sigma$) is unknown, an confidence interval is calulated via a similar method, using the t values instead of the z values, and the sample standard deviation (s) in place of the population standard deviation:

![ci_uknown_sd.svg](attachment:ci_uknown_sd.svg)

**Both of these methods rely on the central limit theorm.**

## Central Limit Theorem

The [central limit theorem](https://en.wikipedia.org/wiki/Central_limit_theorem) tells us that if the sample size n is "sufficiently large", the sampling distribution of the sample mean is approximately normally distributed, regardless of the distribution of the underlying random sample.  n=30 is a generally accepted guideline for "sufficently large".

## Bootstrapping Confidence Intervals

In some case we either lack a sufficnetly large enough sample to assume that central limit theorm applies, or we lack knowledge of the distribution of the parameter (e.g. the median or a correlation term), so we need a different method to create a confidence interval.  In this case we can bootstrap a confidence interval.

## Bootstrapping

Generally [bootstrapping](https://en.wikipedia.org/wiki/Bootstrapping_(statistics) is any test or metric that relies on [random sampling with replacement](https://en.wikipedia.org/wiki/Bootstrapping_(statistics).  It allows for estimation of the sampling distribution of almost any statistic. Generally, it falls in the broader class of [resampling methods](https://en.wikipedia.org/wiki/Resampling_(statistics).

The bootstrap idea is elegantly simple. 

1. You gather the sample and use it to find the parameter $\theta$
2. Then draw a new random sample of size n, from your sample, with replacement.  
3. This second sample is called a bootstrap sample. For that bootstrap sample, that we can use to calculate an calculate an estimate of the parameter of interest for the sample.  We denote this new estimate by $\hat{\theta}^{*}_{1}$
4. We can draw as many bootstrap samples of size n as we want, obtaining M estimates $\hat{\theta}^{*}_{1}$,...,$\hat{\theta}^{*}_{M}$

## Percentile Method

For example, we can draw M = 1000 bootstrap samples of size n. For each such sample we calculate a new estimate of the parameter
of interest.

We rank these estimates from least to largest, and denote these ordered bootstrap estimates by:

$$\hat{\theta}^{*}_{(1)},...,\hat{\theta}^{*}_{(1000)}$$

where the number in parentheses shows the order in terms of size. Thus $\hat{\theta}^{*}_{(1)}$ is the smallest estimate of the standard deviation found in one of the 1000 bootstrap samples, and $\hat{\theta}^{*}_{(1000)}$ is the largest.

The bootstrap method suggests that approximately 95% of the time, the true parameter value for this sample falls between the 2.5th percentile of the bootstrap samples and the 97.5th percentile.

Therefore the 95% confidence interval can be constructed as:

$$ Lower bound = \hat{\theta}^{*}_{(0.025)}$$

$$Upper bound = \hat{\theta}^{*}_{(0.975)}$$

## Resources
Penn State Stats 506 - Sampling Theory and Methods: [Confidence Intervals and the Central Limit Theorem](https://onlinecourses.science.psu.edu/stat506/node/8)

Penn State Stats 414 - Probability Theory and Mathematical Statistics [Central Limit Theorem](https://onlinecourses.science.psu.edu/stat414/node/176)

The Minitab Blog - [When Should I Use Confidence Intervals, Prediction Intervals, and Tolerance Intervals](http://blog.minitab.com/blog/adventures-in-statistics-2/when-should-i-use-confidence-intervals-prediction-intervals-and-tolerance-intervals)

Duke Statistics 111/130 - Probability and Statistics Lecture Notes [Bootstrap Confidence Intervals](http://www2.stat.duke.edu/~banks/111-lectures.dir/lect13.pdf)
