# Large sample confidence intervals

Reference: [link](https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading23b.pdf)

One typical goal in statistics is to estimate the mean of a distribution. When the data follows a normal distribution we could use confidence intervals based on standardized statistics to estimate the mean.

But suppose the data $X_1, X_2,...,X_n$ is drawn from a distribution that may not be normal. If the distribution has finite mean and variance and if $n$ is sufficiently large, then the central limit theorem suggests that we can still use normal distribution to approximate the distribution of the standardized statistic:

$\frac{(\bar{X}-\mu)\sqrt{n}}{s} \approx N(0,1)$

So, for large $n$, the $(1 − \alpha)$ confidence interval for $\mu$ is *approximately*

$[\bar{X}-\frac{s}{\sqrt{n}}z_{1-\alpha/2} ; \bar{X}+\frac{s}{\sqrt{n}}z_{1-\alpha/2}]$

But how good is such an approximation? Let's run some experiments to find out.

In [None]:
import numpy as np
import scipy.stats

To begin with, implement the function *large_sample_ci()* that constructs the large sample CI as described above given the current sample szie $n$, sample mean $\bar{X}$, sample standard deviation $s$ and the confidence level $\alpha.

In [None]:
# Your code here

## How large should $n$ be?

Since our intervals are now approximate, we shouldn’t expect that the true confidence level will be $1-\alpha$, unless sample size $n$ is large enough. In other words, the share of the intervals that don't contain the true value of the parameter $\mu$ would be different from $\alpha$.

Let's perform an experiment to show that.

Run simulations for $X_1,..., X_n$ drawn from the exponential distribution $Exp(1)$. Note that this distribution is "far away" from normal. 

For several values of $n$ (e.g, $20, 50, 100, 500$) and different confidence levels (e.g., $\alpha = 0.1, 0.05, 0.01$), run N = 10000 trials. Each trial should consist of the following steps:

1. Draw $n$ samples from $Exp(1)$.
2. Compute sample mean $\bar{X}$ and sample standard deviation $s$.
3. Construct the large sample confidence interval $\bar{X}\pm\frac{s}{\sqrt{n}}z_{1-\alpha/2}$
4. Check if the true mean $\mu=1$ is not in the interval.

Report the nominal ($1-\alpha$) and empirical confidence levels of the obtained intervals. 

In [None]:
# Your code here


What do you obseve? How does the difference between the two change as $n$ increases?

Now, repeat the experiment, but this time sampling data from the standad normal distribution instead of the exponential one.

In [None]:
# Your code here


Do you see any difference in the results of the two experiments? Why?