<h2> Exploring the Central Limit Theorem </h2>

In this notebook, we're going to explore the central limit theorem. The theorem says that as $N \to \infty$, an appropriately scaled average of $N$ IID random variables will converge to a normal distribution. A natural question that comes up is how large $N$ should be in order to use the central limit theorem accurately and efficiently: if $N$ is too small, then the limit is a bad approximation and if $N$ is too large, we require too much data to be practical.

In this notebook, you'll explore averages of exponential random variables -- similar to the waiting time examples we've talked about.

If you are averaging 30 exponential random variables with distribution $\operatorname{Exp}(1/10)$, you can approximate the average as $\operatorname{N(10, 100/30)}.$ This is because the underlying distribution has mean $10$ and variance $100$; remember that the variance of the average is $\sigma^2 / n$.

We can also just directly simulate $P(\overline{X}_{30} > 14)$ by generating exponential random variables. Code below does this.

The central limit theorem estimate for the probability is about $1.4\%$, while the simulated sum of $30$ exponential variables is around $2.2\%$ (using $10^6$ trials). 

Although these numbers are on the same scale (each representing a fairly unlikely outcome that's not too extreme), they are still off by nearly $60\%$. This is because $30$ is a small number of trials, and the central limit theorem improves as we take a larger number of trials.

<h3> Questions </h3>

You'll explore some of these probabilities and see how the estimates improve as $N$ increases; this will give you a feel for when the CLT is actually applicable in practice. 

* Verify the numbers above, that the probability that a sum of $30$ IID $\operatorname{Exp}(1/10)$ random variables will have probability around $0.022$ of being $\ge 14$.

* For a sum of $100$ IID $\operatorname{Exp}(1/10)$ random variables, simulate the probability that the average is $\ge 11$ and $\le 9$. Compare this to the results of the central limit theorem; is there better agreement than with $30$ random variables?

To help get you started, the following code implements the z-score calculation. You can also look at recent notebooks to find code that implements a running sum of samples and code that implements the uniform distribution. One useful trick to remember is that if $R$ is a $\operatorname{Unif}(0, 1)$ random variable, then $2R - 1$ has a $\operatorname{Unif}(-1, 1)$ distribution.

In [2]:
# The packages we'll need
from math import sqrt, log
from random import random

def exp(lamb):
    # Generate an exponentially distributed random variable 
    # from the Exp(lamb) distribution
    r = random()
    return -log(r) / lamb
    
def zScore(mu, var, observed, N):
    # Compute the z-score of an observation given 
    # the number of trials (N) of IID random variables
    # all having mean mu and variance var
    return (observed - mu) / sqrt(var / N)

# Example usage
# print(f'z-score: {zScore(10, 100, 14, 30)}')

In [14]:
# Starting to implement this: get our data and run one trial
data = [exp(1/10) for _ in range(30)]
average = sum(data) / len(data)

# Put the rest of your code here
print(f'First random average: {average}')

First random average: 6.037214750271872


*Put your analysis here.*