<h1> Exploring the Central Limit Theorem </h1>

In this notebook, we're going to explore the central limit theorem numerically. In class, we've seen that the theorem says that as $N \to \infty$, an appropriately scaled average of $N$ IID random variables will converge to a normal distribution. A natural question that comes up is how large $N$ should be in order to use the central limit theorem accurately and efficiently: if $N$ is too small, then the limit is a bad approximation and if $N$ is too large, we require too much data to be practical.

In this notebook, you'll explore averages of exponential random variables; remember that these frequently occur in waiting-time examples, and there are many, many real-world systems that rely on knowing how quickly randomly arriving events happen. One such case is right here - this is core to how computer networks work!

<h2> An approximation </h2>

If you average 30 exponential random variables each having distribution $\operatorname{Exp}(1/10)$, you can approximate the average as $\operatorname{N(10, 100/30)}.$ This is because the underlying distribution has mean $10$ and variance $100$; remember that the variance of the average is $\sigma^2 / n$.

We can also just directly simulate $P(\overline{X}_{30} > 14)$ by generating exponential random variables. The code below does this.

The central limit theorem estimate for the probability is about $1.4\%$, while the simulated sum of $30$ exponential variables is around $2.2\%$ (using $10^6$ trials). 

Although these numbers are on the same scale (each representing a fairly unlikely outcome that's not too extreme), they are still off by nearly $60\%$. This is because $30$ is a small number of trials, and the central limit theorem improves as we take a larger number of trials.

In [None]:
# The packages we'll need
from math import sqrt, log
from random import random

def exp_rv(lamb):
    # Generate an exponentially distributed random variable 
    # from the Exp(lamb) distribution
    r = random()
    return -log(r) / lamb
    
def zScore(mu, var, observed, N):
    # Compute the z-score of an observation given 
    # the number of trials (N) of IID random variables
    # all having mean mu and variance var
    return (observed - mu) / sqrt(var / N)

# Example usage
# print(f'z-score: {zScore(10, 100, 14, 30)}')

In [None]:
# This code generates a SINGLE average of 30 exponentially distributed 
# data points each coming from an Exp(1/10) distribution:

rv_count = 30
samples = []

for _ in range(rv_count):
    samples.append(exp_rv(1/10))
    
average = sum(samples) / len(samples)

# Print out the average:
print(f'First random average: {average}')

<h3> Questions </h3>

You'll explore some of these probabilities and see how the estimates improve as $N$ increases; this will give you a feel for when the CLT is actually applicable in practice. 

<b>Question 1</b>: Verify the claim above: with a large enough number of samples, the probability a sum of $30$ IID $\operatorname{Exp}(1/10)$ random variables will have probability around $0.022$ of being $\ge 14$.

In [None]:
# Put your code for Question 1 here.



Written answer for question 1: 

<b>Question 2:</b> Now take a sum of $100$ IID $\operatorname{Exp}(1/10)$ random variables. Simulate the probability that the average is $\ge 11$ and $\le 9$. Compare this to the results of the central limit theorem; how far off are the results?

In [None]:
# Put your code for Question 2 here.



Written answer for question 2: 