<h2> Variance of averages </h2>

In this week's notebook, you'll explore data and understand how the averages tend towards their limits. We'll generate data from different distributions and see how it settles down towards a particular value over time, by repeatedly computing averages.

<h3> Generating random data </h3>

The uniform distribution $\operatorname{Unif}(0, 1)$ has mean $0.5$ and variance $0.0833...$; remember that this is the distribution that Python's `random()` defaults to.

Instead of generating just one random number, let's generate a lot (one hundred) of them and average them: this leads to a new random variable
$$Y = \frac{X_1 + X_2 + \cdots + X_{100}}{100}.$$
In this notebook, we'll explore what happens to the variation of $Y$ compared to the variation of $X$.

The code block generates an instance of $Y$:

In [None]:
from random import random

# Generate 100 random data points and average them as a 
# single outcome for Y

s = 0
for i in range(100):
    s = s + random()
y = s / 100
print(y)

We can also do this for other kinds of variables! Although `random()` generates a single uniformly distributed outcome, we can build from this. For example, the following code block generate a random number according to an exponential distribution:

In [None]:
from math import log

# Generate exponential data with parameter ell. 
# We're using ell rather than "lambda" because
# that's a reserved word in Python.

def Exp(ell):
    r = random()
    # If you're interested in why this next line 
    # works, look at page 74 of the course textbook!
    return -(1/ell) * log(r)

# Generate 100 random exponentials and average them:
s = 0
for i in range(100):
    s = s + Exp(5)
z = s/100
print(z)

You can stitch together code from past weeks' notebooks to answer this. In particular, we've generated random numbers to simulate data before, stored it, and averaged it. You shouldn't have to write very many new things to solve the following questions:

<h4> Question 1 </h4>

* Generate at least one million samples for $Y$; how close is the sample mean to $0.5$?

* Estimate the variance and standard deviation of $Y$; how do they compare to the variance and standard deviation of just one uniform random variable?

* If $X$ has the original $\operatorname{Unif}(0, 1)$ distribution, then there is an $80\%$ chance that $X$ differs from its mean by at least $0.1$: $$P(|X - 0.5| \ge 0.1) = 80\%.$$
Estimate this quantity with $Y$ replacing $X$.

In [None]:
# Put your answers for question 1 here!


<h4> Question 2 </h4>
Repeat the previous part using data from an exponential distribution $\operatorname{Exp}(0.5)$. This has mean $2$, variance $4$, and standard deviation $2$. 

Note that the third part has to be modified: If $X \sim \operatorname{Exp}(0.5)$ then $P(|X - 2| \ge 0.1) \approx 96.3\%$ rather than $80\%$. 


<h4> Question 2 </h4>
Repeat the previous part using data from an exponential distribution $\operatorname{Exp}(0.5)$. This has mean $2$, variance $4$, and standard deviation $2$. 

Note that the third part has to be modified: If $X \sim \operatorname{Exp}(0.5)$ then $P(|X - 2| \ge 0.1) \approx 96.3\%$ rather than $80\%$. 

In [None]:
# Put your answers for question 2 here!

<h2> Submitting this to Gradescope </h2>

Once you've finished modifying your notebook and answering the questions, you'll need to submit it to Gradescope along with your other homework. To do this, generate a pdf file by clicking `File -> Save and Export Notebook as... -> PDF`. Then upload that PDF to Gradescope and submit it to the assignment `Jupyter 5 - Variance`. As always -- if you have any questions or run into any issues you can
* ask during discussion,
* email your TA or instructor,
* or bring them to student hours!