<h1> Simulating means and variances </h1>

<h2> Introduction </h2>

We're almost finished establishing some of our big techniques in simulation. We've learned how to simulate easy-to-describe events (like coin flips and dice rolls), and in the last notebook we learned how to generate exponentially distributed random numbers through a change of variables.

In this notebook, we'll take one last look at simulation for a while -- we'll keep generating random data and studying it. Our next step will be to study actual real-world data sets and analyze them statistically. But we need one last detail: how to describe the *summary statistics* of a data set so that we can give a high-level overview of the data. We'll also get more experience working with normally distributed data. Happily, Python has a way to do that: `random.gauss()`. It's called this because normal distributions are also referred to as *Gaussian,* after Carl Friedrich Gauss.

<h2> Part 1: Generating data </h2>

To begin, let's generate some normally distributed data and compare it to our Table B.1 by estimating $P(z \ge 0.29)$. The table value estimates this as $0.3859$; we'll generate $10,000$ random numbers and compare.

In [None]:
# Import the necessary package
from random import gauss

success = 0

for _ in range(10000):
    if gauss() > 0.29:
        success += 1

print(f'Estimated probability: {success / 10000}')

After one run, I got an estimated probability of $0.3861$, which is pretty close to the actual value!

Let's continue: we can compute means and variances by directly computing averages. For the mean, we'll just take the average (`sum`) of the data set and divide by the number of trials. For the variance, we'll sum the squared deviations $(r - \mu)^2$ before dividing.

In [None]:
# Keep going: generate 10,000 more random numbers and compute the mean and variance of the data set.

randoms = [gauss() for _ in range(10000)]
computed_mean = sum(randoms) / 10000

# Keep track of the deviations from the mean: subtract and square
squared_deviations = [(r - computed_mean)**2 for r in randoms]
computed_variance = sum(squared_deviations) / 10000

print(f'Computed mean: {computed_mean}')
print(f'Computed variance: {computed_variance}')

After running this with 10,000 numbers, I came up with a computed mean of $0.0227$ and a variance of $1.0107$. These are both really close to the actual parameters of the distribution -- which are $0$ and $1$, respectively!

In fact, this is a really important technique we'll use later: take some data from the real world, make a guess at the distribution it follows, estimate the parameters of that distribution, and then see how well our hypothesis fits the data. This is one of the core ideas of applied statistics. 

<h2> Part 2: Questions </h2>

> <b>Question 1:</b> A standard rule of thumb is the [68-95-99.7](https://en.wikipedia.org/wiki/68–95–99.7_rule) rule, which refers to the probabilities of $P(-1 \le Z \le 1)$, $P(-2 \le Z \le 2)$, and $P(-3 \le Z \le 3)$ for a random variable $Z \sim N(0, 1)$. 

> Generate at least $100,000$ random numbers from a Gaussian distribution and **estimate these three probabilities**; how well does your computed data match the rule?

<h3><i> Put your answer to Question 1 here! </i></h3>

> <b> Question 2:</b> **Estimate a threshold** $\alpha$ for which $P(Z \le \alpha) = 0.80$, again using at least $100,000$ trials. Compare your result to Table B.1 (or any other $z$-table!).  


<h3><i> Put your answer to Question 2 here! </i></h3>

> <b> Question 3: </b> Many important distributions can be found by modifying the standard normal one. Two of these are the  [chi-squared distribution](https://en.wikipedia.org/wiki/Chi-squared_distribution) (which is the sum of squares of normally distributed random variables) and the [folded normal distribution](https://en.wikipedia.org/wiki/Folded_normal_distribution), which is the absolute value $|Z|$ of a normal random variable. **Estimates the means and variances** of these two distributions (the linked articles have the computed values if you'd like to compare!).

Note: the absolute value in Python is given by `abs()`.

<h3><i> Put your answer to Question 3 here! </i></h3>

<h2> Submitting this to Gradescope </h2>

Once you've finished modifying your notebook and answering the questions, you'll need to submit it to Gradescope along with your other homework. To do this, generate a pdf file by clicking `File -> Save and Export Notebook as... -> PDF`. Then upload that PDF to Gradescope and submit it to the assignment `Jupyter 3 - Binomial`. As always -- if you have any questions or run into any issues you can
* ask during discussion,
* email your TA or instructor,
* or bring them to student hours!