<h2> Exploring confidence intervals </h2>

This week, we're going to experiment a bit with confidence intervals and generating them from data. One of the most subtle things about confidence intervals is that they do *not* represent the probability that a parameter $\theta$ is in a particular interval $(\ell, u)$ -- it either is or it isn't. What *is* true is that if we generate a large number of confidence intervals at level $\gamma$, then they should contain the parameter with probability $\gamma$.

Let's demonstrate this with our standard normal. We'll do the following as a single trial:
* Generate $30$ normally distributed numbers from an $N(0, 1)$ distribution using `random.gauss(0, 1)`.
* Compute the mean of these $30$ data points and then the corresponding $95\%$ confidence interval $(\overline{x}_{30} \pm 1.96 / \sqrt{30})$.
* Count this as a success if $0$ is in the confidence interval, because $0$ is the true mean.

We'll then carry out $100,000$ trials of this and see how close we came:

In [1]:
import random
from math import sqrt

def trial():
    # Generate 30 random data points
    data = []
    for _ in range(30):
        data.append(random.gauss())

    # Compute the sample mean
    mean = sum(data) / 30

    # Check if 0 is in the confidence interval
    # Return 1 if true, 0 if false
    w = 1.960 / sqrt(30)
    if mean - w < 0 < mean + w:
        return 1
    else:
        return 0

# Run this 100K times and count the successes
count = 0
for _ in range(100000):
    count += trial()

print(count)

95038


On my first trial of this, I got $95,038$ successful confidence intervals out of $100,000$ trials. This is extremely close to the $95\%$ estimate! 

<h3> Questions </h3>

* **Question 1**: Let's take the mean of $10$ data points. Construct the $90\%$ confidence interval and experimentally verify that it's correct.
* **Question 2**: Returning to a mean of $10$ data points: replace the $95\%$ confidence interval $(\overline{x}_{10} \pm 1.96 \cdot 1 / \sqrt{30})$ with $(\overline{x}_{10} \pm 1.96 \cdot S_{10} / \sqrt{10})$ where $S_{10}$ is the sample standard deviation. Estimate the corresponding confidence level; is it higher or lower than $95\%$? Does this match your expectation?
* **Question 3**: Adapting your code from the previous part, estimate a value of $t$ so that $(\overline{x}_{10} \pm t \cdot S_{10} / \sqrt{10})$ is a $98\%$ confidence interval for the mean. (What you've estimated is the critical $t_{9, 0.01}$ from Table B.2!).

To get you started, some code to generate the sample standard deviation is below:

In [2]:
def sample_sd(data):
    # Return the sample standard deviation of the data
    # passed as a list / array
    N = len(data)
    mean = sum(data) / N

    return sqrt(sum([(d - mean)**2 for d in data]) / (N - 1))

# Put your code here!

In [3]:
# Put your code for Question 1 here.

*Put your answer to Question 1 here!*

In [4]:
# Put your code for Question 2 here.

*Put your answer to Question 2 here!*

In [5]:
# Put your code for Question 3 here.

*Put your answer to Question 3 here!*

<h2> Submitting this to Gradescope </h2>

Once you've finished modifying your notebook and answering the questions, you'll need to submit it to Gradescope along with your other homework. To do this, generate a pdf file by clicking `File -> Save and Export Notebook as... -> PDF`. Then upload that PDF to Gradescope and submit it to the assignment `Jupyter 8 - Confidence Intervals`. As always -- if you have any questions or run into any issues you can
* ask during discussion,
* email your TA or instructor,
* or bring them to student hours!