<h1> Exploring Hypothesis Testing </h1>

In this notebook, we're going to explore what hypothesis testing actually tells us. The goal of a hypothesis test is to choose between two alternative views of how data was generated, and naturally comes with a probability of making an error. In a **Type 1** error, we incorrectly reject the null hypothesis because the natural random variation in sampling yields a test statistic which is too far from what we expect. 

<h2> Forming a decision rule </h2>

Suppose we're running a hypothesis test for the mean and want to have a 95% degree of confidence. We want to choose between two hypothesis for a distribution $X$ with known variance $1$:
* $H_0 : \text{ mean } = \mu$
* $H_1 : \text{ mean } \ne \mu.$
If our data is normally distributed and we draw $n$ samples our corresponding decision rule is to reject the null hypothesis if the observed sample mean $\overline{X}_n$ is too far from $\mu$:
$$\text{ Reject $H_0$ if } |\overline{X}_n - \mu| > 1.960 \frac{\sigma}{\sqrt{n}}.$$

We can generate normally distributed data with variance $1$ using `numpy.random.normal()`, which defaults to mean $0$ and variance $1$. Suppose our null hypothesis (which is true in this case!) is that $\mu = 0$. If we take $20$ samples from this population, our decision rule is to reject the null hypothesis if $\overline{X}_{20} > 0.438.$

<h2> Questions </h2>

**Question 1**: Generate 10,000 samples of $\overline{X}_{20}$. How many of these samples would lead to rejecting the null hypothesis? Briefly explain why this result is exactly what you'd expect.

In [12]:
# The packages we'll need
from math import sqrt, log
from random import random
import numpy

# Put your code for question 1 here.

*Put your answer for question 1 here!*

**Question 2:** The distribution $Y \sim \operatorname{Exp}(1)$ also has variance $1$. Since its mean is $1$, the corresponding decision rule would be to reject the null hypothesis that $\mu = 1$ if our sample average is more than $1.438$ or less than $0.562$. Generate $10,000$ samples of $\overline{Y}_{20}$. How many of these samples would lead to rejecting the null hypothesis using our decision rule? Briefly explain why this process leads to a lower confidence level than the 95% in Question 1.

In [36]:
# Finish the code for question 2 here.

def exponential_random():
    return -log(random())

<h2> Submitting this to Gradescope </h2>

Once you've finished modifying your notebook and answering the questions, you'll need to submit it to Gradescope along with your other homework. To do this, generate a pdf file by clicking `File -> Save and Export Notebook as... -> PDF`. Then upload that PDF to Gradescope and submit it to the assignment `Jupyter 7 - Hypothesis Testing`. As always -- if you have any questions or run into any issues you can
* ask during discussion,
* email your TA or instructor,
* or bring them to student hours!