In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("hw05.ipynb")

# Homework 5 – Hypothesis Testing


You must submit this assignment to Gradescope by the on-time deadline. **We strongly encourage you to plan to submit your work to Gradescope several days (hours) before the stated deadline.** This way, you will have ample time to reach out to staff for support if you encounter difficulties with submission. While course staff is happy to help guide you with submitting your assignment ahead of the deadline, we will not respond to last-minute requests for assistance (TAs need to sleep, after all!).

Please read the instructions carefully when you are submitting your work to Gradescope.



In [None]:
import pandas as pd
import numpy as np
import io
from pathlib import Path
import os

import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.style.use('fivethirtyeight')

import plotly
import plotly.figure_factory as ff
import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots

pd.options.plotting.backend = "plotly"

In [None]:
# Set random number generator seed
np.random.seed(16)


<div class="alert alert-block alert-danger" markdown="1">

You cannot use `for`-loops in Homework
</div>




<br/><br/>
<hr style="border: 5px solid #8a8c8c;" />
<hr style="border: 1px solid #ffcd00;" />

##  Hypothesis Testing

In this section, you'll practice using terms and structure of hypothesis testing.

The first step is always to define what you're looking at, create your hypotheses, and set a level of significance (i.e. a p-value cutoff). Once you've done that, you can find a p-value.

If all of these words are foreign, look at the Lecture 9 notebook and the readings, and don't forget to think about the real-world meaning of these terms!  The following example describes a real-world scenario, which should help keep it easy to interpret.

### Question 1 – Baking Sale 🧁

At MTU, students are looking to buy treats at a bake sale in the Library. There is a pop-up bake stand selling cookies and cupcakes to students. Last Saturday, this stand sold 250 cookies to MTU students. After eating the cookies, 15 students complained that their cookies were burnt, leaving a bitter taste in their mouths. In response to the student dissatisfaction, the stand claims that 96% of their cookies are baked perfectly without any burning. You think this seems unlikely and decide to investigate.

First, select a significance level for your investigation. You don't need to turn this in anywhere. Then, answer the following questions.

<br> 

--- 

#### Question 1a 


Returns your answer(s) to the following question **as a list** in variable `q1a_result`.

What are reasonable choices for the **null hypothesis** for your investigation? Select all that apply.
1. The stand sells cookies that are approximately 4% burnt. 
2. The stand sells cookies that are 96% perfectly baked.
3. The stand sells cookies that are less than 96% perfectly baked. 
4. The stand sells cookies that are at least 4% burnt.


In [None]:
q1a_result = ...

In [None]:
grader.check("q1a")

<br> 

---

#### Question 1b

<br>

#### `cookies_p_value`

Complete the implementation of the function `cookies_p_value`, which takes in an integer `N` and returns the estimated p-value of your investigation upon simulating the null hypothesis `N` times. (The p-value is an estimate of the true theoretical p-value of your test since it relies on simulation.)

***Note***: When thinking about the null distribution for this problem.  It is most similar to the "Coin Flipping" example in the review of hypothesis testing notebook.  Except in this case, you do not expect the probability to be a fair coin, 0.5, but rather the probability of burnt / not-burnt under the null hypothesis.  

***Note***: Plot the null distribution and your observed statistic to check your work. 

In [None]:
def cookies_p_value(N): 
    # Input: int N 
    # Returns estimated p-value upon simulating the null hypothesis N times
    
    ...
    return None 
cookies_p_value(1000)

In [None]:
grader.check("q1b")

Now that we've gotten our feet wet with hypothesis testing, let's take a closer look at how to choose null and alternative hypotheses and test statistics.

### Question 2 – Tires 🚗

A tire manufacturer, TritonTire, claims that their tires are so good, they will bring a Toyota Highlander from 60 MPH to a complete stop in under 106 feet, 95% percent of the time.

Now, you own a Toyota Highlander equipped with TritonTire tires, and you decide to test this claim. You take your car to an empty Vons parking lot, speed up to exactly 60 MPH, hit the brakes, and measure the stopping distance. As illegal as it is, you repeat this process 50 times and find that **you stopped in under 106 feet only 47 of the 50 times**.

Livid, you call TritonTire and say that their claim is false. They say, no, that you were just unlucky: your experiment is consistent with their claim. But they didn't realize that they are dealing with a *data scientist*.

To settle the matter, you decide to unleash the power of the hypothesis test. The following three subparts ask you to answer a total of four select-all multiple choice questions.



<br>

---

#### Question 2a: `car_null_hypothesis` and `car_alt_hypothesis`

You will set up a hypothesis test in order to test your suspicion that the tires are are actually worse than claimed. Which of the following are valid null and alternative hypotheses for this hypothesis test?

1. The tires will stop your car in under 106 feet exactly 95% of the time.
2. The tires will stop your car in under 106 feet less than 95% of the time.
3. The tires will stop your car in under 106 feet greater than 95% of the time.
4. The tires will stop your car in more than 106 feet exactly 5% of the time.
5. The tires will stop your car in more than 106 feet less than 5% of the time.
6. The tires will stop your car in more than 106 feet greater than 5% of the time.

Set the variable, `car_null_hypothesis`, to a list of integers, corresponding to the the valid null hypotheses above.

Set the variable, `car_alt_hypothesis`, to a list of integers, corresponding to the valid alternative hypotheses above given your observation.



In [None]:
car_null_hypothesis = ...
car_alt_hypothesis = ...

In [None]:
grader.check("q2a")

<br>

--- 

#### Question 4.2: `car_test_stat`

Which of the following are valid test statistics for our question?

1. The number of times the car stopped in under 106 feet in 50 attempts.
2. The average number of feet the car took to come to a complete stop in 50 attempts.
3. The number of attempts it took before the car stopped in under 95 feet.
4. The proportion of attempts in which the car stopped in under 106 feet in 50 attempts.

Set the variable, `car_test_stat`, to a list of integers, corresponding to the valid test statistics above.



In [None]:
car_test_stat = ...

In [None]:
grader.check("q2b")

<br>

---

#### Question 4.3: `car_p_value`

The p-value is the probability, under the assumption the null hypothesis is true, of observing a test statistic **equal to our observed statistic, or more extreme in the direction of the alternative hypothesis**.

Why don't we just look at the probability of observing a test statistic equal to our observed statistic? That is, why is the "more extreme in the direction of the alternative hypothesis" part necessary?

1. Because our observed test statistic isn't extreme.
2. Because our null hypothesis isn't suggesting equality.
3. Because the probability of finding our observed test statistic equals the probability of finding something more extreme.
4. Because if we run more and more trials, the probability of observing any particular test statistic gets closer and closer to zero.

Set the variable `car_p_value`, to the correct reason as an **integer** (not a list).

In [None]:
car_p_value = ...

In [None]:
grader.check("q2c")

<br> 

### Question 3 – Superheroes 🦸

In the previous two questions, we ran hypothesis tests that didn't require us to look at stored data. In this next question, we'll return to the `heroes` DataFrame from Lab 5, which is read in from the file `data/superheroes.csv`.

Our goal in this section will be to answer the question:

> Are there significantly **more** "good" blond-haired, blue-eyed characters than the general pool of characters?

To answer this question, we will conduct a hypothesis test. You choose the following null hypothesis:

> The proportion of "good" characters among blond-haired, blue-eyed characters is equal to the proportion of "good" characters in the sample population from the current `heroes` DataFrame.

and alternative hypothesis: 

> the distribution of "good" characters among blond-haired, blue-eyed characters is greater than the proportion of "good" characters in the overall population.

To proceed with the hypothesis test, we will need to determine the test statistics for our test.



<br>

--- 

#### Question 3a `superheroes_test_stat`

Which of the following are valid test statistics for our question?

1. The difference in proportions for "good" characters among blond-haired, blue-eyed characters and "good" characters in the overall population. 
2. The number of "good" characters that are blond-haired, blue-eyed.
3. The proportion of blond-haired, blue-eyed characters among all "good" characters.
4. The absolute difference in proportions for "good" characters among blond-haired, blue-eyed characters and "good" characters in the overall population.

Assign the variable `superheroes_test_stat`, to a list of integers, corresponding to all the valid test statistics above.



In [None]:
superheroes_test_stat = ...

In [None]:
grader.check("q3a")

<br> 

---

#### Question 3b`bhbe_col`

Regardless of your choice for the above question, we will use the test statistic stated below to complete the implementations of the following functions:

> The proportion of "good" characters among blond-haired, blue-eyed characters.

To start, complete the implementation of the function `bhbe_col`, which takes in a DataFrame like `heroes` and returns a Boolean Series that contains `True` for characters that have **both** blond hair and blue eyes, and `False` for all other characters. 

***Note***: If a character's hair color contains the word `'blond'`, uppercase or lowercase, we consider their hair to be blond for the purposes of this question. Similarly, if a character's eye color contains the word `'blue'`, uppercase or lowercase, we consider their eye color to be blue for the purposes of this question.

Fix a significance level (i.e. p-value cutoff) of 1%.



In [None]:
def bhbe_col(df): 
    # Input a DataFrame like "heroes"
    # Returns a Boolean Series
    #  True for characters that have both blond hair and blue eyes 
    #  False for all other characters

    return None

superheroes_fp = Path('data') / 'superheroes.csv'
heroes = pd.read_csv(superheroes_fp, index_col=0)
bhbe_out = bhbe_col(heroes)

In [None]:
grader.check("q3b")

<br>

---

#### Question 3c `superheroes_observed_stat`

Complete the implementation of the function `superheroes_observed_stat`, which takes in a DataFrame like `heroes` and returns the observed test statistic.



In [None]:
def superheroes_observed_stat(df): 
    # Input a DataFrame like "heroes"
    # Returns the observed test statistic

    return None

obs_stat_out = superheroes_observed_stat(heroes)
obs_stat_out

In [None]:
grader.check("q3c")

<br>

---

#### Question 3d `simulate_bhbe_null` 
Complete the implementation of the function `simulate_bhbe_null`, which takes in a DataFrame like `heroes` and a positive integer `N` and returns an array of length `N`, where each element is a simulated test statistic according to the null hypothesis.

***Hint***: Like in `superheroes_observed_stat`, you'll need to use both `bhbe_col` and information in the `heroes` DataFrame to complete your simulation. Remember that you cannot use `for`-loops in this question.



In [None]:
def simulate_bhbe_null(df, N): 
    # Input a DataFrame like "heroes" and an integer N
    # Returns array of length N with each element a simulated test statistic

    return None

simulate_bhbe_out = simulate_bhbe_null(heroes, 10)
simulate_bhbe_out

In [None]:
grader.check("q3d")

<br>

--- 

#### Question 3e `superheroes_p_value` 
Complete the implementation of the function `superheroes_p_value`, which takes in DataFrame like `heroes` and returns a list where:
* The first element is the p-value for the hypothesis test, using 10,000 simulations.
* The second element is `'Reject'` if you reject the null hypothesis and `'Fail to reject'` if you fail to reject the null hypothesis, at the 1% significance level.

In [None]:
def superheroes_p_value(df): 
    # Input a DataFrame like "heroes" 
    # Returns list where the first element is the p-value using 100000 sims 
    #  the 2nd element is a string to 'Reject' or 'Fail to reject' the null hyp

    return None 
    

pval_out = superheroes_p_value(heroes)
pval_out

In [None]:
grader.check("q3e")

## Congratulations! You're done HW 5!

### Submission Instructions

Below, you will see a cell. Running this cell will automatically generate a zip file with your autograded answers.  If you run into any issues when running this cell, feel free to check the [Debugging Guide](https://mtu.instructure.com/courses/1527249/pages/debugging-guide).

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export()