In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("pre07.ipynb")

# Lab 7: Great British Bake Off (A/B Test)

**Helpful Resources:**
- [Python Reference](http://www.cs.williams.edu/~cs104/auto/python-library-ref.html): Cheat sheet of helpful library methods.

**Readings:**
* [Ch 12.1. A/B Testing](https://inferentialthinking.com/chapters/12/1/AB_Testing.html)

Please complete this notebook by filling in the cells provided. Before you begin, execute the following cell to setup the notebook by importing some helpful libraries. Each time you start your server, you will need to execute this cell again.  For all problems that you must write explanations and sentences for, you **must** provide your answer in the designated space. **Moreover, throughout this prelab and all future ones, please be sure to not re-assign variables throughout the notebook!** For example, if you use `max_temperature` in your answer to one question, do not reassign it later on. Otherwise, you will fail tests that you thought you were passing previously!

In [None]:
# Run this cell to set up the notebook.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines make plots look nice and hide some messy Python warnings.
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
import warnings
warnings.simplefilter('ignore', FutureWarning)
warnings.simplefilter('ignore', np.VisibleDeprecationWarning)

## 1. A/B Testing (10 pts)


A/B testing is a form of hypothesis testing that allows you to make comparisons between two distributions. We may also refer to an A/B test as a permutation test.

You'll almost never be explicitly asked to perform an A/B test. Make sure you can identify situations where the test is appropriate and know how to correctly implement each step. Oftentimes, we use an A/B test to determine whether or not two samples came from the same underlying distribution.

#### Part 1.1 (5 pts)


 The following statements are the steps of an A/B hypothesis test presented in a *random order*:

1. Choose a test statistic (typically the difference in means between two categories)

2. Shuffle the labels of the original sample, find your simulated test statistic, and repeat many times

3. Find the value of the observed test statistic

4. Calculate the p-value based off your observed and simulated test statistics

5. Define a null and alternate model

6. Use the p-value and p-value cutoff to draw a conclusion about the null hypothesis

Assign `ab_test_order` to an array of integers that contains the correct order of an A/B test, where the first item of the array is the first step of an A/B test and the last item of the array is the last step of an A/B test.



In [None]:
ab_test_order = ...

In [None]:
grader.check("q1.1")

#### Part 1.2 (5 pts)


Which of the following two statements is correct.  Assign your answer to `null_hyp_answer` below:

1. If the null hypothesis of an A/B test is correct, the order of labels affects the differences in means between each group.
2. If the null hypothesis of an A/B test is correct, the order of labels does not affect the differences in means between each group.

In [None]:
null_hyp_answer = ...

In [None]:
grader.check("q1.2")

## 2. The Great British Bake Off (40 pts)


>"The Great British Bake Off (often abbreviated to Bake Off or GBBO) is a British television baking competition, produced by Love Productions, in which a group of amateur bakers compete against each other in a series of rounds, attempting to impress a group of judges with their baking skills" [Wikipedia](https://en.wikipedia.org/wiki/The_Great_British_Bake_Off)

For every week of the competition, the judges assign one contestant the title "Star Baker". Ultimately, one winner is crowned every season. Using this information, we would like to investigate how winning Star Baker awards affects the odds of winning a season of the show.   Answering that question requires more than just comparing star baker award rates for season winners and losers because, without a randomized controlled experiment, we may be misled by confounding factors or reverse causation.  This leads us to...

### Running an Experiment

We are going to run the following hypothesis test to determine the association between winning and number of Star Baker awards. The population we are examining is every contestant from seasons 2 through 11 of GBBO. We are going to use the following null and alternative hypotheses:

**Null hypothesis:** The distribution of Star Baker awards between contestants who won their season and contestants who did not win their season is the same.

**Alternative hypothesis:** Contestants who win their season of the show will win more Star Baker awards on average.

Our alternative hypothesis is related to our suspicion that contestants who win more Star Baker awards are more skilled, so they are more likely to win the season.

The `bakers` table below describes the number of star baker awards each contest won and whether or not they won their season (`1` if they won, `0` if they did not win). The data was manually aggregated from Wikipedia for seasons 2-11 of the show. We randomized the order of rows as to not spoil the outcome of the show.

In [None]:
bakers = Table.read_table("star_bakers.csv")
bakers.show(3)

#### Part 2.1 (5 pts)


 Create a new table called `means` that contains the mean number of star baker awards for bakers who did not win (`won==0`) and bakers that did win (`won==1`). The table should have the column names `won` and `star baker awards mean`.


In [None]:
means = ...
means

In [None]:
grader.check("q2.1")

#### Part 2.2 (5 pts)


Using the original `bakers` table, visualize the distribution of Star Baker awards for winners and non-winners. You should use the bins we provided.

Hint: You will want to use the group argument of `tbl.hist`. In order to produce several overlayed histograms based on unique values in a given column, we can do something like `tbl.hist(..., group=<col_name>, bins=...)`!



In [None]:
useful_bins = np.arange(0, 7)
...

#### Part 2.3 (5 pts)


We want to figure out if there is a difference between the awarding of Star Baker awards between winners and non winners. We can use as the test statistic the difference of means between our two groups.  Differences close to 0 support the null hypothesis -- that there is no distinction between winners and non winners.  Large differences support the alternative hypothesis that there is a distinction.

#### Part 2.4 (5 pts)


 Set `observed_difference` to the observed test statistic using the `means` table. 



In [None]:
observed_difference = ...
observed_difference

In [None]:
grader.check("q2.3")

#### Part 2.5 (5 pts)


We'll now generalize the computation you did above.  Given a table like `bakers`, a value column `label_col`, and a group column `group_col`, write a function that calculates the appropriate test statistic.

*Hint:* Make sure that you are taking the directionality of our alternative hypothesis into account.

In [None]:
def find_test_stat(table, group_col, values_col):
    """Takes: the table, the column indicating which of two groups
    each row belongs, and the column containing the values.
    Returns: Difference of the means of the two groups."""
    means_table = table.group(group_col, np.mean)
    means = ...
    return means.item(1) - means.item(0)

find_test_stat(bakers, "won", "star baker awards")

In [None]:
grader.check("q2.5")

When we run a simulation for A/B testing, we resample by **shuffling the labels** of the original sample. If the null hypothesis is true and the star baker award distributions are the same, we expect that the difference in mean star baker awards will be not change when `"won"` labels are changed.

#### Part 2.6 (5 pts)


 Write a function `simulate_and_test_statistic` to compute one trial of our A/B test. Your function should run a simulation and return a test statistic.  The first step is to create a new version of the given `table` table in which the `labels_col` column is shuffled.  Recall that, given a table `table`, you can obtain a shuffled version of a column via `table.sample(with_replacement=False).column(labels_col)`.  The second step is to use your `find_test_stat` function to compute the test statistic.



In [None]:
def simulate_and_test_statistic(table, labels_col, values_col):
    shuffled_labels = ...
    shuffled_table = ...
    ...

simulate_and_test_statistic(bakers, "won", "star baker awards")

In [None]:
grader.check("q2.6")

#### Part 2.7 (5 pts)


 Simulate 5000 trials of our A/B test and store the test statistics in an array called `differences`.



In [None]:
# This cell might take a couple seconds to run
differences = make_array()

repetitions = 5000
for i in np.arange(repetitions):
    new_difference = ...
    differences = np.append(differences, new_difference)                               
                                                 
differences

In [None]:
grader.check("q2.7")

Run the cell below to view a histogram of your simulated test statistics plotted with your observed test statistic.

In [None]:
Table().with_column('Difference Between Group Means', differences).hist(bins=20)
plots.scatter(observed_difference, 0, color='red', s=30, zorder=2)
plots.ylim(-0.1, 1.35);

#### Part 2.8 (5 pts)


Find the p-value for your test and assign it to `empirical_p`.  In this case, small differences in the means support the null hypothesis.  So, to computer the p-value, we'll need to count the number of values in our `differences` distribution that are **larger** than our `observed_difference`.  The p-value is the number of such values in `differences` divided by `repetitions`, the size of `differences`.



In [None]:
empirical_p = np.count_nonzero(...) / ...

empirical_p

In [None]:
grader.check("q2.8")

#### Part 2.9 (5 pts)


Using a 5% P-value cutoff, draw a conclusion about the null and alternative hypotheses.  Store in the `conclusion` variable which of the following two conclusions is supported by your analysis:

1. Winning star baker awards does not increase to being the season winner.
2. Winning star baker awards increases the likelihood of being the season winner.


In [None]:
conclusion = ...

In [None]:
grader.check("q2.9")

## 3. You're Done!


**Important submission information:** Follow these steps to submit your work:
* Run the tests and verify that they pass as you expect. 
* Choose **Save Notebook** from the **File** menu.
* **Run the final cell** and click the link below to download the zip file. 

Once you have downloaded that file, go to [Gradescope](https://www.gradescope.com/) and submit the zip file to the corresponding assignment. The name of this assignment is "Prelab 7 Autograder". **Be sure your work is saved before running the last cell!**

Once you have submitted, your Gradescope assignment should show you passing all the tests you passed in your assignment notebook.

---

To double-check your work, the cell below will rerun all of the autograder tests.

In [None]:
grader.check_all()

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.export(pdf=False)