In [None]:
# Initialize Otter
import otter
grader = otter.Notebook("lab07.ipynb")

# Lab 07: Helicopter Drop

Welcome to Lab 07.

In [None]:
# Run this cell to set up the notebook, but please don't change it.

# These lines import the Numpy and Datascience modules.
import numpy as np
from datascience import *

# These lines do some fancy plotting
import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')

## 1. A/B Testing

A/B testing is a form of hypothesis testing that allows you to make comparisons between two numerical distributions. It is also called a permutation test.

You'll almost never be explicitly asked to perform an A/B test by name, much like other forms of hypothesis testing. Make sure you can identify situations where this type of test is appropriate and know how to correctly implement each step. Remember, the goal of an A/B test is to determine if two samples came from the same underlying distribution, or from different distributions.

### Question 1.1.

The following statements are the unordered steps of an A/B hypothesis test:

1. Choose a test statistic (typically the difference in means between two categories)

2. Shuffle the labels of the original sample, find your simulated test statistic, and repeat many times

3. Find the value of the observed test statistic

4. Calculate the p-value based off your observed and simulated test statistics

5. Define a null and alternate model

6. Use the p-value and p-value cutoff to draw a conclusion about the null hypothesis

Make an array called `ab_test_order` that contains the correct order of an A/B test, where the first item of the array is the first step of an A/B test and the last item of the array is the last step of an A/B test


In [None]:
ab_test_order = ...

In [None]:
grader.check("q1_1")

<!-- BEGIN QUESTION -->

### Question 1.2.

If the null hypothesis of an A/B test is correct what value would be expected, on average, when computing the difference of the mean between the two categories?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### Question 1.3

Why do we shuffle labels in an A/B test? 


_Type your answer here, replacing this text._

<!-- END QUESTION -->

## 2. Paper Helicopters

The Paper Helicopter Experiment (https://www.paperhelicopterexperiment.com/) provides different templates for constructing paper helicopters that can be dropped. The designs can be modified to create helicopters with different dimensions for the wings, body, weighting, and more. In this lab you'll investigate whether the length of the rotor and the amount of weight attached to helicopter results in a different in the time it takes to fall from a 3rd floor stairwell. To conduct your analysis, you'll use an A/B test.

### The Data
This data was collected by a group of NCSSM-Online students during online weekend held October 2nd, 2021. Five groups of students dropped 143 paper helicopters in total. There were 4 unique configurations that were assigned to the groups, one per group.

1. The Default: Full length (long) rotors, unweighted body
2. The Short: Short rotors, unweighted body
3. The Heavy: Long rotors, weighted body
4. The Short Heavy: Short rotors, weighted body

The following features were recorded for each helicopter dropped:
* `Rotor`: Long or Short (string). Indicates if the helicopter had full length or shortened rotors.
* `Weight`: Weighted or Unweighted (string). Indicates if the helicopter had a weighted (3 paperclips) or unweighted body.
* `Team`: 1, 2, 3, 4, or 5 (int). Indicates which team
* `Location`: Side A or Side B (string). Indicates which side of the building the helicopters were dropped from.
* `Obstruction`: True or False (Boolean). Indicates if the helicopter hit the stairwell or other object on the way down.
* `Time`: (Float). Time, measured in seconds, it took for the helicopter to hit the ground after being released.

Run the following cell to load the data in as the table named `helicopters_raw_data`.

In [None]:
helicopters_raw_data = Table.read_table('helicopters.csv')
helicopters_raw_data

You'll notice that due to variation in the helicopter construction, wind/weather, and randomness of the universe not every drop time was identical, even for the helicopters with the same configurations. Let's explore the data a bit to see how the drop times varied.

### Question 2.1.

Let's start by cleaning the data a little. This dataset has several measurements taken where the helicopter hit the stairwell, a nearby building, or some other obstruction that impacted the fall time. These results should be discarded since they might introduce unexpected variation in the fall times that would impact the ability to draw a causal inference from the analysis that follows.

Create a new table called `helicopters` that only contains the rows of `helicopters_raw_data` that correspond to a helicopter that did **not** hit an obstruction during the fall.

In [None]:
helicopters = ...
helicopters

In [None]:
grader.check("q2_1")

<!-- BEGIN QUESTION -->

### Question 2.2
Suppose you're interested in determining how the rotor length might impact fall time. Start by creating a histogram that displays the distribution of times for helicopters with long and short rotors, overlaid. Use the provided bins stored to `my_bins` and the `group` argument to create the necessary histogram.

In [None]:
my_bins = np.arange(2, 9.5, 0.5)
...

<!-- END QUESTION -->

You can see it's fairly easy to compare these two distributions now that they're on the same scale and set of axes.

## 3. The Rotor Question

You might now be wondering if there is a difference between the fall times of helicopters with long (full length) vs. short rotors. It certainly appears that the distributions *might* be different than each other, but visual inspection can often be deceiving.  Use the set of collected data to make a more statistically rigorous determination.

### Running an experiment

We are going to run the following hypothesis test to determine either the association or causal link between the rotor length of a helicopter and the time it takes to fall to the ground. We are going to use the following null and alternative hypotheses:

**Null hypothesis**: The distribution of helicopters with long rotors and short rotors is the same.

**Alternative hypothesis**: Helicopters with long rotors take more time to fall to the ground on average.

Our alternative hypothesis is related to our suspicion that helicopters with long rotors fall slower, since there is more material to cause resistance against the air as the helicopter falls to the ground.

<!-- BEGIN QUESTION -->

### Question 3.1.

Why is an A/B test appropriate for this situation? What would you use as the "A" group (the control group) and which group is the "B" group (the treatment group)?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

### Question 3.2.

We want to figure out if there is a difference between the distribution of fall times for full length and shortened rotor helicopters. Specifically, we want to test if fall times were longer for longer rotor helicopters than they were for shorter rotor helicopters. 

What should the test statistic be to test this hypothesis? What values of this statistic would support the alternative hypothesis?

_Type your answer here, replacing this text._

<!-- END QUESTION -->

### Question 3.3.

Create a Table named `time_means` that contains the average fall time for both categories of helicopters: full length rotor helicopters, and the shortened rotor helicopters. The Table should have two columns: one indicating the rotor length, and one that contains the average fall time for the corresponding group.

**Hint:** Use a combination of `.select()` and `.group()` to generate this Table.

In [None]:
time_means = ...
time_means

In [None]:
grader.check("q3_3")

### Question 3.4.

Calculate the test statistic for our observed data using the `time_means` table. Set `observed_difference` to the value you compute. 


In [None]:
observed_difference = ...
observed_difference

In [None]:
grader.check("q3_4")

### Question 3.5.

To perform an A/B test, we'll need to compute similar statistics for our many simulations we'll run. Writing a function which returns a test statistic will be a great way to save time. Write a function named `find_test_stat` which takes in the arguments `table`, `labels_col`, and `values_col` that calculates the test statistic required for A/B testing.

The `table` passed into this function will be a permutation of our original table and structured the same way. `labels_col` will be passed a string that matches the column label in `table` that contains the labels of the categories you'll be grouping by. `values_col` will be passed a string that matches the column label that contains the values that you'll be using to compute the test statistic.

When you've written this function, you must be able to pass it any table and two specified column labels and the function should compute a test statistic required for an A/B test, not just for this problem, but any problem! For example, running `find_test_stat(helicopters, "Rotor", "Time")` should return the exact same test statistic you generated in an earlier question and running `find_test_stat(helicopters, "Weight", "Time")` would compute the test statistic based on the groups of time and **"Weight"** columns.

In [None]:
def find_test_stat(table, labels_col, values_col):
    ...

find_test_stat(helicopters, "Rotor", "Time")

In [None]:
grader.check("q3_5")

### Question 3.6.

Write a function `simulate_and_test_statistic` to compute one trial of our A/B test. Your function should run a simulation and return a test statistic.

**Hint:** You can "shuffle" the labels by using `.sample(with_replacement = False)` on the entire Table, and then select the column that contains the newly shuffled labels. Then, you can either overwrite the existing labels, **or**, extend the table with a new column labeled something similar to "shuffled labels". Just make sure you pass the correct label on to the `find_test_stat` function!

Note: The autograder test here is fairly lenient, if you have an issue with the following questions, make sure to take a look at your answer to the previous question. Specifically, make sure that you are taking the directionality of our alternative hypothesis into account, meaning, check the order in which you are subtracting the average times.

In [None]:
def simulate_and_test_statistic(table, labels_col, values_col):
    ...
    
simulate_and_test_statistic(helicopters, "Rotor", "Time")

In [None]:
grader.check("q3_6")

### Question 3.7.

Use the `simulate_and_test_statistic` to simulate 5,000 trials of our A/B test and assign an array of the test statistics to `differences`.

In [None]:
# This cell might take a couple seconds to run
differences = make_array()
...
differences

In [None]:
grader.check("q3_7")

Run the cell below to view a histogram of your simulated test statistics plotted with your observed test statistic. Think about what this might imply about the p-value and if there is sufficient evidence to reject the null hypothesis.

In [None]:
Table().with_column('Difference Between Group Means', differences).hist()
plt.scatter(observed_difference, -0.01, color = 'red', s = 60, marker="^", zorder = 2);

### Question 3.8.

Compute the empirical p-value using the statistics from your simulation and the observed value of the test statistic. Assign it to `empirical_p`

In [None]:
empirical_p = ...
empirical_p

In [None]:
grader.check("q3_8")

<!-- BEGIN QUESTION -->

### Question 3.9.

Your p-value should have been very small, perhaps even 0! Since this value is less than the standard p-value cutoff of 5 percent, we would reject the null hypothesis. Is there enough to claim that the difference in rotor length **causes** the difference in fall time, or does it just show association? Explain your answer using both the design of the experiment and the result of of the A/B test.

_Type your answer here, replacing this text._

<!-- END QUESTION -->

## 4. Bonus Investigation (Optional, Not graded)

Collect your own helicopter data on your own as part of a class activity. Load the data in the cell below and perform a similar A/B test using *your* data. You should be able to use the functions you wrote as part of this lab activity to quickly perform an analysis.

In [None]:
# Your analysis could go here if you choose to complete it.


Complete a similar analysis by:

* selecting a category to group your data by (rotor length, color of paper, weight of helicopter, etc.)
* computing the observed value of the test statistic
* simulating 5,000 A/B tests to create a distribution of the test statistic under the assumptions of the null hypothesis
* compute the empirical p-value
* determine if there is a difference in the drop times between the two groups in the selected category.

# Submitting your work
You're done with this assignment! Assignments should be turned in using the following best practices:
1. Save your notebook.
2. Restart the kernel and run all cells up to this one.
3. Run the cell below with the code `grader.export(...)`. This will re-run all the tests. Make sure they are passing as you expect them to.
4. Download the file named `lab07_<date-time-stamp>.zip`, found in the explorer pane on the left side of the screen. **Note**: Clicking on the link in this notebook may result in an error, it's best to download from the file explorer panel.
5. Upload `lab07_<date-time-stamp>.zip` to the corresponding assignment on Canvas.

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a zip file for you to submit.

In [None]:
grader.export(pdf=False, force_save=True)