# Introduction
In this case study, you'll apply what you've learned on confidence intervals and hypothesis testing to help a company decide whether to launch two new features on their website. To do this, you'll analyze results from A/B testing, a valuable and widely practiced method in industry.

# A/B Testing
A/B tests are used to test changes on a web page by running an experiment where a **control group** sees the old version, while the **experiment group** sees the new version. A **metric** is then chosen to measure the level of engagement from users in each group. These results are then used to judge whether one version is more effective than the other. A/B testing is very much like hypothesis testing with the following hypotheses:

* **Null Hypothesis**: The new version is no better, or even worse, than the old version
* **Alternative Hypothesis**: The new version is better than the old version

If we fail to reject the null hypothesis, the results would suggest keeping the old version. If we reject the null hypothesis, the results would suggest launching the change. These tests can be used for a wide variety of changes, from large feature additions to small adjustments in color, to see what change maximizes your metric the most.

A/B testing also has its drawbacks. It can help you compare two options, but it can't tell you about an option you haven’t considered. It can also produce bias results when tested on existing users, due to factors like change aversion and novelty effect.

* **Change Aversion**: Existing users may give an unfair advantage to the old version, simply because they are unhappy with change, even if it’s ultimately for the better.
* **Novelty Effect**: Existing users may give an unfair advantage to the new version, because they’re excited or drawn to the change, even if it isn’t any better in the long run.

You'll learn more about factors like these later.



# Business Example
In this case study, you’ll analyze A/B test results for Audacity. Here's the customer funnel for typical new users on their site:

**View home page > Explore courses > View course overview page > Enroll in course > Complete course**

Audacity loses users as they go down the stages of this funnel, with only a few making it to the end. To increase student engagement, Audacity is performing A/B tests to try out changes that will hopefully increase conversion rates from one stage to the next.

We’ll analyze test results for two changes they have in mind, and then make a recommendation on whether they should launch each change.

# Experiments I
The first change Audacity wants to try is on their homepage. They hope that this new, more engaging design will increase the number of users that explore their courses, that is, move on to the second stage of the funnel.

The metric we will use is the click through rate for the Explore Courses button on the home page. **Click through rate (CTR)** is often defined as the the number of clicks divided by the number of views. Since Audacity uses cookies, we can identify unique users and make sure we don't count the same one multiple times. For this experiment, we'll define our click through rate as:

**CTR: # clicks by unique users / # views by unique users**

Now that we have our metric, let's set up our null and alternative hypotheses:

$H_0: \text{CTR}_{\text{new}} \leq \text{CTR}_{\text{old}}$

$H_1: \text{CTR}_{\text{new}} > \text{CTR}_{\text{old}}$

Our alternative hypothesis is what we want to prove to be true, in this case, that the new homepage design has a higher click through rate than the old homepage design. And the null hypothesis is what we assume to be true before analyzing data, which is that the new homepage design has a click through rate that is less than or equal to that of the old homepage design. As you’ve seen before, we can rearrange our hypotheses to look like this:

$H_0: \text{CTR}_{\text{new}} -\text{CTR}_{\text{old}} \leq 0$

$H_1: \text{CTR}_{\text{new}} -\text{CTR}_{\text{old}} > 0$

# Metric - Click Through Rate

Let's recap the steps we took to analyze the results of this A/B test.

1. We computed the **observed difference** between the metric, click through rate, for the control and experiment group.
1. We simulated the **sampling distribution** for the difference in proportions (or difference in click through rates).
1. We used this sampling distribution to simulate the **distribution under the null** hypothesis, by creating a random normal distribution centered at 0 with the same spread and size.
1. We computed the **p-value** by finding the proportion of values in the null distribution that were greater than our observed difference.
1. We used this p-value to determine the **statistical significance** of our observed difference.

# Experiment II

The second change Audacity is A/B testing is a more career focused description on a course overview page. They hope that this change may encourage more users to enroll and complete this course. In this experiment, we’re going to analyze the following metrics:

1. **Enrollment Rate**: Click through rate for the Enroll button the course overview page
1. **Average Reading Duration**: Average number of seconds spent on the course overview page
1. **Average Classroom Time**: Average number of days spent in the classroom for students enrolled in the course
1. **Completion Rate**: Course completion rate for students enrolled in the course

First, let's determine if the difference observed for each metric is statistically significant individually.

# Metric - Average Reading Duration
Again, let's recap the steps we took to analyze the results of this A/B test.

1. We computed the **observed difference** between the metric, average reading duration, for the control and experiment group.
1. We simulated the **sampling distribution** for the difference in means (or average reading durations).
1. We used this sampling distribution to simulate the **distribution under the null** hypothesis, by creating a random normal distribution centered at 0 with the same spread and size.
1. We computed the **p-value** by finding the proportion of values in the null distribution that were greater than our observed difference.
1. We used this p-value to determine the **statistical significance** of our observed difference.

# Analyzing Multiple Metrics
The more metrics you evaluate, the more likely you are to observe significant differences just by chance - similar to what you saw in previous lessons with multiple tests. Luckily, this [multiple comparisons](https://en.wikipedia.org/wiki/Multiple_comparisons_problem) problem can be handled in several ways.

# Drawing Conclusion
Since the Bonferroni method is too conservative when we expect correlation among metrics, we can better approach this problem with more sophisticated methods, such as the [closed testing procedure](http://en.wikipedia.org/wiki/Closed_testing_procedure), [Boole-Bonferroni bound](http://en.wikipedia.org/wiki/Bonferroni_bound), and the [Holm-Bonferroni method](http://en.wikipedia.org/wiki/Holm%E2%80%93Bonferroni_method). These are less conservative and take this correlation into account.

If you do choose to use a less conservative method, just make sure the assumptions of that method are truly met in your situation, and that you're not just trying to [cheat on a p-value](http://freakonometrics.hypotheses.org/19817). Choosing a poorly suited test just to get significant results will only lead to misguided decisions that harm your company's performance in the long run.