<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Challenge: A/B Testing Hypothesis Tests - Solutions

_Authors: Alexander Egorenkov (DC)_

---

### Scenario

You are a data science team working for a web-based company and you are planning to roll out a new website design. One of two competing designs were presented to random samples of users, and their ultimate purchase total was recorded (if any).

Your task is to determine which of the two designs yields higher total purchases and if the result is statistically significant.

In [None]:
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd
from scipy import stats
import seaborn as sns

%matplotlib inline
np.random.seed(42)

In [None]:
## generate some data and randomize

# some people bought nothing, the others bought 
# with some distribution
data1 = [0] * 50
data1.extend(np.random.normal(14, 4, 150))
np.random.shuffle(data1)

# the second design hooked less people, 
# but those that were hooked bought more stuff
data2 = [0] * 100
data2.extend(np.random.normal(20, 5, 100))
np.random.shuffle(data2)

# make a DataFrame
df = pd.DataFrame()
df["A"] = data1
df["B"] = data2

df.head()

#### Plot out the distributions of group A and group B.

- Plot a histogram of ONLY the group A column, and ONLY the group B column.

In [None]:
# let's plot the data first
plt.hist(df["A"], bins=50, label="A", color=['darkblue']);
plt.ylabel("A counts");
plt.xlabel("Total Purchase");

In [None]:
# exercise: make the same plot for data set B


#### Make a box plot of the two groups using Seaborn:

In [None]:
df.plot.box();
plt.ylabel("Total Purchase");

#### Are our data sets (approximately) normal? Use what we learned in the previous lesson to decide:

In [None]:
# work out the exercises here


<a id="statistical-tests"></a>
### Statistical Tests

There are a few good statistical tests for A/B testing:
* [ANOVA](https://en.wikipedia.org/wiki/Analysis_of_variance)
* [Welch's t-test](https://en.wikipedia.org/wiki/Welch's_t-test)
* [Mann-Whitney test](https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test)

**Each test makes various assumptions:**
* ANOVA assumes the residuals are normally distributed and data have equal variances.
* The Welch t-test assumes normal distributions but not necessarily equal variances and more effectively accounts for small sample sizes.
* The Mann-Whitney test assumes nothing about the distributions but requires at least 20 data points in each set, producing a weaker p value.

Typically you need to choose the most appropriate test. Tests that make more assumptions are more discriminating (producing stronger p values) but can be misleading with data sets that don't satisfy the assumptions.

#### Which test is most appropriate for our data?

In [None]:
# work out the exercises here
# [it may be useful to start with df.describe()]


In statistics, **one-way analysis of variance** (abbreviated one-way **ANOVA**) is a technique used to compare the means of three or more samples (using the **F distribution**). The **ANOVA** tests the **null hypothesis** (the default position that there is no relationship) that samples in two or more groups are drawn from populations with the same mean values. Typically, however, the **one-way ANOVA** is used to test for differences among at least three groups, as the two-group case can be covered by a **t-test**. When there are only two means to compare, the **t-test** and the **F-test** are equivalent.

> **Note:** 
   - One-way ANOVA: An ANOVA hypothesis tests the difference in population means based on one characteristic or factor.
   - Two-way ANOVA: An ANOVA hypothesis tests comparisons between populations based on multiple characteristics.

#### Use the Mann-Whitney test on our data.

- Look up the function in SciPy [here](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.mannwhitneyu.html).
- Statistic: Float the Mann-Whitney U statistic — equal to min(U for x, U for y) if alternative is equal to none (deprecated; exists for backward compatibility) — and U for Y otherwise.
- P value: Float p value assuming an asymptotic normal distribution — one sided or two sided, depending on the choice of alternative.

In [None]:
u, p = stats.mannwhitneyu(df["A"], df["B"])

print(u)
print(p)

The Mann–Whitney U test (also called the Mann–Whitney–Wilcoxon (MWW), Wilcoxon rank-sum test, or Wilcoxon–Mann–Whitney test) is a nonparametric test of the null hypothesis of whether it is equally likely that a randomly selected value from one sample will be less than or greater than a randomly selected value from a second sample.

Unlike the t-test, it does not require the assumption of normal distributions. It is also nearly as efficient as the t-test on normal distributions.

<a id="interpret-your-results"></a>
### Interpret Your Results
* Is there a significant difference in the mean total purchases in the two designs?
* Which design do you recommend? Why? 
* Write two sentences explaining your results and your recommendation.