# <img style="float: left; padding-right: 10px; width: 45px" src="https://raw.githubusercontent.com/Harvard-IACS/2018-CS109A/master/content/styles/iacs.png"> CS109A Introduction to Data Science: 

## Homework 8: AB Testing


**Harvard University**<br/>
**Fall 2019**<br/>
**Instructors**: Pavlos Protopapas, Kevin Rader and Chris Tanner

<hr style="height:2pt">



In [43]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

import statsmodels as sm
import scipy as sp
import math

### Problem 1: Amazon expenditures

Amazon is considering updating their app.  They decide to perform an AB test to determine whether the new version of their app will lead to more expenditures per customer.  Unbeknownst to them, each customer is randomly assigned to either the `old` or `new` version of the app and will interact with the app for an entire month in this selected format.  Total expenditure is measured at the end of the month.

The data set ``amazon.csv`` contains a subset of data collected for this AB test.  The variables measured are:

- **expenditure**: the amount spent by the customer that month, in dollars
- **treatment**: either 'new' or 'old' representing the version of the app
- **prime**: a binary indiciator for whether or not the customer is an Amaozn Prime member.

**Note**: do not do any transformations on the data for this problem.

**1.1** Perform a formal hypothesis test to determine whether the new app has a different average monthly expenditure per customer than the old version of the app.  Report the test statistic and the conclusion from the test in the separate quiz for this assignment.  

**1.2** Perform a formal hypothesis test to determine whether the new app has a different average monthly expenditure per customer than the old version of the app within the 
Prime customers subgroup.  Report the test statistic in the separate quiz for this assignment. 

**1.3** Perform a formal hypothesis test to determine whether the new app has a different average monthly expenditure per customer than the old version of the app within the 
non-Prime customers subgroup.  Report the test statistic in the separate quiz for this assignment. 

**1.4** Provide one glaring way to improve this study based on the results seen above.  This explanation only needs 1 or 2 sentences of justification.

---

### Problem 2: Obama 2008 Data (click-through rate)

In this problem, we will step you through analyzing the data we saw in lecture 23 regarding the Obama 2008 data set.

**2.1** Perform a formal hypothesis test to determine whether there is a difference in sign-up rates across the 4 button choices.  Report the test statistic and the conclusion from the test in the separate quiz for this assignment.  The sign-up rates were:

- `sign up`: $5851/77858 = 0.07515$ 
- `sign up now`: $5660/77151 = 0.07336$ 
- `join us now`: $5915/77644 = 0.07618$
- `learn= more`: $6927/77729 = 0.08912$

**2.2** Perform a formal hypothesis test to determine whether the ``learn more`` button has a different true sign up rate than the default ``sign up`` button that was originally used.  Report the test statistic in the separate quiz for this assignment.


In [51]:
# data
n1 = 77729
n2 = 77151
n3 = 77644
n4 = 77858

s1 = 6927
s2 = 5660
s3 = 5915
s4 = 5851

p1 = s1/n1
p2 = s2/n2
p3 = s3/n3
p4 = s4/n4

In [52]:
# question 2.1
# chi square test

array = [[ s1, s2, s3, s4 ],[(n1 - s1), (n2 - s2), (n3 - s3), (n4 - s4)]]
chi2, p, ddof, expect = scipy.stats.chi2_contingency(array)
message = "Test Statistic: {}\np-value: {}\nDegrees of Freedom: {}\n"
print(message.format(chi2, p, ddof))
print(expect)

# goodness of fit

Test Statistic: 167.20012877877562
p-value: 5.117834726303442e-36
Degrees of Freedom: 3

[[ 6098.72459421  6053.37391666  6092.05537692  6108.84611221]
 [71630.27540579 71097.62608334 71551.94462308 71749.15388779]]


In [58]:
# question 2.2
# 2 sample z test for proportions

p1 = s1/n1
p4 = s4/n4

p_total = (s1 + s4)/(n1 + n4)

z_stat = (p1-p4)/(math.sqrt(p_total*(1-p_total)*((1/n1) + (1/n4))))
print(z_stat)

pvalue = 2*(1-sp.stats.norm.cdf(z_stat))
print(pvalue)

10.033329217953517
0.0


---

### Problem 3: Multiple Comparisons Simulation

As mentioned in class, a sequential randomized design is prone to the issue of multiple comparisons.  This problem steps you through determining this and correcting it 2 different ways.  Imagine the following AB test with a binary outcome and 2 treatment groups: after a burn-in period of 1200 subjects (600 in each arm), the appropriate $z$-test for proportions is performed to compare the two groups after every 200 subjects until up to 3000 total subjects are enrolled (so at $n=1200,1400,...,3000$ with exactly half in each group).  If either treatment is deemed statistically significantly better than the other (at a two-sided $\alpha=0.05$ level), than the study is stopped. 

Assume that the null hypothesis is true (there is no actual effect of treatment) and the rate of success is truly 20% in both groups (you can think of the response variable as the click-through rate and the treatment groups represent two different website layouts).

**3.1** Determine the overall Type I error under these conditions when each test is performed at the $\alpha=0.05$ level.  Perform 2,000 simulation replicates.  Report this overall Type I error rate in the separate quiz.

**3.2** Determine a single $\alpha$ such that the overall Type I error will be 0.05.  Report this adjusted $\alpha$ rate in the separate quiz.

**3.3** An alternative approach to adjusting $\alpha$ for multiple comparisons is to vary the $\alpha$ of each test as the study progresses.  One common approach to doing this is to increase the $\alpha$ by a specific amount each time a test is run.  So for example, the first test could be run at $\alpha_1=0.001$, the second at $\alpha_2=0.002$, the third at $\alpha_3=0.003$, $\dots$, the tenth $\alpha_{10}=0.010$ (these are not necessarily the values needed to fix overall Type I error rate at 0.05).  Provide an advantage to each of the two choices for adjusting $\alpha$ as described here and in **3.2**. These can be short 1-2 sentence explanations for each.



In [36]:
# question 3.1
n1_start = 600
n2_start = 600

def sim(ns1, ns2):
    for i in range(600, 3200, 200):
        ns1 = ns1 * 0.2
        ns2 = ns2 * 0.2
        add = sm.stats.proportion.proportions_ztest(ns1/i, ns2/i)

In [None]:
# question 3.2

In [None]:
# question 3.3