## Experiment Design

### 1. Metric Choice

**Invariant metric**
> num of unique cookies to view course overview page <br> num of unqiue cookies to click "start free trail" button <br> ctr

**Hypothesis**
1. H0: no effect on the fraction of students who left free trail <br> H1: decrease 
2. H0: no effect on the fraction of students who pay and complete the course  <br> H1: have effect 

**Evaluation metrics**
> **Gross conversion**: num user ids to complete checkout and enrolled in the free trail divided by num of unique cookies to click the start free trail button (for hypithesis num 1) <br>**Net Conversion**: num user ids to remain enrolled and make at least one payment divided by the num of unique cookies to click the start free trail button (for hypothesis num 2)<br> **Retention**: num user ids to remain enrolled divided by num of user ids to complete checkout. It's a good metric to look at and can expect for an increase as the num students finished checkout shoube be decreased. 

### 2. Measuring Variability

In [37]:
d = {"col_name": ["cookies", "click_cookies", "user_ids", "ctr", "gc", "rr", "nc"], 
     "baseline": [40000, 3200, 660, 0.08, 0.20625, 0.53, 0.109313],
     "dmin": [3000, 240, -50, 0.01, -0.01, 0.01, 0.0075]}

In [38]:
import pandas as pd
b_df = pd.DataFrame(d, index = d['col_name']).drop(columns = "col_name")
b_df

Unnamed: 0,baseline,dmin
cookies,40000.0,3000.0
click_cookies,3200.0,240.0
user_ids,660.0,-50.0
ctr,0.08,0.01
gc,0.20625,-0.01
rr,0.53,0.01
nc,0.109313,0.0075


The unit of analysis is the same as the unit of diversion for all three evaluation metrics, thus analytical sd is good enough 

In [39]:
import numpy as np
b_df['adj_baseline'] = np.nan
adj_factor = b_df.loc['cookies', 'baseline']/5000
for i in ['cookies', 'click_cookies', 'user_ids']:
    b_df.loc[i, 'adj_baseline'] = b_df.loc[i, 'baseline']/adj_factor

<br> Given a 5000 sample size, we can assume distributions of the sample proportions (evaluation metrics) follow normal distributions

In [51]:
b_df['se'] = np.nan
b_df.loc['gc', 'se'] = np.sqrt(b_df.loc['gc', 'baseline']*(1-b_df.loc['gc', 'baseline'])/400)
b_df.loc['nc', 'se'] = np.sqrt(b_df.loc['nc', 'baseline']*(1-b_df.loc['nc', 'baseline'])/400)
b_df.loc['rr', 'se'] = np.sqrt(b_df.loc['rr', 'baseline']*(1-b_df.loc['rr', 'baseline'])/82.5)
b_df

Unnamed: 0,baseline,dmin,adj_baseline,se
cookies,40000.0,3000.0,5000.0,
click_cookies,3200.0,240.0,400.0,
user_ids,660.0,-50.0,82.5,
ctr,0.08,0.01,,
gc,0.20625,-0.01,,0.020231
rr,0.53,0.01,,0.054949
nc,0.109313,0.0075,,0.015602


### 3. How many page views?

use alpha = 0.05 and beta = 0.2 and online calculator https://www.evanmiller.org/ab-testing/sample-size.html

In [58]:
b_df['sample_size'] = np.nan
b_df.loc['gc', 'sample_size'] = 28538*2
b_df.loc['rr', 'sample_size'] = 39115*2
b_df.loc['nc', 'sample_size'] = 27411*2
b_df

Unnamed: 0,baseline,dmin,adj_baseline,se,sample_size
cookies,40000.0,3000.0,5000.0,,
click_cookies,3200.0,240.0,400.0,,
user_ids,660.0,-50.0,82.5,,
ctr,0.08,0.01,,,
gc,0.20625,-0.01,,0.020231,57076.0
rr,0.53,0.01,,0.054949,78230.0
nc,0.109313,0.0075,,0.015602,54822.0


In [59]:
b_df['page_views'] = np.nan
b_df.loc['gc', 'page_views'] = b_df.loc['gc', 'sample_size']*(5000/400)
b_df.loc['rr', 'page_views'] = b_df.loc['rr', 'sample_size']*(5000/82.5)
b_df.loc['nc', 'page_views'] = b_df.loc['nc', 'sample_size']*(5000/400)
b_df

Unnamed: 0,baseline,dmin,adj_baseline,se,sample_size,page_views
cookies,40000.0,3000.0,5000.0,,,
click_cookies,3200.0,240.0,400.0,,,
user_ids,660.0,-50.0,82.5,,,
ctr,0.08,0.01,,,,
gc,0.20625,-0.01,,0.020231,57076.0,713450.0
rr,0.53,0.01,,0.054949,78230.0,4741212.0
nc,0.109313,0.0075,,0.015602,54822.0,685275.0


With 100% Udacity traffic and evenly divert into experiment and control group, will need 118 days. This is unresonably long, thus will not consider the test for retention rate.

In [61]:
b_df['days'] = b_df.page_views/40000
b_df

Unnamed: 0,baseline,dmin,adj_baseline,se,sample_size,page_views,days
cookies,40000.0,3000.0,5000.0,,,,
click_cookies,3200.0,240.0,400.0,,,,
user_ids,660.0,-50.0,82.5,,,,
ctr,0.08,0.01,,,,,
gc,0.20625,-0.01,,0.020231,57076.0,713450.0,17.83625
rr,0.53,0.01,,0.054949,78230.0,4741212.0,118.530303
nc,0.109313,0.0075,,0.015602,54822.0,685275.0,17.131875


### 4. Analysis

#### 4.1 Sanity Check

In [68]:
con = pd.read_csv("../ab-testing/Final Project Results - Control.csv") 
exp = pd.read_csv("../ab-testing/Final Project Results - Experiment.csv") 

In [104]:
res = {
    "control": pd.Series([con.Pageviews.sum(), con.Clicks.sum(), con.Enrollments.sum(), con.Payments.sum()], index = ['Pageviews', 'Clicks', 'Enrollments', 'Payments']),
    "exp":pd.Series([exp.Pageviews.sum(), exp.Clicks.sum(), exp.Enrollments.sum(), exp.Payments.sum()], index = ['Pageviews', 'Clicks', 'Enrollments', 'Payments'])  
}

In [123]:
res_df = pd.DataFrame(res)
res_df

Unnamed: 0,control,exp
Pageviews,345543.0,344660.0
Clicks,28378.0,28325.0
Enrollments,3785.0,3423.0
Payments,2033.0,1945.0


In [124]:
res_df['diff'] = res_df.exp-res_df.control
res_df['total'] = res_df.exp+res_df.control
res_df['exp_ratio'] = res_df.exp/res_df.total

In [125]:
res_df

Unnamed: 0,control,exp,diff,total,exp_ratio
Pageviews,345543.0,344660.0,-883.0,690203.0,0.49936
Clicks,28378.0,28325.0,-53.0,56703.0,0.499533
Enrollments,3785.0,3423.0,-362.0,7208.0,0.474889
Payments,2033.0,1945.0,-88.0,3978.0,0.488939


Focus on pageviews and clicks as invariant metrics, both have around 50% of cookies in either control or experiment group. To determine if happen by chance, the prob follows a binominal dist with 50% prob in either group. 

In [141]:
res_df['se'] = np.sqrt(0.5 * (1-0.5) / res_df.total)
res_df['me'] = res_df.se * 1.96
res_df['ci_lower'] = 0.5 - res_df.me
res_df['ci_upper'] = 0.5 + res_df.me
res_df['pass'] = res_df.apply(lambda x: x.exp_ratio>x.ci_lower and x.exp_ratio<x.ci_upper, axis = 1)

In [142]:
res_df

Unnamed: 0,control,exp,diff,total,exp_ratio,se,me,ci_lower,ci_upper,pass
Pageviews,345543.0,344660.0,-883.0,690203.0,0.49936,0.000602,0.00118,0.49882,0.50118,True
Clicks,28378.0,28325.0,-53.0,56703.0,0.499533,0.0021,0.004116,0.495884,0.504116,True
Enrollments,3785.0,3423.0,-362.0,7208.0,0.474889,0.005889,0.011543,0.488457,0.511543,False
Payments,2033.0,1945.0,-88.0,3978.0,0.488939,0.007928,0.015538,0.484462,0.515538,True


both pageviews and clicks pass the sanity check

Look at CTR, comapring two sample prob

In [185]:
ctr_con = res_df.loc['Clicks', 'control'] / res_df.loc['Pageviews', 'control']
ctr_exp = res_df.loc['Clicks', 'exp'] / res_df.loc['Pageviews', 'exp']
ctr_diff = ctr_exp - ctr_con

In [165]:
ctr_pooled = (res_df.loc['Clicks', 'control']+res_df.loc['Clicks', 'exp']) / (res_df.loc['Pageviews', 'control'] + res_df.loc['Pageviews', 'exp'])

In [196]:
ctr_se_pooled = np.sqrt(ctr_pooled*(1-ctr_pooled)*((1/res_df.loc['Pageviews', 'control'])+(1/res_df.loc['Pageviews', 'exp'])))
ctr_me = ctr_se_pooled *1.96
ctr_ci_lower, ctr_ci_upper = -1*ctr_me, ctr_me
if ctr_diff > ctr_ci_lower and ctr_diff < ctr_ci_upper:
    print(True)
else:
    print(False)

True


In [201]:
res_df

Unnamed: 0,control,exp,diff,total,exp_ratio,se,me,ci_lower,ci_upper,pass
Pageviews,345543.0,344660.0,-883.0,690203.0,0.49936,0.000602,0.00118,0.49882,0.50118,True
Clicks,28378.0,28325.0,-53.0,56703.0,0.499533,0.0021,0.004116,0.495884,0.504116,True
Enrollments,3785.0,3423.0,-362.0,7208.0,0.474889,0.005889,0.011543,0.488457,0.511543,False
Payments,2033.0,1945.0,-88.0,3978.0,0.488939,0.007928,0.015538,0.484462,0.515538,True


Thus no significant change in CTR

#### 4.2 Effective size test

The last 14-days data need to be excluded as students haven't made an enrollment decision

In [217]:
con_test = con.dropna()
exp_test = exp.dropna()

In [238]:
sig_test_df = pd.DataFrame(columns = ["control", "exp", "diff", "dmin", "pooled_p", "pooled_se", "ci_lower", "ci_upper", "statistically_sig?", "practically_sig?"], index = ['GC', 'NC'])

In [257]:
for i, j in zip(["GC", "NC"],["Enrollments", "Payments"]):
    sig_test_df.loc[i, 'control'] = con_test[j].sum() / con_test['Clicks'].sum()
    sig_test_df.loc[i, 'exp'] = exp_test[j].sum() / exp_test['Clicks'].sum()
    sig_test_df.loc[i, 'diff'] = sig_test_df.loc[i, 'exp'] - sig_test_df.loc[i, 'control'] 
    sig_test_df.loc["GC", 'dmin'] = 0.01 
    sig_test_df.loc["NC", 'dmin'] = 0.0075 
    sig_test_df.loc[i, 'pooled_p'] =  (con_test[j].sum()+exp_test[j].sum()) / (con_test['Clicks'].sum() +exp_test['Clicks'].sum())
    sig_test_df.loc[i, 'pooled_se'] = np.sqrt(sig_test_df.loc[i, 'pooled_p'] * (1-sig_test_df.loc[i, 'pooled_p']) * (1/con_test['Clicks'].sum() + 1/exp_test['Clicks'].sum())) 
    sig_test_df.loc[i, 'ci_lower'] = sig_test_df.loc[i, 'diff'] - sig_test_df.loc[i, 'pooled_se']*1.96
    sig_test_df.loc[i, 'ci_upper'] = sig_test_df.loc[i, 'diff'] + sig_test_df.loc[i, 'pooled_se']*1.96
    sig_test_df.loc[i, 'statistically_sig?'] = True if (0<sig_test_df.loc[i, 'ci_lower'] or 0 > sig_test_df.loc[i, 'ci_upper']) else False
    sig_test_df.loc[i, 'practically_sig?'] = True if (sig_test_df.loc[i, 'dmin']<sig_test_df.loc[i, 'ci_lower'] or sig_test_df.loc[i, 'dmin'] > sig_test_df.loc[i, 'ci_upper']) else False

sig_test_df.loc['NC', 'practically_sig?'] = False # as  diff is negative, should use dmin as -0.0075


In [258]:
sig_test_df

Unnamed: 0,control,exp,diff,dmin,pooled_p,pooled_se,ci_lower,ci_upper,statistically_sig?,practically_sig?
GC,0.218875,0.19832,-0.0205549,0.01,0.208607,0.00437168,-0.0291234,-0.0119864,True,True
NC,0.117562,0.112688,-0.00487372,0.0075,0.115127,0.00343413,-0.0116046,0.00185718,False,False


#### 4.3 Sign test

In [None]:
# for GC

In [282]:
gc_con = con_test.set_index("Date").loc[:, 'Enrollments'] / con_test.set_index("Date").loc[:, 'Clicks']
gc_exp = exp_test.set_index("Date").loc[:, 'Enrollments'] / exp_test.set_index("Date").loc[:, 'Clicks']
gc_diff = gc_exp - gc_con
gc_len = len(gc_exp)
gc_negative_len = len(gc_diff[gc_diff < 0])
gc_negative_p = gc_negative_len / gc_len

In [284]:
from scipy.stats import binom_test 

In [285]:
p_value=binom_test(x=gc_negative_len, n=gc_len, p=0.5)

In [286]:
p_value

0.0025994777679443364

As p-value less than 0.05, thus statistically significant

In [287]:
# for NC
nc_con = con_test.set_index("Date").loc[:, 'Payments'] / con_test.set_index("Date").loc[:, 'Clicks']
nc_exp = exp_test.set_index("Date").loc[:, 'Payments'] / exp_test.set_index("Date").loc[:, 'Clicks']
nc_diff = nc_exp - nc_con
nc_len = len(nc_exp)
nc_negative_len = len(nc_diff[nc_diff < 0])
nc_negative_p = nc_negative_len / nc_len

In [288]:
p_value=binom_test(x=nc_negative_len, n=nc_len, p=0.5)

In [289]:
p_value

0.6776394844055175

### Recommendation: launch