E-Learning Company Webpage A/B Testing - Part 2
-----------

### Key Project Steps
1. Problem statement
2. Baseline data
3. Standard error of metrics
4. Experiment sizing, duration and exposure
5. Experimental analysis
6. Recommendation and conclusion

### 1. Problem Statement

**The company's strategy pivoted from bringing in more users to their free course to maximizing user retention rate. The goal was to set clearer expectations for the students upfront, thus reducing the number of frustrated students who left the free trial. This would eventually help improve overall student experience and improve coaches' capacity to support students who are likely to complete the course. An experimental website design change was implemented to achieve this. The goal here is to determine whether or not to launch this experiment**

In [2]:
# import relevant libraries
import numpy as np
import pandas as pd

### 2. Baseline Data

In [3]:
df_basevals = pd.read_csv('data/part2_baseline_values.csv', header = None, names = ['metric','baseline_val'])
print(df_basevals.head(10))

                                              metric  baseline_val
0  Unique cookies to view course overview page pe...  40000.000000
1  Unique cookies to click "Start free trial" per...   3200.000000
2                               Enrollments per day:    660.000000
3   Click-through-probability on "Start free trial":      0.080000
4             Probability of enrolling, given click:      0.206250
5              Probability of payment, given enroll:      0.530000
6                Probability of payment, given click      0.109313


### 3. Standard Error of Metrics

**The metrics chosen were 1) Gross conversion 2) Retention and 3) Net conversion**

In [4]:
total_no_of_cookies = df_basevals[df_basevals['metric']=='Unique cookies to view course overview page per day:'].baseline_val.item()
no_of_free_trial_clicks = df_basevals[df_basevals['metric']=='Unique cookies to click "Start free trial" per day:'].baseline_val.item()
no_of_enrollments = df_basevals[df_basevals['metric']=='Enrollments per day:'].baseline_val.item()
no_of_payments = df_basevals[df_basevals['metric']=='Probability of payment, given click'].baseline_val.item() * no_of_free_trial_clicks

p_gross_conversion = no_of_enrollments / no_of_free_trial_clicks
p_retention = no_of_payments / no_of_enrollments
p_net_conversion = no_of_payments / no_of_free_trial_clicks

print(f"Baseline gross conversion rate: {p_gross_conversion}")
print(f"Baseline retention rate: {p_retention}")
print(f"Baseline net conversion rate: {p_net_conversion}")

Baseline gross conversion rate: 0.20625
Baseline retention rate: 0.53
Baseline net conversion rate: 0.1093125


In [5]:
# standard error = sqrt(p(1-p)/n)
# the idea here is that samples from populations with percent-in-favor (success rate) close to 50% are wider than those from populations with success rate
# closer to 0% or 100%
# scale to sample size of 5000 page views

n_gross_conversion = 5000 * (no_of_free_trial_clicks / total_no_of_cookies)
n_retention = 5000 * (no_of_enrollments / total_no_of_cookies)
n_net_conversion = 5000 * (no_of_free_trial_clicks / total_no_of_cookies)

SE_gross_conversion = round(np.sqrt(p_gross_conversion*(1-p_gross_conversion)/n_gross_conversion), 4)
SE_retention = round(np.sqrt(p_retention*(1-p_retention)/n_retention), 4)
SE_net_conversion = round(np.sqrt(p_net_conversion*(1-p_net_conversion)/n_net_conversion), 4)

print(f"Std. error for gross conversion: {SE_gross_conversion}")
print(f"Std. error for retention: {SE_retention}")
print(f"Std. error for net conversion: {SE_net_conversion}")

Std. error for gross conversion: 0.0202
Std. error for retention: 0.0549
Std. error for net conversion: 0.0156


### 4. Experiment Sizing, Duration and Exposure

#### 4.1 Sizing

**Using sample size calculator from https://www.evanmiller.org/ab-testing/sample-size.html**

**Pageviews to achieve target statistical power**

**Gross Conversion**
* Baseline conversion: 20.625%
* Minimum detectable effect: 1%
* Alpha: 5%
* Beta: 20%
* Sensitivity: 80%
* Sample size: 25,835 enrollments/group
* Number of groups = 2 (experiment and control)
* Total sample size = 51,670 enrollments
* Clicks/pageview = 3200/40000 = 0.08
* Pageviews required = 51,670/0.08 = 645,875

**Retention**
* Baseline conversion: 53%
* Minimum detectable effect: 1%
* Alpha: 5%
* Beta: 20%
* Sensitivity: 80%
* Sample size: 39,115 enrollments/group
* Number of groups = 2 (experiment and control)
* Total sample size = 78,230 enrollments
* Enrollments/pageview = 660/40000 = 0.0165
* Pageviews required = 78,230/0.08 = 4,741,212

**Net Conversion**
* Baseline conversion: 10.93%
* Minimum detectable effect: 0.75%
* Alpha: 5%
* Beta: 20%
* Sensitivity: 80%
* Sample size: 27,413 enrollments/group
* Number of groups = 2 (experiment and control)
* Total sample size = 54,826 enrollments
* Clicks/pageview = 3200/40000 = 0.08
* Pageviews required = 54,826/0.08 = 6,85,325

**Pageviews required is maximum of pageviews for the different metrics. Therefore the required pageviews is 4,741,212**

#### 4.2 Duration and Exposure

* 100% diversion of traffic at 40,000 pageviews/day would require 119 days
* On eliminating retention (which has the max pageview requirement currently), the pageview requirement becomes 685,325 and there are two options:
    * 18 day experiment with 100% diversion
    * 36 day experiment with 50% diversion
    
**From a time perspective, the 18 day experiment with 100% diversion seems the most reasonable**

### 5. Experimental Analysis

In [6]:
# load experimental data
df_control = pd.read_csv('data/part2_results_control.csv')
df_experiment = pd.read_csv('data/part2_results_experiment.csv')

results = {"Control":pd.Series([df_control.Pageviews.sum(),df_control.Clicks.sum(),
                                  df_control.Enrollments.sum(),df_control.Payments.sum()],
                                  index = ["cookies","clicks","enrollments","payments"]),
           "Experiment":pd.Series([df_experiment.Pageviews.sum(),df_experiment.Clicks.sum(),
                               df_experiment.Enrollments.sum(),df_experiment.Payments.sum()],
                               index = ["cookies","clicks","enrollments","payments"])}
df_results = pd.DataFrame(results)
df_results

Unnamed: 0,Control,Experiment
cookies,345543.0,344660.0
clicks,28378.0,28325.0
enrollments,3785.0,3423.0
payments,2033.0,1945.0


#### 5.1 Sanity Checks

#### 5.1.1 Count Metrics

In [36]:
# For invariant metrics we expect equal diversion into the experiment and control group. We will test this at the 95% confidence interval.
# the invariant metrics tested here are cookies and clicks
df_results['Total']=df_results.Control + df_results.Experiment
df_results['Prob'] = 0.5
df_results['StdErr'] = np.sqrt((df_results.Prob * (1- df_results.Prob))/df_results.Total)
df_results["MargErr"] = 1.96 * df_results.StdErr
df_results["CI_lower"] = df_results.Prob - df_results.MargErr
df_results["CI_upper"] = df_results.Prob + df_results.MargErr
df_results["Obs_val"] = df_results.Experiment/df_results.Total
df_results["Pass_Sanity"] = df_results.apply(lambda x: (x.Obs_val > x.CI_lower) and (x.Obs_val < x.CI_upper),axis=1)
df_results['Diff'] = abs((df_results.Experiment - df_results.Control)/df_results.Total)
df_results

Unnamed: 0,Control,Experiment,Total,Prob,StdErr,MargErr,CI_lower,CI_upper,Obs_val,Pass_Sanity,Diff
cookies,345543.0,344660.0,690203.0,0.5,0.000602,0.00118,0.49882,0.50118,0.49936,True,0.001279
clicks,28378.0,28325.0,56703.0,0.5,0.0021,0.004116,0.495884,0.504116,0.499533,True,0.000935
enrollments,3785.0,3423.0,7208.0,0.5,0.005889,0.011543,0.488457,0.511543,0.474889,False,0.050222
payments,2033.0,1945.0,3978.0,0.5,0.007928,0.015538,0.484462,0.515538,0.488939,True,0.022122


#### 5.1.2 Other metrics

In [10]:
# function to return confidence intervals
def stats_prop(p1,p2,z_score,n1,n2):
    diff = p1 - p2
    std_err = np.sqrt((p1*(1-p1)/n1) + (p2*(1-p2)/n2))
    marg_err = z_score * std_err
    ci_lower = diff - marg_err
    ci_upper = diff + marg_err
    return std_err,marg_err,ci_lower,ci_upper

In [35]:
# the invariant metric being tested here is click through probability (clicks/cookies)
cont_p = df_results['Control']['clicks']/df_results['Control']['cookies']
exp_p = df_results['Experiment']['clicks']/df_results['Experiment']['cookies']
cont_n = df_results['Control']['cookies']
exp_n = df_results['Experiment']['cookies']

std_err,marg_err,ci_lower,ci_upper = stats_prop(cont_p, exp_p, 1.96, cont_n, exp_n)
print(f"std_err: {std_err}")
print(f"marg_err: {marg_err}")
print(f"ci_lower: {ci_lower}")
print(f"ci_upper: {ci_upper}")

std_err: 0.0006610610775037591
marg_err: 0.0012956797119073678
ci_lower: -0.0013523068034943038
ci_upper: 0.0012390526203204318


* Sanity check for cookies and clicks passes since 0.5 lies in the confidence interval. This implies that these invariant metrics are equally  distributed between the experiment and control groups.
* Sanity check for ctp passes since zero lies in the confidence interval. This implies that the ctp is not statistically different between experiment and control groups

#### 5.2 AB Testing

In [23]:
df_control_notnull = df_control[pd.isnull(df_control.Enrollments) != True]
df_experiment_notnull = df_experiment[pd.isnull(df_control.Enrollments) != True]
results_notnull = {"Control":pd.Series([df_control_notnull.Pageviews.sum(),df_control_notnull.Clicks.sum(),
                                  df_control_notnull.Enrollments.sum(),df_control_notnull.Payments.sum()],
                                  index = ["cookies","clicks","enrollments","payments"]),
           "Experiment":pd.Series([df_experiment_notnull.Pageviews.sum(),df_experiment_notnull.Clicks.sum(),
                               df_experiment_notnull.Enrollments.sum(),df_experiment_notnull.Payments.sum()],
                               index = ["cookies","clicks","enrollments","payments"])}
df_results_notnull = pd.DataFrame(results_notnull)
df_results_notnull['Total']=df_results_notnull.Control + df_results_notnull.Experiment
df_results_notnull

  


Unnamed: 0,Control,Experiment,Total
cookies,212163.0,211362.0,423525.0
clicks,17293.0,17260.0,34553.0
enrollments,3785.0,3423.0,7208.0
payments,2033.0,1945.0,3978.0


In [26]:
# experiment values
enrollments_exp = df_results_notnull.loc["enrollments"].Experiment
clicks_exp = df_results_notnull.loc["clicks"].Experiment
payments_exp = df_results_notnull.loc["payments"].Experiment

# control values
enrollments_cont = df_results_notnull.loc["enrollments"].Control
clicks_cont = df_results_notnull.loc["clicks"].Control
payments_cont = df_results_notnull.loc["payments"].Control

# metrics
GrossConversion_exp = enrollments_exp/clicks_exp
NetConversion_exp = payments_exp/clicks_exp
GrossConversion_cont = enrollments_cont/clicks_cont
NetConversion_cont = payments_cont/clicks_cont

#GrossConversion = (enrollments_exp + enrollments_cont)/(clicks_cont + clicks_exp)
#NetConversion = (payments_cont + payments_exp)/(clicks_cont + clicks_exp)

print(f"Gross conversion for control: {GrossConversion_cont}")
print(f"Net conversion for control: {NetConversion_cont}")
print(f"Gross conversion for experiment: {GrossConversion_exp}")
print(f"Net conversion for experiment: {NetConversion_exp}")

Gross conversion for control: 0.2188746891805933
Net conversion for control: 0.11756201931417337
Gross conversion for experiment: 0.19831981460023174
Net conversion for experiment: 0.1126882966396292


#### Two-sided hypothesis test to check if difference in conversion rates between the two groups is significant

In [31]:
# test for gross conversion
std_err,marg_err,ci_lower,ci_upper = stats_prop(GrossConversion_exp, GrossConversion_cont, 1.96, clicks_exp, clicks_cont)
diff = GrossConversion_exp - GrossConversion_cont
print("2 sided test results for gross conversion")
print(f"std_err: {std_err}")
print(f"marg_err: {marg_err}")
print(f"difference: {diff}")
print(f"ci_lower: {ci_lower}")
print(f"ci_upper: {ci_upper}")

2 sided test results for gross conversion
std_err: 0.004370125116166828
marg_err: 0.008565445227686982
difference: -0.020554874580361565
ci_lower: -0.029120319808048547
ci_upper: -0.011989429352674583


In [32]:
# test for net conversion
std_err,marg_err,ci_lower,ci_upper = stats_prop(NetConversion_exp, NetConversion_cont, 1.96, clicks_exp, clicks_cont)
diff = NetConversion_exp - NetConversion_cont
print("2 sided test results for net conversion")
print(f"std_err: {std_err}")
print(f"marg_err: {marg_err}")
print(f"difference: {diff}")
print(f"ci_lower: {ci_lower}")
print(f"ci_upper: {ci_upper}")

2 sided test results for net conversion
std_err: 0.0034339730295116894
marg_err: 0.006730587137842911
difference: -0.0048737226745441675
ci_lower: -0.011604309812387078
ci_upper: 0.0018568644632987437


#### 5.3 Inference

**Gross conversion**
* Since the confidence interval does not contain zero, the difference is statistically significant. Further, since the difference is negative, it can be inferred that the experiment has reduced the gross conversion rate
* Also, since the difference is greater than the minimum detectable effect of 0.01, it is considered to be practically significant as well

**Net conversion**
* Since the confidence interval contains zero, the difference is not statistically significant
* Also, since the difference is lesser than the minimum detectable effect of 0.0075, it is not considered to be practically significant as well

### 6. Recommendation and Conclusion

* Based on the analysis of the AB test, it can be concluded that the experiment resulted in a decrease in the gross conversion rate of users. In other words, the ratio of the number of users enrolling in the free trial to the number of users clicking on the free trial has reduced. 
* However, the experiment did not increase net enrollment in a statistically significant manner. 
* Therefore, it can be considered partly successfull. The recommendation is to launch the experiment, while continuting to design additional experiments to achieve the goal of improved net enrollment