# Load Packages

In [24]:
import pandas as pd
import numpy as np
import math as mt
from scipy.stats import norm

# Experiment Description

### "Free Trial" Screener Test

##### Control

* The website has two options on the overview page: "start free trial", and "access materials".
* If the person clicks "start free trial", they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version. After 14 days, they will automatically be charged unless they cancel first.
* If the person clicks "access materials", they will be able to view part of the content for free.

##### Experiment
* If the person clicked "start free trial", they would get a pop up message reminding them the "access meterials" option would give them part of the content for free.
* At this point, the person would have the option to continue enrolling in the free trial, or access the materials for free instead.

##### Hypothesis
* This change would set clearer expectations upfront, thus reducing the number of frustrated people left the free trial without significantly reducing the number of people to continue past the free trial. If this hypothesis held true, it could improve the overall experience.

# Define Metrics

* Invariate metrics: used for "sanity checks", that is, to make sure our experiment is not inherently wrong. Basically, this means we pick metrics which we consider not to be affected because of our experiment and later make sure these metrics don't change drastically between our control and experiment groups.
* Evaluation metrics: which we expect to see a change, and are relevant to the business goals we aim to achieve.
* For each metric we state a __Minimum Change__ which is practically significant to the business. For instance, stating that any increase in retention that is under 2%, even if statistically significant, is not practical to the business.

### Load Baseline Values

In [7]:
df_basevals = pd.read_csv("baseline_vals.csv", index_col=False,header = None, names = ['metric','baseline_val'])
df_basevals.metric = df_basevals.metric.map(lambda x: x.lower())
df_basevals

Unnamed: 0,metric,baseline_val
0,unique cookies to view page per day:,40000.0
1,"unique cookies to click ""start free trial"" per...",3200.0
2,enrollments per day:,660.0
3,"click-through-probability on ""start free trial"":",0.08
4,"probability of enrolling, given click:",0.20625
5,"probability of payment, given enroll:",0.53
6,"probability of payment, given click",0.109313


##### Invariate Metrics
* Number of Cookies in Overview Page
* Number of Clicks on Free Trial Button
* Free Trial button Click-Through-Probability

##### Evaluation Metrics
* Gross Convertion - #enrolled/#clicked
* Net Convertion - #paid/#clicked
* Retention - #paid/#enrolled

### Check Standard Deviation

* We should estimate the standard deviation of each evaluation metric. The more variant a metric is, the harder it is to reach a significant result. 
* The sample size we are considering should be smaller than the total population we collected and small enough to have two groups with that size.

In [15]:
# Assume we want 10000 page views (out of 40000 page views) per day in each group

def check_std(probability,sample_size):
    return round(np.sqrt((probability*(1-probability))/sample_size),4)


GC_probability = 0.206250
GC_sample_size = 10000*(3200/40000)

NC_probability = 0.109313
NC_sample_size = 10000*(3200/40000)

R_probability = 0.530000
R_sample_size = 10000*(660/40000)


GC_std = check_std(GC_probability,GC_sample_size)
NC_std = check_std(NC_probability,NC_sample_size)
R_std = check_std(R_probability,R_sample_size)

print('std for Gross Conversion:',GC_std,'\n'
      'std for Net Conversion:',NC_std,'\n'
      'std for Retention:',R_std)


std for Gross Conversion: 0.0143 
std for Net Conversion: 0.011 
std for Retention: 0.0389


# Calculate Smaple Size and Test Duration

* Significance level α=0.05
* Power β=0.2
* Minimum detectable effect: Gross Conversion (1%), Retention (1%), Net Conversion (0.75%)

* We can calculate the sample size using the online calculator here: https://www.evanmiller.org/ab-testing/sample-size.html or use our defined function

In [18]:
# p - probability (conversion rate), d - minimum detectable effect

def get_sampsize(p,d,alpha,beta):
    
    sd1=mt.sqrt(2*p*(1-p))
    sd2=mt.sqrt(p*(1-p)+(p+d)*(1-(p+d)))
    
    z_score1 = norm.ppf(1-alpha/2)
    z_score2 = norm.ppf(1-beta)
    
    n=pow((z_score1*sd1+z_score2*sd2),2)/pow(d,2)
    return n

In [25]:
GC_dmin = 0.01
NC_dmin = 0.0075
R_dmin = 0.01

GC_size = get_sampsize(GC_probability,GC_dmin,0.05,0.2)
NC_size = get_sampsize(NC_probability,NC_dmin,0.05,0.2)
R_size = get_sampsize(R_probability,R_dmin,0.05,0.2)

print('#clicks for Gross Conversion:',round(GC_size,0),'\n'
      '#clicks for Net Conversion:',round(NC_size,0),'\n'
      '#enrolled for Retention:',round(R_size,0),'\n')

print('#page views for Gross Conversion:',round(GC_size/(3200/40000)*2,0),'\n'
      '#page views for Net Conversion:',round(NC_size/(3200/40000)*2,0),'\n'
      '#page views for Retention:',round(R_size/(3200/40000)/0.20625*2,0))


#clicks for Gross Conversion: 25835.0 
#clicks for Net Conversion: 27413.0 
#enrolled for Retention: 39087.0 

#page views for Gross Conversion: 645868.0 
#page views for Net Conversion: 685336.0 
#page views for Retention: 4737771.0


Given we have 40000 page views per day, if we want to use 100% of traffic 
- Gross Conversion: 16.1 days
- Net Conversion: 17.1 days
- Retention: 118 days

118 days is too long for a test, so we have to give up retention, and only focus on GC and NC; It is also risky to use all 100% of traffic, so we're going to use 50% of the traffic, which gives us the duration of 
- Gross Conversion: 33 days
- Net Conversion: 35 days 

# Experiment Analysis

### Load and Inspect data

In [28]:
control=pd.read_csv("Final Project Results - Control.csv")
experiment=pd.read_csv("Final Project Results - Experiment.csv")
experiment.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7716,686,105.0,34.0
1,"Sun, Oct 12",9288,785,116.0,91.0
2,"Mon, Oct 13",10480,884,145.0,79.0
3,"Tue, Oct 14",9867,827,138.0,92.0
4,"Wed, Oct 15",9793,832,140.0,94.0


In [39]:
experiment.describe()

Unnamed: 0,Pageviews,Clicks,Enrollments,Payments
count,37.0,37.0,23.0,23.0
mean,9315.135135,765.540541,148.826087,84.565217
std,708.070781,64.578374,33.234227,23.060841
min,7664.0,642.0,94.0,34.0
25%,8881.0,722.0,127.0,69.0
50%,9359.0,770.0,142.0,91.0
75%,9737.0,827.0,172.0,99.0
max,10551.0,884.0,213.0,123.0


In [40]:
control.describe()

Unnamed: 0,Pageviews,Clicks,Enrollments,Payments
count,37.0,37.0,23.0,23.0
mean,9339.0,766.972973,164.565217,88.391304
std,740.239563,68.286767,29.977,20.650202
min,7434.0,632.0,110.0,56.0
25%,8896.0,708.0,146.5,70.0
50%,9420.0,759.0,162.0,91.0
75%,9871.0,825.0,175.0,102.5
max,10667.0,909.0,233.0,128.0


In [47]:
# experiment.isnull().sum()
# control.isnull().sum()

control_notnull = control[(pd.isnull(control.Enrollments) != True)&(pd.isnull(control.Payments) != True)]
experiment_notnull = experiment[(pd.isnull(experiment.Enrollments) != True)&(pd.isnull(experiment.Payments) != True)]

### Sanity Check

In [35]:
# Number of Cookies in Overview Page
# Number of Clicks on Free Trial Button
# Click-Through-Probability

print('Total Pageviews:','\n','\n',
      'Experiment:',experiment.Pageviews.sum(),'\n',
      'Control:',control.Pageviews.sum(),'\n','\n'
      'Total Clicks:','\n','\n',
      'Experiment:',experiment.Clicks.sum(),'\n',
      'Control:',control.Clicks.sum(),'\n','\n'
      'Total CTP:','\n','\n',
      'Experiment:',experiment.Clicks.sum()/experiment.Pageviews.sum(),'\n',
      'Control:',control.Clicks.sum()/control.Pageviews.sum())

Total Pageviews: 
 
 Experiment: 344660 
 Control: 345543 
 
Total Clicks: 
 
 Experiment: 28325 
 Control: 28378 
 
Total CTP: 
 
 Experiment: 0.08218244066616376 
 Control: 0.08212581357457682


### Calculate Evaluation Metrics

In [48]:
GC_exp = experiment_notnull.Enrollments.sum()/experiment_notnull.Clicks.sum()
NC_exp = experiment_notnull.Payments.sum()/experiment_notnull.Clicks.sum()
GC_cont = control_notnull.Enrollments.sum()/control_notnull.Clicks.sum()
NC_cont = control_notnull.Payments.sum()/control_notnull.Clicks.sum()

GC = (experiment_notnull.Enrollments.sum() + control_notnull.Enrollments.sum())/(experiment_notnull.Clicks.sum() + control_notnull.Clicks.sum())
NC = (experiment_notnull.Payments.sum() + control_notnull.Payments.sum())/(experiment_notnull.Clicks.sum() + control_notnull.Clicks.sum())


In [50]:
GC_diff = GC_exp - GC_cont
NC_diff = NC_exp - NC_cont

print('Gross Conversion Difference: ', GC_diff,'\n'
     'Net Conversion Difference: ', NC_diff)

Gross Conversion Difference:  -0.020554874580361565 
Net Conversion Difference:  -0.0048737226745441675


In [52]:
def stats_prop(p_hat,z_score,N_cont,N_exp,diff):
    std_err = np.sqrt((p_hat * (1- p_hat ))*(1/N_cont + 1/N_exp))
    marg_err = z_score * std_err
    ci_lower = diff - marg_err
    ci_upper = diff + marg_err
    
    return std_err,marg_err,ci_lower,ci_upper

In [53]:
stats_prop(GC,1.96,control_notnull.Clicks.sum(),experiment_notnull.Clicks.sum(),GC_diff)

(0.004371675385225936,
 0.008568483755042836,
 -0.0291233583354044,
 -0.01198639082531873)

In [55]:
stats_prop(NC,1.96,control_notnull.Clicks.sum(),experiment_notnull.Clicks.sum(),NC_diff)

(0.0034341335129324238,
 0.0067309016853475505,
 -0.011604624359891718,
 0.001857179010803383)

In [58]:
import statsmodels.api as sm
sm.stats.proportions_ztest([experiment_notnull.Enrollments.sum(), control_notnull.Enrollments.sum()], 
                           [experiment_notnull.Clicks.sum(), control_notnull.Clicks.sum()], 
                           alternative='two-sided')

(-4.701830023753982, 2.578401033720593e-06)

In [59]:
sm.stats.proportions_ztest([experiment_notnull.Payments.sum(), control_notnull.Payments.sum()], 
                           [experiment_notnull.Clicks.sum(), control_notnull.Clicks.sum()], 
                           alternative='two-sided')

(-1.4192001144365733, 0.15584068262150205)

# Summary & Recommendations

The test showed that, at the 95% CI, the difference in gross conversion is statistically signficant and also practically signficant. However, the net conversion is neither statistically nor practically signficant at the 95% CI. This means if we remind and filter out people and try to provide a better user experience, it won't lead to a higher payment rate, thus, we don't recommend launching the new feature.