<a href="https://colab.research.google.com/github/jounb/udacity_ab_test/blob/main/Udacity_AB_Testing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Table of Contents

1. [Background](https://colab.research.google.com/drive/1wK8YTt1Yc7zp1ziNEllBI_ar-eYzHUb_#scrollTo=yOzzsCdVHJOR)
2. [Experiment Overview](https://colab.research.google.com/drive/1wK8YTt1Yc7zp1ziNEllBI_ar-eYzHUb_#scrollTo=i3cbjNUmKhsv)
3. [Experiment Design](https://colab.research.google.com/drive/1wK8YTt1Yc7zp1ziNEllBI_ar-eYzHUb_#scrollTo=yS37kvPtkjuz)
4. [Data Analysis](https://colab.research.google.com/drive/1wK8YTt1Yc7zp1ziNEllBI_ar-eYzHUb_#scrollTo=jPVZGBBHg7Ez&line=7&uniqifier=1)
5. [Summary and Recommendations](https://colab.research.google.com/drive/1wK8YTt1Yc7zp1ziNEllBI_ar-eYzHUb_#scrollTo=ErZReIECCLaJ&line=16&uniqifier=1)

## 1. Background

This experiment is used as the final project for Udacity's A/B testing course, and was an actual experiment that was run by Udacity. The specific numbers have been changed but the patterns have not. The experiment overview and resulting data are provided by Udacity.


## 2. Experiment Overview 

**2.1 Current State**

At the time of this experiment, Udacity courses currently have two options on the course overview page: 1) "start free trial", and 2) "access course materials". 

* If the student clicks "start free trial", they are asked for their credit card information, and then enrolled in a free trial for the paid version of the course. After 14 days, they are automatically charged unless they cancel. 

* If the student clicks "access course materials", they are able to view the videos and take the quizzes for free, but they do not receive the paid benefits (e.g. coaching support, verified certificate)


**2.2 The Experiment**
* If the student clicks "start free trial", they are asked how much time they have to devote to the course. 
 * If the student chooses  5 + hours per week, they are taken through the checkout process as usual. 
 * If they choose < 5 hours per week, a message would say that Udacity courses usually need more time for completion, and suggest accessing the course materials for free. Then, the student would have the option to continue enrolling, or access free course instead. This [screenshot](https://drive.google.com/file/d/0ByAfiG8HpNUMakVrS0s4cGN2TjQ/view) shows what the experiment looks like.

**2.3 The Hypothesis**

By setting clearer expectations for students upfront, 1) reduce the number of frustrated students who leave the free trial because they didn't have enough time. And 2) do not significantly reducing the number of students to continue past the free trial and eventually complete the course. 

If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.

## 3. Experiment Design

**3.1 Metric Choice**

First, the following metrics and definitions will be used as invariant metrics for sanity checking the experiment setup. We'll check that these metrics are not affected by the experimeent.

* Number of cookies: Number of unique cookies to view the course overview page. (dmin=3000)
* Number of user-ids: Number of users who enroll in the free trial. (dmin=50)
* Number of clicks: Number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger). (dmin=240)
* Click-through-probability: Number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page. (dmin=0.01)


The following metrics and definitions will be used as our evaluation metrics to test our hypotheses:
* Gross conversion: # of user-ids to complete checkout and enroll in the free trial divided by # of unique cookies to click the "Start free trial" button. (dmin= 0.01)
* Retention: # of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by # of user-ids to complete checkout. (dmin=0.01)
* Net conversion: # of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the # of unique cookies to click the "Start free trial" button. (dmin= 0.0075)

If our hypotheses hold true, we expect:
* Gross conversion should decrease since we expect our enrollments to decrease. 
* Retention should increase since we expect users who don't have enough time would not enroll.
* Net conversion should not change


The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.





**3.2 Sample Size**

Now that we have our metrics, let's compute our necessary sample size for our evaluation metrics. First we'll create a dataframe using the baseline data that Udacity provided

In [44]:
import pandas as pd
import statistics as stats
import numpy as np
import math
import seaborn as sns

In [2]:
ab_df = pd.read_csv("https://raw.githubusercontent.com/jounb/udacity_ab_test/main/Udacity%20AB%20Test%20Data/Final%20Project%20Baseline%20Values.csv")

ab_df

Unnamed: 0,Metric,Value
0,Course overview page views per day (unique coo...,40000.0
1,"""Start free trial"" clicks per day (unqiue cook...",3200.0
2,Enrollments per day,660.0
3,"Click-through-probability on ""Start free trial""",0.08
4,"Probability of enrolling, given click",0.20625
5,"Probability of payment, given enroll",0.53
6,"Probability of payment, given click",0.109313


Let's add an abbreviated metric column for readability and add the dmin values as well

In [3]:
ab_df.insert(0, 'ab_metric', ['Page_views', 'Clicks', 'Enrollments', 'CTP', 'GC','Retention', 'NC'])

ab_df.insert(3, 'dmin', [3000, 50, 240, 0.01, 0.01,0.01, 0.0075])

ab_df.set_index('ab_metric', inplace = True)

ab_df

Unnamed: 0_level_0,Metric,Value,dmin
ab_metric,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Page_views,Course overview page views per day (unique coo...,40000.0,3000.0
Clicks,"""Start free trial"" clicks per day (unqiue cook...",3200.0,50.0
Enrollments,Enrollments per day,660.0,240.0
CTP,"Click-through-probability on ""Start free trial""",0.08,0.01
GC,"Probability of enrolling, given click",0.20625,0.01
Retention,"Probability of payment, given enroll",0.53,0.01
NC,"Probability of payment, given click",0.109313,0.0075


For each metric, we'll calculate the sample size needed using this online calculator: https://www.evanmiller.org/ab-testing/sample-size.html and store the values in our dataframe


In [4]:
#GC page view sample size
gc_clicks = 25835
gc_page_view = round( gc_clicks / 0.08 * 2) #clicks divided by click thru rate, x2 for test & control

#Retention page view sample size
ret_enrolls = 39115
ret_page_view = round(ret_enrolls / (660/40000) * 2) #enrollments divided by enroll rate, x2 for test & control

# NC page view sample size
nc_clicks = 27413
nc_page_view = round(nc_clicks / 0.08 * 2) #clicks divided by click thru rate, x2 for test & control


sample_size = [np.nan, np.nan, np.nan, np.nan, gc_page_view, ret_page_view, nc_page_view]

ab_df['sample_size'] = sample_size

ab_df

Unnamed: 0_level_0,Metric,Value,dmin,sample_size
ab_metric,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Page_views,Course overview page views per day (unique coo...,40000.0,3000.0,
Clicks,"""Start free trial"" clicks per day (unqiue cook...",3200.0,50.0,
Enrollments,Enrollments per day,660.0,240.0,
CTP,"Click-through-probability on ""Start free trial""",0.08,0.01,
GC,"Probability of enrolling, given click",0.20625,0.01,645875.0
Retention,"Probability of payment, given enroll",0.53,0.01,4741212.0
NC,"Probability of payment, given click",0.109313,0.0075,685325.0


**3.3 Experiment Duration**

Now that we have the needed sample size, let's calculate the duration needed for the experiment, assuming we divert 50% of the traffic to this experiment.

We do not want to divert all due to opportunity cost (i.e. udacity may want to perform other experiments) and risk (e.g. something may go wrong with experiment, enrollment might fall more than expected).


In [5]:
#Gross conversion duration

gc_dur = ab_df.loc['GC']['sample_size'] / (ab_df.loc['Page_views']['Value'] * 0.5) 
ret_dur = ab_df.loc['Retention']['sample_size'] / (ab_df.loc['Page_views']['Value'] * 0.5) 
nc_dur = ab_df.loc['NC']['sample_size'] / (ab_df.loc['Page_views']['Value'] * 0.5) 

ab_df['duration'] = [np.nan, np.nan, np.nan,np.nan,gc_dur, ret_dur, nc_dur]

ab_df


Unnamed: 0_level_0,Metric,Value,dmin,sample_size,duration
ab_metric,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Page_views,Course overview page views per day (unique coo...,40000.0,3000.0,,
Clicks,"""Start free trial"" clicks per day (unqiue cook...",3200.0,50.0,,
Enrollments,Enrollments per day,660.0,240.0,,
CTP,"Click-through-probability on ""Start free trial""",0.08,0.01,,
GC,"Probability of enrolling, given click",0.20625,0.01,645875.0,32.29375
Retention,"Probability of payment, given enroll",0.53,0.01,4741212.0,237.0606
NC,"Probability of payment, given click",0.109313,0.0075,685325.0,34.26625


In order to get a statistically significant sample size with 50% traffic diverted, we have to run this experiment for 237 days, or 8 months! 

Since the duration is not realistic given opportunity cost, we will drop measuring retention. This should be fine since net conversion will tell us if there are any negative impacts to uesrs who are enrolled past the 14-day boundary.

## 4. Data Analysis

**Sanity Checks**

Now we'll bring in the experiment results data from Udacity. Before we analyze the control vs. experiment data, we'll do a few checks on th data and perform sanity checks using the invariant metrics previously defined.

First, let's do some general data checks.

In [67]:
control_df= pd.read_csv("https://raw.githubusercontent.com/jounb/udacity_ab_test/main/Udacity%20AB%20Test%20Data/Final%20Project%20Results_Control.csv")

exp_df = pd.read_csv("https://raw.githubusercontent.com/jounb/udacity_ab_test/main/Udacity%20AB%20Test%20Data/Final%20Project%20Results_Exp.csv")

print(control_df.info(), exp_df.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 39 entries, 0 to 38
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Date         37 non-null     object 
 1   Pageviews    37 non-null     float64
 2   Clicks       37 non-null     float64
 3   Enrollments  23 non-null     float64
 4   Payments     23 non-null     float64
dtypes: float64(4), object(1)
memory usage: 1.6+ KB
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 37 entries, 0 to 36
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   Date         37 non-null     object 
 1   Pageviews    37 non-null     int64  
 2   Clicks       37 non-null     int64  
 3   Enrollments  23 non-null     float64
 4   Payments     23 non-null     float64
dtypes: float64(2), int64(2), object(1)
memory usage: 1.6+ KB
None None


In [70]:
print(control_df.describe())
print(exp_df.describe())

          Pageviews      Clicks  Enrollments    Payments
count     37.000000   37.000000    23.000000   23.000000
mean    9339.000000  766.972973   164.565217   88.391304
std      740.239563   68.286767    29.977000   20.650202
min     7434.000000  632.000000   110.000000   56.000000
25%     8896.000000  708.000000   146.500000   70.000000
50%     9420.000000  759.000000   162.000000   91.000000
75%     9871.000000  825.000000   175.000000  102.500000
max    10667.000000  909.000000   233.000000  128.000000
          Pageviews      Clicks  Enrollments    Payments
count     37.000000   37.000000    23.000000   23.000000
mean    9315.135135  765.540541   148.826087   84.565217
std      708.070781   64.578374    33.234227   23.060841
min     7664.000000  642.000000    94.000000   34.000000
25%     8881.000000  722.000000   127.000000   69.000000
50%     9359.000000  770.000000   142.000000   91.000000
75%     9737.000000  827.000000   172.000000   99.000000
max    10551.000000  884.000000

We see that the data provided by Udacity is likely directing 50% (or slightly less) of traffic given average page views of ~9000 for test and control, and the baseline daily pages of 40,000. The clicks are in line with baseline data as well.

However, the provided data has a few issues.

1.  There are 37 days of data and 14 NAs for Enrollments and Payments. The NAs for payments makes sense since there is a 14 day wait period, but enrollment data should be available for the full period. Potentially, the data only includes enrollments that stayed enrolled for 14 days, which is not what we need to measure gross conversion or retention.

2. The needed sample size for gross conversion is 32 days, and net conversion is 34 days. We only have 23 days of data.

At this point, the recommended next steps would be to 1) discuss with engineers on the availability and definition of enrollment data, and 2) wait 14 days to gather a sufficient sample size.

However, since we do not have that option, we'll continue with the analysis and drop the NAs.

In [53]:
print(control_df.isna().sum())
print(exp_df.isna().sum())

Date            0
Pageviews       0
Clicks          0
Enrollments    14
Payments       14
group           0
dtype: int64
Date            0
Pageviews       0
Clicks          0
Enrollments    14
Payments       14
group           0
dtype: int64


In [54]:
# drop na values in date, pageviews and clicks since they should have values.
# NAs in  payments are expected, given 14 day period to measure if user remains enrolled
exp_df=exp_df.dropna()
control_df = control_df.dropna()
print(control_df.isna().sum())

Date           0
Pageviews      0
Clicks         0
Enrollments    0
Payments       0
group          0
dtype: int64


We'll perform t-tests on invariate metrics that are counts to ensure there are not significant differences. All p-values are greater than 0.05, thus we cannot reject the null hypothesis that the means are different. This passes our sanity check.

In [55]:
from scipy import stats

print(stats.ttest_ind(control_df.Pageviews, exp_df.Pageviews))
print(stats.ttest_ind(control_df.Clicks, exp_df.Clicks))
print(stats.ttest_ind(control_df.Enrollments, exp_df.Enrollments, nan_policy='omit'))

Ttest_indResult(statistic=0.14289030617246118, pvalue=0.8870291739409888)
Ttest_indResult(statistic=0.06598153974790888, pvalue=0.9476914165204888)
Ttest_indResult(statistic=1.686512674290538, pvalue=0.09877614030153456)


For Click-through-probability, we'll perform a two-test proportions z test. 

With a p-value >0.05, we fail to reject our null hypothesis that the CTPs are different, thus passing the sanity check.

In [56]:
from statsmodels.stats.proportion import proportions_ztest

success = np.array([control_df.Clicks.sum(), exp_df.Clicks.sum()])
sample = np.array([control_df.Pageviews.sum(),exp_df.Pageviews.sum()])

proportions_ztest(count = success, nobs = sample, alternative='two-sided')


(-0.1815932392462904, 0.855901954934631)

**A/B Testing**

To recap, we have two metrics we want to test for significant differences.

* Gross conversion: (# of user-ids to enroll) / (# of "Start free trial" clicks)

* Net conversion: (# of user-ids enrolled past 14-day boundary w/ payment) / (# of "Start free trial" clicks)

The hypotheses we will be testing are:

**1. Gross Conversion**

* $H_0: \mu_1 = \mu_2$  -> Gross conversion is equal for users who recevied the new pop-up asking for time commitment vs. not

* $H_1: \mu_1 \neq \mu_2$  -> Gross conversion is not equal for users who received the new pop-up asking for time commitment vs. not

**2. Net Conversion**

* $H_0: \mu_1 = \mu_2$  -> Net conversion is equal for users who recevied the new pop-up asking for time commitment vs. not

* $H_1: \mu_1 \neq \mu_2$ -> Net conversion is not equal for users who received the new pop-up asking for time commitment vs. not

Let's test our first hypothesis for gross conversion


In [57]:
# GC: (# of user-ids to enroll) / (# of "Start free trial" clicks)

gc_success = np.array([control_df.Enrollments.sum(),exp_df.Enrollments.sum() ])
gc_sample = np.array([control_df.Clicks.sum(), exp_df.Clicks.sum()])


print(proportions_ztest(count = gc_success, nobs = gc_sample, alternative='two-sided'))

(4.701830023753982, 2.578401033720593e-06)


At a p-value < 0.05, we reject our null hypothesis. The gross conversion for the control group is 0.0206 higher than experiment (CI: 0.0120 to 0.0291). The results are also pratically significant since the confidence interval does not include the dmin of 0.01.


In [63]:
control_gc = control_df.Enrollments.sum() / control_df.Clicks.sum()
exp_gc = exp_df.Enrollments.sum() / exp_df.Clicks.sum()
control_n = control_df.Clicks.sum()
exp_n = exp_df.Clicks.sum()
prop_diff = exp_gc - control_gc

pool_std_err = math.sqrt(control_gc*(1-control_gc)/control_n + exp_gc*(1-exp_gc)/exp_n)
margin_err = 1.96 * pool_std_err
upper_lim = prop_diff + margin_err
lower_lim = prop_diff - margin_err

print(prop_diff, lower_lim, upper_lim)

-0.020554874580361565 -0.029120319808048547 -0.011989429352674583


In [64]:
# Net conversion: (# of user-ids enrolled past 14-day boundary w/ payment) / (# of "Start free trial" clicks)

nc_success = np.array([control_df.Payments.sum(),exp_df.Payments.sum() ])
nc_sample = np.array([control_df.Clicks.sum(), exp_df.Clicks.sum()])

print(proportions_ztest(count = nc_success, nobs = nc_sample, alternative='two-sided'))

(1.4192001144365733, 0.15584068262150205)


At a p-value > 0.05, we fail to reject our null hypothesis. The gross conversion for users who received the pop-up vs not were not different at a statistically significant level (CI: -0.0019 to 0.0116). 

These results are in line with the hypothesis that users who ultimately stay on will not change based on the additional pop-up to check time commitment. 

However, although not significantly different, the measured proportion difference is lower for the experiment group.

In [65]:
control_nc = control_df.Payments.sum() / control_df.Clicks.sum()
exp_nc = exp_df.Payments.sum() / exp_df.Clicks.sum()
control_n = control_df.Clicks.sum()
exp_n = exp_df.Clicks.sum()
nc_diff = exp_nc - control_nc

pool_std_err_nc = math.sqrt(control_nc*(1-control_nc)/control_n + exp_nc*(1-exp_nc)/exp_n)
margin_err_nc = 1.96 * pool_std_err_nc
lower_lim_nc = nc_diff - margin_err_nc
upper_lim_nc = nc_diff + margin_err_nc

print(nc_diff, lower_lim_nc, upper_lim_nc, margin_err_nc)

-0.0048737226745441675 -0.011604309812387078 0.0018568644632987437 0.006730587137842911


## 5. Summary and Recommendations

The initial read out of the data is promising:

* Gross conversion: At a p-value < 0.05, we reject our null hypothesis. The gross conversion for experiment group is lower than control group at a statistically and practically significant level. 

* Net conversion: At a p-value > 0.05, we fail to reject our null hypothesis. While the difference is not statistically significant, the net conversion for the experiment group was lower than the control group.

These results are in line with our hypothesis that showing the pop up will reduce the  of users who enroll, while not impacting users who ultimately stay. 

Based on the initial results, the recommendations are the following:

1. Confirm enrollment data, and wait 14 more days and rerun the analysis

2. Run more experiments that may lead to better outcomes. Users who sign up for the free trial period and drop before 14 days, does not necessarily mean they were frustrated with lack of time. 

Potential experiments:
* Instead of a pop up, test disclaimer under "start new trial" that indicates recommended time commitment. This may better set expectations, while not potentially negatively impacting enrollment rate.

* Once user is enrolled and inactive, test notifications/verbiage that remindes users to commit more time ("e.g. you've spent 2 hours this week on X vs. recommended time of Y. Continue course now"). This may increase net conversion/retention.




