# Experiment Design
## Hypothesis
The modification might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course.

## Metric Choice

* Number of Cookies: number of unique cookies to view the course overview page. (dmin=3000)
    * Invariant: Yes. 
        * It is the unit of diversion that is randomly assigned. It should be roughly the same for each group.
* Number of user-ids: number of users who enroll in the free trial. (dmin=50)
    * Invariant: No 
        * Tracked after enrolling in the free trial, two groups might have different populations because not all users will choose to enroll in the free trial
    * Evaluation Metric: No.
        * The goal of this project is to minimize the number of frustrated students who left the free trial due to lack of time while keeping students who want to keep studying the course after the free trial. The metric is not well-defined, which could make the definition clearer. For example, the difference in user-ids before and after 14 days.  
* Number of Clicks: number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger). (dmin=240)
    * Invariant: Yes
        * This happens before asking the student how much time they will spend on the course. Should split evenly between two groups
* Click-through-probability: number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page. (dmin=0.01)
    * Invariant: Yes
        * The clicks happen before asking the student how much time they will spend on the course. Should split evenly between two groups
* Gross conversion: number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button. (dmin= 0.01)
    * Invariant: No
        * This could be affected by the change 
    * Evaluation Metric: Yes
        * Expected treatment group has lower gross conversion rate since numerator (number of user-ids to complete checkout and enroll in the free trial, enrolled) will decrease for treatment group while denominator stays the same for two groups.
* Retention: number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout. (dmin=0.01
    * Invariant: No
        * This could be affected by the change  
    * Evaluation Metric: Yes
        * Expected treatment group have higher retention rate since denominator (number of user-ids enrolled) will be smaller for the treatment group compared to the control group. The number of payments should be more or less or the same.
* Net conversion: number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button. (dmin= 0.0075)
    * Invariant: No
        * This could be affected by the change  
    * Evaluation Metric: Yes
        * Is the product of two former metrics.  The number of payments is expected to remain more or less or the same, so the change of this metric should be insignificant.

![](Evaluation.PNG)
![](Process.PNG)
Image Source: https://www.kaggle.com/code/mariusmesserschmied/udacity-a-b-testing-final-course-project/notebook

## Measuring Standard Deviation

![](sd_calculation.PNG)
![](baseline.PNG)
For each of your evaluation metrics, indicate whether you think the analytic estimate would be comparable to the the empirical variability, or whether you expect them to be different (in which case it might be worth doing an empirical estimate if there is time). Briefly give your reasoning in each case.


In [19]:
import math

### Gross Conversion SD

In [18]:
#gross conversion: # enroll_uid / # clicks 'start free' button
p_enroll = 0.20625
p_click_start_now = 3200/40000
n_for_click = 5000*p_click_start_now
gc_sd = round(math.sqrt((p_enroll*(1-p_enroll)/n_for_click)),4)
gc_sd

0.0202

### Retension SD

In [31]:
#retension: # paid_uid / # enroll_uid
p_pay_enrolled = 0.53
p_enroll = 660/40000
n_for_enroll = 5000*p_enroll
gc_rt = round(math.sqrt((p_pay_enrolled*(1-p_pay_enrolled)/n_for_enroll)),4)
gc_rt

0.0549

### Net Conversion SD

In [22]:
#net conversion: # paid_uid / # clicks 'start free' button
p_pay_clicked = 0.1093125
gc_nc = round(math.sqrt((p_pay_clicked*(1-p_pay_clicked)/n_for_click)),4)
gc_nc

0.0156

## Sizing

### Will you use the Bonferroni correction during your analysis phase? 
No, because evaluation metrics are all correlated, using bonferroni correction will be too conservative. 
### Give the number of pageviews you will need to power you experiment appropriately.
### https://www.evanmiller.org/ab-testing/sample-size.html

In [26]:
# Gross Conversion
min_n_based_click_one_group = 25835
# One Group: min_n_based_view * p_click_start_now = 25385
min_n_based_view_one_group = 25835/p_click_start_now
min_n_based_view_one_group*2

645875.0

In [34]:
# Retention
min_n_based_enroll_one_group = 39115
min_n_based_view_one_group = 39115/p_enroll
min_n_based_view_one_group*2

4741212.121212121

In [37]:
# Net Conversion
min_n_based_click_one_group = 27413
min_n_based_view_one_group = 27413/p_click_start_now
min_n_based_view_one_group*2

685325.0

### Duration vs. Exposure
#### Indicate what fraction of traffic you would divert to this experiment
I am going to divert 50% fraction of the traffic

#### How many days you would need to run the experiment. (These should be the answers from the "Choosing Duration and Exposure" quiz.)

In [12]:
#Days Needed for three metrics
print('Gross Conversion:',645875/40000)
print('Rentension:',4741213/40000)
print('Net Conversion:',685325/40000)

Gross Conversion: 16.146875
Rentension: 118.530325
Net Conversion: 17.133125


Rentension Needs 119 days, which is to long. Retension is no longer used as evaluation metric. Gross Conversion and Net Conversion need 17 and 18 days. Last 14 day data cannot be used because users entered at that time are still on free trial, the days need to run the experiment is 18 + 14 = 32 days. The length of Udacity's business cycle is one week, and there might be week of day effect. So the final expriement length is 7*5 = 35 days. I don't think this is a risky experiment.

# Experiment Analysis
## Sanity Checks

In [1]:
import pandas as pd
import numpy as np
control = pd.read_csv('Final Project Results - Control.csv')
experiment = pd.read_csv('Final Project Results - Experiment.csv')

In [3]:
control_pageview = control['Pageviews'].sum()
experiment_pageview = experiment['Pageviews'].sum()
control_enroll = control['Enrollments'].sum()
experiment_enroll = experiment['Enrollments'].sum()
control_click = control['Clicks'].sum()
experiment_click= experiment['Clicks'].sum()

### Number of Cookies

In [5]:
# 1) Compute Standard Deviation of binominal with probablilty 0.5 of success
cookie_sd = np.sqrt((0.5*0.5)/(control_pageview+experiment_pageview))
# 2) Multiply by z-score to get margin of error
cookie_MOE = cookie_sd * 1.96
# 3) Compute confidence interval around 0.5
print(0.5-cookie_MOE, 0.5+cookie_MOE)
# 4) Check whether observed fraction is within interval
cookie_p_hat = control_pageview/(control_pageview + experiment_pageview)
print('Yes',cookie_p_hat)

0.49882039214902313 0.5011796078509769
Yes 0.5006396668806133


### Number of Clicks

In [7]:
# 1) Compute Standard Deviation of binominal with probablilty 0.5 of success
click_sd = np.sqrt((0.5*0.5)/(control_click+experiment_click))
# 2) Multiply by z-score to get margin of error
click_MOE = click_sd * 1.96
# 3) Compute confidence interval around 0.5
print(0.5-click_MOE, 0.5+click_MOE)
# 4) Check whether observed fraction is within interval
click_p_hat = control_click/(control_click+experiment_click)
print('Yes',click_p_hat)

0.49588449572378945 0.5041155042762105
Yes 0.5004673474066628


### Click Through Probablity

![](two_proportion_compare.PNG)
image source: https://medium.com/@zhouyuchen999/a-b-testing-experiment-a-udacity-course-project-f958f7236278

In [10]:
p_control = control_click/control_pageview
p_experiment = experiment_click/experiment_pageview
print('Control:',p_control)
print('Experiment',p_experiment)

Control: 0.08212581357457682
Experiment 0.08218244066616376


In [12]:
# 1) Compute Standard Deviation of binominal with probablilty 0.08 of success (from baseline)
ctp_pool = (control_click + experiment_click)/(control_pageview + experiment_pageview)
ctp_sd = np.sqrt(ctp_pool*(1-ctp_pool)*((1/control_pageview)+(1/experiment_pageview)))
# 2) Multiply by z-score to get margin of error
ctp_MOE = ctp_sd * 1.96
# 3) Compute confidence interval around the difference
diff = p_experiment-p_control
print(-ctp_MOE, ctp_MOE)
# 4) Check whether observed fraction is within interval
print('Yes',diff)

-0.0012956791986518956 0.0012956791986518956
Yes 5.662709158693602e-05


| Invariant | Lower Bound | Upper Bound | Obersved | Passed? |
| --- | --- | --- | --- | --- |
| Number of Cookies | 0.4988 | 0.5012 | 0.5006 | Yes |
| Number of Clicks | 0.4959 | 0.5041 | 0.5005 | Yes |
| Click Through Probablity | -0.0013 | 0.0013 | 0.0001 | Yes |

## Effective Size Tests

In [18]:
control_df = control.dropna()
experiment_df = experiment.dropna()

In [21]:
control_df

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7723,687,134.0,70.0
1,"Sun, Oct 12",9102,779,147.0,70.0
2,"Mon, Oct 13",10511,909,167.0,95.0
3,"Tue, Oct 14",9871,836,156.0,105.0
4,"Wed, Oct 15",10014,837,163.0,64.0
5,"Thu, Oct 16",9670,823,138.0,82.0
6,"Fri, Oct 17",9008,748,146.0,76.0
7,"Sat, Oct 18",7434,632,110.0,70.0
8,"Sun, Oct 19",8459,691,131.0,60.0
9,"Mon, Oct 20",10667,861,165.0,97.0


### Gross Conversion

In [27]:
control_enroll = control_df['Enrollments'].sum()
experiment_enroll = experiment_df['Enrollments'].sum()
control_click = control_df['Clicks'].sum()
experiment_click= experiment_df['Clicks'].sum()
p_control = control_enroll/control_click
p_expriment = experiment_enroll/experiment_click
p_pool = (control_enroll+experiment_enroll)/(control_click+experiment_click)

In [28]:
diff = p_expriment-p_control
sd = np.sqrt(p_pool*(1-p_pool)*(1/control_click+1/experiment_click))
MOE = sd*1.96
print('Difference:',diff)
print('95% CI:',diff-MOE,diff+MOE)

Difference: -0.020554874580361565
95% CI: -0.0291233583354044 -0.01198639082531873


Confident interval does not contain zero and negative d_min = -0.01, meaning the result is both statistical and practical significant.

### Net Conversion

In [37]:
control_paid = control_df['Payments'].sum()
experiment_paid = experiment_df['Payments'].sum()

In [38]:
p_control = control_paid/control_click
p_expriment = experiment_paid/experiment_click
p_pool = (control_paid+experiment_paid)/(control_click+experiment_click)

In [39]:
diff = p_expriment-p_control
sd = np.sqrt(p_pool*(1-p_pool)*(1/control_click+1/experiment_click))
MOE = sd*1.96
print('Difference:',diff)
print('95% CI:',diff-MOE,diff+MOE)

Difference: -0.0048737226745441675
95% CI: -0.011604624359891718 0.001857179010803383


Confidence interval contains 0 and negative d_min = -0.0075. Meaning there is no evidence showing that significant difference exists between the control and experiment group.

### Sign Test
### https://www.graphpad.com/quickcalcs/binomial1/

#### Gross Conversion

In [51]:
gc_sign = 0
exp_days = len(control_df['Payments'])
for i in range(exp_days):
    daily_control_gc = control_df['Enrollments'].iloc[i]/control_df['Clicks'].iloc[i]
    daily_experiment_gc = experiment_df['Enrollments'].iloc[i]/experiment_df['Clicks'].iloc[i]
    if daily_experiment_gc < daily_control_gc:
        gc_sign += 1
print('Total Days:',exp_days)
print('Number of Successes:', gc_sign)

Total Days: 23
Number of Successes: 19


Result is consistent with effecitve size test.

![](gc_sign_test.PNG)

#### Net Conversion

In [55]:
nc_sign = 0
exp_days = len(control_df['Payments'])
for i in range(exp_days):
    daily_control_nc = control_df['Payments'].iloc[i]/control_df['Clicks'].iloc[i]
    daily_experiment_nc = experiment_df['Payments'].iloc[i]/experiment_df['Clicks'].iloc[i]
    if daily_control_nc < daily_experiment_nc:
        nc_sign += 1
print('Total Days:',exp_days)
print('Number of Successes:', nc_sign)

Total Days: 23
Number of Successes: 10


Result is consistent with effecitve size test.

![](nc_sign_test.PNG)

# Recommandations

I would recommand launch. The number of frustrated students who left the free trial has being successful reduced based on tests performed on the gross conversion. Also, the number of students to continue past the free trial and eventually complete the course was not being significatly reduced by the change based on the tests performed on the net conversion.