# Experiment Overview: Free Trial Screener

At the time of this experiment, Udacity courses currently have two options on the course overview page: "start free trial", and "access course materials". If the student clicks "start free trial", they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first. If the student clicks "access course materials", they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.

In the experiment, Udacity tested a change where if the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead.

The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.



The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

# Metric Choice

### user funnel analysis

![title](user_funnel.png)

### Hypothesis 

The treatment has no effect on the number of students who continue past the free trial

### Invariant metrics

- The number of cookies that browse course page 
- The click through probability
- The number of unique cookies that click on the 'start free trial' buttion


### Evaluation metrics

- Gross conversion 
- Retention 
- Net conversion 

### Measuring Standard Deviation of evaluation metrics 

- Unique cookies to view course overview page per day: 40000
- Unique cookies to click "Start free trial" per day: 3200
- Enrollments per day: 660
- Click-through-probability on "Start free trial": 0.08
- Probability of enrolling, given click: 0.20625
- Probability of payment, given enroll: 0.53
- Probability of payment, given click: 0.1093125

In [23]:
d = {"Metric": ["Cookies", "Clicks", "Enrollments", "Click-through-probability", "Gross conversion", "Retention", "Net conversion"], 
     "Population": [40000, 3200, 660, 0.08, 0.20625, 0.53, 0.109313],
     "Sample": [5000, 400, 82.5, np.nan, np.nan, np.nan, np.nan]}
dat = pd.DataFrame(data=d)
dat

Unnamed: 0,Metric,Population,Sample
0,Cookies,40000.0,5000.0
1,Clicks,3200.0,400.0
2,Enrollments,660.0,82.5
3,Click-through-probability,0.08,
4,Gross conversion,0.20625,
5,Retention,0.53,
6,Net conversion,0.109313,


Given a sample size of 5000 cookies visiting the course overview page, the standard error is $ \sqrt{\frac {(1-p)p} {n}}$

In [24]:
dat['standard error'] = np.nan

In [26]:
dat.iloc[4,3] = np.sqrt(dat.iloc[4,1]*(1-dat.iloc[4,1])/400)

In [27]:
dat.iloc[5,3] = np.sqrt(dat.iloc[5,1]*(1-dat.iloc[5,1])/82.5)

In [29]:
dat.iloc[6,3] = np.sqrt(dat.iloc[6,1]*(1-dat.iloc[6,1])/400)

In [30]:
dat

Unnamed: 0,Metric,Population,Sample,standard error
0,Cookies,40000.0,5000.0,
1,Clicks,3200.0,400.0,
2,Enrollments,660.0,82.5,
3,Click-through-probability,0.08,,
4,Gross conversion,0.20625,,0.020231
5,Retention,0.53,,0.054949
6,Net conversion,0.109313,,0.015602


### Choosing Number of Samples given Power

Set $\alpha = 0.05 $ and $\beta = 0.2$
 
- alpha: the probabilty of reject null when null is true is bounded by 0.05

- 1- beta: the probability of rejecting the null hypothesis when null is false is 0.8

In [33]:
dat['Minimum Detectable Effect'] = [np.nan, np.nan, np.nan, np.nan, 0.01, 0.01, 0.0075]
dat['Required pageviews'] = [np.nan, np.nan, np.nan, np.nan, 645875, 4741212, 685325]

In [34]:
dat

Unnamed: 0,Metric,Population,Sample,standard error,Minimum Detectable Effect,Required pageviews
0,Cookies,40000.0,5000.0,,,
1,Clicks,3200.0,400.0,,,
2,Enrollments,660.0,82.5,,,
3,Click-through-probability,0.08,,,,
4,Gross conversion,0.20625,,0.020231,0.01,645875.0
5,Retention,0.53,,0.054949,0.01,4741212.0
6,Net conversion,0.109313,,0.015602,0.0075,685325.0


### Choosing Duration vs. Exposure

If the change is not risky, we might want a larger fraction of population to be exposed to the experiment so that we can get results faster. However, even if we decide to make the change visible to entire population, it requires more than 100 days to get enough pageviews to measure the effect on retention. Therefore, we should focus on gross conversion and net conversion instead.

Suppose we want the entire population to be exposed to the change, then we can get result within 18 days.

In [35]:
def calculate_days(fraction, required_size):
    return required_size/(fraction*40000)

In [37]:
calculate_days(1, 685325)

17.133125

# Experiment Analysis

### Sanity checks 

Checking whether invariant metrics are equivalent between the two groups by computing confidence intervals. We should not proceed to the rest of the analysis unless all sanity checks pass.

Given the data observed, all the invariant metrics pass the sanity check

In [38]:
raw_data = pd.read_excel('results.xlsx', sheet_name=None)

In [44]:
control = raw_data['Control']

In [45]:
exp = raw_data['Experiment']

In [46]:
exp.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7716,686,105.0,34.0
1,"Sun, Oct 12",9288,785,116.0,91.0
2,"Mon, Oct 13",10480,884,145.0,79.0
3,"Tue, Oct 14",9867,827,138.0,92.0
4,"Wed, Oct 15",9793,832,140.0,94.0


In [70]:
sanity_check = pd.DataFrame()

In [71]:
# construct confidence interval for pageviews
control_pageviews = control['Pageviews'].sum()
exp_pageviews = exp['Pageviews'].sum()
observed_pageview = control_pageviews/(exp_pageviews + control_pageviews)
mu = 0.5
se = np.sqrt(mu*(1-mu)/(exp_pageviews+control_pageviews))
pageview_right = se*1.96 + 0.5
pageview_left = 0.5 - 1.96*se

In [72]:
sanity_check['metrics'] = ['pageview', 'clicks', 'click_through_prob']
sanity_check['CI_left'] = [pageview_left, np.nan, np.nan]
sanity_check['CI_right'] = [pageview_right, np.nan, np.nan]
sanity_check['observed'] = [observed, np.nan, np.nan]

In [73]:
# construct confidence interval for clicks 
control_clicks = control['Clicks'].sum()
exp_clicks = exp['Clicks'].sum()
observed_click = control_clicks/(exp_clicks + control_clicks)
mu = 0.5
se = np.sqrt(mu*(1-mu)/(exp_clicks + control_clicks))
click_right = se*1.96 + 0.5
click_left = 0.5 - 1.96*se

In [74]:
sanity_check['metrics'] = ['pageview', 'clicks', 'click_through_prob']
sanity_check['CI_left'] = [pageview_left, click_left, np.nan]
sanity_check['CI_right'] = [pageview_right, click_right, np.nan]
sanity_check['observed'] = [observed_pageview, observed_click, np.nan]

In [83]:
# construct confidence interval for click through probability
control_prob = control["Clicks"].sum()/control["Pageviews"].sum()
exp_prob = exp["Clicks"].sum()/exp["Pageviews"].sum()
observed_prob = exp_prob -control_prob 
mu = 0
se_control = (control_prob*(1-control_prob))**0.5
se_exp = (exp_prob*(1-exp_prob))**0.5
se_diff = (se_control**2/control["Pageviews"].sum()+se_exp**2/exp["Pageviews"].sum())**0.5
prob_left = -1.96*se_diff
prob_right = 1.96*se_diff

In [85]:
sanity_check['metrics'] = ['pageview', 'clicks', 'click_through_prob']
sanity_check['CI_left'] = [pageview_left, click_left, prob_left]
sanity_check['CI_right'] = [pageview_right, click_right, prob_right]
sanity_check['observed'] = [observed_pageview, observed_click, observed_prob]

In [86]:
sanity_check

Unnamed: 0,metrics,CI_left,CI_right,observed
0,pageview,0.49882,0.50118,0.50064
1,clicks,0.495884,0.504116,0.500467
2,click_through_prob,-0.001296,0.001296,5.7e-05


### Check for Practical and Statistical Significance

For evaluation metrics, calculate a confidence interval for the difference between the experiment and control groups, and check whether each metric is statistically and/or practically significance. A metric is statistically significant if the confidence interval does not include 0 (that is, we can be confident there was a change), and it is practically significant if the confidence interval does not include the practical significance boundary (that is, we can be confident there is a change that matters to the business.)


In [88]:
# true sample size is smaller than our desired sample size 
test_result = pd.DataFrame()
sample_size = control.iloc[:23]["Pageviews"].sum()+exp.iloc[:23]["Pageviews"].sum()
sample_size

423525

In [110]:
# two_tailed Z test for gross conversion
control_gross = control.iloc[:23]['Enrollments'].sum()/control.iloc[:23]["Clicks"].sum()
exp_gross = exp.iloc[:23]['Enrollments'].sum()/exp.iloc[:23]["Clicks"].sum()
observed_gross = exp_gross - control_gross
mu = 0
se_control = (control_gross*(1-control_gross))**0.5
se_exp = (exp_gross*(1-exp_gross))**0.5
se_diff = (se_control**2/control.iloc[:23]["Clicks"].sum()+se_exp**2/exp.iloc[:23]["Clicks"].sum())**0.5

gross_left = -1.96*se_diff + observed_gross
gross_right = 1.96*se_diff + observed_gross

In [111]:
test_result['metric'] = ['gross conversion', 'net conversion']
test_result['CL_left'] = [gross_left, np.nan]
test_result['CL_right'] = [gross_right, np.nan]
test_result['observed_difference'] = [observed_gross, np.nan]
test_result['minimal_pratical_diff'] = [0.01, np.nan]
test_result['statistically_significant'] = [True, np.nan]
test_result['practically_significant'] = [True, np.nan]

In [112]:
# two_tailed Z test for net conversion
control_net = control.iloc[:23]['Payments'].sum()/control.iloc[:23]["Clicks"].sum()
exp_net = exp.iloc[:23]['Payments'].sum()/exp.iloc[:23]["Clicks"].sum()
observed_net = exp_net - control_net
se_control = (control_net*(1-control_net))**0.5
se_exp = (exp_net*(1-exp_net))**0.5
se_diff = (se_control**2/control.iloc[:23]["Clicks"].sum()+se_exp**2/exp.iloc[:23]["Clicks"].sum())**0.5

net_left = -1.96*se_diff + observed_net
net_right = 1.96*se_diff + observed_net


In [117]:
test_result['metric'] = ['gross conversion', 'net conversion']
test_result['CL_left'] = [gross_left, net_left]
test_result['CL_right'] = [gross_right, net_right]
test_result['observed_difference'] = [observed_gross, observed_net]
test_result['minimal_pratical_diff'] = [0.01, 0.0075]
test_result['statistically_significant'] = [True, False]
test_result['practically_significant'] = [True, False]

In [118]:
test_result

Unnamed: 0,metric,CL_left,CL_right,observed_difference,minimal_pratical_diff,statistically_significant,practically_significant
0,gross conversion,-0.02912,-0.011989,-0.020555,0.01,True,True
1,net conversion,-0.011604,0.001857,-0.004874,0.0075,False,False


#  Interpretation of Test Results 

In this experiemnt, we are testing the assumption that adding a Free Trial Screener will reduce the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If the assumption is true, we would expect to see a decrease in Gross Conversion and an increase in Retention. Also, the Net Conversion should be unchanged. 

Based on the results, a statistically and practically signficant decrease in Gross Conversion was observed but we are not sure about the effect of this sceener on Net Conversion. Given these results, we can only tell that the Free Trial Screener indeed help to set clearer expectations for students upfront. Based on the result of this experiment, I would recommend not to roll out this feature since this feature might actually decrease revenue. 


