# Experiment Design

## Experiment Overview: Free Trial Screener

At the time of this experiment, Udacity courses currently have two options on the home page: "start free trial", and "access course materials". If the student clicks "start free trial", they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first. If the student clicks "access course materials", they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.


In the experiment, Udacity tested a change where if the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead.


The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.


The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

## Metric Choice

#### Invariant metrcis: 

1. *Number of cookies* (number of unique cookies to view the course overview page)
2. *Number of clicks* (number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger)
3. *Click-through-probability* (number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page)

#### Evaluation Metrics:

1. *Gross conversion*  (number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button) 
2. *Retention* (number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by number of user-ids to complete checkout)
3. *Net conversion* (number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button)


>For each metric, explain both why you did or did not use it as an invariant metric and why you did or did not use it as an evaluation metric. Also, state what results you will look for in your evaluation metrics in order to launch the experiment.


* Number of cookies is chosen as an invariant metric because viewing of the course overview page happens before experiment and is not changed by the experiment. Plus, a cookie is a unit of diversion by which we separate users into control and experiment groups therefore the number of cookies should be even in both groups.
* Number of user-ids is neither invariant nor evaluation metric for the experiment. It is the number of users who enroll in a free trial and it is dependent on the experiment. Moreover, it is not tracked if the user does not enroll in a free trial and therefore number of user ids-will be different between the control and the experiment groups which may skew the result of the experiment.
* Number of clicks is an invariant metric. Clicking on the "Start free trial" button happens before the experiment and is not dependent on it.
* Click-through-probability is chosen as an invariant metric. Basically, click-through-probability is a fraction of number of clicks in number of cookies which as stated above are invariant metrics themselves. Both viewing of course page and clicking “Start free trial" button happen before the experiment and is are not influenced by the experiment.
* Gross conversion is an evaluation metric in the experiment. It represents the fraction of users who enrolled in the trial in the total number of users who clicked the "Start free trial" button and is dependent on the experiment and therefore cannot be an invariant metric. Gross conversion is a good evaluation metric because it will show if the experiment led to decreasing costs for Udacity.
* Retention is an evaluation metric. It is not invariant metric because the number of user who enroll in a free trial is directly dependent on the experiment. It is a good evaluation metric since positive change in the metric will mean that the experiment led to increase in paying customers and therefore increase in the revenue.
* Net conversion is an evaluation metric and not invariant because the number of user who enroll in the trial is dependent on the experiment. It is a good evaluation metric since positive change in this metric means increase of revenue for Udacity.

To launch the experiment, I will need Gross conversion to have practically significant decrease. I am expecting to have gross conversion rate lower in the experiment group because if hypnosis holds experiment should prevent users who do not have enough time to commit to the course from enrolling and therefore decreasing cost for Udacity.

Net conversion on the other hand is required to have statistically significant increase. I expect net conversion to be higher in the experiment group since users in this group are aware of what it requires to finish the course and more likely to continue after the trial is over which will lead to increase in revenue.


## Measuring Standard Deviation

In [31]:
import math

#Sample number of cookies visiting the course overview page:
N_sample = 5000.0
#Unique cookies to view page per day:
N_unique = 40000.0
#Unique cookies to click "Start free trial" per day:
n_click = 3200.0
#Enrollments per day:
n_enr = 660
#Probability of enrolling, given click:
p_enr_click = 0.20625
#Probability of payment, given enroll:
p_pay_enr = 0.53
#Probability of payment, given click
p_pay_click = 0.1093125

#Click through probability
ctp = n_click/N_unique

#Probability of enrolling given pageview.
p_enr_view= n_enr/N_unique

#Gross Conversion standard deviation calcualtion:
SE_gc = math.sqrt(p_enr_click * (1-p_enr_click)/(ctp*N_sample))

#Net Conversion standard deviation calcualtion:
SE_nc = math.sqrt(p_pay_click * (1-p_pay_click)/(ctp*N_sample))

#Retention standard deviation calcualtion:
SE_ret = math.sqrt(p_pay_enr * (1-p_pay_enr)/(p_enr_view*N_sample))

print "Gross Conversion standard error: ", round(SE_gc,4)
print "Net Conversion standard error: ", round(SE_nc,4)
print "Retention standard error: ", round(SE_ret,4)


Gross Conversion standard error:  0.0202
Net Conversion standard error:  0.0156
Retention standard error:  0.0549


>For each of your evaluation metrics, indicate whether you think the analytic estimate would be comparable to the the empirical variability, or whether you expect them to be different (in which case it might be worth doing an empirical estimate if there is time). Briefly give your reasoning in each case

Both Gross conversion and Net conversion have a unit if diversion (number of cookies) as denominator therefore the analytic estimate would be comparable to the the empirical variability.

For Retention unit of analysis (number of users who complete checkout) is different form the unit of diversion therefore analytical and empirical estimates would be different.
