In [3]:
import numpy as np

## Experiment Overview: Free Trial Screener

At the time of this experiment, Udacity courses currently have two options on the course overview page: "start free trial", and "access course materials". If the student clicks "start free trial", they will be asked to enter their credit card information, and then they will be enrolled in a free trial for the paid version of the course. After 14 days, they will automatically be charged unless they cancel first. If the student clicks "access course materials", they will be able to view the videos and take the quizzes for free, but they will not receive coaching support or a verified certificate, and they will not submit their final project for feedback.

In the experiment, Udacity tested a change where if the student clicked "start free trial", they were asked how much time they had available to devote to the course. If the student indicated 5 or more hours per week, they would be taken through the checkout process as usual. If they indicated fewer than 5 hours per week, a message would appear indicating that Udacity courses usually require a greater time commitment for successful completion, and suggesting that the student might like to access the course materials for free. At this point, the student would have the option to continue enrolling in the free trial, or access the course materials for free instead. [This screenshot](img/experiment.png) shows what the experiment looks like.

The hypothesis was that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who left the free trial because they didn't have enough time—without significantly reducing the number of students to continue past the free trial and eventually complete the course. If this hypothesis held true, Udacity could improve the overall student experience and improve coaches' capacity to support students who are likely to complete the course.

The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users that do not enroll, their user-id is not tracked in the experiment, even if they were signed in when they visited the course overview page.

## Experiment Design

### Metric Choice

#### Invariant metrics
- **Number of cookies:** That is, number of unique cookies to view the course overview page. (dmin=3000)
- **Number of clicks:** That is, number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger). (dmin=240)
- **Click through probability:** That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page. (dmin=0.01)

#### Evaluation metrics
- **Gross conversion:** That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button. (dmin= 0.01)
- **Net conversion:** That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button. (dmin= 0.0075)


## Measuring Standard Deviation

Using baseline values we have:

- Unique cookies to view course overview page per day:	**40000**
- Unique cookies to click "Start free trial" per day:	**3200**
- Enrollments per day:	**660**
- Click-through-probability on "Start free trial":	**0.08**
- Probability of enrolling, given click:	**0.20625**
- Probability of payment, given enroll:	**0.53**
- Probability of payment, given click	**0.1093125**

To calculate standard deviation, we use this formula:

$$Formula = np.sqrt(p * (1-p) / n)$$

and using baseline data above.

Here we have a sample with 5000 cookies, so we need number of users who click "Start now" button:

In [1]:
n = 5000
clicks = 5000*0.08

clicks

400.0

The standard deviation for **Gross Conversion** is:

In [4]:
p = 0.20625
round(np.sqrt(p * (1-p) / clicks),4)

0.0202

And for **Net Conversion**:

In [5]:
p = 0.1093125
round(np.sqrt(p * (1-p) / clicks),4)

0.0156

## Sizing

### Number of Samples vs. Power

- Gross Conversion. **Baseline**: 0.20625 - **dmin**: 0.01
- Net Conversion. **Baseline**: 0.1093125 - **dmin**: 0.0075
- Not using Bonferroni correction.
- Using alpha = 0.05 and beta = 0.2

I used [this site](http://www.evanmiller.org/ab-testing/sample-size.html) to calculate those metrics.

- **Gross Conversion**: 25,835 cookies who clicks
- **Net Conversion**: 27,411 cookies who clicks

To calculate the total page views I'll use the bigger number, so the minimum required cookies is sufficient.

In [10]:
print 'The total page view is: ', int((27411 * 2) / 0.08)

The total page view is:  685275


### Duration vs. Exposure

The fraction of experiment exposure Udacity visitors will be 80%.

- **Fraction**: 0.8 (Low risk)
- **Duration**: 22 days (40000 pageviews/day)

## Experiment Analysis

### Sanity Checks

For each of your invariant metrics, give the 95% confidence interval for the value you expect to observe, the actual observed value, and whether the metric passes your sanity check.

**Number of Cookies**
- Bounds = (0.4988,0.5012)
- Observed = 0.5006

**Number of clicks on “Start free trial”**
- Bounds = (0.4959,0.5041)
- Observed = 0.5005

**Click-through-probability on “Start free trial”**
- Bounds = (0.0812,0.0830)
- Observed = 0.0821

All sanity checks have been passed! :)

We did this because sanity checks ensure that both experiment and groups have equal proportion.

Now I'll see the experiment and control data from [Experiment](experiment.csv) and [Control](control.csv).

In [16]:
import pandas as pd

experiment = pd.read_csv('experiment.csv', sep = ',')
control = pd.read_csv('control.csv', sep = ',')

In [17]:
experiment.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7716,686,105.0,34.0
1,"Sun, Oct 12",9288,785,116.0,91.0
2,"Mon, Oct 13",10480,884,145.0,79.0
3,"Tue, Oct 14",9867,827,138.0,92.0
4,"Wed, Oct 15",9793,832,140.0,94.0


In [18]:
control.head()

Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7723,687,134.0,70.0
1,"Sun, Oct 12",9102,779,147.0,70.0
2,"Mon, Oct 13",10511,909,167.0,95.0
3,"Tue, Oct 14",9871,836,156.0,105.0
4,"Wed, Oct 15",10014,837,163.0,64.0


Now we have to calculate total views and total clicks in both files

In [19]:
experiment_views = experiment['Pageviews'].sum()
experiment_clicks = experiment['Clicks'].sum()

control_views = control['Pageviews'].sum()
control_clicks = control['Clicks'].sum()

In [21]:
print 'Total experiment views: ', experiment_views
print 'Total experiment clicks: ', experiment_clicks
print 'Total control views: ', control_views
print 'Total control clicks: ', control_clicks

Total experiment views:  344660
Total experiment clicks:  28325
Total control views:  345543
Total control clicks:  28378


Now I'll do a sanity check in our data. 

First I'll use clicks and after views. We expect that the sanity will be closer to 0.5, so this means that we have equal proportion in both files :)

Let's see!

#### Views

In [26]:
goal = 0.5
SE = np.sqrt((goal*(1-goal))/(control_views + experiment_views))
ME = 1.96 * SE

print 'Views sanity check: ', goal-ME, goal+ME
print 'Actual proportion in your file: ', float(control_views)/(control_views+experiment_views)

Views sanity check:  0.498820392149 0.501179607851
Actual proportion in your file:  0.500639666881


#### Clicks

In [27]:
goal = 0.5
SE = np.sqrt((goal*(1-goal))/(control_clicks + experiment_clicks))
ME = 1.96 * SE

print 'Clicks sanity check: ', goal-ME, goal+ME
print 'Actual proportion in your file: ', float(control_clicks)/(control_clicks+experiment_clicks)

Clicks sanity check:  0.495884495724 0.504115504276
Actual proportion in your file:  0.500467347407


#### Click Through Probabilty

In [51]:
ctp_control = float(control_clicks)/control_views
ctp_experiment = float(experiment_clicks)/experiment_views

SE = np.sqrt(ctp_control*(1-ctp_control)/control_views)
ME = 1.96 * SE

print 'CTP sanity check: ', ctp_control-ME, ctp_control+ME
print 'Actual CTP in your file: ', ctp_experiment

CTP sanity check:  0.0812103597525 0.0830412673966
Actual CTP in your file:  0.0821824406662


How we saw, we have passed all of the sanity checks, so I'll keep going to do our experiment :)

## Result Analysis

### Effect Size Tests

For each of your evaluation metrics, give a 95% confidence interval around the difference between the experiment and control groups. Indicate whether each metric is statistically and practically significant.

**Gross Conversion**
- Bounds = (-0.0291233583354,-0.0119863908253)
- Observed = 0.00856848375504

**Net Conversion**
- Bounds = (-0.00717728190218, 0.00123178641406)
- Observed = 0.00420453415812

### Gross Conversion

In [82]:
control_clicks = control['Clicks'].dropna().sum()
control_enrollments = control['Enrollments'].dropna().sum()

experiment_clicks = experiment['Clicks'].dropna().sum()
experiment_enrollments = experiment['Enrollments'].dropna().sum()

print 'Control Clicks: ', int(control_clicks)
print 'Control Enrollments: ', int(control_enrollments)
print 'Experiment Clicks: ', int(experiment_clicks)
print 'Experiment Enrollments: ', int(experiment_enrollments)

Control Clicks:  28378
Control Enrollments:  3785
Experiment Clicks:  28325
Experiment Enrollments:  3423


In [57]:
print 'Control file result: ', control_enrollments/control_clicks
print 'Experiment file result: ', experiment_enrollments/experiment_clicks

Control file result:  0.218874689181
Experiment file result:  0.1983198146


Now we'll see if our standard deviation is ok :)

In [59]:
# first of all, we'll get the difference between control and experiment files

diff = experiment_enrollments/experiment_clicks - control_enrollments/control_clicks
diff

-0.020554874580361565

The *diff* is above of *d_min* that equal 0.01, minimum detectable effect.

In [76]:
prop = (control_enrollments+experiment_enrollments)/(control_clicks+experiment_clicks)
SE = np.sqrt((prop*(1-prop)) * (1/float(control_clicks) + 1/float(experiment_clicks)))
ME = 1.96 * SE

print ME

print diff-ME, diff+ME

0.00856848375504
-0.0291233583354 -0.0119863908253


And one more comment is that the observed diff is outside the confidance interval.

### Net Conversion

In [87]:
control_clicks = control['Clicks'].dropna().sum()
control_payments = control['Payments'].dropna().sum()

experiment_clicks = experiment['Clicks'].dropna().sum()
experiment_payments = experiment['Payments'].dropna().sum()

print 'Control Clicks: ', int(control_clicks)
print 'Control Payments: ', int(control_payments)
print 'Experiment Clicks: ', int(experiment_clicks)
print 'Experiment Payments: ', int(experiment_payments)

Control Clicks:  28378
Control Payments:  2033
Experiment Clicks:  28325
Experiment Payments:  1945


In [88]:
print 'Control file result: ', control_payments/control_clicks
print 'Experiment file result: ', experiment_payments/experiment_clicks

Control file result:  0.0716400028191
Experiment file result:  0.068667255075


In [89]:
# first of all, we'll get the difference between control and experiment files

diff = experiment_payments/experiment_clicks - control_payments/control_clicks
diff

-0.0029727477440631422

In [90]:
prop = (control_payments+experiment_payments)/(control_clicks+experiment_clicks)
SE = np.sqrt((prop*(1-prop)) * (1/float(control_clicks) + 1/float(experiment_clicks)))
ME = 1.96 * SE

print ME

print diff-ME, diff+ME

0.00420453415812
-0.00717728190218 0.00123178641406


### Sign Tests

For each of your evaluation metrics, do a sign test using the day-by-day data, and report the p-value of the sign test and whether the result is statistically significant. Sign Test is also a test that must be confirmed with effect size test.

To compare day-by-day, I'll create a new DataFrame with 3 columns:
- Control probability
- Experiment probability
- Result

Let's do this! :)

In [94]:
sign = pd.DataFrame()

sign['control_gross'] = control['Enrollments'].dropna()/control['Clicks'].dropna()
sign['control_net'] = control['Payments'].dropna()/control['Clicks'].dropna()
sign['experiment_gross'] = experiment['Enrollments'].dropna()/experiment['Clicks'].dropna()
sign['experiment_net'] = experiment['Payments'].dropna()/experiment['Clicks'].dropna()

sign.head()

Unnamed: 0,control_gross,control_net,experiment_gross,experiment_net
0,0.195051,0.101892,0.153061,0.049563
1,0.188703,0.089859,0.147771,0.115924
2,0.183718,0.10451,0.164027,0.089367
3,0.186603,0.125598,0.166868,0.111245
4,0.194743,0.076464,0.168269,0.112981


### Gross Conversion

In [98]:
print 'True: ', (sign['control_gross'] < sign['experiment_gross']).sum()
print 'False: ', (sign['control_gross'] > sign['experiment_gross']).sum()

True:  4
False:  19


With this results, I got the *p* value equal *0.0026* and it has Statistical Significance.

### Net Conversion

In [99]:
print 'True: ', (sign['control_net'] < sign['experiment_net']).sum()
print 'False: ', (sign['control_net'] > sign['experiment_net']).sum()

True:  10
False:  13


With this results, I got the *p* value equal *0.6776* and it hasn't Statistical Significance.

## Conclusion

In this experiment, I **didn't use Bonferroni** correction because it needs that all metrics to be significantly different and this is not what we saw here.
- **Gross conversion:** Need significant
- **Net conversion:** doesn't need

## Recommendation

- Gross conversion is good because it passes Udacity's practical significance.
- Net conversion is not statistically significant, so Udacity could be lose money with this experiment.

## Follow-Up Experiment

free-trial = 14 dias
alunos frustrados = cancelaram no período free por não terem tempo

Na teoria, um aluno frustrado é aquele que cancela no período free (14 dias após a assinatura) por não ter tempo suficiente para se dedicar ao curso ou por conta do pagamento(hipótese). 
Para reduzir o número de alunos frustrados, ou melhor, para tentar reduzir acredito que podemos fazer as seguintes atividades:
- E se oferecermos o primeiro curso para ele completar gratuitamente e assim que terminá-lo, ele teria que voltar e ser Code Reviewer do curso?

Tenho as seguintes hipóteses por conta dessa questão:
- Iremos conseguir reduzir o número de usuários que cancelam muito cedo nos cursos.
- Teremos mais engajamento dos usuários após o incentivo

Com isso conseguimos atacar dois pontos:
- Code Reviewer da Udacity recebe pela revisão que faz. Fazendo isso, a Udacity não precisaria pagá-lo, ou seja, iria economizar (o que acabaria compensando o curso gratuito ao usuário).
- Com o curso gratuito, o usuário irá continuá-lo até o fim, mesmo que ele termine em mais tempo do que o sugerido.

Porém temos alguns riscos com essa ideia:
- O usuário pode cancelar o plano dele bem no meio.
- Terminar o curso, porém não voltar para ser um Code Reviewer

Em relação às métricas, para as invariantes nós podemos usar as seguintes:
- **Number of cookies:** That is, number of unique cookies to view the course overview page.
- **Number of clicks:** That is, number of unique cookies to click the "Start free trial" button (which happens before the free trial screener is trigger).
- **Click-through-probability:** That is, number of unique cookies to click the "Start free trial" button divided by number of unique cookies to view the course overview page.
- **Gross conversion:** That is, number of user-ids to complete checkout and enroll in the free trial divided by number of unique cookies to click the "Start free trial" button.

Para as de evaluation:
- **Debt Conversion:** That is, number of user-ids to click “Start Debt Program” divided by number of user-ids that enroll in the free trial.
- **Debt-Net conversion:** That is, number of user-ids to click “Start Debt Program” divided by number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment)
- **Net conversion:** That is, number of user-ids to remain enrolled past the 14-day boundary (and thus make at least one payment) divided by the number of unique cookies to click the "Start free trial" button.

