# Udacity A/B Testing Report

## Experiment Design
In this experiment, Udacity changes the process so that if a student clicks "start free trial," they are asked how much time they have available to devote to the course. If the student indicates 5 or more hours per week, they go through the checkout process as usual. If they indicate fewer than 5 hours per week, a message appears indicating that Udacity courses usually require a greater time commitment for successful completion, and suggests that the student might like to access the course materials for free. At this point, the student has the option to continue enrolling in the free trial or access the course materials for free instead. This screenshot shows what the experiment looks like.

The hypothesis is that this might set clearer expectations for students upfront, thus reducing the number of frustrated students who leave the free trial because they don't have enough time—without significantly reducing the number of students who continue past the free trial and eventually complete the course. If this hypothesis holds true, Udacity can improve the overall student experience and increase coaches' capacity to support students who are likely to complete the course.

The unit of diversion is a cookie, although if the student enrolls in the free trial, they are tracked by user-id from that point forward. The same user-id cannot enroll in the free trial twice. For users who do not enroll, their user-id is not tracked in the experiment, even if they are signed in when they visit the course overview page.

## Metric Choice
**Invariant Metrics:**
- Number of Cookies
- Number of Clicks on “Start Free Trial”
- Click-through-probability on "Start free trial"

These are the metrics that we do not expect to vary between the control and treatment groups. The number of cookies is the unit of diversion and therefore should be split evenly. Given that the control groups are split evenly, the number of clicks on “Start Free Trial” should also be similar, as the change has not occurred yet. The Click-through-probability on "Start free trial" should also be the same because the change has not occurred yet.

**Evaluation Metrics:**
- Retention
- Gross Conversion
- Net Conversion

These are the metrics that may/should change. Retention is our main metric of concern and our objective is to make this metric go up. If we reduce the number of students that leave the free trial, then we expect a higher proportion of students continuing after the free trial. Gross Conversion is a metric that could potentially go down. Net conversion may go up or down and is a product of Gross Conversion and Retention

## Measuring Standard Deviation

Based on a sample of 5000 cookies

| Metric Name      | Estimate | dmin    | SE       |
|------------------|----------|---------|----------|
| Gross conversion | 0.206250 | -0.0100 | 0.020231 |
| Retention        | 0.530000 | 0.0100  | 0.054949 |
| Net conversion   | 0.109313 | 0.0075  | 0.015602 |

The empirical variability of Retention is likely to be different from the analytic estimate because the unit of analysis is now different from the unit of diversion. The denominator (users who enrolled) is post-randomization (enrollment happens after treatment assignment).

This creates dependency between numerator and denominator, violating the independence assumption. In which case the analytical variance likely underestimates the true variability.

Gross Conversion and Net Conversion likely have good analytic estimates because the unit of analysis is still the number of unique cookies.

## Sizing
**Retention:**
We need 39,115 user-ids per group, which would become 2,370,607 when scaled to the number of cookies to view page. In total, we would need 4,741,214 cookies to view page to detect a minimum effect of 0.01 at 80% power and a 5% significance level.

**Gross Conversion:**
We need 25,835 cookies to click “Start Free Trial”, which would become 322,938 when scaled to the number of cookies to view page. In total we would need 645,876 cookies to view page to detect a minimum effect of -0.01 at 80% power and a 5% significance level.

**Net Conversion:**
We need 27,413 cookies to click “Start Free Trial”, which would become 342,538 when scaled to the number of cookies to view page. In total we would need 685,076 cookies to view page to detect a minimum effect of -0.01 at 80% power and a 5% significance level.

## Duration and Exposure
With 100% traffic diverted, we would need
- Gross Conversion : 16.15 days
- Net Conversion : 17.13 days
- Retention : 118.53 days

Retention is unfeasible to use as our evaluation metric, as it would take too long to see the effect. It would also be ill-advised to divert 100% of traffic because we wouldn’t be able to run any other experiments. 

With 80% of traffic diverted, we could complete the experiment for measuring Net Conversion and Gross Conversion in about 21.4 days, or about 3 weeks. The experiment itself is not very risky in nature so diverting 80% would be ambitious but feasible.


## Analysis

In [2]:
import numpy as np
import pandas as pd

In [17]:
control = pd.read_csv('control.csv')
control.head()


Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7723,687,134.0,70.0
1,"Sun, Oct 12",9102,779,147.0,70.0
2,"Mon, Oct 13",10511,909,167.0,95.0
3,"Tue, Oct 14",9871,836,156.0,105.0
4,"Wed, Oct 15",10014,837,163.0,64.0


In [18]:
treatment = pd.read_csv('treatment.csv')
treatment.head()


Unnamed: 0,Date,Pageviews,Clicks,Enrollments,Payments
0,"Sat, Oct 11",7716,686,105.0,34.0
1,"Sun, Oct 12",9288,785,116.0,91.0
2,"Mon, Oct 13",10480,884,145.0,79.0
3,"Tue, Oct 14",9867,827,138.0,92.0
4,"Wed, Oct 15",9793,832,140.0,94.0


In [5]:
#check sample size and store it as sample_size
sample_size_control = control["Pageviews"].sum()
sample_size_treatment = treatment["Pageviews"].sum()
sample_size = sample_size_control+sample_size_treatment
print(sample_size)

690203


### Sanity Checks

In [8]:
# Calculate Click-Through-Probability (CTP) for both groups
control_ctp = control['Clicks'].sum() / control['Pageviews'].sum()
treatment_ctp = treatment['Clicks'].sum() / treatment['Pageviews'].sum()

# Calculate sample sizes
n_control = control['Pageviews'].sum()
n_treatment = treatment['Pageviews'].sum()

# Calculate the difference in CTPs
ctp_difference = treatment_ctp - control_ctp

# Standard error for the difference between two proportions
# SE = sqrt(p1(1-p1)/n1 + p2(1-p2)/n2)
se_diff = np.sqrt(
    (control_ctp * (1 - control_ctp)) / n_control + 
    (treatment_ctp * (1 - treatment_ctp)) / n_treatment
)

# Calculate 95% confidence interval for the difference
z_score = 1.96
ci_diff = (ctp_difference - z_score * se_diff, ctp_difference + z_score * se_diff)

print(f"Control CTP: {control_ctp:.6f}")
print(f"Treatment CTP: {treatment_ctp:.6f}")
print(f"CTP Difference (Treatment - Control): {ctp_difference:.6f}")
print(f"95% CI for difference: ({ci_diff[0]:.6f}, {ci_diff[1]:.6f})")



Control CTP: 0.082126
Treatment CTP: 0.082182
CTP Difference (Treatment - Control): 0.000057
95% CI for difference: (-0.001239, 0.001352)


In [11]:
from scipy import stats

# Pair the daily metrics and calculate differences
pageview_differences = control['Pageviews'] - treatment['Pageviews']
click_differences = control['Clicks'] - treatment['Clicks']

# Perform sign test for pageviews
pageview_sign_test = stats.binomtest(
    (pageview_differences > 0).sum(),  # number of positive differences
    n=len(pageview_differences),       # total number of pairs
    p=0.5                             # null hypothesis probability
)

# Perform sign test for clicks
click_sign_test = stats.binomtest(
    (click_differences > 0).sum(),     # number of positive differences
    n=len(click_differences),          # total number of pairs
    p=0.5                             # null hypothesis probability
)

print("Sign Test Results:")
print(f"Pageviews: p-value = {pageview_sign_test.pvalue:.4f}")
print(f"Number of days control > treatment (pageviews): {(pageview_differences > 0).sum()}")
print(f"Number of days control < treatment (pageviews): {(pageview_differences < 0).sum()}")
print(f"Number of days equal (pageviews): {(pageview_differences == 0).sum()}\n")

print(f"Clicks: p-value = {click_sign_test.pvalue:.4f}")
print(f"Number of days control > treatment (clicks): {(click_differences > 0).sum()}")
print(f"Number of days control < treatment (clicks): {(click_differences < 0).sum()}")
print(f"Number of days equal (clicks): {(click_differences == 0).sum()}")

Sign Test Results:
Pageviews: p-value = 0.3240
Number of days control > treatment (pageviews): 22
Number of days control < treatment (pageviews): 14
Number of days equal (pageviews): 1

Clicks: p-value = 1.0000
Number of days control > treatment (clicks): 18
Number of days control < treatment (clicks): 18
Number of days equal (clicks): 1


The sanity checks are passed, as there is no statistically significant difference in the click-through-probabilities between the two groups

### A/B Test

In [20]:
# Sample sizes for denominator (Clicks with non-null Enrollments/Payments)
control_clicks = control.loc[control['Enrollments'].notnull(), 'Clicks'].sum()
treatment_clicks = treatment.loc[treatment['Enrollments'].notnull(), 'Clicks'].sum()

# Calculate Gross Conversion and Net Conversion for both groups
control_gross = control['Enrollments'].sum() / control_clicks
treatment_gross = treatment['Enrollments'].sum() / treatment_clicks
control_net = control['Payments'].sum() / control_clicks
treatment_net = treatment['Payments'].sum() / treatment_clicks



# Standard errors
se_gross = np.sqrt(
    (control_gross * (1 - control_gross)) / control_clicks +
    (treatment_gross * (1 - treatment_gross)) / treatment_clicks
)
se_net = np.sqrt(
    (control_net * (1 - control_net)) / control_clicks +
    (treatment_net * (1 - treatment_net)) / treatment_clicks
)

# Differences
gross_diff = treatment_gross - control_gross
net_diff = treatment_net - control_net

# 95% confidence intervals
ci_gross = (gross_diff - z_score * se_gross, gross_diff + z_score * se_gross)
ci_net = (net_diff - z_score * se_net, net_diff + z_score * se_net)

print(f"Control Gross Conversion: {control_gross:.6f}")
print(f"Treatment Gross Conversion: {treatment_gross:.6f}")
print(f"Gross Conversion Diff: {gross_diff:.6f}")
print(f"95% CI for Gross Conversion Diff: ({ci_gross[0]:.6f}, {ci_gross[1]:.6f})\n")

print(f"Control Net Conversion: {control_net:.6f}")
print(f"Treatment Net Conversion: {treatment_net:.6f}")
print(f"Net Conversion Diff: {net_diff:.6f}")
print(f"95% CI for Net Conversion Diff: ({ci_net[0]:.6f}, {ci_net[1]:.6f})")

Control Gross Conversion: 0.218875
Treatment Gross Conversion: 0.198320
Gross Conversion Diff: -0.020555
95% CI for Gross Conversion Diff: (-0.029120, -0.011989)

Control Net Conversion: 0.117562
Treatment Net Conversion: 0.112688
Net Conversion Diff: -0.004874
95% CI for Net Conversion Diff: (-0.011604, 0.001857)


There is a statistically significant and practically significant difference for Gross Conversion but not for Net Conversion. 

## Conclusion

Our hypothesis that Gross Conversion would decrease from filtering out students that were less suited for enrollment was supported by our A/B test. However, this experiment alone does not give a sufficient indication on whether the enrolled student experience improved and does not show that there is a change in the number of paying users. Given this result, we would recommend Udacity not to launch the change, as there is no clear benefit shown from these results alone. Further experiements could explore whether or not enrolled students had a better experience and how student coaching factors into satisfaction.

## Follow up experiment

We could test whether users that started a free trial would be more likely to become a paying user if they received a prompt for student coaching during the free trial. The impact, if significant, would be fairly impactful for business revenue.