In [3]:
import numpy as np
import pandas as pd
import math
from scipy.stats import binom_test 

from IPython.display import Image

![title](./business_process.jpeg)

# Calculating Standard Deviation


Estimates of the baseline values for metrics are collected as:

|  | Number |
| :- | :- |
| Unique cookies to view course overview page per day | 40000 |
| Unique cookies to click "Start free trial" per day | 3200 |
| Enrollments per day | 660 |
| Click-through-probability on "Start free trial"| 0.08 |
| Probability of enrolling, given click| 0.20625 |
| Probability of payment, given enroll| 0.53 |
| Probability of payment, given click | 0.1093125  |


Use baseline values to make an analytic estimate of its standard deviation, given a sample size of 5000 cookies visiting the course overview page.


Retention and Net conversion are two selected evaluation metrics. Their Standard Deviations are estimated as:

In [2]:
n_total_observe = 40000
n_click = 3200
n_enroll = 660
p_click_through = n_click / n_total_observe
p_enroll = n_enroll / n_click
p_retention = 0.53
p_net_conversion = p_retention * p_enroll

n_exp = 5000

n_retention = n_enroll / n_total_observe * n_exp
n_net_conversion = n_click / n_total_observe * n_exp

In [5]:
std_retention = np.sqrt(p_retention * (1-p_retention) / n_retention)
std_net_conversion = np.sqrt(p_net_conversion * (1 - p_net_conversion) / n_net_conversion)
print (f'Retention sd: {std_retention}')
print (f'Net conversion sd: {std_net_conversion}')

Retention sd: 0.05494901217850908
Net conversion sd: 0.015601544582488459


# Sizing

## Multiple metrics correction

Because Retention and Net conversion are highly related metrics, using Bonferroni correction would be too conservative. 


## Size estimation


Using Evan Millar's site to help estimate size per group: https://www.evanmiller.org/ab-testing/sample-size.html

Using significance level alpha = 5% and statistical power 1-beta = 80% across the board.

### Retention

Given:

* Baseline convresion rate: 53%
* Minimum detectable effect: 1%

Estimated: 

* Sample size per variation: 39,115

Therefore,

* Experiment and control group size: 78,230 (= 39115 * 2)
* Number of pageviews: 4,741,212 (= 78230 / 660 * 40000)

### Net Conversion

Given:

* Baseline conversion rate: 10.93%
* Minimum detectable effect: 0.75%

Estimated: 

* Sample size per variation: 27,411

Therefore,

* Experiment and control group size: 54,822 (= 27411 * 2)
* Number of pageviews: 685,275 (= 54822 / 3200 * 40000)

## Conclusion

Given retention requires more pageviews than net conversion, the larger pageview is needed as for experiments: 4,741,212.

# Duration and exposure

## Retention required duration

In [14]:
pageview_retention = 4741212
fraction = 1
duration_retention = pageview_retention / n_total_observe

print (f'With {fraction} trafic, it takes {math.ceil(duration_retention)} days to collect enough pageviews to be able to reach conclusion for retention.')


With 1 trafic, it takes 119 days to collect enough pageviews to be able to reach conclusion for retention.


## Net conversion required duration

In [15]:
pageview_net_conversion = 685275
fraction = 1
duration_net_conversion = pageview_net_conversion / n_total_observe

print (f'With {fraction} trafic, it takes {math.ceil(duration_net_conversion)} days to collect enough pageviews to be able to reach conclusion for retention.')


With 1 trafic, it takes 18 days to collect enough pageviews to be able to reach conclusion for retention.


## Conclusion

Because 119 days are too long to draw conclusions of the experiment, an early decision is required. Taking 18 days to evaluate net conversion should be sufficient.

# Sanity checks

For each invariant metrics, check whether experiment data fall in confidence intervals and pass sanity checks. Invariant metrics are:

* Number of cookies
* Number of clicks: Number pf unique cookies to click the "Start free tiral"
* Click-through-probability: number of clicks divided by number of unique cookies to view the course overview page.

The experiment collected data are:

In [16]:
pageviews_cont=[ 7723,  9102, 10511,  9871, 10014,  9670,  9008,  7434,  8459,
       10667, 10660,  9947,  8324,  9434,  8687,  8896,  9535,  9363,
        9327,  9345,  8890,  8460,  8836,  9437,  9420,  9570,  9921,
        9424,  9010,  9656, 10419,  9880, 10134,  9717,  9192,  8630,
        8970]
pageviews_exp=[ 7716,  9288, 10480,  9867,  9793,  9500,  9088,  7664,  8434,
       10496, 10551,  9737,  8176,  9402,  8669,  8881,  9655,  9396,
        9262,  9308,  8715,  8448,  8836,  9359,  9427,  9633,  9842,
        9272,  8969,  9697, 10445,  9931, 10042,  9721,  9304,  8668,
        8988]
clicks_cont=[687, 779, 909, 836, 837, 823, 748, 632, 691, 861, 867, 838, 665,
       673, 691, 708, 759, 736, 739, 734, 706, 681, 693, 788, 781, 805,
       830, 781, 756, 825, 874, 830, 801, 814, 735, 743, 722]
clicks_exp=[686, 785, 884, 827, 832, 788, 780, 652, 697, 860, 864, 801, 642,
       697, 669, 693, 771, 736, 727, 728, 722, 695, 724, 789, 743, 808,
       831, 767, 760, 850, 851, 831, 802, 829, 770, 724, 710]
enrolls_cont=[134, 147, 167, 156, 163, 138, 146, 110, 131, 165, 196, 162, 127,
       220, 176, 161, 233, 154, 196, 167, 174, 156, 206]
enrolls_exp=[105, 116, 145, 138, 140, 129, 127,  94, 120, 153, 143, 128, 122,
       194, 127, 153, 213, 162, 201, 207, 182, 142, 182]
payment_cont=[ 70,  70,  95, 105,  64,  82,  76,  70,  60,  97, 105,  92,  56,
       122, 128, 104, 124,  91,  86,  75, 101,  93,  67]
payment_exp=[ 34,  91,  79,  92,  94,  61,  44,  62,  77,  98,  71,  70,  68,
        94,  81, 101, 119, 120,  96,  67, 123, 100, 103]

ctr_exp = [i/j for (i, j) in zip(clicks_exp, pageviews_exp)]
ctr_cont = [i/j for (i, j) in zip(clicks_cont, pageviews_cont)]


## Number of cookies


In [18]:
n_pageview_cont = sum(pageviews_cont)
n_pageview_exp = sum(pageviews_exp)
pageview_sd = np.sqrt (0.5 * 0.5 / (n_pageview_cont + n_pageview_exp))
pageview_margin = 1.96 * pageview_sd
ci_left, ci_right = 0.5 - pageview_margin, 0.5 + pageview_margin

print (f'Pageview confidence interval: [{round(ci_left, 4)}, {round(ci_right, 4)}]')
print (f'Observed pageview % in experiment group: {round(n_pageview_exp / (n_pageview_exp + n_pageview_cont), 4)}')

Pageview confidence interval: [0.4988, 0.5012]
Observed pageview % in experiment group: 0.4994


## Number of clicks

In [20]:
n_click_cont = sum(clicks_cont)
n_click_exp = sum(clicks_exp)
click_sd = np.sqrt (0.5 * 0.5 / (n_click_cont + n_click_exp))
click_margin = 1.96 * click_sd
ci_left, ci_right = 0.5 - click_margin, 0.5 + click_margin

print (f'Number of clicks on "Start free trial" confidence interval: [{round(ci_left, 4)}, {round(ci_right, 4)}]')
print (f'Observed clicks on "Start free trial" % in experiment group: {round(n_click_exp / (n_click_exp + n_click_cont), 4)}')

Number of clicks on "Start free trial" confidence interval: [0.4959, 0.5041]
Observed clicks on "Start free trial" % in experiment group: 0.4995


## Click-through-probability

In [23]:
p_cont = n_click_cont / n_pageview_cont
p_exp = n_click_exp / n_pageview_exp
p_pool = (n_click_cont + n_click_exp) / (n_pageview_cont + n_pageview_exp)
se_pool = np.sqrt(p_pool * (1-p_pool) * (1/n_pageview_cont + 1/n_pageview_exp))
diff = p_exp - p_cont
ci_left, ci_right = -1.96 * se_pool, 1.96 * se_pool

print (f'Click-through-probability confidence interval: [{round(ci_left, 4), round(ci_right, 4)}]')
print (f'Observed Click-through-probability: {round(diff, 4)}')


Click-through-probability confidence interval: [(-0.0013, 0.0013)]
Observed Click-through-probability: 0.0001


# Effect size tests

For each of evaluation metrics, compute confidence interval around the difference.

Evaluation metrics:

* Retention
* Net conversino

A metric is statistically significant if the confidence interval does not include 0 (that is, you can be confident there was a change), and it is practically significant if the confidence interval does not include the practical significance boundary (that is, you can be confident there is a change that matters to the business.)

In [37]:
retention_exp = [i/j for (i,j) in zip(payment_exp , enrolls_exp)]
retention_cont = [i/j for (i,j) in zip(payment_cont , enrolls_cont)]
rtt_p_cont = sum(payment_cont) / sum(enrolls_cont)
rtt_p_exp = sum(payment_exp) / sum(enrolls_exp)
rtt_diff = rtt_p_exp - rtt_p_cont
rtt_p_pool = (sum(payment_cont) + sum(payment_exp))/ (sum(enrolls_cont) + sum(enrolls_exp))
rtt_se_pool = np.sqrt(rtt_p_pool * (1-rtt_p_pool) * (1/sum(enrolls_cont) + 1/sum(enrolls_exp)))
ci_left, ci_right = rtt_diff-1.96*rtt_se_pool, rtt_diff+1.96* rtt_se_pool

print (f'Retention confidence interval: [{round(ci_left, 4)}, {round(ci_right, 4)}]')
print (f'Observed Retention diff: {round(rtt_diff, 4)}')
print (f'Statistical significance: {0< ci_left or 0 > ci_right}')
print (f'Practical significance (d_min = 0.01): {0.01< ci_left or 0.01 > ci_right}')

Retention confidence interval: [0.0081, 0.0541]
Observed Retention diff: 0.0311
Statistical significance: True
Practical significance (d_min = 0.01): False


In [41]:
cnt = len(payment_cont)
nc_p_cont = sum(payment_cont) / sum(clicks_cont[:cnt])
nc_p_exp = sum(payment_exp) / sum(clicks_exp[:cnt])
nc_diff = nc_p_exp - nc_p_cont
nc_p_pool = (sum(payment_cont) + sum(payment_exp))/ (sum(clicks_cont[:cnt]) + sum(clicks_exp[:cnt]))
nc_se_pool = np.sqrt(nc_p_pool * (1-nc_p_pool) * (1/sum(clicks_cont[:cnt]) + 1/sum(clicks_exp[:cnt])))
ci_left, ci_right = nc_diff-1.96*nc_se_pool, nc_diff+1.96* nc_se_pool

print (f'Retention confidence interval: [{round(ci_left, 4)}, {round(ci_right, 4)}]')
print (f'Observed Retention diff: {round(nc_diff, 4)}')
print (f'Statistical significance: {0< ci_left or 0 > ci_right}')
print (f'Practical significance (d_min = 0.0075): {0.0075>max( abs(ci_left), abs(ci_right))}')

Retention confidence interval: [-0.0116, 0.0019]
Observed Retention diff: -0.0049
Statistical significance: False
Practical significance (d_min = 0.0075): False


# Sign test

In [43]:
alpha=0.05
beta = 0.2

In [46]:
print(days, len(rtt_exp))

23 23


In [48]:
rtt_exp = [i/j for (i,j) in zip(payment_exp , enrolls_exp)]
rtt_cont = [i/j for (i,j) in zip(payment_cont , enrolls_cont)]
rtt_sign = sum([i>j for (i,j) in zip(rtt_exp, rtt_cont)])
days = cnt
p_value = binom_test(rtt_sign, n=days, p=0.5)

print(f'Retention p-value: {round(p_value, 4)}, statistically significance: {p_value < alpha}')


Retention p-value: 0.6776, statistically significance: False


In [52]:
nc_exp = [i/j for (i,j) in zip(payment_exp , clicks_exp)]
nc_cont = [i/j for (i,j) in zip(payment_cont , clicks_cont)]
nc_sign = sum([i>j for (i,j) in zip(nc_exp, nc_cont)])
days = cnt
p_value = binom_test(nc_sign, n=days, p=0.5)

print(f'Net Conversion p-value: {round(p_value, 4)}, statistically significance: {p_value < alpha}')


Net Conversion p-value: 0.6776, statistically significance: False
