# Delta Method   


In this notebook, I simulate confidence intervals for a percent change between two samples.    



The *Delta method* is used in large-scale A/B testing to measure percent change.     
In the practice, we want to measure the average treatment effect by the difference of the same metric from control and treatment groups.     
Lets $X$ and $Y$ be a control and treatment groups with mean values $\overline{X}$ and $\overline{Y}$, respectively.    
$s_x$, $s_y$ are their corresponding standard deviation and $s_{xy}$ their covariance.    

A **confidence interval** is given by    
Point estimate $\pm$ Margin of error      

**Point estimate**     
$\frac{\overline{Y}}{\overline{X}} - 1$    

**Margin of error**     
$\frac{\displaystyle z_{\alpha/2}}{\displaystyle \sqrt{n}}
\sqrt{s^2_y 
- 2 \frac{\overline{Y}}{\overline{X}} s_{xy} + 
\frac{\overline{Y}^2}{\overline{X}^2} s^2_x}
$

where $z_{\alpha/2}$ is the $\alpha/2$ qunatile for $N(0, 1)$. 

**Source:**
Alex Deng, Ulf Knoblich and Jiannan Lu, 
Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas,
Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery \&\#38; Data Mining},
2018
https://arxiv.org/pdf/1803.06336.pdf


In [2]:
import numpy as np
from scipy import stats
import pandas as pd

In [24]:
bootstrap_iterations = 500000

def confidence_interval_delta(X, Y):
    """Construct confidence interval for the Delta method with bias correction    
    Algorithm 1 of Alex Deng paper (see reference above)"""
    size = len(X)
    xmean = X.mean()
    ymean = Y.mean()
    sx = X.std()
    sy = Y.std()
    sxy = np.sum((X - X.mean()) * (Y - Y.mean())) / (size - 1)    
    bias_correction = ymean / xmean**3 * sx**2 / size - 1 / xmean**2 * sxy / size
    point_estimate = ymean / xmean - 1 + bias_correction
    vest = sy**2 / xmean **2 - 2 * ymean / xmean**3 * sxy + ymean**2 / xmean**4 * sx**2
    #z_critical = stats.norm.ppf(1 - alpha / 2)
    z_critical = 1.96
    moe = z_critical * np.sqrt(vest / size)
    confidence_interval = point_estimate - moe, point_estimate + moe
    return confidence_interval

def interval_contains_true_p(ci, p):
    return ci[0] <= p <= ci[1]

def get_coverage(loc1, loc2, scale, size):
    """Estimage the coverage for the Delta method with bias correction    
    Coverage is the proportion of created confidence intervals which contain the true percentage change
    """
    coverage = 0
    for i in range(bootstrap_iterations):
        X = np.random.normal(loc=loc1, scale=scale, size=size)
        Y = np.random.normal(loc=loc2, scale=scale, size=size)
        true_percentage_change = loc2 - loc1
        confidence_interval = confidence_interval_delta(X, Y)
        if interval_contains_true_p(confidence_interval, true_percentage_change):
            coverage += 1
    coverage = coverage / bootstrap_iterations
    return coverage

In [25]:
alpha = 0.05 #significance level
loc1 = 1.0 # mean for X
loc2 = 1.1 # mean for Y
scale = 0.1 # std for both X and Y

delta_coverage = []
for n in [20, 50, 200, 2000]: #sample sizes
    coverage = get_coverage(loc1, loc2, scale, n)
    delta_coverage.append({'n' : n, 'coverage' : coverage})
    break
delta_coverage = pd.DataFrame(delta_coverage)
delta_coverage

Unnamed: 0,coverage,n
0,0.928288,20
