# Delta method
Delta method is one of the most practical A/B testing applications. As a Data Scientist you are going to encounter cases where the user assignment per variant is based on user_id to create a consistent experience, while the unit of analysis is something more granular such as a page view or a session.

#### When to use ztest_delta()

- ✔ The metric is expressed as a ratio
- ✔ The numerator and denominator are not independent
- ✔ Large sample sizes are available
- ✔ Fast and reproducible results are required

#### Common examples:
- CTR = clicks / impressions
- CVR = conversions / sessions
- AOV = revenue / orders

In [1]:
import pandas as pd
import numpy as np

## Delta method variance of ratio metric

In [2]:
# Delta method variance of ratio metric
def var_delta(x,y):
    x_bar = np.mean(x)
    y_bar = np.mean(y)
    x_var = np.var(x, ddof=1)
    y_var = np.var(y, ddof=1)
    cov_xy = np.cov(x, y, ddof=1)[0][1]
    # Note that we divide by len(x) here because the denominator of the test statistic is standard error (=sqrt(var/n))
    var_ratio = (x_var/y_bar**2 + y_var*(x_bar**2/y_bar**4) - 2*cov_xy*(x_bar/y_bar**3))/len(x)
    return var_ratio


## Delta method z-test

In [13]:
from scipy.stats import norm

def ztest_delta(xA, yA, xB, yB, alpha=0.05):
    rA = np.mean(xA) / np.mean(yA)
    rB = np.mean(xB) / np.mean(yB)

    varA = var_delta(xA, yA)
    varB = var_delta(xB, yB)

    se = np.sqrt(varA + varB)
    diff = rB - rA
    z = diff / se
    p_value = 2 * (1 - norm.cdf(abs(z)))

    zcrit = norm.ppf(1 - alpha/2)
    ci_low = diff - zcrit * se
    ci_high = diff + zcrit * se

    uplift = (rB / rA) - 1

    return {
        "ratio_A": rA,
        "ratio_B": rB,
        "diff": diff,
        "uplift": uplift,
        "se": se,
        "z": z,
        "p_value": p_value,
        "ci_95": (ci_low, ci_high)
    }


#### Exercise
In this exercise you will analyze the difference in total order_value per page_view ratio metric between variants A and C. 

In [4]:
checkout_data = pd.read_csv('../data/checkout.csv')
checkout_data.head(3)

Unnamed: 0.1,Unnamed: 0,user_id,checkout_page,order_value,purchased,gender,browser,time_on_page
0,0,877621,A,29.410131,1.0,F,chrome,66.168628
1,1,876599,A,,0.0,M,firefox,49.801887
2,2,905407,A,27.446845,1.0,M,chrome,56.744856


Note: a single user can have multiple page views, which can distort average calculations. To avoid this issue, the data is aggregated at the user level.

In [12]:
# Create DataFrames for per user metrics for variants A and C
A_per_user = pd.DataFrame({'order_value':checkout_data[checkout_data['checkout_page']=='A'].groupby('user_id')['order_value'].sum()
                            ,'page_view':checkout_data[checkout_data['checkout_page']=='A'].groupby('user_id')['user_id'].count()})

C_per_user = pd.DataFrame({'order_value':checkout_data[checkout_data['checkout_page']=='C'].groupby('user_id')['order_value'].sum()
                           ,'page_view':checkout_data[checkout_data['checkout_page']=='C'].groupby('user_id')['user_id'].count()})

# Assign the control and treatment ratio columns 
x_control = A_per_user['order_value']
y_control = A_per_user['page_view']
x_treatment = C_per_user['order_value']
y_treatment = C_per_user['page_view']

res = ztest_delta(x_control, y_control, x_treatment, y_treatment)
res

{'ratio_A': np.float64(20.472597188012),
 'ratio_B': np.float64(30.296828047168336),
 'diff': np.float64(9.824230859156337),
 'uplift': np.float64(0.4798722296411442),
 'se': np.float64(0.29296814692733913),
 'z': np.float64(33.53344369411227),
 'p_value': np.float64(0.0),
 'ci_95': (np.float64(9.250023842561314), np.float64(10.39843787575136))}

#### Statistical Interpretation
Significance:
- The z-statistic of 33.53 is extremely large.
- The p-value is effectively zero, well below any conventional significance level.

#### Conclusion:
We reject the null hypothesis and conclude that the difference between checkout pages A and B is statistically significant.

#### Confidence Interval
- The 95% confidence interval does not include zero.
- The interval is very narrow, indicating a highly precise estimate.

#### Interpretation:
The true increase in order value per page view is expected to lie between +9.25 and +10.40 units.

#### Business Interpretation
Effect Size
- Checkout page B generates approximately 48% more order value per page view than page A.
- This is a large and practically meaningful effect, not just a statistically significant one.

#### Practical Meaning
For every visit to the checkout page, version B produces substantially more monetary value than version A.

#### Executive Conclusion (Stakeholder-Ready)
Checkout page B clearly outperforms checkout page A.
The order value per page view metric increases by approximately 9.8 units (+48%), with an extremely high level of statistical confidence (z = 33.5, p < 0.001) and a tight confidence interval.
We recommend rolling out checkout page B, as the improvement is large, stable, and statistically robust.