# A/B testing
![bnet-hypothesis-testing](../pics/bnet-hypothesis-testing.png)
This figure comes from the chapter entitled "A/B testing" in my book "Bayesuvius".
The purpose of this notebook is to calculat the quantities at the leaf nodes and maybe some intermediate quatities too, given the quantities at root nodes, using the statistics packages of `scipy`. A large fraction of this notebook was written by ChapGPT 


In [1]:
# this makes sure it starts looking for things from the project folder down.
import os
import sys
os.chdir('../')
sys.path.insert(0,os.getcwd())
print(os.getcwd())

C:\Users\rrtuc\Desktop\backed-up\python-projects\uplift_rocket


In [2]:
import numpy as np
from scipy import stats
from statsmodels.stats.power import NormalIndPower
from statsmodels.stats.proportion import proportions_ztest, confint_proportions_2indep

## Scenario: Email Campaign

**Goal:** Increase click-through rate (CTR) for a promotional email.

**A/B Test Setup:**

* **Version A (Control):** Subject line — ``Get 20\% Off Your Next Order!''
* **Version B (Variant):** Subject line — ``Your Exclusive 20\% Discount Is Waiting!''

**Execution:**

* Version A is sent to 50\% of the email list.
* Version B is sent to the other 50\%.
* All other elements in the email are kept constant.

**Outcome:**

* Version A: 12\% CTR.
* Version B: 17\% CTR.

**Conclusion:** Version B performs better and is chosen for future campaigns.

## Hypotheses
* Null Hypothesis $H_0: p_A = p_B$
* Alternative Hypothesis $H_1: p_B > p_A$

## parameter values

In [3]:
p1 = 0.12  # CTR of Version A
p2 = 0.17  # CTR of Version B
effect_size = abs(p2 - p1)
alpha = 0.05  # Significance level
power = 0.8   # Desired power

In [4]:
# p2-p1, standard deviation

In [5]:
print("p2-p1=", f"{p2-p1:.4f}")

# Compute pooled standard deviation for proportions
p_pool = (p1 + p2) / 2
std_dev = np.sqrt(2 * p_pool * (1 - p_pool))
print("p_pool=", f"{p_pool:.4f}")
print("std_dev=", f"{std_dev:.4f}")

p2-p1= 0.0500
p_pool= 0.1450
std_dev= 0.4979


## Required Sample Size $n$

In [6]:
# Calculate Cohen's h
cohen_h = 2 * np.arcsin(np.sqrt(p2)) - 2 * np.arcsin(np.sqrt(p1))

# Power analysis
power_analysis = NormalIndPower()
sample_size = power_analysis.solve_power(effect_size=cohen_h, 
                                         alpha=alpha, 
                                         power=power, 
                                         alternative='larger')

print(f"Required sample size per group: {int(np.ceil(sample_size))}")


Required sample size per group: 609


## Confidence Interval

In [7]:
# Observed values
successes = np.array([120, 170])  # clicks
samples = np.array([1000, 1000])  # total emails sent

# Confidence interval
ci_low, ci_upp = confint_proportions_2indep(count1=successes[0], nobs1=samples[0],
                                            count2=successes[1], nobs2=samples[1],
                                            method='wald')

print(f"95% Confidence interval for difference in proportions: ({ci_low:.4f}, {ci_upp:.4f})")

95% Confidence interval for difference in proportions: (-0.0808, -0.0192)


In [8]:
# One-sided z-test (H1: p2 > p1)
z_stat, p_value = proportions_ztest(successes, samples, alternative='larger')

print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_value:.4f}")

Z-statistic: -3.1753
P-value: 0.9993


## interpretation
If the p-value < 0.05, you reject $H_0$ and conclude Version B is significantly better.

If the confidence interval for p2-p1
  doesn't include 0, that  is also significant.