# A/B Testing (Hypothesis Testing)

**Goal**: A/B Testing compares two versions (A and B) of a process or product to determine if a change has a statistically significant effect.

**Use Cases**:
- Product change impact (UI, algorithms, pricing)
- Conversion rate optimization (CTR, signups)
- Finance: effect of signals/strategies on return or volatility

**Steps**:
1. Define null and alternative hypotheses
2. Choose a metric (e.g., conversion rate)
3. Calculate test statistic (z-test, t-test)
4. Compute p-value or confidence interval
5. Interpret result (reject/fail to reject null)

**Key Concepts**:
- Null hypothesis $H_0$
- Alternative hypothesis $H_1$
- p-value and significance level ($\alpha$)
- Type I / Type II errors
- Confidence intervals


## Introduction to A/B Testing

A/B testing is a **controlled statistical experiment** where two groups (A: control, B: variant) are compared to determine if a new change has a meaningful effect.

---

### Hypothesis Setup

Let’s say we are measuring **conversion rate**:

- $p_A$: true conversion rate for control
- $p_B$: true conversion rate for variant

We test:

- **Null hypothesis ($H_0$)**: $p_A = p_B$
- **Alternative hypothesis ($H_1$)**: $p_A \ne p_B$ (two-sided)  
  or $p_A < p_B$ (one-sided)

---

### Test Statistic (for proportions)

If $n_A$, $n_B$ are sample sizes, and $\hat{p}_A$, $\hat{p}_B$ are observed proportions:

Pooled proportion:
$$
\hat{p} = \frac{x_A + x_B}{n_A + n_B}
$$

Standard error:
$$
SE = \sqrt{\hat{p}(1 - \hat{p}) \left( \frac{1}{n_A} + \frac{1}{n_B} \right)}
$$

Z-statistic:
$$
z = \frac{\hat{p}_A - \hat{p}_B}{SE}
$$

---

### p-value and Significance

- Compute p-value from $z$
- Choose significance level $\alpha$ (e.g., 0.05)
- Reject $H_0$ if p-value < $\alpha$

---

### Confidence Interval for Difference in Proportions

Without pooling:
$$
CI = (\hat{p}_A - \hat{p}_B) \pm z^* \cdot \sqrt{ \frac{\hat{p}_A(1 - \hat{p}_A)}{n_A} + \frac{\hat{p}_B(1 - \hat{p}_B)}{n_B} }
$$

Where $z^*$ is from the standard normal distribution (e.g., $z^* = 1.96$ for 95% CI)

---

### Key Errors

- **Type I error**: Reject $H_0$ when it’s true (false positive, controlled by $\alpha$)
- **Type II error**: Fail to reject $H_0$ when $H_1$ is true


In [3]:
import numpy as np
from statsmodels.stats.proportion import proportions_ztest
from scipy.stats import norm

# Simulated data
n_A, n_B = 1000, 1000
conv_A = 130
conv_B = 160

# Proportion estimates
p_A = conv_A / n_A
p_B = conv_B / n_B

# Perform two-sided z-test
count = np.array([conv_A, conv_B])
nobs = np.array([n_A, n_B])
z_stat, p_val = proportions_ztest(count, nobs)

# Confidence Interval
p_diff = p_A - p_B
se = np.sqrt(p_A*(1-p_A)/n_A + p_B*(1-p_B)/n_B)
ci_low, ci_high = p_diff - 1.96*se, p_diff + 1.96*se

print(f"Z-statistic: {z_stat:.4f}")
print(f"P-value: {p_val:.4f}")
print(f"95% CI for difference: ({ci_low:.4f}, {ci_high:.4f})")


Z-statistic: -1.9052
P-value: 0.0568
95% CI for difference: (-0.0608, 0.0008)


## Additional Notes

### One-sided vs Two-sided Tests
- Use **two-sided** when testing for any difference
- Use **one-sided** when you only care about increase or decrease

### Sample Size and Power
- Small sample size → high variance → Type II error risk
- Use power analysis to determine required $n$ before launching test

### Multiple Testing
- Repeating tests inflates false positive rate
- Use Bonferroni correction or False Discovery Rate (FDR)

### Sequential Testing / Early Stopping
- Stopping early after checking p-values can bias results
- Use techniques like group sequential designs or Bayesian A/B testing

### Business Interpretation
- A statistically significant result does **not always** mean practical importance
- Consider effect size and confidence intervals in decision-making


## Further Topics to Explore

1. Bayesian A/B Testing (Posterior difference in success rates)
2. A/A Testing and sanity checks
3. Power analysis and sample size calculation
4. Time-based vs random A/B splits
5. CUPED (variance reduction technique for A/B testing)
