# Power Analysis for Difference-in-Differences

This notebook demonstrates how to use the power analysis tools in `diff-diff` for study design. We'll cover:

1. Computing minimum detectable effects (MDE)
2. Calculating required sample sizes
3. Estimating statistical power
4. Creating power curves for visualization
5. Simulation-based power analysis for complex designs
6. Panel data considerations (ICC, multiple periods)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from diff_diff import (
    PowerAnalysis,
    DifferenceInDifferences,
    simulate_power,
    compute_mde,
    compute_power,
    compute_sample_size,
    plot_power_curve,
)

## 1. The Power Analysis Problem

Before running a DiD study, researchers need to answer key design questions:

- **"How many units do I need?"** (sample size calculation)
- **"What's the smallest effect I can detect?"** (minimum detectable effect)
- **"What's my chance of finding a significant result?"** (power calculation)

The `PowerAnalysis` class provides analytical formulas for these calculations in basic DiD designs.

In [None]:
# Create a PowerAnalysis object with standard settings
# alpha = 0.05 (5% significance level)
# power = 0.80 (80% power - conventional target)
pa = PowerAnalysis(alpha=0.05, power=0.80)

print(f"Significance level: {pa.alpha}")
print(f"Target power: {pa.target_power}")
print(f"Alternative hypothesis: {pa.alternative}")

## 2. Minimum Detectable Effect (MDE)

The MDE is the smallest treatment effect you can detect with your specified power and sample size. It depends on:

- Sample sizes (n_treated, n_control)
- Residual variance (sigma)
- Significance level (alpha)
- Target power

In [None]:
# Calculate MDE for a basic 2x2 DiD design
# Assume: 50 treated units, 50 control units, outcome SD of 10
result = pa.mde(
    n_treated=50,
    n_control=50,
    sigma=10.0  # Residual standard deviation
)

print(result.summary())

In [None]:
# Access individual results
print(f"Minimum Detectable Effect: {result.mde:.2f}")
print(f"This is {result.mde / 10.0:.2f} standard deviations")
print(f"")
print(f"With 50 treated and 50 control units, and sigma=10,")
print(f"you can detect an effect of {result.mde:.2f} or larger")
print(f"with {result.power:.0%} power at alpha={result.alpha:.2f}.")

### How MDE Changes with Sample Size

In [None]:
# Compare MDE across different sample sizes
sample_sizes = [25, 50, 100, 200, 500]

print(f"{'N per group':>12} {'Total N':>10} {'MDE':>10} {'MDE/SD':>10}")
print("-" * 45)

for n in sample_sizes:
    result = pa.mde(n_treated=n, n_control=n, sigma=10.0)
    print(f"{n:>12} {2*n:>10} {result.mde:>10.2f} {result.mde/10.0:>10.2f}")

## 3. Required Sample Size

Given a target effect size to detect, calculate the required sample size.

In [None]:
# How many units do we need to detect an effect of 5 units?
result = pa.sample_size(
    effect_size=5.0,  # Effect we want to detect
    sigma=10.0  # Outcome standard deviation
)

print(result.summary())

In [None]:
# Compare sample sizes needed for different effect sizes
effect_sizes = [2.0, 3.0, 5.0, 7.0, 10.0]

print(f"{'Effect Size':>12} {'Effect/SD':>10} {'Required N':>12} {'N per group':>12}")
print("-" * 50)

for effect in effect_sizes:
    result = pa.sample_size(effect_size=effect, sigma=10.0)
    print(f"{effect:>12.1f} {effect/10.0:>10.2f} {result.required_n:>12} {result.n_treated:>12}")

## 4. Power Calculation

Given a specific effect size and sample size, calculate the statistical power.

In [None]:
# What's our power to detect an effect of 4 with 75 units per group?
pa_calc = PowerAnalysis(alpha=0.05)  # Don't specify target power for calculation

result = pa_calc.power(
    effect_size=4.0,
    n_treated=75,
    n_control=75,
    sigma=10.0
)

print(f"Power to detect effect of 4.0: {result.power:.1%}")
print(f"This is {'adequate' if result.power >= 0.80 else 'below the conventional 80% threshold'}")

## 5. Power Curves

Power curves show how power changes with effect size (or sample size). They're essential for understanding the trade-offs in study design.

In [None]:
# Generate power curve data
curve_df = pa.power_curve(
    n_treated=50,
    n_control=50,
    sigma=10.0
)

# Get MDE for reference
mde_result = pa.mde(n_treated=50, n_control=50, sigma=10.0)

print("First few rows of power curve:")
print(curve_df.head(10))

In [None]:
# Plot the power curve
plot_power_curve(
    curve_df,
    mde=mde_result.mde,
    target_power=0.80,
    title="Power Curve: 50 Treated, 50 Control, SD=10",
    xlabel="Treatment Effect Size",
    figsize=(10, 6)
)

### Power vs Sample Size

In [None]:
# How does power change with sample size for a fixed effect?
sample_curve = pa.sample_size_curve(
    effect_size=5.0,
    sigma=10.0
)

# Plot
plot_power_curve(
    sample_curve,
    target_power=0.80,
    show_mde_line=False,
    title="Power vs Sample Size (Effect=5, SD=10)",
    figsize=(10, 6)
)

## 6. Panel Data Considerations

For panel DiD with multiple time periods, power depends on:
- Number of pre/post periods
- Intra-cluster correlation (ICC) - the correlation of outcomes within units over time

More periods generally improve precision, but high ICC reduces the effective sample size.

In [None]:
# Compare MDE with different numbers of periods
print("Effect of number of periods on MDE (50 treated, 50 control, sigma=10):")
print(f"{'Periods (pre+post)':>20} {'MDE':>10} {'Improvement':>15}")
print("-" * 50)

baseline_mde = None
for n_pre, n_post in [(1, 1), (2, 2), (3, 3), (5, 5)]:
    result = pa.mde(
        n_treated=50, n_control=50, sigma=10.0,
        n_pre=n_pre, n_post=n_post
    )
    if baseline_mde is None:
        baseline_mde = result.mde
        improvement = "-"
    else:
        improvement = f"{(1 - result.mde/baseline_mde)*100:.1f}%"
    print(f"{n_pre + n_post:>20} {result.mde:>10.2f} {improvement:>15}")

In [None]:
# Effect of intra-cluster correlation (ICC)
print("Effect of ICC on MDE (50 treated, 50 control, 6 periods, sigma=10):")
print(f"{'ICC':>10} {'MDE':>10} {'Design Effect':>15}")
print("-" * 40)

for rho in [0.0, 0.1, 0.3, 0.5, 0.7]:
    result = pa.mde(
        n_treated=50, n_control=50, sigma=10.0,
        n_pre=3, n_post=3, rho=rho
    )
    # Design effect: how much ICC inflates variance
    T = 6
    design_effect = 1 + (T - 1) * rho
    print(f"{rho:>10.1f} {result.mde:>10.2f} {design_effect:>15.2f}")

## 7. Simulation-Based Power Analysis

For complex designs (staggered adoption, synthetic DiD, etc.), analytical formulas may not exist. Use Monte Carlo simulation instead.

The `simulate_power` function:
1. Generates synthetic data with known treatment effect
2. Fits your estimator
3. Records whether the effect is statistically significant
4. Repeats many times
5. Reports the proportion of significant results (= power)

In [None]:
# Simulation-based power analysis
did = DifferenceInDifferences()

results = simulate_power(
    estimator=did,
    n_units=100,
    n_periods=4,
    treatment_effect=5.0,
    treatment_fraction=0.5,
    sigma=5.0,  # Noise level
    n_simulations=200,  # More simulations = more precise power estimate
    seed=42,
    progress=False
)

print(results.summary())

In [None]:
# Key metrics from simulation
print("Simulation Results:")
print(f"  Power (rejection rate): {results.power:.1%}")
print(f"  95% CI for power: [{results.power_ci[0]:.1%}, {results.power_ci[1]:.1%}]")
print(f"")
print("Estimator Performance:")
print(f"  True effect: {results.true_effect:.2f}")
print(f"  Mean estimate: {results.mean_estimate:.2f}")
print(f"  Bias: {results.bias:.4f}")
print(f"  RMSE: {results.rmse:.4f}")
print(f"  Coverage: {results.coverage:.1%}")

### Power Curve via Simulation

In [None]:
# Simulate power for multiple effect sizes
results_multi = simulate_power(
    estimator=did,
    n_units=100,
    n_periods=4,
    effect_sizes=[1.0, 2.0, 3.0, 5.0, 7.0, 10.0],  # Multiple effects
    sigma=5.0,
    n_simulations=100,  # Fewer per effect size for speed
    seed=42,
    progress=False
)

# Get power curve data
power_curve_sim = results_multi.power_curve_df()
print(power_curve_sim)

In [None]:
# Plot simulation-based power curve
plot_power_curve(
    power_curve_sim,
    target_power=0.80,
    title="Simulation-Based Power Curve (100 units, 4 periods, SD=5)",
    figsize=(10, 6)
)

## 8. Convenience Functions

For quick calculations, use the convenience functions:

In [None]:
# Quick MDE calculation
mde = compute_mde(n_treated=100, n_control=100, sigma=10.0, power=0.80)
print(f"MDE: {mde:.2f}")

# Quick power calculation
power = compute_power(effect_size=5.0, n_treated=100, n_control=100, sigma=10.0)
print(f"Power: {power:.1%}")

# Quick sample size calculation
n = compute_sample_size(effect_size=5.0, sigma=10.0, power=0.80)
print(f"Required N: {n}")

## 9. Practical Recommendations

### Estimating Sigma (Residual SD)

The residual standard deviation is crucial for power calculations. Options:

1. **Pilot data**: Fit a model on historical data and get residual SD
2. **Literature**: Find similar studies and use their reported SDs
3. **Domain knowledge**: Expert judgment about outcome variability
4. **Sensitivity analysis**: Calculate power for a range of sigma values

In [None]:
# Sensitivity to sigma
print("MDE sensitivity to residual SD (50 treated, 50 control):")
print(f"{'Sigma':>10} {'MDE':>10} {'Effect/SD':>12}")
print("-" * 35)

for sigma in [5.0, 7.5, 10.0, 12.5, 15.0]:
    result = pa.mde(n_treated=50, n_control=50, sigma=sigma)
    print(f"{sigma:>10.1f} {result.mde:>10.2f} {result.mde/sigma:>12.3f}")

### Choosing a Target Power

- **80% power** is conventional but arbitrary
- **90% power** is more conservative and recommended for important studies
- Consider the cost of Type II errors (missing a real effect) vs Type I errors

In [None]:
# Compare required sample sizes for different power levels
print("Required sample size by target power (effect=5, sigma=10):")
print(f"{'Power':>10} {'Required N':>15}")
print("-" * 30)

for power in [0.70, 0.80, 0.90, 0.95]:
    pa_custom = PowerAnalysis(power=power)
    result = pa_custom.sample_size(effect_size=5.0, sigma=10.0)
    print(f"{power:>10.0%} {result.required_n:>15}")

## Summary

Key takeaways for DiD power analysis:

1. **Always do a power analysis** before running a study
2. **MDE decreases** with sample size, more periods, and lower variance
3. **ICC matters** for panel data - high autocorrelation reduces effective sample size
4. **Use simulation** for complex designs (staggered, synthetic DiD)
5. **Be realistic about sigma** - err on the side of larger values
6. **Consider your smallest meaningful effect** - don't just target statistical significance

For more on DiD estimation, see the other tutorials:
- `01_basic_did.ipynb` - Basic DiD estimation
- `02_staggered_did.ipynb` - Staggered adoption designs
- `03_synthetic_did.ipynb` - Synthetic DiD
- `04_parallel_trends.ipynb` - Testing assumptions
- `05_honest_did.ipynb` - Sensitivity analysis
- `07_pretrends_power.ipynb` - Pre-trends power analysis (Roth 2022)