Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion docs/methodology/REGISTRY.md
Original file line number Diff line number Diff line change
Expand Up @@ -229,6 +229,11 @@ Aggregations:
- "universal": All comparisons use g-anticipation-1 as base
- Both produce identical post-treatment ATT(g,t); differ only pre-treatment
- Matches R `did::att_gt()` base_period parameter
- Base period interaction with Sun-Abraham comparison:
- CS with `base_period="varying"` produces different pre-treatment estimates than SA
- This is expected: CS uses consecutive comparisons, SA uses fixed reference (e=-1-anticipation)
- Use `base_period="universal"` for methodologically comparable pre-treatment effects
- Post-treatment effects match regardless of base_period setting
- Control group with `control_group="not_yet_treated"`:
- Always excludes cohort g from controls when computing ATT(g,t)
- This applies to both pre-treatment (t < g) and post-treatment (t >= g) periods
Expand Down Expand Up @@ -257,7 +262,7 @@ Aggregations:
*Assumption checks / warnings:*
- Requires never-treated units as control group
- Warns if treatment effects may be heterogeneous across cohorts (which the method handles)
- Reference period must be specified (default: e=-1)
- Reference period: e=-1-anticipation (defaults to e=-1 when anticipation=0)

*Estimator equation (as implemented):*

Expand Down
100 changes: 5 additions & 95 deletions docs/tutorials/02_staggered_did.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3,31 +3,7 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Staggered Difference-in-Differences\n",
"\n",
"This notebook demonstrates how to handle **staggered treatment adoption** using modern DiD estimators. In staggered DiD settings:\n",
"\n",
"- Different units get treated at different times\n",
"- Traditional TWFE can give biased estimates due to \"forbidden comparisons\"\n",
"- Modern estimators compute group-time specific effects and aggregate them properly\n",
"\n",
"We'll cover:\n",
"1. Understanding staggered adoption\n",
"2. The problem with TWFE (and Goodman-Bacon decomposition)\n",
"3. The Callaway-Sant'Anna estimator\n",
"4. Group-time effects ATT(g,t)\n",
"5. Aggregating effects (simple, group, event-study)\n",
"6. Bootstrap inference for valid standard errors\n",
"7. Visualization\n",
"8. **Pre-treatment effects and parallel trends testing**\n",
"9. Different control group options\n",
"10. Handling anticipation effects\n",
"11. Adding covariates\n",
"12. Comparing with MultiPeriodDiD\n",
"13. Sun-Abraham interaction-weighted estimator\n",
"14. Comparing CS and SA as a robustness check"
]
"source": "# Staggered Difference-in-Differences\n\nThis notebook demonstrates how to handle **staggered treatment adoption** using modern DiD estimators. In staggered DiD settings:\n\n- Different units get treated at different times\n- Traditional TWFE can give biased estimates due to \"forbidden comparisons\"\n- Modern estimators compute group-time specific effects and aggregate them properly\n\nWe'll cover:\n1. Understanding staggered adoption\n2. The problem with TWFE (and Goodman-Bacon decomposition)\n3. The Callaway-Sant'Anna estimator\n4. Group-time effects ATT(g,t)\n5. Aggregating effects (simple, group, event-study)\n6. Bootstrap inference for valid standard errors\n7. Visualization\n8. Pre-treatment effects and parallel trends testing\n9. Different control group options\n10. Handling anticipation effects\n11. Adding covariates\n12. Comparing with MultiPeriodDiD\n13. Sun-Abraham interaction-weighted estimator\n14. Comparing CS and SA as a robustness check"
},
{
"cell_type": "code",
Expand Down Expand Up @@ -834,85 +810,19 @@
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 14. Comparing CS and SA as a Robustness Check\n",
"\n",
"Running both estimators provides a useful robustness check. When they agree, results are more credible."
]
"source": "## 14. Comparing CS and SA as a Robustness Check\n\nRunning both estimators provides a useful robustness check. When they agree, results are more credible.\n\n### Understanding Pre-Period Differences\n\nYou may notice that **post-treatment effects align closely** between CS and SA, but **pre-treatment effects can differ in magnitude and significance**. This is expected methodological behavior, not a bug.\n\n**Why the difference?**\n\n1. **Callaway-Sant'Anna with `base_period=\"varying\"` (default)**:\n - Pre-treatment effects use **consecutive period comparisons** (period t vs period t-1)\n - Each pre-period coefficient represents a one-period change\n - These smaller incremental changes often yield lower t-statistics\n\n2. **Sun-Abraham**:\n - Uses a **fixed reference period** (e=-1 when anticipation=0, or e=-1-anticipation otherwise)\n - All coefficients are deviations from this single reference\n - Pre-period coefficients show cumulative difference from the reference\n\n**To make CS pre-periods more comparable to SA**, use `base_period=\"universal\"`:\n\n```python\ncs_universal = CallawaySantAnna(base_period=\"universal\")\n```\n\nThis makes CS compare all periods to g-1 (like SA), producing more similar pre-treatment estimates."
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Compare overall ATT from both estimators\n",
"print(\"Robustness Check: CS vs SA\")\n",
"print(\"=\" * 50)\n",
"print(f\"{'Estimator':<25} {'Overall ATT':>12} {'SE':>10}\")\n",
"print(\"-\" * 50)\n",
"print(f\"{'Callaway-SantAnna':<25} {results_cs.overall_att:>12.4f} {results_cs.overall_se:>10.4f}\")\n",
"print(f\"{'Sun-Abraham':<25} {results_sa.overall_att:>12.4f} {results_sa.overall_se:>10.4f}\")\n",
"\n",
"# Compare event study effects\n",
"print(\"\\n\\nEvent Study Comparison:\")\n",
"print(f\"{'Rel. Time':>12} {'CS ATT':>10} {'SA ATT':>10} {'Difference':>12}\")\n",
"print(\"-\" * 50)\n",
"\n",
"# Use the pre-computed event_study_effects from results_cs\n",
"for rel_time in sorted(results_sa.event_study_effects.keys()):\n",
" sa_eff = results_sa.event_study_effects[rel_time]['effect']\n",
" if results_cs.event_study_effects and rel_time in results_cs.event_study_effects:\n",
" cs_eff = results_cs.event_study_effects[rel_time]['effect']\n",
" diff = sa_eff - cs_eff\n",
" print(f\"{rel_time:>12} {cs_eff:>10.4f} {sa_eff:>10.4f} {diff:>12.4f}\")\n",
"\n",
"print(\"\\nSimilar results indicate robust findings across estimation methods\")"
]
"source": "# Compare overall ATT from both estimators\nprint(\"Robustness Check: CS vs SA\")\nprint(\"=\" * 60)\nprint(f\"{'Estimator':<30} {'Overall ATT':>12} {'SE':>10}\")\nprint(\"-\" * 60)\nprint(f\"{'Callaway-Sant\\\\'Anna (varying)':<30} {results_cs.overall_att:>12.4f} {results_cs.overall_se:>10.4f}\")\nprint(f\"{'Sun-Abraham':<30} {results_sa.overall_att:>12.4f} {results_sa.overall_se:>10.4f}\")\n\n# Also fit CS with universal base period for comparison\ncs_universal = CallawaySantAnna(control_group=\"never_treated\", base_period=\"universal\")\nresults_cs_univ = cs_universal.fit(\n df, outcome=\"outcome\", unit=\"unit\",\n time=\"period\", first_treat=\"first_treat\",\n aggregate=\"event_study\"\n)\n\n# Compare event study effects\nprint(\"\\n\\nEvent Study Comparison:\")\nprint(\"Note: Pre-periods differ due to base period methodology (see explanation above)\")\nprint(f\"{'Rel. Time':>10} {'CS (vary)':>12} {'CS (univ)':>12} {'SA':>10} {'Note':>20}\")\nprint(\"-\" * 70)\n\nfor rel_time in sorted(results_sa.event_study_effects.keys()):\n sa_eff = results_sa.event_study_effects[rel_time]['effect']\n cs_vary = results_cs.event_study_effects.get(rel_time, {}).get('effect', np.nan)\n cs_univ = results_cs_univ.event_study_effects.get(rel_time, {}).get('effect', np.nan)\n \n note = \"pre (differs)\" if rel_time < 0 else \"post (matches)\"\n print(f\"{rel_time:>10} {cs_vary:>12.4f} {cs_univ:>12.4f} {sa_eff:>10.4f} {note:>20}\")\n\nprint(\"\\nPost-treatment effects should be similar across all methods\")\nprint(\"Pre-treatment differences are expected due to base period methodology\")"
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\n",
"\n",
"Key takeaways:\n",
"\n",
"1. **TWFE can be biased** with staggered adoption and heterogeneous effects\n",
"2. **Goodman-Bacon decomposition** reveals *why* TWFE fails by showing:\n",
" - The implicit 2x2 comparisons and their weights\n",
" - How much weight falls on \"forbidden comparisons\" (already-treated as controls)\n",
"3. **Callaway-Sant'Anna** properly handles staggered adoption by:\n",
" - Computing group-time specific effects ATT(g,t)\n",
" - Only using valid comparison groups\n",
" - Properly aggregating effects\n",
"4. **Sun-Abraham** provides an alternative approach using:\n",
" - Interaction-weighted regression with cohort x relative-time indicators\n",
" - Different weighting scheme than CS\n",
" - More efficient under homogeneous effects\n",
"5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n",
"6. **Aggregation options**:\n",
" - `\"simple\"`: Overall ATT\n",
" - `\"group\"`: ATT by cohort\n",
" - `\"event\"`: ATT by event time (for event-study plots)\n",
"7. **Bootstrap inference** provides valid standard errors and confidence intervals:\n",
" - Use `n_bootstrap` parameter to enable multiplier bootstrap\n",
" - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n",
" - Bootstrap results include SEs, CIs, and p-values for all aggregations\n",
"8. **Pre-treatment effects** provide parallel trends diagnostics:\n",
" - Use `base_period=\"varying\"` for consecutive period comparisons\n",
" - Pre-treatment ATT(g,t) should be near zero\n",
" - 95% CIs including zero is consistent with parallel trends\n",
" - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n",
"9. **Control group choices** affect efficiency and assumptions:\n",
" - `\"never_treated\"`: Stronger parallel trends assumption\n",
" - `\"not_yet_treated\"`: Weaker assumption, uses more data\n",
"\n",
"For more details, see:\n",
"- Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. *Journal of Econometrics*.\n",
"- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*.\n",
"- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*."
]
"source": "## Summary\n\nKey takeaways:\n\n1. **TWFE can be biased** with staggered adoption and heterogeneous effects\n2. **Goodman-Bacon decomposition** reveals *why* TWFE fails by showing:\n - The implicit 2x2 comparisons and their weights\n - How much weight falls on \"forbidden comparisons\" (already-treated as controls)\n3. **Callaway-Sant'Anna** properly handles staggered adoption by:\n - Computing group-time specific effects ATT(g,t)\n - Only using valid comparison groups\n - Properly aggregating effects\n4. **Sun-Abraham** provides an alternative approach using:\n - Interaction-weighted regression with cohort x relative-time indicators\n - Different weighting scheme than CS\n - More efficient under homogeneous effects\n5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n6. **Aggregation options**:\n - `\"simple\"`: Overall ATT\n - `\"group\"`: ATT by cohort\n - `\"event\"`: ATT by event time (for event-study plots)\n7. **Bootstrap inference** provides valid standard errors and confidence intervals:\n - Use `n_bootstrap` parameter to enable multiplier bootstrap\n - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n - Bootstrap results include SEs, CIs, and p-values for all aggregations\n8. **Pre-treatment effects** provide parallel trends diagnostics:\n - Use `base_period=\"varying\"` for consecutive period comparisons\n - Pre-treatment ATT(g,t) should be near zero\n - 95% CIs including zero is consistent with parallel trends\n - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n9. **Control group choices** affect efficiency and assumptions:\n - `\"never_treated\"`: Stronger parallel trends assumption\n - `\"not_yet_treated\"`: Weaker assumption, uses more data\n10. **CS vs SA pre-period differences are expected**:\n - Post-treatment effects should be similar (robustness check)\n - Pre-treatment effects differ due to base period methodology\n - CS (varying): consecutive comparisons → one-period changes\n - SA: fixed reference (e=-1-anticipation) → cumulative deviations\n - Use `base_period=\"universal\"` in CS for comparable pre-periods\n\nFor more details, see:\n- Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. *Journal of Econometrics*.\n- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*.\n- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*."
}
],
"metadata": {
Expand All @@ -922,4 +832,4 @@
},
"nbformat": 4,
"nbformat_minor": 4
}
}
101 changes: 101 additions & 0 deletions tests/test_sun_abraham.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,6 +526,107 @@ def test_both_recover_treatment_effect(self):
assert abs(sa_results.overall_att - 3.0) < 2.0
assert abs(cs_results.overall_att - 3.0) < 2.0

def test_pre_period_difference_expected_between_cs_sa(self):
"""Pre-periods differ between CS (varying) and SA; post-periods match.

This is expected: CS uses consecutive comparisons, SA uses fixed reference.
CS with base_period="universal" should be closer to SA for pre-periods.
"""
from diff_diff import CallawaySantAnna

data = generate_staggered_data(
n_units=200, treatment_effect=3.0, seed=42
)

# Sun-Abraham (uses fixed reference period e=-1)
sa = SunAbraham()
sa_results = sa.fit(
data,
outcome="outcome",
unit="unit",
time="time",
first_treat="first_treat",
)

# Callaway-Sant'Anna with varying base (default: consecutive comparisons)
cs_varying = CallawaySantAnna(base_period="varying")
cs_varying_results = cs_varying.fit(
data,
outcome="outcome",
unit="unit",
time="time",
first_treat="first_treat",
aggregate="event_study",
)

# Callaway-Sant'Anna with universal base (all compare to g-1)
cs_universal = CallawaySantAnna(base_period="universal")
cs_universal_results = cs_universal.fit(
data,
outcome="outcome",
unit="unit",
time="time",
first_treat="first_treat",
aggregate="event_study",
)

# Find common event times
sa_times = set(sa_results.event_study_effects.keys())
cs_varying_times = set(cs_varying_results.event_study_effects.keys())
cs_universal_times = set(cs_universal_results.event_study_effects.keys())
common_times = sa_times & cs_varying_times & cs_universal_times

# Separate pre and post periods
pre_times = [t for t in common_times if t < 0]
post_times = [t for t in common_times if t > 0]

# Post-treatment effects should match across all methods
for t in post_times:
sa_eff = sa_results.event_study_effects[t]["effect"]
cs_vary_eff = cs_varying_results.event_study_effects[t]["effect"]
cs_univ_eff = cs_universal_results.event_study_effects[t]["effect"]

# All three should be similar for post-treatment
max_se = max(
sa_results.event_study_effects[t]["se"],
cs_varying_results.event_study_effects[t]["se"],
cs_universal_results.event_study_effects[t]["se"],
)
assert abs(sa_eff - cs_vary_eff) < 3 * max_se, (
f"Post-period t={t}: SA and CS(varying) differ too much: "
f"SA={sa_eff:.4f}, CS(vary)={cs_vary_eff:.4f}"
)
assert abs(sa_eff - cs_univ_eff) < 3 * max_se, (
f"Post-period t={t}: SA and CS(universal) differ too much: "
f"SA={sa_eff:.4f}, CS(univ)={cs_univ_eff:.4f}"
)

# Require pre-periods exist for this test to be meaningful
assert len(pre_times) > 0, (
"Test requires pre-treatment periods to validate methodology difference. "
"Increase n_periods or adjust cohort timing in test data."
)

# Compute total absolute differences
total_diff_varying = 0.0
total_diff_universal = 0.0
for t in pre_times:
sa_eff = sa_results.event_study_effects[t]["effect"]
cs_vary_eff = cs_varying_results.event_study_effects[t]["effect"]
cs_univ_eff = cs_universal_results.event_study_effects[t]["effect"]

total_diff_varying += abs(sa_eff - cs_vary_eff)
total_diff_universal += abs(sa_eff - cs_univ_eff)

# CS(universal) should generally be closer to SA than CS(varying)
# for pre-treatment periods (due to similar reference period approach)
# Allow some tolerance since weighting schemes still differ
assert total_diff_universal <= total_diff_varying + 0.5, (
f"Expected CS(universal) to be closer to SA than CS(varying) for pre-periods. "
f"Got: CS(univ)-SA diff={total_diff_universal:.4f}, "
f"CS(vary)-SA diff={total_diff_varying:.4f}"
)

def test_agreement_under_homogeneous_effects(self):
"""Test that SA and CS agree under homogeneous treatment effects."""
from diff_diff import CallawaySantAnna
Expand Down