From 17e4aa533f2f0748dcebbdf9debbecdc80662003 Mon Sep 17 00:00:00 2001 From: igerber Date: Thu, 22 Jan 2026 15:45:14 -0500 Subject: [PATCH 1/2] Explain CS vs SA pre-period discrepancy in Tutorial 02 - Add detailed explanation in Section 14 of why pre-treatment effects differ between Callaway-Sant'Anna (varying base) and Sun-Abraham (fixed reference period e=-1), while post-treatment effects match - Enhance comparison code to show CS with both base_period options - Add point #10 to tutorial summary documenting expected behavior - Add test documenting this methodological difference - Update REGISTRY.md with cross-reference note Co-Authored-By: Claude Opus 4.5 --- docs/methodology/REGISTRY.md | 5 ++ docs/tutorials/02_staggered_did.ipynb | 100 ++------------------------ tests/test_sun_abraham.py | 97 +++++++++++++++++++++++++ 3 files changed, 107 insertions(+), 95 deletions(-) diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index e27bc5d..3fd6487 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -229,6 +229,11 @@ Aggregations: - "universal": All comparisons use g-anticipation-1 as base - Both produce identical post-treatment ATT(g,t); differ only pre-treatment - Matches R `did::att_gt()` base_period parameter +- Base period interaction with Sun-Abraham comparison: + - CS with `base_period="varying"` produces different pre-treatment estimates than SA + - This is expected: CS uses consecutive comparisons, SA uses fixed reference (e=-1) + - Use `base_period="universal"` for methodologically comparable pre-treatment effects + - Post-treatment effects match regardless of base_period setting - Control group with `control_group="not_yet_treated"`: - Always excludes cohort g from controls when computing ATT(g,t) - This applies to both pre-treatment (t < g) and post-treatment (t >= g) periods diff --git a/docs/tutorials/02_staggered_did.ipynb b/docs/tutorials/02_staggered_did.ipynb index ef89fb3..2454286 100644 --- a/docs/tutorials/02_staggered_did.ipynb +++ b/docs/tutorials/02_staggered_did.ipynb @@ -3,31 +3,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "# Staggered Difference-in-Differences\n", - "\n", - "This notebook demonstrates how to handle **staggered treatment adoption** using modern DiD estimators. In staggered DiD settings:\n", - "\n", - "- Different units get treated at different times\n", - "- Traditional TWFE can give biased estimates due to \"forbidden comparisons\"\n", - "- Modern estimators compute group-time specific effects and aggregate them properly\n", - "\n", - "We'll cover:\n", - "1. Understanding staggered adoption\n", - "2. The problem with TWFE (and Goodman-Bacon decomposition)\n", - "3. The Callaway-Sant'Anna estimator\n", - "4. Group-time effects ATT(g,t)\n", - "5. Aggregating effects (simple, group, event-study)\n", - "6. Bootstrap inference for valid standard errors\n", - "7. Visualization\n", - "8. **Pre-treatment effects and parallel trends testing**\n", - "9. Different control group options\n", - "10. Handling anticipation effects\n", - "11. Adding covariates\n", - "12. Comparing with MultiPeriodDiD\n", - "13. Sun-Abraham interaction-weighted estimator\n", - "14. Comparing CS and SA as a robustness check" - ] + "source": "# Staggered Difference-in-Differences\n\nThis notebook demonstrates how to handle **staggered treatment adoption** using modern DiD estimators. In staggered DiD settings:\n\n- Different units get treated at different times\n- Traditional TWFE can give biased estimates due to \"forbidden comparisons\"\n- Modern estimators compute group-time specific effects and aggregate them properly\n\nWe'll cover:\n1. Understanding staggered adoption\n2. The problem with TWFE (and Goodman-Bacon decomposition)\n3. The Callaway-Sant'Anna estimator\n4. Group-time effects ATT(g,t)\n5. Aggregating effects (simple, group, event-study)\n6. Bootstrap inference for valid standard errors\n7. Visualization\n8. Pre-treatment effects and parallel trends testing\n9. Different control group options\n10. Handling anticipation effects\n11. Adding covariates\n12. Comparing with MultiPeriodDiD\n13. Sun-Abraham interaction-weighted estimator\n14. Comparing CS and SA as a robustness check" }, { "cell_type": "code", @@ -834,85 +810,19 @@ { "cell_type": "markdown", "metadata": {}, - "source": [ - "## 14. Comparing CS and SA as a Robustness Check\n", - "\n", - "Running both estimators provides a useful robustness check. When they agree, results are more credible." - ] + "source": "## 14. Comparing CS and SA as a Robustness Check\n\nRunning both estimators provides a useful robustness check. When they agree, results are more credible.\n\n### Understanding Pre-Period Differences\n\nYou may notice that **post-treatment effects align closely** between CS and SA, but **pre-treatment effects can differ in magnitude and significance**. This is expected methodological behavior, not a bug.\n\n**Why the difference?**\n\n1. **Callaway-Sant'Anna with `base_period=\"varying\"` (default)**:\n - Pre-treatment effects use **consecutive period comparisons** (period t vs period t-1)\n - Each pre-period coefficient represents a one-period change\n - Smaller changes → typically smaller SEs → may not reach significance\n\n2. **Sun-Abraham**:\n - Uses a **fixed reference period** (e=-1, the period just before treatment)\n - All coefficients are deviations from this single reference\n - Pre-period coefficients show cumulative difference from the reference\n\n**To make CS pre-periods more comparable to SA**, use `base_period=\"universal\"`:\n\n```python\ncs_universal = CallawaySantAnna(base_period=\"universal\")\n```\n\nThis makes CS compare all periods to g-1 (like SA), producing more similar pre-treatment estimates." }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], - "source": [ - "# Compare overall ATT from both estimators\n", - "print(\"Robustness Check: CS vs SA\")\n", - "print(\"=\" * 50)\n", - "print(f\"{'Estimator':<25} {'Overall ATT':>12} {'SE':>10}\")\n", - "print(\"-\" * 50)\n", - "print(f\"{'Callaway-SantAnna':<25} {results_cs.overall_att:>12.4f} {results_cs.overall_se:>10.4f}\")\n", - "print(f\"{'Sun-Abraham':<25} {results_sa.overall_att:>12.4f} {results_sa.overall_se:>10.4f}\")\n", - "\n", - "# Compare event study effects\n", - "print(\"\\n\\nEvent Study Comparison:\")\n", - "print(f\"{'Rel. Time':>12} {'CS ATT':>10} {'SA ATT':>10} {'Difference':>12}\")\n", - "print(\"-\" * 50)\n", - "\n", - "# Use the pre-computed event_study_effects from results_cs\n", - "for rel_time in sorted(results_sa.event_study_effects.keys()):\n", - " sa_eff = results_sa.event_study_effects[rel_time]['effect']\n", - " if results_cs.event_study_effects and rel_time in results_cs.event_study_effects:\n", - " cs_eff = results_cs.event_study_effects[rel_time]['effect']\n", - " diff = sa_eff - cs_eff\n", - " print(f\"{rel_time:>12} {cs_eff:>10.4f} {sa_eff:>10.4f} {diff:>12.4f}\")\n", - "\n", - "print(\"\\nSimilar results indicate robust findings across estimation methods\")" - ] + "source": "# Compare overall ATT from both estimators\nprint(\"Robustness Check: CS vs SA\")\nprint(\"=\" * 60)\nprint(f\"{'Estimator':<30} {'Overall ATT':>12} {'SE':>10}\")\nprint(\"-\" * 60)\nprint(f\"{'Callaway-Sant\\\\'Anna (varying)':<30} {results_cs.overall_att:>12.4f} {results_cs.overall_se:>10.4f}\")\nprint(f\"{'Sun-Abraham':<30} {results_sa.overall_att:>12.4f} {results_sa.overall_se:>10.4f}\")\n\n# Also fit CS with universal base period for comparison\ncs_universal = CallawaySantAnna(control_group=\"never_treated\", base_period=\"universal\")\nresults_cs_univ = cs_universal.fit(\n df, outcome=\"outcome\", unit=\"unit\",\n time=\"period\", first_treat=\"first_treat\",\n aggregate=\"event_study\"\n)\n\n# Compare event study effects\nprint(\"\\n\\nEvent Study Comparison:\")\nprint(\"Note: Pre-periods differ due to base period methodology (see explanation above)\")\nprint(f\"{'Rel. Time':>10} {'CS (vary)':>12} {'CS (univ)':>12} {'SA':>10} {'Note':>20}\")\nprint(\"-\" * 70)\n\nfor rel_time in sorted(results_sa.event_study_effects.keys()):\n sa_eff = results_sa.event_study_effects[rel_time]['effect']\n cs_vary = results_cs.event_study_effects.get(rel_time, {}).get('effect', np.nan)\n cs_univ = results_cs_univ.event_study_effects.get(rel_time, {}).get('effect', np.nan)\n \n note = \"pre (differs)\" if rel_time < 0 else \"post (matches)\"\n print(f\"{rel_time:>10} {cs_vary:>12.4f} {cs_univ:>12.4f} {sa_eff:>10.4f} {note:>20}\")\n\nprint(\"\\nPost-treatment effects should be similar across all methods\")\nprint(\"Pre-treatment differences are expected due to base period methodology\")" }, { "cell_type": "markdown", "metadata": {}, - "source": [ - "## Summary\n", - "\n", - "Key takeaways:\n", - "\n", - "1. **TWFE can be biased** with staggered adoption and heterogeneous effects\n", - "2. **Goodman-Bacon decomposition** reveals *why* TWFE fails by showing:\n", - " - The implicit 2x2 comparisons and their weights\n", - " - How much weight falls on \"forbidden comparisons\" (already-treated as controls)\n", - "3. **Callaway-Sant'Anna** properly handles staggered adoption by:\n", - " - Computing group-time specific effects ATT(g,t)\n", - " - Only using valid comparison groups\n", - " - Properly aggregating effects\n", - "4. **Sun-Abraham** provides an alternative approach using:\n", - " - Interaction-weighted regression with cohort x relative-time indicators\n", - " - Different weighting scheme than CS\n", - " - More efficient under homogeneous effects\n", - "5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n", - "6. **Aggregation options**:\n", - " - `\"simple\"`: Overall ATT\n", - " - `\"group\"`: ATT by cohort\n", - " - `\"event\"`: ATT by event time (for event-study plots)\n", - "7. **Bootstrap inference** provides valid standard errors and confidence intervals:\n", - " - Use `n_bootstrap` parameter to enable multiplier bootstrap\n", - " - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n", - " - Bootstrap results include SEs, CIs, and p-values for all aggregations\n", - "8. **Pre-treatment effects** provide parallel trends diagnostics:\n", - " - Use `base_period=\"varying\"` for consecutive period comparisons\n", - " - Pre-treatment ATT(g,t) should be near zero\n", - " - 95% CIs including zero is consistent with parallel trends\n", - " - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n", - "9. **Control group choices** affect efficiency and assumptions:\n", - " - `\"never_treated\"`: Stronger parallel trends assumption\n", - " - `\"not_yet_treated\"`: Weaker assumption, uses more data\n", - "\n", - "For more details, see:\n", - "- Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. *Journal of Econometrics*.\n", - "- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*.\n", - "- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*." - ] + "source": "## Summary\n\nKey takeaways:\n\n1. **TWFE can be biased** with staggered adoption and heterogeneous effects\n2. **Goodman-Bacon decomposition** reveals *why* TWFE fails by showing:\n - The implicit 2x2 comparisons and their weights\n - How much weight falls on \"forbidden comparisons\" (already-treated as controls)\n3. **Callaway-Sant'Anna** properly handles staggered adoption by:\n - Computing group-time specific effects ATT(g,t)\n - Only using valid comparison groups\n - Properly aggregating effects\n4. **Sun-Abraham** provides an alternative approach using:\n - Interaction-weighted regression with cohort x relative-time indicators\n - Different weighting scheme than CS\n - More efficient under homogeneous effects\n5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n6. **Aggregation options**:\n - `\"simple\"`: Overall ATT\n - `\"group\"`: ATT by cohort\n - `\"event\"`: ATT by event time (for event-study plots)\n7. **Bootstrap inference** provides valid standard errors and confidence intervals:\n - Use `n_bootstrap` parameter to enable multiplier bootstrap\n - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n - Bootstrap results include SEs, CIs, and p-values for all aggregations\n8. **Pre-treatment effects** provide parallel trends diagnostics:\n - Use `base_period=\"varying\"` for consecutive period comparisons\n - Pre-treatment ATT(g,t) should be near zero\n - 95% CIs including zero is consistent with parallel trends\n - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n9. **Control group choices** affect efficiency and assumptions:\n - `\"never_treated\"`: Stronger parallel trends assumption\n - `\"not_yet_treated\"`: Weaker assumption, uses more data\n10. **CS vs SA pre-period differences are expected**:\n - Post-treatment effects should be similar (robustness check)\n - Pre-treatment effects differ due to base period methodology\n - CS (varying): consecutive comparisons → one-period changes\n - SA: fixed reference (e=-1) → cumulative deviations\n - Use `base_period=\"universal\"` in CS for comparable pre-periods\n\nFor more details, see:\n- Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. *Journal of Econometrics*.\n- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*.\n- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*." } ], "metadata": { @@ -922,4 +832,4 @@ }, "nbformat": 4, "nbformat_minor": 4 -} +} \ No newline at end of file diff --git a/tests/test_sun_abraham.py b/tests/test_sun_abraham.py index c34ae26..067e4e6 100644 --- a/tests/test_sun_abraham.py +++ b/tests/test_sun_abraham.py @@ -526,6 +526,103 @@ def test_both_recover_treatment_effect(self): assert abs(sa_results.overall_att - 3.0) < 2.0 assert abs(cs_results.overall_att - 3.0) < 2.0 + def test_pre_period_difference_expected_between_cs_sa(self): + """Pre-periods differ between CS (varying) and SA; post-periods match. + + This is expected: CS uses consecutive comparisons, SA uses fixed reference. + CS with base_period="universal" should be closer to SA for pre-periods. + """ + from diff_diff import CallawaySantAnna + + data = generate_staggered_data( + n_units=200, treatment_effect=3.0, seed=42 + ) + + # Sun-Abraham (uses fixed reference period e=-1) + sa = SunAbraham() + sa_results = sa.fit( + data, + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + ) + + # Callaway-Sant'Anna with varying base (default: consecutive comparisons) + cs_varying = CallawaySantAnna(base_period="varying") + cs_varying_results = cs_varying.fit( + data, + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + aggregate="event_study", + ) + + # Callaway-Sant'Anna with universal base (all compare to g-1) + cs_universal = CallawaySantAnna(base_period="universal") + cs_universal_results = cs_universal.fit( + data, + outcome="outcome", + unit="unit", + time="time", + first_treat="first_treat", + aggregate="event_study", + ) + + # Find common event times + sa_times = set(sa_results.event_study_effects.keys()) + cs_varying_times = set(cs_varying_results.event_study_effects.keys()) + cs_universal_times = set(cs_universal_results.event_study_effects.keys()) + common_times = sa_times & cs_varying_times & cs_universal_times + + # Separate pre and post periods + pre_times = [t for t in common_times if t < 0] + post_times = [t for t in common_times if t > 0] + + # Post-treatment effects should match across all methods + for t in post_times: + sa_eff = sa_results.event_study_effects[t]["effect"] + cs_vary_eff = cs_varying_results.event_study_effects[t]["effect"] + cs_univ_eff = cs_universal_results.event_study_effects[t]["effect"] + + # All three should be similar for post-treatment + max_se = max( + sa_results.event_study_effects[t]["se"], + cs_varying_results.event_study_effects[t]["se"], + cs_universal_results.event_study_effects[t]["se"], + ) + assert abs(sa_eff - cs_vary_eff) < 3 * max_se, ( + f"Post-period t={t}: SA and CS(varying) differ too much: " + f"SA={sa_eff:.4f}, CS(vary)={cs_vary_eff:.4f}" + ) + assert abs(sa_eff - cs_univ_eff) < 3 * max_se, ( + f"Post-period t={t}: SA and CS(universal) differ too much: " + f"SA={sa_eff:.4f}, CS(univ)={cs_univ_eff:.4f}" + ) + + # For pre-treatment periods, CS(universal) should be closer to SA than CS(varying) + # because both SA and CS(universal) use a fixed reference period + if len(pre_times) > 0: + total_diff_varying = 0.0 + total_diff_universal = 0.0 + for t in pre_times: + sa_eff = sa_results.event_study_effects[t]["effect"] + cs_vary_eff = cs_varying_results.event_study_effects[t]["effect"] + cs_univ_eff = cs_universal_results.event_study_effects[t]["effect"] + + total_diff_varying += abs(sa_eff - cs_vary_eff) + total_diff_universal += abs(sa_eff - cs_univ_eff) + + # CS(universal) should generally be closer to SA than CS(varying) + # for pre-treatment periods (due to similar reference period approach) + # Note: This is a soft assertion - in some data configurations + # the relationship may not hold perfectly due to weighting differences + # The key point is that the methodological difference exists + assert ( + len(pre_times) > 0 + ), "Test requires pre-treatment periods to verify methodology difference" + def test_agreement_under_homogeneous_effects(self): """Test that SA and CS agree under homogeneous treatment effects.""" from diff_diff import CallawaySantAnna From 2ac66057699b0b81b5a8156083bc21675377686c Mon Sep 17 00:00:00 2001 From: igerber Date: Thu, 22 Jan 2026 16:26:46 -0500 Subject: [PATCH 2/2] Address code review feedback for PR #102 MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Fix misleading "smaller SEs" wording in Tutorial 02 Cell 43 → Changed to "lower t-statistics" which is more accurate - Update SA reference period notation throughout docs → e=-1 → e=-1-anticipation (defaults to e=-1 when anticipation=0) - Update REGISTRY.md SunAbraham section with correct reference period - Add meaningful assertions to pre-period test → Test now requires pre-periods exist (not vacuous) → Assert CS(universal) closer to SA than CS(varying) for pre-periods Co-Authored-By: Claude Opus 4.5 --- docs/methodology/REGISTRY.md | 4 +-- docs/tutorials/02_staggered_did.ipynb | 4 +-- tests/test_sun_abraham.py | 46 +++++++++++++++------------ 3 files changed, 29 insertions(+), 25 deletions(-) diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 3fd6487..fd11469 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -231,7 +231,7 @@ Aggregations: - Matches R `did::att_gt()` base_period parameter - Base period interaction with Sun-Abraham comparison: - CS with `base_period="varying"` produces different pre-treatment estimates than SA - - This is expected: CS uses consecutive comparisons, SA uses fixed reference (e=-1) + - This is expected: CS uses consecutive comparisons, SA uses fixed reference (e=-1-anticipation) - Use `base_period="universal"` for methodologically comparable pre-treatment effects - Post-treatment effects match regardless of base_period setting - Control group with `control_group="not_yet_treated"`: @@ -262,7 +262,7 @@ Aggregations: *Assumption checks / warnings:* - Requires never-treated units as control group - Warns if treatment effects may be heterogeneous across cohorts (which the method handles) -- Reference period must be specified (default: e=-1) +- Reference period: e=-1-anticipation (defaults to e=-1 when anticipation=0) *Estimator equation (as implemented):* diff --git a/docs/tutorials/02_staggered_did.ipynb b/docs/tutorials/02_staggered_did.ipynb index 2454286..62d913d 100644 --- a/docs/tutorials/02_staggered_did.ipynb +++ b/docs/tutorials/02_staggered_did.ipynb @@ -810,7 +810,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": "## 14. Comparing CS and SA as a Robustness Check\n\nRunning both estimators provides a useful robustness check. When they agree, results are more credible.\n\n### Understanding Pre-Period Differences\n\nYou may notice that **post-treatment effects align closely** between CS and SA, but **pre-treatment effects can differ in magnitude and significance**. This is expected methodological behavior, not a bug.\n\n**Why the difference?**\n\n1. **Callaway-Sant'Anna with `base_period=\"varying\"` (default)**:\n - Pre-treatment effects use **consecutive period comparisons** (period t vs period t-1)\n - Each pre-period coefficient represents a one-period change\n - Smaller changes → typically smaller SEs → may not reach significance\n\n2. **Sun-Abraham**:\n - Uses a **fixed reference period** (e=-1, the period just before treatment)\n - All coefficients are deviations from this single reference\n - Pre-period coefficients show cumulative difference from the reference\n\n**To make CS pre-periods more comparable to SA**, use `base_period=\"universal\"`:\n\n```python\ncs_universal = CallawaySantAnna(base_period=\"universal\")\n```\n\nThis makes CS compare all periods to g-1 (like SA), producing more similar pre-treatment estimates." + "source": "## 14. Comparing CS and SA as a Robustness Check\n\nRunning both estimators provides a useful robustness check. When they agree, results are more credible.\n\n### Understanding Pre-Period Differences\n\nYou may notice that **post-treatment effects align closely** between CS and SA, but **pre-treatment effects can differ in magnitude and significance**. This is expected methodological behavior, not a bug.\n\n**Why the difference?**\n\n1. **Callaway-Sant'Anna with `base_period=\"varying\"` (default)**:\n - Pre-treatment effects use **consecutive period comparisons** (period t vs period t-1)\n - Each pre-period coefficient represents a one-period change\n - These smaller incremental changes often yield lower t-statistics\n\n2. **Sun-Abraham**:\n - Uses a **fixed reference period** (e=-1 when anticipation=0, or e=-1-anticipation otherwise)\n - All coefficients are deviations from this single reference\n - Pre-period coefficients show cumulative difference from the reference\n\n**To make CS pre-periods more comparable to SA**, use `base_period=\"universal\"`:\n\n```python\ncs_universal = CallawaySantAnna(base_period=\"universal\")\n```\n\nThis makes CS compare all periods to g-1 (like SA), producing more similar pre-treatment estimates." }, { "cell_type": "code", @@ -822,7 +822,7 @@ { "cell_type": "markdown", "metadata": {}, - "source": "## Summary\n\nKey takeaways:\n\n1. **TWFE can be biased** with staggered adoption and heterogeneous effects\n2. **Goodman-Bacon decomposition** reveals *why* TWFE fails by showing:\n - The implicit 2x2 comparisons and their weights\n - How much weight falls on \"forbidden comparisons\" (already-treated as controls)\n3. **Callaway-Sant'Anna** properly handles staggered adoption by:\n - Computing group-time specific effects ATT(g,t)\n - Only using valid comparison groups\n - Properly aggregating effects\n4. **Sun-Abraham** provides an alternative approach using:\n - Interaction-weighted regression with cohort x relative-time indicators\n - Different weighting scheme than CS\n - More efficient under homogeneous effects\n5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n6. **Aggregation options**:\n - `\"simple\"`: Overall ATT\n - `\"group\"`: ATT by cohort\n - `\"event\"`: ATT by event time (for event-study plots)\n7. **Bootstrap inference** provides valid standard errors and confidence intervals:\n - Use `n_bootstrap` parameter to enable multiplier bootstrap\n - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n - Bootstrap results include SEs, CIs, and p-values for all aggregations\n8. **Pre-treatment effects** provide parallel trends diagnostics:\n - Use `base_period=\"varying\"` for consecutive period comparisons\n - Pre-treatment ATT(g,t) should be near zero\n - 95% CIs including zero is consistent with parallel trends\n - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n9. **Control group choices** affect efficiency and assumptions:\n - `\"never_treated\"`: Stronger parallel trends assumption\n - `\"not_yet_treated\"`: Weaker assumption, uses more data\n10. **CS vs SA pre-period differences are expected**:\n - Post-treatment effects should be similar (robustness check)\n - Pre-treatment effects differ due to base period methodology\n - CS (varying): consecutive comparisons → one-period changes\n - SA: fixed reference (e=-1) → cumulative deviations\n - Use `base_period=\"universal\"` in CS for comparable pre-periods\n\nFor more details, see:\n- Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. *Journal of Econometrics*.\n- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*.\n- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*." + "source": "## Summary\n\nKey takeaways:\n\n1. **TWFE can be biased** with staggered adoption and heterogeneous effects\n2. **Goodman-Bacon decomposition** reveals *why* TWFE fails by showing:\n - The implicit 2x2 comparisons and their weights\n - How much weight falls on \"forbidden comparisons\" (already-treated as controls)\n3. **Callaway-Sant'Anna** properly handles staggered adoption by:\n - Computing group-time specific effects ATT(g,t)\n - Only using valid comparison groups\n - Properly aggregating effects\n4. **Sun-Abraham** provides an alternative approach using:\n - Interaction-weighted regression with cohort x relative-time indicators\n - Different weighting scheme than CS\n - More efficient under homogeneous effects\n5. **Run both CS and SA** as a robustness check—when they agree, results are more credible\n6. **Aggregation options**:\n - `\"simple\"`: Overall ATT\n - `\"group\"`: ATT by cohort\n - `\"event\"`: ATT by event time (for event-study plots)\n7. **Bootstrap inference** provides valid standard errors and confidence intervals:\n - Use `n_bootstrap` parameter to enable multiplier bootstrap\n - Choose weight type: `'rademacher'`, `'mammen'`, or `'webb'`\n - Bootstrap results include SEs, CIs, and p-values for all aggregations\n8. **Pre-treatment effects** provide parallel trends diagnostics:\n - Use `base_period=\"varying\"` for consecutive period comparisons\n - Pre-treatment ATT(g,t) should be near zero\n - 95% CIs including zero is consistent with parallel trends\n - See Tutorial 07 for pre-trends power analysis (Roth 2022)\n9. **Control group choices** affect efficiency and assumptions:\n - `\"never_treated\"`: Stronger parallel trends assumption\n - `\"not_yet_treated\"`: Weaker assumption, uses more data\n10. **CS vs SA pre-period differences are expected**:\n - Post-treatment effects should be similar (robustness check)\n - Pre-treatment effects differ due to base period methodology\n - CS (varying): consecutive comparisons → one-period changes\n - SA: fixed reference (e=-1-anticipation) → cumulative deviations\n - Use `base_period=\"universal\"` in CS for comparable pre-periods\n\nFor more details, see:\n- Callaway, B., & Sant'Anna, P. H. (2021). Difference-in-differences with multiple time periods. *Journal of Econometrics*.\n- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. *Journal of Econometrics*.\n- Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. *Journal of Econometrics*." } ], "metadata": { diff --git a/tests/test_sun_abraham.py b/tests/test_sun_abraham.py index 067e4e6..8a75e4e 100644 --- a/tests/test_sun_abraham.py +++ b/tests/test_sun_abraham.py @@ -601,27 +601,31 @@ def test_pre_period_difference_expected_between_cs_sa(self): f"SA={sa_eff:.4f}, CS(univ)={cs_univ_eff:.4f}" ) - # For pre-treatment periods, CS(universal) should be closer to SA than CS(varying) - # because both SA and CS(universal) use a fixed reference period - if len(pre_times) > 0: - total_diff_varying = 0.0 - total_diff_universal = 0.0 - for t in pre_times: - sa_eff = sa_results.event_study_effects[t]["effect"] - cs_vary_eff = cs_varying_results.event_study_effects[t]["effect"] - cs_univ_eff = cs_universal_results.event_study_effects[t]["effect"] - - total_diff_varying += abs(sa_eff - cs_vary_eff) - total_diff_universal += abs(sa_eff - cs_univ_eff) - - # CS(universal) should generally be closer to SA than CS(varying) - # for pre-treatment periods (due to similar reference period approach) - # Note: This is a soft assertion - in some data configurations - # the relationship may not hold perfectly due to weighting differences - # The key point is that the methodological difference exists - assert ( - len(pre_times) > 0 - ), "Test requires pre-treatment periods to verify methodology difference" + # Require pre-periods exist for this test to be meaningful + assert len(pre_times) > 0, ( + "Test requires pre-treatment periods to validate methodology difference. " + "Increase n_periods or adjust cohort timing in test data." + ) + + # Compute total absolute differences + total_diff_varying = 0.0 + total_diff_universal = 0.0 + for t in pre_times: + sa_eff = sa_results.event_study_effects[t]["effect"] + cs_vary_eff = cs_varying_results.event_study_effects[t]["effect"] + cs_univ_eff = cs_universal_results.event_study_effects[t]["effect"] + + total_diff_varying += abs(sa_eff - cs_vary_eff) + total_diff_universal += abs(sa_eff - cs_univ_eff) + + # CS(universal) should generally be closer to SA than CS(varying) + # for pre-treatment periods (due to similar reference period approach) + # Allow some tolerance since weighting schemes still differ + assert total_diff_universal <= total_diff_varying + 0.5, ( + f"Expected CS(universal) to be closer to SA than CS(varying) for pre-periods. " + f"Got: CS(univ)-SA diff={total_diff_universal:.4f}, " + f"CS(vary)-SA diff={total_diff_varying:.4f}" + ) def test_agreement_under_homogeneous_effects(self): """Test that SA and CS agree under homogeneous treatment effects."""