Skip to content

Conversation

@igerber
Copy link
Owner

@igerber igerber commented Jan 22, 2026

Summary

  • Add detailed explanation in Section 14 of Tutorial 02 explaining why pre-treatment effects differ between CS and SA
  • Enhance comparison code to show CS with both base_period="varying" and base_period="universal" options
  • Add point Add data preparation utilities for easier onboarding #10 to tutorial summary documenting expected behavior
  • Add test test_pre_period_difference_expected_between_cs_sa documenting this methodological difference
  • Update REGISTRY.md with cross-reference note about base_period + SA comparison

Methodology references (required if estimator / math changes)

  • Method name(s): Callaway-Sant'Anna base period handling, Sun-Abraham fixed reference period
  • Paper / source link(s):
    • Callaway, B., & Sant'Anna, P.H.C. (2021). Difference-in-Differences with multiple time periods. Journal of Econometrics
    • Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics
  • Any intentional deviations from the source (and why): None - this PR documents expected methodological differences

Validation

  • Tests added/updated: tests/test_sun_abraham.py - added test_pre_period_difference_expected_between_cs_sa
  • Backtest / simulation / notebook evidence (if applicable): Tutorial 02 updated with enhanced comparison output

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

- Add detailed explanation in Section 14 of why pre-treatment effects
  differ between Callaway-Sant'Anna (varying base) and Sun-Abraham
  (fixed reference period e=-1), while post-treatment effects match
- Enhance comparison code to show CS with both base_period options
- Add point #10 to tutorial summary documenting expected behavior
- Add test documenting this methodological difference
- Update REGISTRY.md with cross-reference note

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link

Overall assessment: ⚠️ Needs changes

Executive summary:

  • Methodology registry says SA uses fixed reference e=-1, but implementation shifts with anticipation; update docs for accuracy.
  • New pre-period test computes differences but never asserts them and passes even if no pre-periods exist.
  • Tutorial explanation about “smaller SEs” is misleading; rephrase to avoid incorrect inference intuition.

Methodology

  • P2: REGISTRY states SA uses fixed reference period e=-1, but implementation sets self._reference_period = -1 - anticipation. Impact: methodology registry misstates the reference period when anticipation ≠ 0, which can mislead CS vs SA comparability guidance. Fix: update registry to say the reference is e=-1-anticipation (or “last pre-treatment period”; note default e=-1 when anticipation=0). docs/methodology/REGISTRY.md:L232-L235, diff_diff/sun_abraham.py:L540-L542

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P2: test_pre_period_difference_expected_between_cs_sa computes pre-period diffs but never asserts them; if pre_times is empty, the test still passes. Impact: regression coverage for the intended methodological difference is effectively absent. Fix: assert pre_times is non-empty and add a real check (e.g., total_diff_universal <= total_diff_varying + tol or at least one pre-period differs above a threshold), or remove the unused diff calculations. tests/test_sun_abraham.py:L529-L624
  • P3: Tutorial claims “Smaller changes → typically smaller SEs → may not reach significance,” which is internally inconsistent and not supported by the methodology. Impact: users get incorrect inference intuition. Fix: rephrase to “smaller estimates → smaller t-stats → may not reach significance,” or remove the SE statement. docs/tutorials/02_staggered_did.ipynb:L813-L825
  • P3: Tutorial text and summary state SA uses fixed reference e=-1 without noting anticipation. Impact: same mismatch as registry for users following the tutorial with anticipation settings. Fix: align to e=-1-anticipation / “last pre-treatment period.” docs/tutorials/02_staggered_did.ipynb:L813-L826, diff_diff/sun_abraham.py:L540-L542

Open questions / assumptions:

  • Do you want the pre-period comparison test to enforce “CS(universal) closer to SA” (with a tolerance), or simply assert that pre-period differences exist?

- Fix misleading "smaller SEs" wording in Tutorial 02 Cell 43
  → Changed to "lower t-statistics" which is more accurate
- Update SA reference period notation throughout docs
  → e=-1 → e=-1-anticipation (defaults to e=-1 when anticipation=0)
- Update REGISTRY.md SunAbraham section with correct reference period
- Add meaningful assertions to pre-period test
  → Test now requires pre-periods exist (not vacuous)
  → Assert CS(universal) closer to SA than CS(varying) for pre-periods

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@igerber
Copy link
Owner Author

igerber commented Jan 22, 2026

/ai-review

@github-actions
Copy link

🔁 AI review rerun (requested by @igerber)

Head SHA: 2ac66057699b0b81b5a8156083bc21675377686c


Overall assessment: ✅ Looks good

Executive Summary

  • Methodology notes in the registry align with base_period handling and SA reference-period logic; no estimator changes.
  • Two low-severity test robustness gaps in the new CS/SA comparison test (post-period guard, fixed tolerance).
  • Residual risk: future changes to event-study support or variance estimation could make the new test pass vacuously or fail despite correct behavior.

Methodology

  • No findings; registry update matches base_period and SA reference-period behavior (docs/methodology/REGISTRY.md:L227-L266, diff_diff/staggered.py:L212-L220, diff_diff/sun_abraham.py:L540-L548).

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • P3 Impact: Post-treatment alignment checks are skipped if post_times is empty, so the test can pass without validating its core claim; Fix: assert len(post_times) > 0 (or skip with a clear message) before the loop. tests/test_sun_abraham.py:L579-L608
  • P3 Impact: The +0.5 absolute-difference threshold is arbitrary and may become brittle if the DGP or weighting changes; Fix: compare per-period differences against SE-based bounds (as done for post periods) or scale the tolerance with a documented rationale. tests/test_sun_abraham.py:L610-L628

@igerber igerber merged commit 5a3af95 into main Jan 22, 2026
4 checks passed
@igerber igerber deleted the docs/cs-sa-preperiod-explanation branch January 22, 2026 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants