Explain CS vs SA pre-period discrepancy in Tutorial 02 #102

igerber · 2026-01-22T20:45:29Z

Summary

Add detailed explanation in Section 14 of Tutorial 02 explaining why pre-treatment effects differ between CS and SA
Enhance comparison code to show CS with both base_period="varying" and base_period="universal" options
Add point Add data preparation utilities for easier onboarding #10 to tutorial summary documenting expected behavior
Add test test_pre_period_difference_expected_between_cs_sa documenting this methodological difference
Update REGISTRY.md with cross-reference note about base_period + SA comparison

Methodology references (required if estimator / math changes)

Method name(s): Callaway-Sant'Anna base period handling, Sun-Abraham fixed reference period
Paper / source link(s):
- Callaway, B., & Sant'Anna, P.H.C. (2021). Difference-in-Differences with multiple time periods. Journal of Econometrics
- Sun, L., & Abraham, S. (2021). Estimating dynamic treatment effects in event studies with heterogeneous treatment effects. Journal of Econometrics
Any intentional deviations from the source (and why): None - this PR documents expected methodological differences

Validation

Tests added/updated: tests/test_sun_abraham.py - added test_pre_period_difference_expected_between_cs_sa
Backtest / simulation / notebook evidence (if applicable): Tutorial 02 updated with enhanced comparison output

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

- Add detailed explanation in Section 14 of why pre-treatment effects differ between Callaway-Sant'Anna (varying base) and Sun-Abraham (fixed reference period e=-1), while post-treatment effects match - Enhance comparison code to show CS with both base_period options - Add point #10 to tutorial summary documenting expected behavior - Add test documenting this methodological difference - Update REGISTRY.md with cross-reference note Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

github-actions · 2026-01-22T20:51:04Z

Overall assessment: ⚠️ Needs changes

Executive summary:

Methodology registry says SA uses fixed reference e=-1, but implementation shifts with anticipation; update docs for accuracy.
New pre-period test computes differences but never asserts them and passes even if no pre-periods exist.
Tutorial explanation about “smaller SEs” is misleading; rephrase to avoid incorrect inference intuition.

Methodology

P2: REGISTRY states SA uses fixed reference period e=-1, but implementation sets self._reference_period = -1 - anticipation. Impact: methodology registry misstates the reference period when anticipation ≠ 0, which can mislead CS vs SA comparability guidance. Fix: update registry to say the reference is e=-1-anticipation (or “last pre-treatment period”; note default e=-1 when anticipation=0). docs/methodology/REGISTRY.md:L232-L235, diff_diff/sun_abraham.py:L540-L542

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P2: test_pre_period_difference_expected_between_cs_sa computes pre-period diffs but never asserts them; if pre_times is empty, the test still passes. Impact: regression coverage for the intended methodological difference is effectively absent. Fix: assert pre_times is non-empty and add a real check (e.g., total_diff_universal <= total_diff_varying + tol or at least one pre-period differs above a threshold), or remove the unused diff calculations. tests/test_sun_abraham.py:L529-L624
P3: Tutorial claims “Smaller changes → typically smaller SEs → may not reach significance,” which is internally inconsistent and not supported by the methodology. Impact: users get incorrect inference intuition. Fix: rephrase to “smaller estimates → smaller t-stats → may not reach significance,” or remove the SE statement. docs/tutorials/02_staggered_did.ipynb:L813-L825
P3: Tutorial text and summary state SA uses fixed reference e=-1 without noting anticipation. Impact: same mismatch as registry for users following the tutorial with anticipation settings. Fix: align to e=-1-anticipation / “last pre-treatment period.” docs/tutorials/02_staggered_did.ipynb:L813-L826, diff_diff/sun_abraham.py:L540-L542

Open questions / assumptions:

Do you want the pre-period comparison test to enforce “CS(universal) closer to SA” (with a tolerance), or simply assert that pre-period differences exist?

- Fix misleading "smaller SEs" wording in Tutorial 02 Cell 43 → Changed to "lower t-statistics" which is more accurate - Update SA reference period notation throughout docs → e=-1 → e=-1-anticipation (defaults to e=-1 when anticipation=0) - Update REGISTRY.md SunAbraham section with correct reference period - Add meaningful assertions to pre-period test → Test now requires pre-periods exist (not vacuous) → Assert CS(universal) closer to SA than CS(varying) for pre-periods Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

igerber · 2026-01-22T21:27:24Z

/ai-review

github-actions · 2026-01-22T21:36:18Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2ac66057699b0b81b5a8156083bc21675377686c

Overall assessment: ✅ Looks good

Executive Summary

Methodology notes in the registry align with base_period handling and SA reference-period logic; no estimator changes.
Two low-severity test robustness gaps in the new CS/SA comparison test (post-period guard, fixed tolerance).
Residual risk: future changes to event-study support or variance estimation could make the new test pass vacuously or fail despite correct behavior.

Methodology

No findings; registry update matches base_period and SA reference-period behavior (docs/methodology/REGISTRY.md:L227-L266, diff_diff/staggered.py:L212-L220, diff_diff/sun_abraham.py:L540-L548).

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

P3 Impact: Post-treatment alignment checks are skipped if post_times is empty, so the test can pass without validating its core claim; Fix: assert len(post_times) > 0 (or skip with a clear message) before the loop. tests/test_sun_abraham.py:L579-L608
P3 Impact: The +0.5 absolute-difference threshold is arbitrary and may become brittle if the DGP or weighting changes; Fix: compare per-period differences against SE-based bounds (as done for post periods) or scale the tolerance with a documented rationale. tests/test_sun_abraham.py:L610-L628

igerber merged commit 5a3af95 into main Jan 22, 2026
4 checks passed

igerber deleted the docs/cs-sa-preperiod-explanation branch January 22, 2026 22:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explain CS vs SA pre-period discrepancy in Tutorial 02 #102

Explain CS vs SA pre-period discrepancy in Tutorial 02 #102

Uh oh!

igerber commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

igerber commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Explain CS vs SA pre-period discrepancy in Tutorial 02 #102

Explain CS vs SA pre-period discrepancy in Tutorial 02 #102

Uh oh!

Conversation

igerber commented Jan 22, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

igerber commented Jan 22, 2026

Uh oh!

github-actions bot commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants