Port R did package tests, fix not_yet_treated bugs#207
Conversation
Port ~21 tests from R's `did` package (bcallaway11/did) to validate CallawaySantAnna estimator. Tests organized as Tier 1 (Python DGP, loose tolerance, always-run) and Tier 2 (R golden values, strict tolerance). Found and fixed two bugs via the ported tests: - not_yet_treated control group now works without never-treated units (requires ≥2 treatment cohorts) - not_yet_treated control mask now uses max(t, base_period) to prevent contamination when base_period="universal" exceeds evaluation period Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good No unmitigated P0/P1 findings in the estimator change. The methodology-sensitive Executive Summary
Methodology
Code Quality
Performance No findings. Maintainability No findings beyond the metadata issue above. Tech Debt
Security No findings. Documentation/Tests
|
…d panel=FALSE golden values - Rename summary label "Control units:" → "Never-treated units:" in CallawaySantAnnaResults to avoid confusion when not_yet_treated control group has zero never-treated units - Clarify n_control_units docstring to note it excludes not-yet-treated dynamic controls - Remove panel=FALSE from scenarios 4 and 5 in R golden value generator so golden values match the Python panel estimator (previously overstated validation) - Regenerate csdid_golden_values.json Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. The methodology-sensitive estimator change is aligned with the local Methodology Registry, the prior re-review issues appear addressed, and the remaining concern is a minor test-coverage gap rather than a correctness defect. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
didpackage (bcallaway11/did) to cross-validate CallawaySantAnna estimatornot_yet_treatedcontrol group now works without never-treated units (requires ≥2 cohorts)not_yet_treatedcontrol mask usesmax(t, base_period)to prevent contamination whenbase_period="universal"exceeds evaluation periodMethodology references (required if estimator / math changes)
did::att_gt()didpackage behaviorValidation
tests/test_csdid_ported.py— 30 tests (20 Tier 1 + 10 Tier 2), all passingtests/helpers/csdid_dgp.py— R DGP translation for Tier 1 testsbenchmarks/R/generate_csdid_test_values.R— generates golden values for Tier 2benchmarks/data/csdid_golden_values.json— committed golden values (786KB)test_staggered.py+test_methodology_callaway.pyDIFF_DIFF_BACKEND=python pytest tests/test_csdid_ported.py— all passSecurity / privacy
🤖 Generated with Claude Code