Add EDID validation tests against paper results#221
Conversation
Validate the EfficientDiD estimator against published results from the paper's Table 6 (HRS empirical application) and Tables 4/5 patterns (Compustat MC simulations). HRS replication: point estimates match within 0.1-1.3% for all targets except ATT(9,10) which is a near-zero noisy estimate (90 ± 641). The CS cross-validation matches to <$1 on all ATT(g,t), confirming data loading correctness. Fixture is a 656-individual subset of the Dobkin et al. (2018) publicly available replication kit. MC simulations validate unbiasedness, RMSE dominance over CS, efficiency gains increasing with serial correlation, ~95% coverage, and SE calibration across rho values. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P2: Assert exact cohort counts (656/252/176/163/65) and wave support since the CSV fixture is deterministic — approximate tolerances could mask fixture drift. P3: Derive _TRUE_ES_AVG_COMPUSTAT programmatically from DGP parameters instead of hard-coding, so changes to the DGP definition propagate automatically. P3: Add tests/data/README.md documenting the HRS fixture source, sample selection steps, and expected counts for future audit/rebuild. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P2: Replace loose max(10%*estimate, 200) tolerance with 0.1*SE (10% of one published standard error). Our actual diffs are all < 0.03 SE, so this catches real drift while absorbing minor sample differences. ATT(9,10)=90 now accepts [26, 154] instead of [-110, 290]. P3: Extract Compustat DGP into tests/edid_dgp.py as the single source of truth. Both test_efficient_did.py and test_efficient_did_validation.py import from it. Truth values (ES_avg, overall_att) are derived programmatically from the shared DGP parameters. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. This re-review addresses the prior review’s substantive items: the paper-match tolerances are now SE-based, and the Compustat DGP is shared instead of duplicated. I only have one remaining P3 maintainability note. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P3: Move edid_dgp.py into tests/helpers/ and add the helpers directory to sys.path via conftest.py instead of the global pythonpath = ["tests"] setting. This avoids making every module under tests/ importable as a top-level module. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. The prior import-path issue has been addressed by narrowing the pytest-time path mutation to Tech Debt No findings. Security No findings. Documentation/Tests
|
P3: Replace reference to non-committed plan file with the exact Python command sequence used to build hrs_edid_validation.csv. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code QualityNo findings. PerformanceNo findings. MaintainabilityNo findings. The helper extraction is clean, and the pytest import-path change stays narrowly scoped to Tech DebtNo findings. This diff does not introduce any new deferred-correctness item that should be tracked in SecurityNo findings. Documentation/TestsNo findings. The previous documentation gap is resolved: the fixture README now includes provenance, sample-selection rules, expected cohort counts, and a concrete regeneration recipe in tests/data/README.md:5, tests/data/README.md:19, tests/data/README.md:35. |
Summary
tests/data/hrs_edid_validation.csvreplication_data/to.gitignoreHRS Replication Results (vs Table 6)
All differences are < 0.03 standard errors. The CS cross-validation matches to < $1 on all ATT(g,t).
Methodology references
Validation
tests/test_efficient_did_validation.py(7 HRS replication + 5 MC simulation tests)tests/data/hrs_edid_validation.csv(656 individuals × 4 waves from Dobkin et al. 2018)-m '')Security / privacy
Generated with Claude Code
🤖 Generated with Claude Code