SyntheticControl: leave-one-out + in-time placebo (ADH 2015 §4) by igerber · Pull Request #514 · igerber/diff-diff

igerber · 2026-06-01T00:36:23Z

Summary

Adds the two ADH-2015 §4 robustness diagnostics to the classic SyntheticControl estimator (the agreed ship-blockers before launch), as opt-in SyntheticControlResults methods that re-run the validated solver and leave the no-analytical-inference contract intact (se/t_stat/p_value/conf_int/is_significant stay bound to the NaN p_value).
leave_one_out() — drops each reportably-weighted donor (the >1e-6 support, frozen on the fit snapshot so it is immune to post-fit donor_weights mutation) and re-fits the treated unit; returns a baseline + per-drop ATT/delta_att table; the reporting headline is the baseline-relative max_abs_delta_att. get_leave_one_out_df() / get_leave_one_out_gaps() accessors.
in_time_placebo() — reassigns the intervention to an earlier pre-date and measures the placebo effect over the held-out window (~0 if no real pre-period effect). TRUNCATE windowing re-cuts predictor specs to the pre-fake window (custom_v subset in lockstep, raveled for array-like inputs), excludes the true post-periods entirely (no peeking), requires ≥2 pre-fake periods. Sweeps feasible dates by default; explicit dates validated/de-duplicated/canonicalized; empty input raises. Statuses: ran / infeasible / failed / mixed all_dates_unusable with n_failed/n_infeasible. get_in_time_placebo_df() / get_in_time_placebo_gaps() accessors.
Both fail closed (non-converged treated fit, too-few donors/pre-periods, all-failed refits). Wired into DiagnosticReport (_scm_native opt-in blocks with machine-readable reason_code), BusinessReport, and practitioner_next_steps.

Methodology references (required if estimator / math changes)

Method name(s): SyntheticControl — Abadie, Diamond & Hainmueller (2015) §4 robustness diagnostics (leave-one-out donor robustness; in-time / backdating placebo).
Paper / source link(s): Abadie, Diamond & Hainmueller (2015), AJPS 59(2), 495–510 (doi:10.1111/ajps.12116). On-file review: docs/methodology/papers/abadie-diamond-hainmueller-2015-review.md. Documented in docs/methodology/REGISTRY.md §SyntheticControl.
Any intentional deviations from the source (and why): (1) TRUNCATE in-time windowing for absolute-period predictor specs (vs literal relative-time relabeling) — R Synth has no in-time function, so truncate = a manual dataprep+synth re-run for outcome-predictor fits; documented **Note:**. (2) Leave-one-out drops the reportable (>1e-6) support rather than every strictly-positive weight (sub-floor = numerical dust, ~0 delta_att); documented **Note:**. (3) In-time placebo requires ≥2 pre-fake periods (stricter than the base T0≥1, an auto-swept single-pre-fake date is non-credible); documented **Note:**. Validation: deterministic self-consistency (each diagnostic == a fresh fit on the equivalent sub-problem, to 1e-7) + an R Synth drop-donor LOO golden on Basque; the custom-V solver's existing Basque R-parity transitively anchors both. Remaining ADH-2015 items (CV V-selection, W^reg extrapolation, sparse-SC) deferred in TODO.md.

Validation

Tests added/updated: tests/test_methodology_synthetic_control.py (LOO + in-time behavioral/edge/fail-closed/pickle/determinism/self-consistency + Tier-1 R LOO parity), tests/test_diagnostic_report.py, tests/test_business_report.py, tests/test_practitioner.py; R golden benchmarks/R/generate_synth_basque_golden.R + tests/data/synth_basque_golden.json (drop-donor LOO block). Local: 478 passed (pure-Python) + Rust/@slow tier green; black/ruff clean; mypy no new errors. Iterated /ai-review-local --backend codex to a clean verdict (9 rounds).
Backtest / simulation / notebook evidence (if applicable): N/A.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…5 §4) Adds the two ADH-2015 §4 robustness diagnostics to the classic SyntheticControl estimator — the agreed ship-blockers before launch — as opt-in results methods that re-run the validated solver and leave the no-analytical-inference contract intact (se/t_stat/p_value/conf_int/is_significant stay bound to the NaN p_value). - leave_one_out(): drops each reportably-weighted donor (the >1e-6 support, frozen on the fit snapshot at fit time so it is immune to post-fit donor_weights mutation) and re-fits the treated unit; returns a baseline + per-drop ATT/delta_att table; the headline single-donor-dependence metric is the baseline-relative max_abs_delta_att. - in_time_placebo(): reassigns the intervention to an earlier pre-date and measures the placebo effect over the held-out window (~0 if no real pre-period effect). TRUNCATE windowing re-cuts predictor specs to the pre-fake window (custom_v subset in lockstep, raveled to support array-like inputs), excludes the true post-periods entirely (no peeking), and requires >=2 pre-fake periods (documented Note). Sweeps feasible dates by default; explicit dates are validated, de-duplicated, and canonicalized; empty explicit input raises. Statuses distinguish ran / infeasible / failed and the mixed all_dates_unusable case with n_failed / n_infeasible counts. Both fail closed (non-converged treated fit, too-few donors/pre-periods, all-failed refits). Wired into DiagnosticReport (_scm_native opt-in blocks with machine-readable reason_code), BusinessReport, and practitioner_next_steps. Validation: deterministic self-consistency (each diagnostic == a fresh fit on the equivalent sub-problem, 1e-7) plus an R Synth drop-donor LOO golden on Basque; the custom-V solver's existing Basque R-parity transitively anchors both. R Synth has no in-time/LOO function (documented). Remaining ADH-2015 items (CV V-selection, W_reg extrapolation, sparse-SC) deferred in TODO.md. Docs: REGISTRY §SyntheticControl, REPORTING.md, api/synthetic_control.rst, LLM guides, README, CHANGELOG. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T00:45:01Z

Overall Assessment

✅ Looks good

Executive Summary

No P0/P1 issues found. The new SCM diagnostics are consistent with the library’s documented methodology contract; the ADH-2015 deviations I checked are explicitly recorded in REGISTRY/TODO.
DiagnosticReport / BusinessReport underreport mixed in_time_placebo() sweeps by dropping n_infeasible as soon as any placebo date succeeds.
Public docs still say leave-one-out drops “positively-weighted” donors even though the shipped rule is the documented reportable >1e-6 support.
Analytical inference remains correctly NaN-gated; I did not find a new inline t-stat/p-value anti-pattern in the changed SCM path.
I could not run pytest locally because the environment does not have it installed.

Methodology

Severity: P3. Impact: none for merge. ADH 2015 Section 4 describes the added diagnostics as re-estimating after dropping each donor with positive weight and re-estimating in-time placebos with the same validation technique and predictors lagged accordingly. The PR’s two implementation deviations I checked, TRUNCATE windowing and the >1e-6 reportable-weight floor, are explicitly documented in docs/methodology/REGISTRY.md:L1991-L2010, and the still-deferred ADH-2015 tail is tracked in TODO.md:L87. Concrete fix: none required. citeturn4view4turn3view0

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P2. Impact: mixed-success in-time-placebo sweeps are summarized too optimistically. SyntheticControlResults.in_time_placebo() tracks both failed and infeasible dates in diff_diff/synthetic_control_results.py:L1192-L1296, but _scm_native() only exposes n_infeasible on terminal failure; once any date runs, the report becomes plain status="ran" with n_dates=len(itp) and no indication that part of the requested grid never ran in diff_diff/diagnostic_report.py:L2393-L2444. That can overstate robustness coverage in both DiagnosticReport and BusinessReport. Concrete fix: include n_infeasible on the status="ran" path as well, and ideally add n_ran or rename n_dates to n_requested; add regression coverage next to tests/test_diagnostic_report.py:L2105-L2112 and tests/test_business_report.py:L829-L839.

Tech Debt

Severity: P3. Impact: none for this PR. The remaining ADH-2015 items (V cross-validation, W^reg, sparse-SC) are already tracked in TODO.md:L87. Concrete fix: none in this PR.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: user-facing docs overstate the leave-one-out drop set. The implementation and REGISTRY intentionally use the reportable >1e-6 support in diff_diff/synthetic_control_results.py:L790-L805 and docs/methodology/REGISTRY.md:L2009, but public docs still say “positively-weighted donor” in docs/api/synthetic_control.rst:L28-L34 and CHANGELOG.md:L13. Concrete fix: change the public wording to “reportably-weighted donor (weight > 1e-6)” and update mirrored guide text; while there, the stale SCM summary in diff_diff/diagnostic_report.py:L2143-L2156 should also mention the new ADH-2015 blocks.
Verification note: I could not execute the tests locally because pytest is unavailable in this environment (python -m pytest reports No module named pytest).

…odex review) P2: _scm_native's in_time_placebo "ran" block now surfaces n_ran / n_infeasible (not just n_dates / n_failed), so a partially-usable sweep (some dates ran, some infeasible) is not summarized as full coverage. Regression test added. P3: aligned the remaining "positively-weighted" copy (docs/api/synthetic_control.rst, CHANGELOG, llms-full) to the documented "reportably-weighted (>1e-6)" contract, and refreshed the _check_estimator_native SCM summary to mention the leave_one_out / in_time_placebo blocks. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-01T00:53:54Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6c49acb0ebbf047b9b5ccc4afac5f024cfaade49

Overall Assessment

✅ Looks good

Executive Summary

Previous re-review items are resolved. _scm_native() now reports partial in-time-placebo coverage via n_dates/n_ran/n_failed/n_infeasible, and BusinessReport now lifts both ADH-2015 diagnostic blocks into the robustness payload. diff_diff/diagnostic_report.py:L2395-L2448 diff_diff/business_report.py:L1018-L1024
The public SCM docs now match the documented methodology note for leave-one-out by using the reportable-weight rule instead of “positively-weighted donor.” docs/api/synthetic_control.rst:L28-L35 docs/methodology/REGISTRY.md:L2007-L2010
I did not find a new undocumented methodology mismatch in the implemented ADH-2015 diagnostics. LOO freezes fit-time donor support, in-time placebo truncates predictor windows without post-treatment peeking, and both preserve the no-analytical-inference contract. diff_diff/synthetic_control.py:L526-L528 diff_diff/synthetic_control.py:L1324-L1388 diff_diff/synthetic_control_results.py:L786-L974 diff_diff/synthetic_control_results.py:L1043-L1296
The added tests cover the main edge cases for this feature: malformed/empty placebo grids, mixed failed/infeasible sweeps, zero-mass truncated custom_v, mutation-proof donor support, and all-refits-failed status handling. tests/test_methodology_synthetic_control.py:L1773-L1805 tests/test_methodology_synthetic_control.py:L1840-L1986 tests/test_methodology_synthetic_control.py:L2149-L2206
I could not run pytest locally because this environment does not have it installed (python -m pytest -> No module named pytest).

Methodology

Severity: P3. Impact: none for merge. Cross-checking against the ADH-2015 review, the new leave-one-out and in-time-placebo paths match the documented library contract. The only ADH-2015 gaps I see are the already tracked deferred tail: CV-based V selection, W^reg extrapolation diagnostics, and sparse-SC. Concrete fix: none required in this PR; keep the existing REGISTRY/TODO tracking. docs/methodology/papers/abadie-diamond-hainmueller-2015-review.md:L41-L57 docs/methodology/REGISTRY.md:L1991-L2010 TODO.md:L87-L87
No P0/P1 findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings. The prior reporting-layer concern is addressed by explicit reason_code handling for infeasible/no-success runs and by preserving partial in-time-placebo coverage in SCM-native diagnostics. diff_diff/diagnostic_report.py:L2358-L2450

Tech Debt

Severity: P3. Impact: none for merge. The PR reduces the SCM ADH-2015 gap and leaves the remaining tail explicitly tracked rather than implicit. Concrete fix: none in this PR. TODO.md:L87-L87

Security

No findings.

Documentation/Tests

No findings. The public docs are now consistent with the REGISTRY note, and the reporting/test updates cover the previously identified re-review issues. Concrete fix: none required; verify in CI since I could not execute the tests locally. docs/api/synthetic_control.rst:L28-L35 tests/test_diagnostic_report.py:L2078-L2112 tests/test_business_report.py:L828-L839

igerber added the ready-for-ci Triggers CI test workflows label Jun 1, 2026

igerber merged commit 4d76ce3 into main Jun 1, 2026
52 of 71 checks passed

igerber deleted the feature/synthetic-control-loo-in-time branch June 1, 2026 12:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SyntheticControl: leave-one-out + in-time placebo (ADH 2015 §4)#514

SyntheticControl: leave-one-out + in-time placebo (ADH 2015 §4)#514
igerber merged 2 commits into
mainfrom
feature/synthetic-control-loo-in-time

igerber commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Jun 1, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

github-actions Bot commented Jun 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant