Skip to content

SyntheticControl: leave-one-out + in-time placebo (ADH 2015 §4)#514

Merged
igerber merged 2 commits into
mainfrom
feature/synthetic-control-loo-in-time
Jun 1, 2026
Merged

SyntheticControl: leave-one-out + in-time placebo (ADH 2015 §4)#514
igerber merged 2 commits into
mainfrom
feature/synthetic-control-loo-in-time

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jun 1, 2026

Summary

  • Adds the two ADH-2015 §4 robustness diagnostics to the classic SyntheticControl estimator (the agreed ship-blockers before launch), as opt-in SyntheticControlResults methods that re-run the validated solver and leave the no-analytical-inference contract intact (se/t_stat/p_value/conf_int/is_significant stay bound to the NaN p_value).
  • leave_one_out() — drops each reportably-weighted donor (the >1e-6 support, frozen on the fit snapshot so it is immune to post-fit donor_weights mutation) and re-fits the treated unit; returns a baseline + per-drop ATT/delta_att table; the reporting headline is the baseline-relative max_abs_delta_att. get_leave_one_out_df() / get_leave_one_out_gaps() accessors.
  • in_time_placebo() — reassigns the intervention to an earlier pre-date and measures the placebo effect over the held-out window (~0 if no real pre-period effect). TRUNCATE windowing re-cuts predictor specs to the pre-fake window (custom_v subset in lockstep, raveled for array-like inputs), excludes the true post-periods entirely (no peeking), requires ≥2 pre-fake periods. Sweeps feasible dates by default; explicit dates validated/de-duplicated/canonicalized; empty input raises. Statuses: ran / infeasible / failed / mixed all_dates_unusable with n_failed/n_infeasible. get_in_time_placebo_df() / get_in_time_placebo_gaps() accessors.
  • Both fail closed (non-converged treated fit, too-few donors/pre-periods, all-failed refits). Wired into DiagnosticReport (_scm_native opt-in blocks with machine-readable reason_code), BusinessReport, and practitioner_next_steps.

Methodology references (required if estimator / math changes)

  • Method name(s): SyntheticControl — Abadie, Diamond & Hainmueller (2015) §4 robustness diagnostics (leave-one-out donor robustness; in-time / backdating placebo).
  • Paper / source link(s): Abadie, Diamond & Hainmueller (2015), AJPS 59(2), 495–510 (doi:10.1111/ajps.12116). On-file review: docs/methodology/papers/abadie-diamond-hainmueller-2015-review.md. Documented in docs/methodology/REGISTRY.md §SyntheticControl.
  • Any intentional deviations from the source (and why): (1) TRUNCATE in-time windowing for absolute-period predictor specs (vs literal relative-time relabeling) — R Synth has no in-time function, so truncate = a manual dataprep+synth re-run for outcome-predictor fits; documented **Note:**. (2) Leave-one-out drops the reportable (>1e-6) support rather than every strictly-positive weight (sub-floor = numerical dust, ~0 delta_att); documented **Note:**. (3) In-time placebo requires ≥2 pre-fake periods (stricter than the base T0≥1, an auto-swept single-pre-fake date is non-credible); documented **Note:**. Validation: deterministic self-consistency (each diagnostic == a fresh fit on the equivalent sub-problem, to 1e-7) + an R Synth drop-donor LOO golden on Basque; the custom-V solver's existing Basque R-parity transitively anchors both. Remaining ADH-2015 items (CV V-selection, W^reg extrapolation, sparse-SC) deferred in TODO.md.

Validation

  • Tests added/updated: tests/test_methodology_synthetic_control.py (LOO + in-time behavioral/edge/fail-closed/pickle/determinism/self-consistency + Tier-1 R LOO parity), tests/test_diagnostic_report.py, tests/test_business_report.py, tests/test_practitioner.py; R golden benchmarks/R/generate_synth_basque_golden.R + tests/data/synth_basque_golden.json (drop-donor LOO block). Local: 478 passed (pure-Python) + Rust/@slow tier green; black/ruff clean; mypy no new errors. Iterated /ai-review-local --backend codex to a clean verdict (9 rounds).
  • Backtest / simulation / notebook evidence (if applicable): N/A.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…5 §4)

Adds the two ADH-2015 §4 robustness diagnostics to the classic SyntheticControl
estimator — the agreed ship-blockers before launch — as opt-in results methods
that re-run the validated solver and leave the no-analytical-inference contract
intact (se/t_stat/p_value/conf_int/is_significant stay bound to the NaN p_value).

- leave_one_out(): drops each reportably-weighted donor (the >1e-6 support, frozen
  on the fit snapshot at fit time so it is immune to post-fit donor_weights mutation)
  and re-fits the treated unit; returns a baseline + per-drop ATT/delta_att table; the
  headline single-donor-dependence metric is the baseline-relative max_abs_delta_att.
- in_time_placebo(): reassigns the intervention to an earlier pre-date and measures
  the placebo effect over the held-out window (~0 if no real pre-period effect). TRUNCATE
  windowing re-cuts predictor specs to the pre-fake window (custom_v subset in lockstep,
  raveled to support array-like inputs), excludes the true post-periods entirely (no
  peeking), and requires >=2 pre-fake periods (documented Note). Sweeps feasible dates
  by default; explicit dates are validated, de-duplicated, and canonicalized; empty
  explicit input raises. Statuses distinguish ran / infeasible / failed and the mixed
  all_dates_unusable case with n_failed / n_infeasible counts.

Both fail closed (non-converged treated fit, too-few donors/pre-periods, all-failed
refits). Wired into DiagnosticReport (_scm_native opt-in blocks with machine-readable
reason_code), BusinessReport, and practitioner_next_steps. Validation: deterministic
self-consistency (each diagnostic == a fresh fit on the equivalent sub-problem, 1e-7)
plus an R Synth drop-donor LOO golden on Basque; the custom-V solver's existing Basque
R-parity transitively anchors both. R Synth has no in-time/LOO function (documented).
Remaining ADH-2015 items (CV V-selection, W_reg extrapolation, sparse-SC) deferred in
TODO.md. Docs: REGISTRY §SyntheticControl, REPORTING.md, api/synthetic_control.rst,
LLM guides, README, CHANGELOG.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Overall Assessment

✅ Looks good

Executive Summary

  • No P0/P1 issues found. The new SCM diagnostics are consistent with the library’s documented methodology contract; the ADH-2015 deviations I checked are explicitly recorded in REGISTRY/TODO.
  • DiagnosticReport / BusinessReport underreport mixed in_time_placebo() sweeps by dropping n_infeasible as soon as any placebo date succeeds.
  • Public docs still say leave-one-out drops “positively-weighted” donors even though the shipped rule is the documented reportable >1e-6 support.
  • Analytical inference remains correctly NaN-gated; I did not find a new inline t-stat/p-value anti-pattern in the changed SCM path.
  • I could not run pytest locally because the environment does not have it installed.

Methodology

  • Severity: P3. Impact: none for merge. ADH 2015 Section 4 describes the added diagnostics as re-estimating after dropping each donor with positive weight and re-estimating in-time placebos with the same validation technique and predictors lagged accordingly. The PR’s two implementation deviations I checked, TRUNCATE windowing and the >1e-6 reportable-weight floor, are explicitly documented in docs/methodology/REGISTRY.md:L1991-L2010, and the still-deferred ADH-2015 tail is tracked in TODO.md:L87. Concrete fix: none required. citeturn4view4turn3view0

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

Tech Debt

  • Severity: P3. Impact: none for this PR. The remaining ADH-2015 items (V cross-validation, W^reg, sparse-SC) are already tracked in TODO.md:L87. Concrete fix: none in this PR.

Security

  • No findings.

Documentation/Tests

…odex review)

P2: _scm_native's in_time_placebo "ran" block now surfaces n_ran / n_infeasible
(not just n_dates / n_failed), so a partially-usable sweep (some dates ran, some
infeasible) is not summarized as full coverage. Regression test added. P3: aligned
the remaining "positively-weighted" copy (docs/api/synthetic_control.rst, CHANGELOG,
llms-full) to the documented "reportably-weighted (>1e-6)" contract, and refreshed
the _check_estimator_native SCM summary to mention the leave_one_out / in_time_placebo
blocks.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 6c49acb0ebbf047b9b5ccc4afac5f024cfaade49


Overall Assessment

✅ Looks good

Executive Summary

Methodology

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings. The prior reporting-layer concern is addressed by explicit reason_code handling for infeasible/no-success runs and by preserving partial in-time-placebo coverage in SCM-native diagnostics. diff_diff/diagnostic_report.py:L2358-L2450

Tech Debt

  • Severity: P3. Impact: none for merge. The PR reduces the SCM ADH-2015 gap and leaves the remaining tail explicitly tracked rather than implicit. Concrete fix: none in this PR. TODO.md:L87-L87

Security

  • No findings.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label Jun 1, 2026
@igerber igerber merged commit 4d76ce3 into main Jun 1, 2026
52 of 71 checks passed
@igerber igerber deleted the feature/synthetic-control-loo-in-time branch June 1, 2026 12:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant