diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7bd213d8..789da6f4 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ### Added
 - **PowerAnalysis methodology-review-tracker promotion: In Progress → Complete, with a panel-variance correction (behavior change).** Closes the Bloom (1995) + Burlig, Preonas & Woerman (2020) source audits on the tracker (PR-A #506 added both paper reviews + under-review Notes; this PR validates the source against the code and reconciles the discrepancies). **Behavior change:** the analytical *panel* DiD variance was the Moulton design-effect factor `(1+(T−1)·rho)/T`, wrong two ways versus the source — wrong period-scaling (~4× too small at `rho=0`, `m=r=5` versus the iid DiD benchmark) and the **opposite `rho`-sign** (it *raised* the MDE as within-unit correlation grew). It is replaced by the within-unit equicorrelated special case of Burlig et al. Eq. 2, `Var(ATT) = sigma² · (1/n_T + 1/n_C) · (1/n_pre + 1/n_post) · (1 − rho)`, in which within-unit (serial) correlation *lowers* the MDE because the difference-in-differences cancels the shared within-unit component. So `PowerAnalysis.mde` / `power` / `sample_size` (and the `compute_*` wrappers) now return a **smaller** MDE / required N as `rho` rises for **all** designs; the 2×2 path matches Bloom's `2σ²` at the default `rho = 0` and is continuous with the panel form at `n_pre = n_post = 1`. New input validation, enforced for **all** designs *before* the 2×2-vs-panel router: `n_pre >= 1`, `n_post >= 1`, `rho ∈ [−1/(T−1), 1)` (`T = n_pre + n_post`), finite `sigma >= 0`, positive group counts, and `treat_frac ∈ (0, 1)` now raise `ValueError` (previously invalid two-period shapes and out-of-range `rho` fell through to `basic_did` silently). The `(1 − rho)` factor applies at `T = 2` too — the 2×2 path is Burlig's `m = r = 1` special case (footnote 11), so a nonzero `rho` is no longer silently ignored there, while `rho = 0` still recovers Bloom's `2σ²`. The MDE multiplier stays the **normal (z)** Bloom multiplier (a deliberate large-sample approximation to Burlig's t, documented as `**Deviation from R:**`) — unchanged. New `tests/test_methodology_power.py` (Bloom Table 1 multipliers; 2×2 + panel closed forms; a literal-equicorrelated Monte-Carlo validation of the panel variance; `sample_size`↔`mde` round-trip; input-guard + `rho`-at-`T=2` + `compute_*` wrapper validation; base-R `qnorm` parity at `benchmarks/data/r_power_golden.json`, generator `benchmarks/R/generate_power_golden.R`); the two `tests/test_power.py` ICC-direction tests were inverted to Burlig's sign. REGISTRY `## PowerAnalysis` equation block rewritten (z not t; corrected 2×2 / panel SE + sample-size; removed the cluster-`m` and inverted-`R²` terms that matched neither code nor source); `docs/references.rst` adds Frison & Pocock (1992) + McKenzie (2012) as the equicorrelated lineage; tutorial `06_power_analysis.ipynb` corrected. `METHODOLOGY_REVIEW.md` row promoted to **Complete** (`Last Review = 2026-05-31`); priority queue pruned; the PR-A under-review Notes removed across REGISTRY / `power.py` / `references.rst`.
 - **`WooldridgeDiD` outcome-fit hint:** `WooldridgeDiD(method="ols")` now emits a `UserWarning` when the outcome is binary (`{0, 1}`) or a non-negative integer count, noting that a matching nonlinear model (`method="logit"` / `method="poisson"`) is often the **more appropriate specification** for such outcomes. Following Wooldridge (2023): the nonlinear paths impose parallel trends on the link/index scale rather than in levels (level-PT is only valid for continuous/unbounded outcomes), and the paper's Section 5 simulations show the linear model both biased and less precise where the nonlinear mean holds. It is a **different identifying assumption** than linear OLS — which one fits depends on which parallel-trends restriction holds — so the warning frames it as a recommended comparison, not an automatic switch or free efficiency upgrade. OLS remains a valid QMLE for *any* response (Table 1). Always-on (suppress via `warnings.filterwarnings`); detection is high-signal (binary requires exactly `{0, 1}`; the count branch suggests Poisson — the natural unbounded-count model — for *any* non-negative integers with >2 distinct values, so bounded binomial / known-upper-bound integer outcomes are not separately distinguished from unbounded counts; fractional / continuous outcomes are not flagged).
+- **`SyntheticControl` leave-one-out + in-time placebo robustness diagnostics (ADH 2015 §4).** Two opt-in `SyntheticControlResults` methods, each a thin re-run of the validated solver (analytical `se`/`t_stat`/`p_value`/`conf_int`/`is_significant` stay bound to the NaN analytical `p_value`). **`leave_one_out()`** drops each reportably-weighted donor (weight above the 1e-6 floor — the donors in `donor_weights`) in turn and re-fits the treated unit against the reduced pool, returning a per-drop ATT / `delta_att` table (a `status="baseline"` row first, then one row per dropped donor sorted by `|delta_att|`; non-converged refits → `status="failed"` with NaN metrics); a large `delta_att` flags single-donor dependence (a single *dominant* donor is still dropped — the others absorb its mass — and its large `delta_att` is the intended signal). **`in_time_placebo()`** reassigns the intervention to an earlier pre-date `t_f`, re-fits using only pre-`t_f` information, and reports the placebo "effect" over the held-out window `[t_f, T0)` — ~0 when there is no real pre-period effect (ADH 2015 Fig. 4). It sweeps every feasible interior pre-date by default (≥2 pre-fake + ≥1 post-fake); an explicit post-period / non-pre date raises, a dimensionally-infeasible valid date yields a `status="infeasible"` row. **Windowing = TRUNCATE** (documented `**Note:**` in REGISTRY): predictor specs are re-cut to the pre-`t_f` window (pre-period-outcome predictors become the pre-`t_f` outcomes; covariate/special windows are intersected), a window lying entirely in the held-out region is **dropped** (surfaced in `n_dropped_specs` + an aggregated warning) and `custom_v` is subset in lockstep with the surviving specs; the true post-treatment periods are excluded from the placebo fit entirely (no peeking). Both fail closed on a non-converged treated fit (and `leave_one_out` on `<2` donors). New accessors `get_leave_one_out_df()` / `get_in_time_placebo_df()` (survive pickling) and long-form `get_leave_one_out_gaps()` / `get_in_time_placebo_gaps()` for the overlay/backdating plots (panel-derived, dropped on pickle). **Validation:** R `Synth` has no in-time/LOO function (verified against its full CRAN function index), so — beyond the solver's existing Basque R parity — the diagnostics are anchored by deterministic self-consistency tests proving each equals a from-scratch `synthetic_control()` fit on the equivalent sub-problem (reduced donor pool / backdated panel) to 1e-7. **Reporting-stack integration:** `_scm_native` surfaces opt-in `leave_one_out` + `in_time_placebo` blocks (`status="not_run"` stub until run), `BusinessReport` lifts them into the SCM native robustness block, and `practitioner_next_steps` emits both as steps (non-`STEPS` tags so a caller's `completed_steps` cannot suppress them). The remaining ADH-2015 items (CV `V`-selection, `W^reg` extrapolation diagnostic, sparse-SC) are tracked in `TODO.md`. Documented in `docs/methodology/REGISTRY.md` §SyntheticControl, `docs/methodology/REPORTING.md`, `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
 - **New tutorial: `docs/tutorials/24_staggered_vs_collapsed_power.ipynb` — "Staggered Rollout or a Simple 2×2? A Power-Analysis Decision Guide".** A practitioner walkthrough for geo experiments (framed on a 50-state staggered rollout) on when to reach for Callaway-Sant'Anna vs collapsing to a familiar pre/post 2×2. Shows, with live paired Monte Carlo on `generate_staggered_data`, that the collapsed 2×2 silently targets a *diluted* estimand (reports ~60–94% of the true effect-on-treated as the rollout staggers, with near-zero CI coverage of the truth under a slow rollout), and that CS's minimum-detectable-lift penalty is a *fast-rollout* phenomenon that shrinks to parity as the rollout becomes more staggered. Fully self-contained (runs live, no committed data files); ends with a CS-vs-2×2 decision guide.
 - **`SyntheticControl` in-space placebo permutation inference + reporting-stack integration (ADH 2010 §2.4).** New `SyntheticControlResults.in_space_placebo()` provides the significance test classic SCM lacks an analytical SE for: it reassigns treatment to each donor, refits a synthetic control for that pseudo-treated donor against the **other `J−1` donors** (the real treated unit is excluded from every placebo pool — its post-period is treatment-contaminated; matches `SCtools::generate.placebos`), and ranks the treated unit's post/pre **RMSPE ratio** among the `J+1` units. New fields `placebo_p_value` (`= rank/(n_placebos+1)`, an upper-tail rank test on the unsigned RMSPE ratio — direction-agnostic, so it detects an effect of *either* sign rather than a signed/one-directional hypothesis; ties counted via `≥`), `rmspe_ratio` (the treated statistic, set at fit), and `n_placebos`/`n_failed` (effective reference-set sizes; non-converged placebos are excluded from BOTH numerator and denominator, never penalized into the rank). `placebo_p_value` is a **separate field** from the (always-NaN) `p_value` — it is a permutation p-value with no SE/t-stat and does not flow through `safe_inference`; `is_significant` stays bound to `p_value`. Edge cases fail closed: scale-aware RMSPE-ratio floor (a perfect pre-fit gives a finite ratio, not `inf`), `J<2` → NaN+warn, `J==2` → degenerate+coarse warn, deterministic given `seed`. New `get_placebo_df()` returns the per-unit RMSPE-ratio summary table (incl. the treated row and any failed donors) used for the rank. The design keeps the placebo *compute* opt-in — the per-donor refit loop runs only on the explicit `in_space_placebo()` call. To support that opt-in call, every fit retains a `_SyntheticControlFitSnapshot` of the pivoted panel (memory O(units × periods × predictor-vars), like `SyntheticDiD`'s snapshot for `in_time_placebo`; excluded from pickling). A compact/lazy snapshot representation is tracked as a follow-up in `TODO.md`. **Reporting-stack integration:** `SyntheticControlResults` is now routed through `DiagnosticReport` (fit-based `scm_fit` parallel-trends analogue → verdict `design_enforced_pt` reading `pre_rmspe`; `_scm_native` surfaces `pre_rmspe` + donor-weight concentration + the placebo p-value when already computed — never triggering the refit loop implicitly), `practitioner_next_steps` (`_handle_synthetic_control` with the placebo as the headline significance step), and `BusinessReport` (fit-based assumption block, ADH 2010 attribution, robustness via `estimator_native_diagnostics`; HonestDiD passthrough rejected like SDiD/TROP). Also fixes a latent BR bug where the headline `is_significant` was a non-JSON-serializable numpy `bool_` when `p_value` is a numpy `NaN`. Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (new `**Note:**` labels for the donor-pool construction, failure handling, RMSPE-ratio floor, and the non-analytical-p-value split), `docs/methodology/REPORTING.md`, `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
 - **New estimator: `SyntheticControl` — classic Synthetic Control Method (Abadie, Diamond & Hainmueller 2010; Abadie & Gardeazabal 2003).** Standalone estimator (`diff_diff/synthetic_control.py`) + `SyntheticControlResults` (`diff_diff/synthetic_control_results.py`) + `synthetic_control()` convenience function, exported from `diff_diff`. Builds a single treated unit's counterfactual as a convex combination of never-treated donor units — **donor (unit) weights only**, no time weights or ridge, distinct from `SyntheticDiD`. The inner simplex-constrained weighted-LS solve `W*(V)` reuses `utils._sc_weight_fw` (folding `V^½` into the predictor matrix, `intercept=False`, `zeta=0`); the diagonal predictor-importance matrix `V` is selected data-driven by minimizing pre-period outcome MSPE (`v_method="nested"`, softmax-on-simplex multistart Nelder-Mead + Powell polish) or supplied by the user (`v_method="custom"`). Predictors are built from `predictors`/`predictor_window`/`predictors_op`, `special_predictors`, and per-period outcome lags (`pre_period_outcomes`), in the R `Synth::dataprep` row order; per-row standardization (SD over donors+treated, ddof=1) matches the R `Synth::synth` source. Reports the gap path (`α̂_1t = Y_1t − Σ_j w_j Y_jt`), `att` (mean post-period gap), `pre_rmspe`, donor weights, `v_weights`, and a predictor-balance table. **No analytical standard error** — `se`/`t_stat`/`p_value`/`conf_int` are NaN; significance comes from in-space placebo permutation inference via `in_space_placebo()` (see the dedicated entry below). Ten validation gates baked in: predictor-period leakage, absorbing post-period suffix + no-anticipation cross-check against the treatment column, post-period canonicalization, donor-pool filtering before period derivation, empty-window rejection, poor-pre-fit `UserWarning` (RMSPE > SD of treated pre-outcomes), duplicate-predictor-label rejection, inner-solve non-convergence warning, order-independent gap-path rebuild, and the `standardize="none"` deviation; plus fail-closed `custom_v` cross-field rules and degenerate single-donor / single-pre-period handling. **R-`Synth` parity** (`tests/test_methodology_synthetic_control.py`, fixtures generated by `benchmarks/R/generate_synth_basque_golden.R` into `tests/data/`): two-tier on the Basque Country study — Tier-1 feeds R's `solution.v` via `custom_v` and reproduces the published donor weights (region 10 Cataluña 0.851 + region 14 Madrid 0.149) to `atol=1e-3` deterministically; Tier-2 (`@pytest.mark.slow`) checks the data-driven nested fit lands in a tolerance band (the nested `V` legitimately differs because the outer objective uses all pre periods, not R's `time.optimize.ssr` window). Documented in `docs/methodology/REGISTRY.md` §SyntheticControl (with `**Deviation from R:** standardize="none"` and `**Note:**` labels for the standardization formula, objective window, softmax `V` parametrization, and 1×SD poor-fit threshold), `docs/api/synthetic_control.rst`, the LLM guides, and `README.md`.
diff --git a/README.md b/README.md
index 1e738591..8330f441 100644
--- a/README.md
+++ b/README.md
@@ -108,7 +108,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
 - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
 - [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC — `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=<col>`); `SurveyDesign.subpopulation()` preserves full-design `n_psu` / `df_survey` via zero-padded scores (Wave E.3, R `svyrecvar(subset())` form)
 - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
-- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html) - Abadie, Diamond & Hainmueller (2010) classic synthetic control for a single treated unit (donor-weight counterfactual, nested/custom V; in-space placebo permutation inference via `in_space_placebo()`)
+- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html) - Abadie, Diamond & Hainmueller (2010) classic synthetic control for a single treated unit (donor-weight counterfactual, nested/custom V; in-space placebo permutation inference via `in_space_placebo()`, plus ADH-2015 `leave_one_out()` + `in_time_placebo()` robustness)
 - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
 - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
 - [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html) - de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲ = 0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲ > 0`, continuous-near-d̲ or mass-point), with a multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release - repeated cross-sections rejected by the validator. Alias `HAD`.
diff --git a/TODO.md b/TODO.md
index 17fed4c2..a1b6f545 100644
--- a/TODO.md
+++ b/TODO.md
@@ -84,7 +84,7 @@ Deferred items from PR reviews that were not addressed before merge.
 | ImputationDiD dense `(A0'A0).toarray()` scales O((U+T+K)^2), OOM risk on large panels | `imputation.py` | #141 | Medium (deferred — only triggers when sparse solver fails) |
 | Multi-absorb weighted demeaning needs iterative alternating projections for N > 1 absorbed FE with survey weights; unweighted multi-absorb also uses single-pass (pre-existing, exact only for balanced panels) | `estimators.py` | #218 | Medium |
 | Survey design resolution/collapse patterns are inconsistent across panel estimators — ContinuousDiD rebuilds unit-level design in SE code, EfficientDiD builds once in fit(), StackedDiD re-resolves on stacked data; extract shared helpers for panel-to-unit collapse, post-filter re-resolution, and metadata recomputation | `continuous_did.py`, `efficient_did.py`, `stacked_did.py` | #226 | Low |
-| SyntheticControl: in-time placebo + leave-one-out donor-robustness diagnostics are not implemented (ADH 2015, not the ADH 2010 scope of the current estimator), so `_scm_native` surfaces only pre-fit + in-space placebo. The practitioner / DiagnosticReport / BusinessReport routing and the in-space placebo permutation layer landed in PR-2; this remaining row covers adding the two ADH-2015 diagnostics (and surfacing them under `estimator_native_diagnostics`) in a later 2015-sourced PR. | `synthetic_control.py`, `diagnostic_report.py` | ADH-2015 follow-up | Low |
+| SyntheticControl: the remaining ADH-2015 §4 items are out of scope of the leave-one-out + in-time-placebo PR — out-of-sample cross-validation `V`-selection (training/validation split), the regression-weight `W^reg = X_0'(X_0 X_0')^{-1} X_1` extrapolation diagnostic (flag implied OLS weights outside `[0,1]`), and sparse-SC subset search (`l < J`, holding `V` fixed). Leave-one-out (`leave_one_out()`) and the in-time placebo (`in_time_placebo()`) landed and are surfaced under `estimator_native_diagnostics`; these three are the deferred tail. | `synthetic_control.py`, `synthetic_control_results.py` | ADH-2015 follow-up | Low |
 | ContinuousDiD deferred CGBS 2024 extensions: (a) `covariates=` kwarg not implemented (matches R `contdid` v0.1.0); (b) discrete-treatment saturated regression deferred (integer-valued dose currently warned, not routed to per-level coefficients); (c) lowest-dose-as-control per CGBS 2024 Remark 3.1 (when `P(D=0) = 0`) not implemented — estimator requires never-treated controls. REGISTRY `## ContinuousDiD` → Implementation Checklist marks these as deferred `[ ]` items. | `diff_diff/continuous_did.py` | — | Low |
 | Survey-weighted Silverman bandwidth in EfficientDiD conditional Omega* — `_silverman_bandwidth()` uses unweighted mean/std for bandwidth selection; survey-weighted statistics would better reflect the population distribution but is a second-order refinement | `efficient_did_covariates.py` | — | Low |
 | TROP: extend Wave 4's `_setup_trop_data` helper to also cover the duplicated bootstrap resampling loop in `_bootstrap_variance` / `_bootstrap_variance_global` (~40 LoC dedup; mirrors the data-setup helper pattern with a `fit_callable` parameter for the per-draw refit step). | `trop_local.py`, `trop_global.py` | follow-up | Low |
diff --git a/benchmarks/R/generate_synth_basque_golden.R b/benchmarks/R/generate_synth_basque_golden.R
index 3f524ff1..8bd8473b 100644
--- a/benchmarks/R/generate_synth_basque_golden.R
+++ b/benchmarks/R/generate_synth_basque_golden.R
@@ -79,6 +79,37 @@ synthetic_path <- as.numeric(dp$Y0plot %*% so$solution.w)
 treated_path <- as.numeric(dp$Y1plot)
 years <- as.integer(rownames(dp$Y1plot))
 
+# --- Leave-one-out golden (ADH 2015 §4 donor robustness) ---------------------
+# Drop the highest-weight donor (region 10, Cataluna) and re-fit with the
+# ORIGINAL solution.v held fixed (custom.v), so the reduced-pool W-solve is
+# deterministic and directly comparable to SyntheticControlResults.leave_one_out()
+# on a v_method="custom" fit (which likewise reuses the original custom_v on the
+# donor pool minus the dropped unit — specs/V are unchanged, only the donors shrink).
+loo_drop <- 10L
+controls_loo <- controls[controls != loo_drop]
+invisible(capture.output({
+  dp_loo <- dataprep(
+    foo = basque,
+    predictors = predictors,
+    predictors.op = "mean",
+    time.predictors.prior = 1964:1969,
+    special.predictors = special,
+    dependent = "gdpcap",
+    unit.variable = "regionno",
+    unit.names.variable = "regionname",
+    time.variable = "year",
+    treatment.identifier = 17,
+    controls.identifier = controls_loo,
+    time.optimize.ssr = 1960:1969,
+    time.plot = 1955:1997
+  )
+  so_loo <- synth(dp_loo, custom.v = as.numeric(so$solution.v))
+}))
+w_loo <- as.numeric(so_loo$solution.w)
+synthetic_path_loo <- as.numeric(dp_loo$Y0plot %*% so_loo$solution.w)
+gap_loo <- as.numeric(dp_loo$Y1plot) - synthetic_path_loo
+att_loo <- mean(gap_loo[years >= 1970])  # mean post-period gap (treatment year 1970)
+
 golden <- list(
   config = list(
     treated_regionno = 17,
@@ -104,7 +135,13 @@ golden <- list(
   years = years,
   treated_path = treated_path,
   synthetic_path = synthetic_path,
-  gap = treated_path - synthetic_path
+  gap = treated_path - synthetic_path,
+  leave_one_out = list(
+    dropped_regionno = loo_drop,
+    solution_w = as.list(setNames(w_loo, colnames(dp_loo$X0))),
+    att = att_loo,
+    gap = gap_loo
+  )
 )
 
 dir.create("tests/data", showWarnings = FALSE, recursive = TRUE)
diff --git a/diff_diff/business_report.py b/diff_diff/business_report.py
index f6edb275..b3160a6e 100644
--- a/diff_diff/business_report.py
+++ b/diff_diff/business_report.py
@@ -1019,6 +1019,9 @@ def _lift_robustness(dr: Optional[Dict[str, Any]]) -> Dict[str, Any]:
         native_block["pre_rmspe"] = native.get("pre_rmspe")
         native_block["weight_concentration"] = native.get("weight_concentration")
         native_block["in_space_placebo"] = native.get("in_space_placebo")
+        # ADH-2015 robustness diagnostics (opt-in; "not_run" stub until run).
+        native_block["leave_one_out"] = native.get("leave_one_out")
+        native_block["in_time_placebo"] = native.get("in_time_placebo")
     return {
         "bacon": {
             "status": bacon.get("status"),
diff --git a/diff_diff/diagnostic_report.py b/diff_diff/diagnostic_report.py
index f474e91b..fd65ea61 100644
--- a/diff_diff/diagnostic_report.py
+++ b/diff_diff/diagnostic_report.py
@@ -2152,8 +2152,10 @@ def _check_estimator_native(self) -> Dict[str, Any]:
         selected ``lambda_*``).
 
         SyntheticControl: pre-treatment fit (``pre_rmspe``), donor-weight
-        concentration, and — when already computed — the in-space placebo
-        permutation p-value (``in_space_placebo``).
+        concentration, and — each surfaced only when already computed — the
+        in-space placebo permutation p-value (``in_space_placebo``), the ADH-2015
+        leave-one-out donor robustness (``leave_one_out``), and the in-time
+        backdating placebo (``in_time_placebo``).
         """
         r = self._results
         name = type(r).__name__
@@ -2251,8 +2253,11 @@ def _scm_native(self, r: Any) -> Dict[str, Any]:
         DR never triggers it implicitly, because it refits one synthetic control
         per donor (potentially many nested V searches) and the placebo layer is
         opt-in by design. (This differs from SDiD's cheaper in-time-placebo sweep,
-        which ``_sdid_native`` runs inline.) Only the in-space placebo is exposed;
-        in-time placebo and leave-one-out are ADH 2015 (not implemented).
+        which ``_sdid_native`` runs inline.) The ADH-2015 §4 diagnostics —
+        leave-one-out donor robustness (``leave_one_out()``) and the in-time
+        (backdating) placebo (``in_time_placebo()``) — are surfaced the same way:
+        opt-in, reported only once the user has run them (each refits the synthetic
+        control one or more times), else a ``status="not_run"`` stub.
         """
         out: Dict[str, Any] = {"status": "ran", "estimator": "SyntheticControl"}
         out["pre_rmspe"] = _to_python_float(getattr(r, "pre_rmspe", None))
@@ -2327,6 +2332,132 @@ def _scm_native(self, r: Any) -> Dict[str, Any]:
                     "per donor)."
                 ),
             }
+
+        # Leave-one-out donor robustness (ADH 2015 §4): opt-in, surfaced once run.
+        if getattr(r, "_loo_df", None) is not None:
+            loo_status = getattr(r, "_loo_status", None)
+            if loo_status == "ran":
+                att_range = getattr(r, "_loo_att_range", None)
+                out["leave_one_out"] = {
+                    "status": "ran",
+                    # Headline single-donor-dependence metric: the largest baseline-
+                    # relative swing (max |delta_att|). Preferred over att_range, which
+                    # can look narrow even when every drop shifts the ATT far from the
+                    # full-fit baseline in the same direction.
+                    "max_abs_delta_att": _to_python_float(
+                        getattr(r, "_loo_max_abs_delta_att", None)
+                    ),
+                    "att_range": (
+                        [_to_python_float(att_range[0]), _to_python_float(att_range[1])]
+                        if att_range is not None
+                        else None
+                    ),
+                    "n_failed": _to_python_scalar(getattr(r, "_loo_n_failed", None)),
+                }
+            else:
+                _loo_reasons = {
+                    "treated_fit_nonconverged": (
+                        "leave_one_out() was run but the treated unit's own SCM fit "
+                        "did not converge at fit time, so the baseline ATT is not a "
+                        "valid reference for the leave-one-out deltas."
+                    ),
+                    "too_few_donors": (
+                        "leave_one_out() was run but fewer than 2 donors are available "
+                        "(dropping one must leave a non-empty pool)."
+                    ),
+                    "all_refits_failed": (
+                        "leave_one_out() was run but every donor-drop refit failed to "
+                        "converge, so no valid leave-one-out estimate was produced "
+                        "(see the status='failed' rows); raise n_starts or loosen the "
+                        "optimizer tolerances."
+                    ),
+                }
+                out["leave_one_out"] = {
+                    "status": "infeasible",
+                    # Machine-readable code so consumers can distinguish a numerical
+                    # convergence failure ("all_refits_failed") from structural
+                    # infeasibility ("too_few_donors") without parsing `reason`.
+                    "reason_code": loo_status,
+                    "reason": _loo_reasons.get(
+                        loo_status, "leave_one_out() produced no valid refits."
+                    ),
+                }
+        else:
+            out["leave_one_out"] = {
+                "status": "not_run",
+                "reason": (
+                    "Call results.leave_one_out() to run leave-one-out donor "
+                    "robustness (opt-in; refits once per reportably-weighted donor)."
+                ),
+            }
+
+        # In-time (backdating) placebo (ADH 2015 §4): opt-in, surfaced once run.
+        if getattr(r, "_in_time_df", None) is not None:
+            in_time_status = getattr(r, "_in_time_status", None)
+            if in_time_status == "ran":
+                itp = r._in_time_df
+                ran = itp[itp["status"] == "ran"] if "status" in itp else itp
+                max_abs_att = float(ran["placebo_att"].abs().max()) if len(ran) else None
+                out["in_time_placebo"] = {
+                    "status": "ran",
+                    # Full coverage breakdown so a partially-usable sweep is not
+                    # overstated: n_dates is the requested grid; n_ran are the usable
+                    # placebos; n_failed / n_infeasible are the dropped remainder.
+                    "n_dates": _to_python_scalar(int(len(itp))),
+                    "n_ran": _to_python_scalar(int(len(ran))),
+                    "n_failed": _to_python_scalar(getattr(r, "_in_time_n_failed", None)),
+                    "n_infeasible": _to_python_scalar(getattr(r, "_in_time_n_infeasible", None)),
+                    "max_abs_placebo_att": _to_python_float(max_abs_att),
+                }
+            else:
+                _it_reasons = {
+                    "treated_fit_nonconverged": (
+                        "in_time_placebo() was run but the treated unit's own SCM fit "
+                        "did not converge at fit time."
+                    ),
+                    "too_few_pre_periods": (
+                        "in_time_placebo() was run but there are too few pre-treatment "
+                        "periods for any feasible placebo date (need >=3)."
+                    ),
+                    "all_dates_infeasible": (
+                        "in_time_placebo() was run but every placebo date was "
+                        "infeasible (no pre-fake period, all predictors dropped, or "
+                        "the supplied custom_v had zero mass on the surviving "
+                        "predictors after truncation)."
+                    ),
+                    "all_dates_failed": (
+                        "in_time_placebo() was run but every placebo refit failed to "
+                        "converge (none was dimensionally infeasible); raise n_starts "
+                        "or loosen the optimizer tolerances."
+                    ),
+                    "all_dates_unusable": (
+                        "in_time_placebo() was run but no placebo date produced a usable "
+                        "result: some refits failed to converge AND some dates were "
+                        "dimensionally infeasible (see n_failed / n_infeasible)."
+                    ),
+                }
+                out["in_time_placebo"] = {
+                    "status": "infeasible",
+                    # Machine-readable code distinguishing a numerical convergence
+                    # failure ("all_dates_failed") from structural infeasibility
+                    # ("all_dates_infeasible" / "too_few_pre_periods") or a mix
+                    # ("all_dates_unusable"), without parsing `reason`. The n_failed /
+                    # n_infeasible counts give the exact breakdown.
+                    "reason_code": in_time_status,
+                    "n_failed": _to_python_scalar(getattr(r, "_in_time_n_failed", None)),
+                    "n_infeasible": _to_python_scalar(getattr(r, "_in_time_n_infeasible", None)),
+                    "reason": _it_reasons.get(
+                        in_time_status, "in_time_placebo() produced no valid refits."
+                    ),
+                }
+        else:
+            out["in_time_placebo"] = {
+                "status": "not_run",
+                "reason": (
+                    "Call results.in_time_placebo() to run the in-time (backdating) "
+                    "placebo (opt-in; refits per backdated date)."
+                ),
+            }
         return out
 
     # -- Heterogeneity helpers --------------------------------------------
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index c0e2caf9..ff1b1ad2 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -616,7 +616,7 @@ scm.fit(
 ) -> SyntheticControlResults
 ```
 
-**Inference:** NONE analytical — `se`/`t_stat`/`p_value`/`conf_int` are always NaN. `att` is the mean post-period gap. Significance via in-space placebo permutation inference: `results.in_space_placebo()` reassigns treatment to each donor, refits against the other J-1 donors (the real treated unit is excluded from every placebo pool), and sets `placebo_p_value = rank/(n_placebos+1)` from the post/pre RMSPE-ratio. The permutation `placebo_p_value` is a SEPARATE field from the (NaN) `p_value`; `is_significant` stays bound to `p_value`. Predictor periods must lie within the pre window; `post_periods` must be a contiguous suffix cross-checked against `D` (no anticipation).
+**Inference:** NONE analytical — `se`/`t_stat`/`p_value`/`conf_int` are always NaN. `att` is the mean post-period gap. Significance via in-space placebo permutation inference: `results.in_space_placebo()` reassigns treatment to each donor, refits against the other J-1 donors (the real treated unit is excluded from every placebo pool), and sets `placebo_p_value = rank/(n_placebos+1)` from the post/pre RMSPE-ratio. The permutation `placebo_p_value` is a SEPARATE field from the (NaN) `p_value`; `is_significant` stays bound to `p_value`. **ADH-2015 §4 robustness (opt-in, analytical inference unchanged):** `results.leave_one_out()` drops each reportably-weighted donor (weight > 1e-6) and re-fits (per-drop ATT/`delta_att` — large `delta_att` ⇒ single-donor dependence); `results.in_time_placebo()` backdates the intervention and checks for a spurious pre-period gap (TRUNCATE windowing — predictor windows in the held-out region are dropped). Predictor periods must lie within the pre window; `post_periods` must be a contiguous suffix cross-checked against `D` (no anticipation).
 
 **Usage:**
 
@@ -1298,7 +1298,7 @@ Returned by `SyntheticControl.fit()`.
 | `pre_periods`, `post_periods` | `list` | Calendar-sorted periods |
 | `v_method`, `standardize` | `str` | Echoed configuration |
 
-**Methods:** `in_space_placebo()` (opt-in permutation inference; refits one synthetic control per donor), `get_placebo_df()` (per-unit RMSPE-ratio table incl. the treated row), `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`, `get_gap_df()`, `get_weights_df()`
+**Methods:** `in_space_placebo()` (opt-in permutation inference; refits one synthetic control per donor), `get_placebo_df()` (per-unit RMSPE-ratio table incl. the treated row), `leave_one_out()` (ADH-2015 §4 donor robustness; drops each reportably-weighted donor (weight > 1e-6) → per-drop ATT/`delta_att` table) + `get_leave_one_out_df()`/`get_leave_one_out_gaps()`, `in_time_placebo()` (ADH-2015 §4 backdating placebo; reassigns the intervention earlier, TRUNCATE windowing, placebo ATT ~0 if no real pre-effect) + `get_in_time_placebo_df()`/`get_in_time_placebo_gaps()`, `summary()`, `print_summary()`, `to_dict()`, `to_dataframe()`, `get_gap_df()`, `get_weights_df()`
 
 ### TripleDifferenceResults
 
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index 5261b046..7fb2922b 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -60,7 +60,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
 - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance
 - [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=<col>` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2 cross-sectional `conley_lag_cutoff=0`) extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC (Newey-West 1987 separable form; `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=<col>`), both composed with the Wave D Gardner GMM correction; `SurveyDesign.subpopulation()` preserves full-design `n_psu` / `df_survey` via zero-padded scores at the meat-helper boundary (Wave E.3, R `svyrecvar(subset())` form) (replicate weights queued as follow-up)
 - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units
-- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html): Abadie, Diamond & Hainmueller (2010) classic synthetic control for ONE treated unit — donor-weight counterfactual, nested or custom predictor-importance V, gap path + pre-RMSPE; no analytical SE (inference fields NaN), significance via in-space placebo permutation inference (`in_space_placebo()`, post/pre RMSPE-ratio, p = rank/(n_placebos+1))
+- [SyntheticControl](https://diff-diff.readthedocs.io/en/stable/api/synthetic_control.html): Abadie, Diamond & Hainmueller (2010) classic synthetic control for ONE treated unit — donor-weight counterfactual, nested or custom predictor-importance V, gap path + pre-RMSPE; no analytical SE (inference fields NaN), significance via in-space placebo permutation inference (`in_space_placebo()`, post/pre RMSPE-ratio, p = rank/(n_placebos+1)); ADH-2015 §4 robustness: `leave_one_out()` donor-robustness + `in_time_placebo()` backdating placebo
 - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
 - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
 - [HeterogeneousAdoptionDiD](https://diff-diff.readthedocs.io/en/stable/api/had.html): de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) for designs where **no unit remains untreated**; local-linear estimator at the dose support boundary returning Weighted Average Slope (WAS) on Design 1' (`d̲=0` / QUG) or `WAS_{d̲}` on Design 1 (`d̲>0`, continuous-near-d̲ or mass-point), with multi-period event-study extension (last-treatment cohort, pointwise CIs). **Panel-only** in this release (repeated cross-sections rejected by the validator). Alias `HAD`.
diff --git a/diff_diff/practitioner.py b/diff_diff/practitioner.py
index b0d6811f..3a636e3c 100644
--- a/diff_diff/practitioner.py
+++ b/diff_diff/practitioner.py
@@ -718,6 +718,42 @@ def _handle_synthetic_control(results: Any):
             priority="medium",
             step_name="estimator_selection",
         ),
+        _step(
+            baker_step=6,
+            label="Leave-one-out donor robustness (ADH 2015)",
+            why=(
+                "Re-fit dropping each reportably-weighted donor (weight above the 1e-6 "
+                "floor) in turn to confirm the "
+                "estimate is not driven by a single donor (Abadie-Diamond-Hainmueller "
+                "2015, Section 4); a large delta_att when one donor is removed flags "
+                "single-donor dependence."
+            ),
+            code=(
+                "loo_df = results.leave_one_out()\n"
+                "print(loo_df)  # baseline + per-dropped-donor ATT and delta_att"
+            ),
+            priority="medium",
+            # Not a standard STEPS tag, so a caller's completed_steps (validated
+            # against STEPS) can never auto-suppress this opt-in recommendation.
+            step_name="loo_jackknife",
+        ),
+        _step(
+            baker_step=6,
+            label="In-time (backdating) placebo (ADH 2015)",
+            why=(
+                "Reassign the intervention to an earlier pre-period and confirm no "
+                "spurious gap appears before the true treatment date (Abadie-Diamond-"
+                "Hainmueller 2015, Section 4, Figure 4)."
+            ),
+            code=(
+                "itp_df = results.in_time_placebo()\n"
+                "print(itp_df)  # per-backdated-date placebo ATT (should be ~0)"
+            ),
+            priority="medium",
+            # Non-standard tag (not in STEPS) -> never auto-suppressed; deliberately
+            # NOT "sensitivity" (a caller could mark that done and drop this step).
+            step_name="in_time_placebo",
+        ),
         _robustness_compare_step("SyntheticDiD or CS"),
     ]
     warnings = _check_nan_att(results)
diff --git a/diff_diff/synthetic_control.py b/diff_diff/synthetic_control.py
index 164568a2..4004446c 100644
--- a/diff_diff/synthetic_control.py
+++ b/diff_diff/synthetic_control.py
@@ -30,7 +30,7 @@
 """
 
 import warnings
-from dataclasses import dataclass
+from dataclasses import dataclass, replace
 from typing import Any, Dict, List, Optional, Tuple, cast
 
 import numpy as np
@@ -523,6 +523,9 @@ def fit(
             pre_periods=list(pre_periods),
             post_periods=list(post_periods),
             donor_ids=list(donor_ids),
+            # Freeze the reportably-weighted support (donor_weights keys, in donor_ids
+            # order) so leave_one_out() is immune to post-fit mutation of donor_weights.
+            weighted_donor_ids=[d for d in donor_ids if d in donor_weights],
             treated_id=treated_id,
             standardize=self.standardize,
             v_method=self.v_method,
@@ -1310,3 +1313,88 @@ def _placebo_fit_unit(
     post_gaps = np.array([gap_path[p] for p in snap.post_periods], dtype=float)
     scale = float(np.max(np.abs(Z1))) if Z1.size else 0.0
     return gap_path, _rmspe_ratio(pre_gaps, post_gaps, scale)
+
+
+# =============================================================================
+# in-time (backdating) placebo (ADH 2015 §4) — used by
+# SyntheticControlResults.in_time_placebo() via function-level import
+# =============================================================================
+
+
+def _truncate_snapshot_in_time(
+    snap: _SyntheticControlFitSnapshot,
+    t_f: Any,
+) -> Tuple[Optional[_SyntheticControlFitSnapshot], List[str]]:
+    """Build a snapshot for an in-time placebo reassigning the intervention to ``t_f``.
+
+    Re-cuts the panel so the **pre-fake** window is the pre-periods strictly before
+    ``t_f`` and the **post-fake** window is the held-out pre-periods from ``t_f``
+    onward (``t_f`` is the first post-fake period). The true post-treatment periods
+    are EXCLUDED entirely — ``all_periods`` is set to ``pre_fake + post_fake`` so the
+    placebo refit (which fits V/W on ``pre_periods`` and measures the gap over
+    ``all_periods``) never sees treatment-contaminated data.
+
+    Predictor specs are TRUNCATED to the pre-fake window (ADH 2015 §4 "lag the
+    predictors accordingly"; for our absolute-period specs this means intersecting
+    each spec's ``periods`` with the pre-fake window). A spec whose window lies
+    ENTIRELY in the held-out region is DROPPED (its label is collected for an
+    aggregated warning), and ``custom_v`` is subset in lockstep with the surviving
+    specs so its length stays ``== len(kept_specs)``.
+
+    Returns ``(modified_snapshot, dropped_spec_labels)``, or ``(None, dropped)`` when
+    the date is infeasible: fewer than 2 pre-fake periods (the weight / V solve needs
+    at least 2), no post-fake period, or every predictor dropped. ``t_f`` must be an
+    element of ``snap.pre_periods`` (the caller validates membership).
+    """
+    pre = snap.pre_periods
+    idx = pre.index(t_f)  # positional split (period labels may be non-numeric)
+    new_pre = list(pre[:idx])
+    new_post = list(pre[idx:])  # t_f is the first post-fake period
+    pre_set = set(new_pre)
+
+    kept_specs: List[_PredictorSpec] = []
+    keep_mask: List[int] = []
+    dropped: List[str] = []
+    for i, spec in enumerate(snap.specs):
+        # Intersect with the pre-fake window (preserves the spec's canonical order).
+        kept_periods = [p for p in spec.periods if p in pre_set]
+        if kept_periods:
+            # NEW spec object — never mutate the shared snapshot specs in place.
+            kept_specs.append(replace(spec, periods=kept_periods))
+            keep_mask.append(i)
+        else:
+            dropped.append(spec.label)
+
+    # Feasibility: >=2 pre-fake periods, >=1 post-fake period (to measure the placebo
+    # gap), and at least one surviving predictor. The >=2 pre-fake rule is DELIBERATELY
+    # stricter than the base estimator's T0>=1 allowance (which warns that single-pre
+    # nested-V selection is unreliable): an auto-swept placebo date with a single
+    # pre-fake period is a trivially-matchable, non-credible pre-fit, so it is dropped
+    # as infeasible rather than surfaced as a "ran" placebo. Documented as a Note in
+    # docs/methodology/REGISTRY.md §SyntheticControl.
+    if len(new_pre) < 2 or len(new_post) < 1 or not kept_specs:
+        return None, dropped
+
+    # Subset custom_v IN LOCKSTEP with the surviving specs (custom_v length must stay
+    # == k = len(specs); a desync silently misaligns V on the custom path). RAVEL first:
+    # fit() accepts array-like custom_v (e.g. a (1, k) row vector) and the snapshot keeps
+    # its original shape, so `[keep_mask]` would otherwise index the wrong axis of a 2D
+    # custom_v and raise (matches _placebo_fit_unit, which also ravels at use).
+    new_custom_v = (
+        None if snap.custom_v is None else np.asarray(snap.custom_v, dtype=float).ravel()[keep_mask]
+    )
+    # A custom V whose surviving entries sum to ~0 (all mass was on dropped specs)
+    # cannot fit — _placebo_fit_unit's `v / v.sum()` would be 0/0. This is the date
+    # being INFEASIBLE UNDER THE SUPPLIED custom_v, not a solver-convergence failure,
+    # so return None (→ status="infeasible") rather than letting the refit "fail".
+    if new_custom_v is not None and float(np.sum(new_custom_v)) <= 0.0:
+        return None, dropped
+    snap_mod = replace(
+        snap,
+        specs=kept_specs,
+        all_periods=new_pre + new_post,
+        pre_periods=new_pre,
+        post_periods=new_post,
+        custom_v=new_custom_v,
+    )
+    return snap_mod, dropped
diff --git a/diff_diff/synthetic_control_results.py b/diff_diff/synthetic_control_results.py
index 8e412057..6dfa10b5 100644
--- a/diff_diff/synthetic_control_results.py
+++ b/diff_diff/synthetic_control_results.py
@@ -48,6 +48,12 @@ class _SyntheticControlFitSnapshot:
     pre_periods: List[Any]
     post_periods: List[Any]
     donor_ids: List[Any]
+    # The treated unit's reportably-weighted donor support (donor ids with weight above
+    # the 1e-6 interpretability floor), FROZEN at fit time and ordered by donor_ids.
+    # leave_one_out() iterates this immutable list — NOT the mutable, presentation-level
+    # results.donor_weights dict — so post-fit mutation cannot change which donors are
+    # dropped, and the robustness result depends only on the fit.
+    weighted_donor_ids: List[Any]
     treated_id: Any
     standardize: str
     v_method: str
@@ -195,6 +201,39 @@ def __post_init__(self) -> None:
         # "all_placebos_failed". A small string, so it survives pickling.
         self._placebo_status: Optional[str] = None
 
+        # --- ADH 2015 §4 robustness diagnostics (opt-in, populated by ---
+        # --- leave_one_out() / in_time_placebo()). Same panel-vs-scalar split as ---
+        # --- the in-space placebo: the small per-row tables (_loo_df / _in_time_df), ---
+        # --- scalar summaries and status strings survive pickling; the per-refit ---
+        # --- gap-path dicts (_loo_gaps / _in_time_gaps) are panel-derived and nulled ---
+        # --- by __getstate__. analytical se/t/p/ci stay NaN throughout.
+        self._loo_df: Optional[pd.DataFrame] = None
+        self._loo_gaps: Optional[Dict[Any, Dict[Any, float]]] = None
+        # Reason a leave-one-out run was infeasible/absent. Values: None (not run),
+        # "ran", "treated_fit_nonconverged", "too_few_donors", "all_refits_failed".
+        self._loo_status: Optional[str] = None
+        # (min, max) ATT across the successful leave-one-out refits (the absolute
+        # spread of counterfactual ATTs); None until run.
+        self._loo_att_range: Optional[Tuple[float, float]] = None
+        # The headline single-donor-dependence number: max |att_loo - baseline_att|
+        # over the successful drops. Baseline-RELATIVE, so a uniform shift of every
+        # drop away from the baseline is NOT masked the way a narrow raw att_range
+        # would be. None until run.
+        self._loo_max_abs_delta_att: Optional[float] = None
+        self._loo_n_failed: int = 0
+        self._in_time_df: Optional[pd.DataFrame] = None
+        self._in_time_gaps: Optional[Dict[Any, Dict[Any, float]]] = None
+        # Reason an in-time placebo run was infeasible/absent. Values: None (not run),
+        # "ran", "treated_fit_nonconverged", "too_few_pre_periods",
+        # "all_dates_infeasible", "all_dates_failed", "all_dates_unusable" (a mix of
+        # failed + infeasible dates with none usable).
+        self._in_time_status: Optional[str] = None
+        self._in_time_n_failed: int = 0
+        # Number of placebo dates that were dimensionally infeasible (too few pre-fake
+        # periods, all predictors dropped, or a zero-mass surviving custom_v). Surfaced
+        # alongside _in_time_n_failed so a mixed no-success run reports an accurate mix.
+        self._in_time_n_infeasible: int = 0
+
     def __getstate__(self) -> Dict[str, Any]:
         """Exclude panel-derived internal state from pickling.
 
@@ -209,6 +248,12 @@ def __getstate__(self) -> Dict[str, Any]:
         state = self.__dict__.copy()
         state["_fit_snapshot"] = None
         state["_placebo_gaps"] = None
+        # ADH-2015 diagnostic gap paths are panel-derived (same hazard as
+        # _placebo_gaps); the small _loo_df / _in_time_df tables + scalar summaries
+        # survive so a round-tripped result still reports the diagnostic, but the
+        # overlay gap accessors raise (re-fit to recompute).
+        state["_loo_gaps"] = None
+        state["_in_time_gaps"] = None
         return state
 
     def __repr__(self) -> str:
@@ -727,3 +772,583 @@ def in_space_placebo(
         self._placebo_status = "ran" if n_placebos > 0 else "all_placebos_failed"
         self._placebo_df = pd.DataFrame(rows, columns=self._PLACEBO_COLS)
         return self._placebo_df.copy()
+
+    _LOO_COLS = [
+        "dropped_unit",
+        "att",
+        "pre_rmspe",
+        "post_rmspe",
+        "rmspe_ratio",
+        "delta_att",
+        "status",
+    ]
+
+    def leave_one_out(self, n_starts: Optional[int] = None) -> pd.DataFrame:
+        """
+        Leave-one-out donor robustness (Abadie-Diamond-Hainmueller 2015, Section 4).
+
+        Drops each **reportably-weighted** donor, one at a time, and re-fits the
+        treated unit's synthetic control against the remaining donor pool. The
+        per-drop ATTs reveal whether the estimated effect is driven by any single
+        donor (ADH 2015 overlay the leave-one-out counterfactual trajectories for
+        this purpose; :meth:`get_leave_one_out_gaps` returns those paths). This is a
+        thin re-run of the validated SCM solver — it has **no analytical standard
+        error**; ``se``/``t_stat``/``p_value``/``conf_int`` and ``is_significant``
+        are unaffected (still bound to the NaN analytical ``p_value``).
+
+        The drop set is exactly the donors in ``donor_weights`` — those above the
+        ``1e-6`` interpretability floor (``synthetic_control._MIN_REPORT_WEIGHT``).
+        A donor with negligible weight ``0 < w ≤ 1e-6`` is excluded (its removal
+        moves the ATT by ~the weight, so its ``delta_att`` would be ~0 — an
+        uninformative row), keeping the LOO table aligned with the reported support;
+        a zero-weight donor's removal leaves the synthetic unchanged. (This `1e-6`
+        approximation of "positive weight" is documented in REGISTRY §SyntheticControl.)
+        A donor that carries ALL the weight is still dropped (the others absorb its
+        mass on re-fit); its large ``delta_att`` is exactly the single-donor-dependence
+        signal this diagnostic exists to surface, NOT a failure.
+
+        Parameters
+        ----------
+        n_starts : int, optional
+            Override the multistart count for each leave-one-out refit's nested V
+            search. Default None inherits the original fit's ``n_starts``.
+
+        Returns
+        -------
+        pandas.DataFrame
+            One ``status="baseline"`` row (the full fit, ``delta_att=0``) followed by
+            one row per dropped donor (``status="loo"``, or ``"failed"`` with NaN
+            metrics when its refit did not converge), sorted by ``|delta_att|``
+            descending (failed rows last). Columns: ``dropped_unit``, ``att``,
+            ``pre_rmspe``, ``post_rmspe``, ``rmspe_ratio``, ``delta_att``
+            (``att_loo - full_att``), ``status``.
+
+        Raises
+        ------
+        ValueError
+            If the fit snapshot is unavailable (e.g. this result was unpickled).
+        """
+        if self._fit_snapshot is None:
+            raise ValueError(
+                "leave_one_out() requires the fit snapshot on the results object. "
+                "This result appears to have been loaded from serialization (which "
+                "excludes the snapshot) or produced by an older estimator version. "
+                "Re-fit to enable leave-one-out donor robustness."
+            )
+        from diff_diff.synthetic_control import _mspe, _placebo_fit_unit
+
+        snap = self._fit_snapshot
+        if n_starts is None:
+            n_starts_eff = snap.n_starts
+        else:
+            # Mirror the estimator constructor's validation so a bad override fails
+            # fast instead of silently coercing into a degenerate refit (cf.
+            # in_space_placebo()).
+            if not isinstance(n_starts, (int, np.integer)) or n_starts < 1:
+                raise ValueError(f"n_starts override must be a positive integer, got {n_starts!r}")
+            n_starts_eff = int(n_starts)
+
+        # Baseline row: read DIRECTLY from the full fit (do NOT re-fit), so the
+        # reference ATT — and therefore delta_att=0.0 — is exact.
+        baseline_row = {
+            "dropped_unit": None,
+            "att": float(self.att),
+            "pre_rmspe": float(self.pre_rmspe),
+            "post_rmspe": float(np.sqrt(_mspe(self.gap_path, snap.post_periods))),
+            "rmspe_ratio": float(self.rmspe_ratio),
+            "delta_att": 0.0,
+            "status": "baseline",
+        }
+
+        # Fail closed when the treated unit's own fit did not converge: a truncated /
+        # under-optimized baseline ATT makes every leave-one-out delta meaningless.
+        if not self._fit_converged:
+            warnings.warn(
+                "Leave-one-out skipped: the treated unit's own SCM fit did not "
+                "converge at fit time (inner Frank-Wolfe weight solve and/or outer V "
+                "search), so the baseline ATT is not a valid optimum to compare "
+                "leave-one-out refits against. Re-fit with a larger inner_max_iter / "
+                "looser inner_min_decrease (inner) and/or a larger "
+                "optimizer_options['maxiter'] / more n_starts (outer V search).",
+                UserWarning,
+                stacklevel=2,
+            )
+            self._loo_status = "treated_fit_nonconverged"
+            self._loo_att_range = None
+            self._loo_n_failed = 0
+            self._loo_gaps = {}
+            self._loo_df = pd.DataFrame([baseline_row], columns=self._LOO_COLS)
+            return self._loo_df.copy()
+
+        # Dropping any donor requires at least one donor left in the pool.
+        if len(snap.donor_ids) < 2:
+            warnings.warn(
+                "Leave-one-out donor robustness requires at least 2 donors (dropping "
+                f"one must leave a non-empty pool); only {len(snap.donor_ids)} "
+                "available. Returning the baseline fit only.",
+                UserWarning,
+                stacklevel=2,
+            )
+            self._loo_status = "too_few_donors"
+            self._loo_att_range = None
+            self._loo_n_failed = 0
+            self._loo_gaps = {}
+            self._loo_df = pd.DataFrame([baseline_row], columns=self._LOO_COLS)
+            return self._loo_df.copy()
+
+        # Drop the FROZEN reportably-weighted support captured at fit time (donor ids
+        # with weight above the 1e-6 floor, in donor_ids order). Reading the snapshot —
+        # NOT the mutable presentation-level self.donor_weights — makes the result
+        # depend only on the fit and immune to post-fit mutation of donor_weights.
+        pos_donors = list(snap.weighted_donor_ids)
+        loo_gaps: Dict[Any, Dict[Any, float]] = {}
+        loo_rows: List[Dict[str, Any]] = []
+        atts: List[float] = []
+        n_failed = 0
+
+        for d in pos_donors:
+            pool = [x for x in snap.donor_ids if x != d]
+            fitted = _placebo_fit_unit(snap, snap.treated_id, pool, n_starts_eff)
+            if fitted is None:
+                n_failed += 1
+                loo_rows.append(
+                    {
+                        "dropped_unit": d,
+                        "att": np.nan,
+                        "pre_rmspe": np.nan,
+                        "post_rmspe": np.nan,
+                        "rmspe_ratio": np.nan,
+                        "delta_att": np.nan,
+                        "status": "failed",
+                    }
+                )
+                continue
+            gap_path_d, ratio_d = fitted
+            loo_gaps[d] = gap_path_d
+            att_d = float(np.mean([gap_path_d[p] for p in snap.post_periods]))
+            atts.append(att_d)
+            loo_rows.append(
+                {
+                    "dropped_unit": d,
+                    "att": att_d,
+                    "pre_rmspe": float(np.sqrt(_mspe(gap_path_d, snap.pre_periods))),
+                    "post_rmspe": float(np.sqrt(_mspe(gap_path_d, snap.post_periods))),
+                    "rmspe_ratio": ratio_d,
+                    "delta_att": att_d - float(self.att),
+                    "status": "loo",
+                }
+            )
+
+        # Sort successful drops by |delta_att| desc (most influential donor first);
+        # non-converged drops sort last.
+        finite_rows = sorted(
+            (r for r in loo_rows if r["status"] == "loo"),
+            key=lambda r: abs(r["delta_att"]),
+            reverse=True,
+        )
+        failed_rows = [r for r in loo_rows if r["status"] == "failed"]
+        ordered = [baseline_row] + finite_rows + failed_rows
+
+        if n_failed > 0:
+            warnings.warn(
+                f"{n_failed} of {len(pos_donors)} leave-one-out refits failed to "
+                "converge and are reported with NaN metrics (status='failed'); the "
+                "ATT range uses the remaining refits.",
+                UserWarning,
+                stacklevel=2,
+            )
+
+        self._loo_gaps = loo_gaps
+        self._loo_n_failed = int(n_failed)
+        self._loo_att_range = (min(atts), max(atts)) if atts else None
+        # Baseline-relative headline: the largest swing of any single donor-drop from
+        # the full-fit ATT (max |delta_att|). Robust to a uniform shift that a raw
+        # att_range would understate.
+        self._loo_max_abs_delta_att = max(abs(a - float(self.att)) for a in atts) if atts else None
+        # Distinguish a real run from "every donor-drop refit failed to converge"
+        # (no valid leave-one-out estimate produced) so DR/BR do not report an empty
+        # diagnostic as completed. (pos_donors empty — a converged fit always has >=1
+        # positive weight — falls through to "ran": baseline-only, benign.)
+        self._loo_status = "all_refits_failed" if (pos_donors and not atts) else "ran"
+        self._loo_df = pd.DataFrame(ordered, columns=self._LOO_COLS)
+        return self._loo_df.copy()
+
+    def get_leave_one_out_df(self) -> pd.DataFrame:
+        """
+        Get the leave-one-out donor-robustness table (see :meth:`leave_one_out`).
+
+        Survives pickling. Raises if :meth:`leave_one_out` has not been run.
+
+        Returns
+        -------
+        pandas.DataFrame
+        """
+        if self._loo_df is None:
+            raise ValueError("No leave-one-out results yet; call leave_one_out() first.")
+        return self._loo_df.copy()
+
+    def get_leave_one_out_gaps(self) -> pd.DataFrame:
+        """
+        Long-form leave-one-out gap paths, for the overlay ("spaghetti") plot.
+
+        One row per (dropped donor, period) for every converged leave-one-out refit.
+        Columns: ``dropped_unit``, ``period``, ``gap``, ``phase`` (``"pre"``/
+        ``"post"``) — mirroring :meth:`get_gap_df`. These per-period paths are
+        panel-derived and are NOT retained after pickling.
+
+        Returns
+        -------
+        pandas.DataFrame
+
+        Raises
+        ------
+        ValueError
+            If :meth:`leave_one_out` has not been run, or if the gap paths were
+            dropped on pickling (re-fit and re-run to recompute them).
+        """
+        if self._loo_df is None:
+            raise ValueError("No leave-one-out results yet; call leave_one_out() first.")
+        if self._loo_gaps is None:
+            raise ValueError(
+                "Leave-one-out gap paths are not retained after pickling "
+                "(panel-derived); re-run leave_one_out() on a freshly fitted result "
+                "to recompute them."
+            )
+        rows: List[Dict[str, Any]] = []
+        for unit, gap_path in self._loo_gaps.items():
+            for period in list(self.pre_periods) + list(self.post_periods):
+                if period in gap_path:
+                    phase = "post" if period in self.post_periods else "pre"
+                    rows.append(
+                        {
+                            "dropped_unit": unit,
+                            "period": period,
+                            "gap": gap_path[period],
+                            "phase": phase,
+                        }
+                    )
+        return pd.DataFrame(rows, columns=["dropped_unit", "period", "gap", "phase"])
+
+    _IN_TIME_COLS = [
+        "placebo_period",
+        "placebo_att",
+        "pre_fit_rmspe",
+        "rmspe_ratio",
+        "n_pre_fake",
+        "n_post_fake",
+        "n_dropped_specs",
+        "status",
+    ]
+
+    def in_time_placebo(
+        self,
+        placebo_periods: Optional[Any] = None,
+        n_starts: Optional[int] = None,
+    ) -> pd.DataFrame:
+        """
+        In-time (backdating) placebo (Abadie-Diamond-Hainmueller 2015, Section 4).
+
+        Reassigns the intervention to an earlier pre-treatment date ``t_f`` and re-fits
+        the synthetic control using ONLY pre-``t_f`` information, then measures the
+        "effect" over the held-out window ``[t_f, T0)``. A credible synthetic control
+        should show **no spurious gap** there (ADH 2015 Figure 4, German reunification
+        backdated to 1975). This is a thin re-run of the validated SCM solver — it has
+        **no analytical standard error**; ``se``/``t_stat``/``p_value``/``conf_int`` and
+        ``is_significant`` are unaffected.
+
+        **Windowing convention (TRUNCATE).** The placebo fit uses only periods strictly
+        before ``t_f``: pre-period-outcome predictors become the pre-``t_f`` outcomes,
+        and covariate / special predictor windows are intersected with the pre-``t_f``
+        window. A predictor window lying ENTIRELY in the held-out region ``[t_f, T0)``
+        is dropped (surfaced in ``n_dropped_specs`` + an aggregated warning). For
+        outcome-predictor fits this equals the literal "lag the predictors" re-run of a
+        manual ``Synth::synth`` (R has no in-time-placebo function); see
+        ``docs/methodology/REGISTRY.md`` for the recognized deviation note.
+
+        Parameters
+        ----------
+        placebo_periods : period value or list of period values, optional
+            The pseudo-intervention date(s), each a member of ``pre_periods``. Default
+            None sweeps every feasible interior pre-date (at least 2 pre-fake periods to
+            fit + at least 1 post-fake period to measure the gap). A date that is a true
+            post-treatment period, or not a pre-period at all, raises ``ValueError``; a
+            valid pre-date that is dimensionally infeasible (too few pre-fake periods, or
+            all predictors dropped) yields a ``status="infeasible"`` row (no raise).
+        n_starts : int, optional
+            Override the multistart count for each placebo refit's nested V search.
+            Default None inherits the original fit's ``n_starts``.
+
+        Returns
+        -------
+        pandas.DataFrame
+            One row per placebo date. Columns: ``placebo_period``, ``placebo_att`` (mean
+            gap over the held-out window — should be ~0 if no real pre-period effect),
+            ``pre_fit_rmspe``, ``rmspe_ratio`` (post-fake/pre-fake), ``n_pre_fake``,
+            ``n_post_fake``, ``n_dropped_specs``, ``status`` (``"ran"`` / ``"infeasible"``
+            / ``"failed"``).
+
+        Raises
+        ------
+        ValueError
+            If the fit snapshot is unavailable (e.g. this result was unpickled), or an
+            explicit ``placebo_periods`` entry is a post-treatment period / not a
+            pre-period.
+        """
+        if self._fit_snapshot is None:
+            raise ValueError(
+                "in_time_placebo() requires the fit snapshot on the results object. "
+                "This result appears to have been loaded from serialization (which "
+                "excludes the snapshot) or produced by an older estimator version. "
+                "Re-fit to enable the in-time placebo."
+            )
+        from diff_diff.synthetic_control import (
+            _mspe,
+            _placebo_fit_unit,
+            _truncate_snapshot_in_time,
+        )
+
+        snap = self._fit_snapshot
+        if n_starts is None:
+            n_starts_eff = snap.n_starts
+        else:
+            if not isinstance(n_starts, (int, np.integer)) or n_starts < 1:
+                raise ValueError(f"n_starts override must be a positive integer, got {n_starts!r}")
+            n_starts_eff = int(n_starts)
+
+        pre = list(snap.pre_periods)
+        empty = pd.DataFrame([], columns=self._IN_TIME_COLS)
+
+        # Fail closed when the treated unit's own fit did not converge: a truncated /
+        # under-optimized baseline makes the placebo comparison meaningless.
+        if not self._fit_converged:
+            warnings.warn(
+                "In-time placebo skipped: the treated unit's own SCM fit did not "
+                "converge at fit time (inner Frank-Wolfe weight solve and/or outer V "
+                "search). Re-fit with a larger inner_max_iter / looser "
+                "inner_min_decrease (inner) and/or a larger optimizer_options['maxiter'] "
+                "/ more n_starts (outer V search).",
+                UserWarning,
+                stacklevel=2,
+            )
+            self._in_time_status = "treated_fit_nonconverged"
+            self._in_time_n_failed = 0
+            self._in_time_gaps = {}
+            self._in_time_df = empty
+            return empty.copy()
+
+        # A feasible date needs >=2 pre-fake + >=1 post-fake period -> >=3 pre periods.
+        # The >=2 pre-fake rule is a deliberate Note-documented restriction (an auto-
+        # swept single-pre-fake placebo is a non-credible pre-fit; see REGISTRY).
+        if len(pre) < 3:
+            warnings.warn(
+                "In-time placebo requires at least 3 pre-treatment periods (a feasible "
+                "placebo date needs >=2 pre-fake periods to fit and >=1 post-fake period "
+                f"to measure the gap); only {len(pre)} available.",
+                UserWarning,
+                stacklevel=2,
+            )
+            self._in_time_status = "too_few_pre_periods"
+            self._in_time_n_failed = 0
+            self._in_time_gaps = {}
+            self._in_time_df = empty
+            return empty.copy()
+
+        if placebo_periods is None:
+            # Sweep every feasible pre-date (positional: idx>=2 gives >=2 pre-fake +
+            # >=1 post-fake; idx<2 would leave fewer than 2 pre-fake periods).
+            dates: List[Any] = [pre[i] for i in range(2, len(pre))]
+        else:
+            if isinstance(placebo_periods, (list, tuple, set, np.ndarray, pd.Index, pd.Series)):
+                dates = list(placebo_periods)
+            else:
+                dates = [placebo_periods]
+            # An explicit but EMPTY container is a malformed request (NOT "every date
+            # was infeasible") — fail fast, consistent with the post-date / non-pre
+            # date raises below. Pass None to sweep all feasible pre-dates.
+            if not dates:
+                raise ValueError(
+                    "placebo_periods is empty; pass None to sweep all feasible "
+                    "pre-dates, or a non-empty list of pre-period date(s)."
+                )
+            pre_set = set(pre)
+            post_set = set(snap.post_periods)
+            for d in dates:
+                if d in post_set:
+                    raise ValueError(
+                        f"placebo_period {d!r} is a true post-treatment period; an "
+                        "in-time placebo date must lie in the pre-treatment window."
+                    )
+                if d not in pre_set:
+                    raise ValueError(
+                        f"placebo_period {d!r} is not a pre-treatment period "
+                        f"(pre_periods = {pre})."
+                    )
+            # De-duplicate + canonicalize to pre-period order (mirrors _resolve_periods):
+            # duplicate / unordered explicit dates must not trigger duplicate refits or
+            # inflate n_dates.
+            _requested = set(dates)
+            dates = [p for p in pre if p in _requested]
+
+        in_time_gaps: Dict[Any, Dict[Any, float]] = {}
+        rows: List[Dict[str, Any]] = []
+        dropped_all: set = set()
+        n_failed = 0
+        n_infeasible = 0
+        n_ran = 0
+
+        for t_f in dates:
+            idx = pre.index(t_f)
+            n_pre_fake = idx
+            n_post_fake = len(pre) - idx
+            snap_mod, dropped = _truncate_snapshot_in_time(snap, t_f)
+            dropped_all.update(dropped)
+            if snap_mod is None:
+                n_infeasible += 1
+                rows.append(
+                    {
+                        "placebo_period": t_f,
+                        "placebo_att": np.nan,
+                        "pre_fit_rmspe": np.nan,
+                        "rmspe_ratio": np.nan,
+                        "n_pre_fake": n_pre_fake,
+                        "n_post_fake": n_post_fake,
+                        "n_dropped_specs": len(dropped),
+                        "status": "infeasible",
+                    }
+                )
+                continue
+            fitted = _placebo_fit_unit(snap_mod, snap.treated_id, snap.donor_ids, n_starts_eff)
+            if fitted is None:
+                n_failed += 1
+                rows.append(
+                    {
+                        "placebo_period": t_f,
+                        "placebo_att": np.nan,
+                        "pre_fit_rmspe": np.nan,
+                        "rmspe_ratio": np.nan,
+                        "n_pre_fake": n_pre_fake,
+                        "n_post_fake": n_post_fake,
+                        "n_dropped_specs": len(dropped),
+                        "status": "failed",
+                    }
+                )
+                continue
+            gap_path, ratio = fitted
+            in_time_gaps[t_f] = gap_path
+            placebo_att = float(np.mean([gap_path[p] for p in snap_mod.post_periods]))
+            rows.append(
+                {
+                    "placebo_period": t_f,
+                    "placebo_att": placebo_att,
+                    "pre_fit_rmspe": float(np.sqrt(_mspe(gap_path, snap_mod.pre_periods))),
+                    "rmspe_ratio": ratio,
+                    "n_pre_fake": n_pre_fake,
+                    "n_post_fake": n_post_fake,
+                    "n_dropped_specs": len(dropped),
+                    "status": "ran",
+                }
+            )
+            n_ran += 1
+
+        if dropped_all:
+            warnings.warn(
+                "In-time placebo (TRUNCATE convention): predictor(s) "
+                f"{sorted(map(str, dropped_all))} fell entirely in the held-out "
+                "post-fake window for some placebo date(s) and were dropped from those "
+                "refits (see the n_dropped_specs column).",
+                UserWarning,
+                stacklevel=2,
+            )
+        if n_infeasible > 0:
+            warnings.warn(
+                f"{n_infeasible} in-time placebo date(s) were dimensionally infeasible "
+                "(too few pre-fake periods or all predictors dropped) and are reported "
+                "with status='infeasible' (NaN metrics).",
+                UserWarning,
+                stacklevel=2,
+            )
+        if n_failed > 0:
+            warnings.warn(
+                f"{n_failed} in-time placebo refit(s) failed to converge and are "
+                "reported with status='failed' (NaN metrics).",
+                UserWarning,
+                stacklevel=2,
+            )
+
+        self._in_time_gaps = in_time_gaps
+        self._in_time_n_failed = int(n_failed)
+        self._in_time_n_infeasible = int(n_infeasible)
+        # When no date ran, classify the cause precisely so the downstream reason text
+        # is never false: a pure convergence failure ("all_dates_failed", actionable —
+        # raise n_starts / loosen tolerances) and pure dimensional infeasibility
+        # ("all_dates_infeasible", structural) are distinct; a MIX of both gets its own
+        # "all_dates_unusable" code (both counters are surfaced) rather than being
+        # mislabeled as exclusively one or the other.
+        if n_ran > 0:
+            self._in_time_status = "ran"
+        elif n_failed > 0 and n_infeasible > 0:
+            self._in_time_status = "all_dates_unusable"
+        elif n_failed > 0:
+            self._in_time_status = "all_dates_failed"
+        else:
+            self._in_time_status = "all_dates_infeasible"
+        self._in_time_df = pd.DataFrame(rows, columns=self._IN_TIME_COLS)
+        return self._in_time_df.copy()
+
+    def get_in_time_placebo_df(self) -> pd.DataFrame:
+        """
+        Get the in-time placebo table (see :meth:`in_time_placebo`).
+
+        Survives pickling. Raises if :meth:`in_time_placebo` has not been run.
+
+        Returns
+        -------
+        pandas.DataFrame
+        """
+        if self._in_time_df is None:
+            raise ValueError("No in-time placebo results yet; call in_time_placebo() first.")
+        return self._in_time_df.copy()
+
+    def get_in_time_placebo_gaps(self) -> pd.DataFrame:
+        """
+        Long-form in-time placebo gap paths, for the backdating overlay plot.
+
+        One row per (placebo date, period) for every converged in-time refit. Columns:
+        ``placebo_period``, ``period``, ``gap``, ``phase`` (``"pre_fake"`` for periods
+        before the placebo date, ``"post_fake"`` for the held-out window from it on).
+        These per-period paths are panel-derived and are NOT retained after pickling.
+
+        Returns
+        -------
+        pandas.DataFrame
+
+        Raises
+        ------
+        ValueError
+            If :meth:`in_time_placebo` has not been run, or if the gap paths were
+            dropped on pickling (re-fit and re-run to recompute them).
+        """
+        if self._in_time_df is None:
+            raise ValueError("No in-time placebo results yet; call in_time_placebo() first.")
+        if self._in_time_gaps is None:
+            raise ValueError(
+                "In-time placebo gap paths are not retained after pickling "
+                "(panel-derived); re-run in_time_placebo() on a freshly fitted result "
+                "to recompute them."
+            )
+        pre = list(self.pre_periods)
+        rows: List[Dict[str, Any]] = []
+        for t_f, gap_path in self._in_time_gaps.items():
+            split = pre.index(t_f)
+            for period in pre:
+                if period in gap_path:
+                    phase = "post_fake" if pre.index(period) >= split else "pre_fake"
+                    rows.append(
+                        {
+                            "placebo_period": t_f,
+                            "period": period,
+                            "gap": gap_path[period],
+                            "phase": phase,
+                        }
+                    )
+        return pd.DataFrame(rows, columns=["placebo_period", "period", "gap", "phase"])
diff --git a/docs/api/synthetic_control.rst b/docs/api/synthetic_control.rst
index 506c7422..b727315d 100644
--- a/docs/api/synthetic_control.rst
+++ b/docs/api/synthetic_control.rst
@@ -25,6 +25,15 @@ reported estimate. Significance comes from **in-space placebo permutation infere
 ``placebo_p_value = rank/(n_placebos+1)``). This permutation p-value is a separate field
 from the (NaN) ``p_value``; ``is_significant`` stays bound to ``p_value``.
 
+**Robustness diagnostics (ADH 2015 §4, opt-in):**
+:meth:`~diff_diff.SyntheticControlResults.leave_one_out` drops each reportably-weighted (weight > 1e-6)
+donor and re-fits (per-drop ATT / ``delta_att`` table — a large ``delta_att`` flags
+single-donor dependence), and
+:meth:`~diff_diff.SyntheticControlResults.in_time_placebo` reassigns the intervention to an
+earlier pre-date and checks for a spurious gap before the true treatment date (the
+backdating placebo; ``placebo_att`` should be ~0). Both re-run the validated solver and
+leave the analytical inference fields NaN.
+
 **Distinct from** :class:`~diff_diff.SyntheticDiD` (Arkhangelsky et al. 2021), which adds
 time weights and ridge regularization; classic SCM uses **donor weights only** plus the
 outer ``V`` search.
@@ -71,6 +80,12 @@ Results container for synthetic control estimation.
 
       ~SyntheticControlResults.in_space_placebo
       ~SyntheticControlResults.get_placebo_df
+      ~SyntheticControlResults.leave_one_out
+      ~SyntheticControlResults.get_leave_one_out_df
+      ~SyntheticControlResults.get_leave_one_out_gaps
+      ~SyntheticControlResults.in_time_placebo
+      ~SyntheticControlResults.get_in_time_placebo_df
+      ~SyntheticControlResults.get_in_time_placebo_gaps
       ~SyntheticControlResults.summary
       ~SyntheticControlResults.print_summary
       ~SyntheticControlResults.to_dict
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 38c8022f..f33f263b 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -1988,6 +1988,10 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d
 
 **Inference:** **No analytical standard error** (Section 2.4) — `se`/`t_stat`/`p_value`/`conf_int` are always NaN. Significance comes from **in-space placebo permutation inference** via `SyntheticControlResults.in_space_placebo()`: reassign treatment to each donor, refit a synthetic control for it, and rank the treated unit's post/pre RMSPE ratio (`rmspe_ratio` = `RMSPE_post / RMSPE_pre` = `sqrt(MSPE_post / MSPE_pre)`) among all units; `placebo_p_value = rank / (n_placebos + 1)`, where `rank = 1 + #{placebos with ratio ≥ treated ratio}` — an **upper-tail rank test on the (unsigned) RMSPE-ratio statistic**, ties counted conservatively via `≥`. Because the ratio squares the gaps it is direction-agnostic: a large ratio signals an effect of *either* sign, so this is NOT a signed/one-directional ("one-sided") hypothesis test on the treatment-effect direction. ADH 2010 §3.4 reports the *MSPE* ratio (the square of `rmspe_ratio`); the two are monotone-equivalent, so the rank and p-value are identical — only the reported statistic's scale differs. `rmspe_ratio` (the treated statistic) is computed at fit time; `placebo_p_value` / `n_placebos` / `n_failed` are populated by the opt-in `in_space_placebo()` call. `get_placebo_df()` returns the per-unit ratio table used for the rank (the per-period placebo gap paths for a "spaghetti" plot are retained internally, not in this summary).
 
+**ADH-2015 §4 robustness diagnostics (opt-in):** Two further diagnostics from Abadie-Diamond-Hainmueller (2015, §4), each a thin re-run of the validated solver — populated only when called and surfaced under `estimator_native_diagnostics` (the analytical inference contract is unchanged; `se`/`t_stat`/`p_value`/`conf_int`/`is_significant` stay bound to the NaN analytical `p_value`):
+- **Leave-one-out donor robustness** (`leave_one_out()`): drops each **reportably-weighted** donor and re-fits the treated unit against the reduced pool, returning a per-drop ATT / `delta_att` table (a `status="baseline"` row first, then one row per dropped donor sorted by `|delta_att|`). A large `delta_att` flags single-donor dependence (a single *dominant* donor is still dropped — the others absorb its mass — and its large `delta_att` is the intended signal, not a failure). The reporting stack's headline donor-sensitivity number is `max_abs_delta_att` = `max |delta_att|` over the drops (baseline-relative, so a uniform shift of every drop away from the full-fit ATT is not masked the way a raw ATT range would be). `get_leave_one_out_gaps()` returns the per-drop trajectories for the overlay plot. Fails closed on a non-converged treated fit or `< 2` donors.
+- **In-time (backdating) placebo** (`in_time_placebo()`): reassigns the intervention to an earlier pre-date `t_f`, re-fits using ONLY pre-`t_f` information (TRUNCATE convention — see Note), and reports the placebo "effect" over the held-out window `[t_f, T0)` — ~0 if there is no real pre-period effect (ADH 2015 Fig. 4, German reunification backdated to 1975). Sweeps every feasible interior pre-date by default (≥2 pre-fake + ≥1 post-fake); an explicit post-period / non-pre date raises, a valid-but-dimensionally-infeasible date yields a `status="infeasible"` row (no raise).
+
 **Notes / deviations:**
 - **Note:** The standardization divisor `divisor = sqrt(apply(cbind(X0,X1), 1, var))` (per-predictor SD over donors+treated, ddof=1) and the inner/outer optimizer are **not specified in ADH 2010** (which defers these numerics to Abadie & Gardeazabal 2003 App. B / the `Synth` software). The divisor is pinned from the R `Synth::synth` source; `solution.v` lives in this scaled predictor space, so the deterministic R-parity test feeds `custom_v` in the same scaled space.
 - **Note:** The outer objective minimizes the pre-period outcome MSPE over **all** pre periods, whereas R `Synth` uses a `time.optimize.ssr` window (1960–1969 in the Basque example). The nested `V` therefore differs from R by an efficiency-only choice (the paper notes inferential validity holds for *any* `V`), so end-to-end nested parity is a tolerance band, not equality.
@@ -2000,6 +2004,10 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d
 - **Note (placebo failure handling):** a placebo is **excluded from both the numerator and the denominator** of the rank (never penalized into it) and tallied in `n_failed` when its fit is not a valid optimum — EITHER its **inner Frank-Wolfe weight solve** did not converge (a truncated `W` is unusable) OR its **outer `V` search** did not converge (an under-optimized `V` fits the pre-period worse, shrinking the RMSPE ratio and biasing the p-value anti-conservatively, so it must not silently enter the rank). The reported p-value uses the **effective** count `rank / (n_placebos + 1)`, where `n_placebos` is the number of placebos that entered the reference set. Failed donors still appear in `get_placebo_df()` (`status="failed"`, NaN metrics), so once a reference set is produced the table is the full treated + every-donor unit set (`n_donors + 1` rows). In the fail-closed cases the placebo loop does not run and only the treated row is returned: `J < 2` → `placebo_p_value` is NaN with a warning (no placebo distribution; `J == 2` warns the distribution is coarse), and a treated fit whose own **inner OR outer** search did not converge also fails closed (ranking a truncated / under-optimized treated statistic would not be a valid permutation). **Caveat:** each placebo refit inherits the original fit's `optimizer_options` / `n_starts`, so valid inference requires settings adequate for the outer `V` search to converge to a comparable-quality synthetic (production defaults do; cheap settings under-optimize placebo `V` and those placebos are dropped as failed — raise `n_starts` on `in_space_placebo()` or re-fit with a larger `optimizer_options['maxiter']`).
 - **Note (RMSPE-ratio floor):** the reported `rmspe_ratio = sqrt(MSPE_post / MSPE_pre)` floors the pre-period MSPE denominator at a scale-aware `1e-8 · max(|pre-outcomes|, 1)²` (before the square root) so a (near-)perfect pre-fit (`pre-MSPE → 0`) yields a large-but-FINITE ratio rather than `inf`/`nan` (which would corrupt the rank). Ties (`ratio_j ≥ treated_ratio`) are counted, making the p-value conservative. Mirrors the `_fit_tol` poor-fit guard.
 - **Note (placebo p-value is non-analytical):** `placebo_p_value` is deliberately a SEPARATE field from `p_value` (which stays NaN) — it is a permutation p-value with no SE / t-stat, so it does not flow through `safe_inference`. `is_significant` likewise stays bound to the (NaN) `p_value`, NOT `placebo_p_value`; a tool gating on `is_significant` will see `False` even when `placebo_p_value` is small. The reporting stack surfaces the placebo p-value through `estimator_native_diagnostics`, never the analytical headline.
+- **Note (in-time placebo windowing — TRUNCATE):** ADH 2015 §4 says to re-estimate the in-time placebo "with the same predictors lagged accordingly." Because `diff_diff`'s predictor specs reference **absolute** periods, the in-time placebo re-cuts them by TRUNCATION: pre-period-outcome predictors become the pre-`t_f` outcomes, and covariate / special-predictor windows are intersected with the pre-`t_f` window; a window lying ENTIRELY in the held-out region `[t_f, T0)` is **dropped** (surfaced in the `n_dropped_specs` column + an aggregated warning), and `custom_v` is subset in lockstep with the surviving specs. For an outcome-predictor fit (the R-anchorable case) TRUNCATE is identical to ADH's "lag" — both equal a manual `Synth::synth` re-run with `time.optimize.ssr` cut at `t_f`. The held-out window never enters the fit (the placebo's `all_periods` is the pre-fake + post-fake span; the true post-treatment periods are excluded entirely), so there is no "peeking." This concrete convention is NOT spelled out in ADH 2015 (which gives only the qualitative "lag accordingly").
+- **Note (in-time placebo requires ≥2 pre-fake periods):** the in-time placebo treats a date with fewer than 2 pre-fake periods as `status="infeasible"` (the default sweep starts at the 3rd pre-period). This is DELIBERATELY stricter than the base estimator's `T0 ≥ 1` allowance (which permits a single-pre-period fit but warns that nested-`V` selection is unreliable): an auto-swept placebo date with a single pre-fake period is a trivially-matchable, non-credible pre-fit, so it is dropped rather than surfaced as a `ran` placebo (mirrors `SyntheticDiD.in_time_placebo`'s `i ≥ 2` rule). A date whose surviving `custom_v` has zero mass after truncation is likewise infeasible (not a convergence failure).
+- **Note (leave-one-out weight floor):** ADH 2015 §4 leave-one-out omits "each donor that received positive weight." This implementation drops each donor with **reportable** weight — above the `1e-6` interpretability floor (`synthetic_control._MIN_REPORT_WEIGHT`), i.e. exactly the donors in `donor_weights` — rather than every strictly-positive weight. A donor with `0 < w ≤ 1e-6` is numerical dust whose removal moves the ATT by ~its weight (its `delta_att` would be ~0, an uninformative row), and the floor keeps the LOO table aligned with the reported donor support. The drop-set is **frozen at fit time** on the fit snapshot (`weighted_donor_ids`), so `leave_one_out()` is immune to post-fit mutation of the presentation-level `donor_weights` dict.
+- **Note (ADH-2015 diagnostics validation):** R `Synth` has **no** in-time-placebo or leave-one-out function (verified against its full CRAN function index; `SCtools` adds only the *in-space* placebo battery, `scpi` only prediction-interval uncertainty), so there is no canonical R *output* to match for these diagnostics — in R they are hand-rolled by re-running `dataprep()`+`synth()`. They are validated instead by (a) the solver's existing Basque R parity (above), and (b) deterministic **self-consistency** tests proving each diagnostic equals a from-scratch `synthetic_control()` fit on the equivalent sub-problem — `leave_one_out()` drop-`d` == a fit on the donor pool minus `d`; `in_time_placebo([t_f])` == a fit on the backdated/truncated panel — both via a fixed `custom_v` (match to 1e-7). The deferred ADH-2015 items (out-of-sample CV `V`-selection, regression-weight `W^reg` extrapolation diagnostic, sparse-SC subset search) are tracked in `TODO.md`.
 
 **Reference implementation:** authors' `Synth` package for R/MATLAB/Stata (`Synth::synth`); in-space placebo construction follows `SCtools::generate.placebos`. **R-parity anchor:** the Basque Country study (Abadie-Gardeazabal 2003, `data("basque")`) — published synthetic = region 10 (Cataluña) 0.851 + region 14 (Madrid) 0.149, `loss.v` 0.0089. Two-tier test (`tests/test_methodology_synthetic_control.py`): Tier-1 feeds R's `solution.v` via `custom_v` → donor weights match to atol 1e-3 (deterministic); Tier-2 checks the nested fit in a band.
 
@@ -2010,6 +2018,9 @@ Classic synthetic control (donor/unit weights only) for a single treated unit, d
 - [x] Outer nested `V` (pre-period outcome MSPE) + user-supplied `custom_v`.
 - [x] Gap path + pre-period RMSPE + predictor-balance table.
 - [x] No analytical SE (NaN inference); in-space placebo permutation inference (`in_space_placebo()`, `rank/(n_placebos+1)`) with the real treated unit excluded from every placebo pool, effective-count denominator, and a scale-aware RMSPE-ratio floor.
+- [x] Leave-one-out donor robustness (`leave_one_out()`, ADH 2015 §4): per-drop ATT / `delta_att` table + overlay gaps; fail-closed.
+- [x] In-time (backdating) placebo (`in_time_placebo()`, ADH 2015 §4): TRUNCATE windowing (drop held-out-window predictors + lockstep `custom_v` subset), feasible-date sweep, fail-closed.
+- [ ] *Deferred (ADH 2015):* out-of-sample CV `V`-selection, regression-weight `W^reg` extrapolation diagnostic, sparse-SC subset search (see `TODO.md`).
 - [x] Predictor-leakage, absorbing-suffix/no-anticipation, empty-window, duplicate-label, and inner-non-convergence validation gates.
 
 ---
diff --git a/docs/methodology/REPORTING.md b/docs/methodology/REPORTING.md
index 58058312..17af4cd9 100644
--- a/docs/methodology/REPORTING.md
+++ b/docs/methodology/REPORTING.md
@@ -266,11 +266,13 @@ a library setting.
   under `estimator_native_diagnostics`. `SyntheticControlResults`
   routes parallel-trends to the `scm_fit` analogue (`pre_rmspe`,
   verdict `design_enforced_pt`) and surfaces `pre_rmspe`, donor-weight
-  concentration, and the in-space placebo permutation p-value under
-  `estimator_native_diagnostics` — the placebo block is populated only
-  when the caller has already run `in_space_placebo()` (opt-in; DR never
-  triggers the per-donor refit loop implicitly), and it omits
-  HonestDiD-style `sensitivity` (significance IS the placebo).
+  concentration, the in-space placebo permutation p-value, and the
+  ADH-2015 leave-one-out (`leave_one_out`) and in-time placebo
+  (`in_time_placebo`) blocks under `estimator_native_diagnostics` — each
+  is populated only when the caller has already run the corresponding
+  opt-in method (DR never triggers a refit loop implicitly; otherwise a
+  `status="not_run"` stub), and it omits HonestDiD-style `sensitivity`
+  (significance IS the placebo).
   `EfficientDiDResults` PT runs through `EfficientDiD.hausman_pretest`
   (the estimator's native PT-All vs PT-Post check).
 
diff --git a/tests/data/synth_basque_golden.json b/tests/data/synth_basque_golden.json
index c9694aa1..ef52a8b4 100644
--- a/tests/data/synth_basque_golden.json
+++ b/tests/data/synth_basque_golden.json
@@ -332,5 +332,27 @@
   "years": [1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997],
   "treated_path": [3.853184630005, 3.945658296151, 4.033561734873, 4.023421896897, 4.013781968405, 4.285918396223, 4.574336095797, 4.898957353563, 5.197014981629, 5.338902978753, 5.465153005252, 5.545915627064, 5.614895726639, 5.852184933072, 6.08140541737, 6.17009424135, 6.283633404546, 6.555555398653, 6.810768561103, 7.105184302811, 7.377891682176, 7.232933621923, 7.089831372119, 6.786703607145, 6.639817386857, 6.56283917137, 6.500785454993, 6.545058607, 6.595329801139, 6.761496750091, 6.937160671728, 7.332191151301, 7.742788123594, 8.120536640759, 8.509711162324, 8.776777889074, 9.025278666196, 8.873892824706, 8.718223539089, 9.018137849286, 9.440873861653, 9.686518137675, 10.170665872809],
   "synthetic_path": [3.702941626236, 3.85396941359, 3.996388245904, 4.029401685261, 4.059672772757, 4.378924693221, 4.733052176452, 4.987634927316, 5.222027456975, 5.298503665995, 5.362138883475, 5.448530450656, 5.523111080498, 5.760711870916, 5.993100448641, 6.137916329668, 6.294338685153, 6.620780377198, 6.933000322789, 7.087051801242, 7.228035388368, 7.220668043084, 7.211129382309, 7.07464338848, 7.057308667347, 7.129300695567, 7.234425280207, 7.325333530431, 7.421847797791, 7.516347610321, 7.610133692242, 8.117951258459, 8.623595409787, 9.086804616765, 9.545547512504, 9.788247373834, 10.037692891258, 9.838212341386, 9.639052163101, 9.987880532374, 10.303885350264, 10.538434998253, 10.998744701142],
-  "gap": [0.1502430037689, 0.09168888256074, 0.03717348896852, -0.005979788364376, -0.04589080435155, -0.09300629699853, -0.1587160806545, -0.08867757375288, -0.02501247534587, 0.0403993127577, 0.1030141217766, 0.09738517640774, 0.0917846461413, 0.09147306215556, 0.08830496872896, 0.03217791168141, -0.01070528060644, -0.06522497854548, -0.1222317616855, 0.01813250156919, 0.1498562938071, 0.01226557883842, -0.1212980101894, -0.2879397813353, -0.4174912804894, -0.5664615241969, -0.7336398252139, -0.7802749234314, -0.8265179966512, -0.7548508602291, -0.6729730205141, -0.7857601071586, -0.8808072861931, -0.9662679760057, -1.035836350179, -1.01146948476, -1.012414225062, -0.96431951668, -0.9208286240117, -0.969742683088, -0.863011488611, -0.851916860578, -0.8280788283333]
+  "gap": [0.1502430037689, 0.09168888256074, 0.03717348896852, -0.005979788364376, -0.04589080435155, -0.09300629699853, -0.1587160806545, -0.08867757375288, -0.02501247534587, 0.0403993127577, 0.1030141217766, 0.09738517640774, 0.0917846461413, 0.09147306215556, 0.08830496872896, 0.03217791168141, -0.01070528060644, -0.06522497854548, -0.1222317616855, 0.01813250156919, 0.1498562938071, 0.01226557883842, -0.1212980101894, -0.2879397813353, -0.4174912804894, -0.5664615241969, -0.7336398252139, -0.7802749234314, -0.8265179966512, -0.7548508602291, -0.6729730205141, -0.7857601071586, -0.8808072861931, -0.9662679760057, -1.035836350179, -1.01146948476, -1.012414225062, -0.96431951668, -0.9208286240117, -0.969742683088, -0.863011488611, -0.851916860578, -0.8280788283333],
+  "leave_one_out": {
+    "dropped_regionno": 10,
+    "solution_w": {
+      "2": 9.808425765417e-09,
+      "3": 2.549786470331e-08,
+      "4": 1.480664733111e-06,
+      "5": 1.139816545885e-07,
+      "6": 1.578269127035e-08,
+      "7": 0.7023493162417,
+      "8": 1.197763079811e-08,
+      "9": 1.711227888294e-08,
+      "11": 4.911054589331e-07,
+      "12": 6.004338842587e-09,
+      "13": 1.948545433164e-08,
+      "14": 0.2976484098422,
+      "15": 1.699497289433e-08,
+      "16": 3.294816157091e-08,
+      "18": 3.255185607848e-08
+    },
+    "att": 0.6138519220436,
+    "gap": [0.6880403421591, 0.628878774828, 0.5753339248836, 0.54101400783, 0.5037249840412, 0.546426742406, 0.5606408216439, 0.6633437692491, 0.7549698385142, 0.8268369239666, 0.8905574520592, 0.8951691657033, 0.8929960283248, 0.9175321635515, 0.9274417393066, 0.8572191113726, 0.8064665804174, 0.8252573922895, 0.8361572960057, 0.9576844174138, 1.065360164047, 0.8680617841655, 0.6817614133823, 0.431542742956, 0.3121824139876, 0.2118891193835, 0.1045310836943, 0.1442119442113, 0.172403844142, 0.2354616807978, 0.2925811556038, 0.3381223905461, 0.3936699900311, 0.4330380719963, 0.4842571460249, 0.6248622785575, 0.7242896438824, 0.7039633742619, 0.6787159280904, 0.7734776694026, 0.9728259048368, 1.059949514994, 1.197909760725]
+  }
 }
diff --git a/tests/test_business_report.py b/tests/test_business_report.py
index d5668e95..80b04fdf 100644
--- a/tests/test_business_report.py
+++ b/tests/test_business_report.py
@@ -825,6 +825,19 @@ def test_scm_robustness_block_surfaces_native_fields(self, scm_fit):
         assert "weight_concentration" in native
         assert native["in_space_placebo"]["n_placebos"] == res.n_placebos
 
+    def test_scm_robustness_block_surfaces_adh2015_diagnostics(self, scm_fit):
+        # After running the opt-in ADH-2015 diagnostics, the robustness block must
+        # carry their native sub-blocks (lifted from estimator_native_diagnostics).
+        res, _ = scm_fit
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            res.leave_one_out()
+            res.in_time_placebo()
+            rob = BusinessReport(res, auto_diagnostics=True).to_dict()["robustness"]
+        native = rob["estimator_native"]
+        assert native["leave_one_out"]["status"] == "ran"
+        assert native["in_time_placebo"]["status"] == "ran"
+
     def test_staggered_triple_diff_assumption_uses_ddd_not_generic_pt(self):
         class StaggeredTripleDiffResults:
             pass
diff --git a/tests/test_diagnostic_report.py b/tests/test_diagnostic_report.py
index 66e594a5..a0a78fcd 100644
--- a/tests/test_diagnostic_report.py
+++ b/tests/test_diagnostic_report.py
@@ -2077,8 +2077,10 @@ def test_scm_native_section_populated(self, scm_fit):
         assert "herfindahl" in native["weight_concentration"]
         # Placebo is opt-in: NOT auto-run inside the report.
         assert native["in_space_placebo"]["status"] == "not_run"
-        # In-time placebo / leave-one-out are ADH 2015 (not implemented here).
-        assert "in_time_placebo" not in native and "leave_one_out" not in native
+        # The ADH-2015 diagnostics are also opt-in: surfaced as "not_run" stubs until
+        # the user calls leave_one_out() / in_time_placebo().
+        assert native["leave_one_out"]["status"] == "not_run"
+        assert native["in_time_placebo"]["status"] == "not_run"
 
     def test_scm_native_surfaces_placebo_after_optin_run(self, scm_fit):
         res, _ = scm_fit
@@ -2090,6 +2092,25 @@ def test_scm_native_surfaces_placebo_after_optin_run(self, scm_fit):
         assert block["n_placebos"] == res.n_placebos
         assert block["placebo_p_value"] == pytest.approx(res.placebo_p_value)
 
+    def test_scm_native_surfaces_leave_one_out_after_optin_run(self, scm_fit):
+        res, _ = scm_fit
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            res.leave_one_out()
+            native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+        block = native["leave_one_out"]
+        assert block["status"] == "ran"
+        assert block["att_range"] is not None and len(block["att_range"]) == 2
+
+    def test_scm_native_surfaces_in_time_placebo_after_optin_run(self, scm_fit):
+        res, _ = scm_fit
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore")
+            res.in_time_placebo()
+            native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+        block = native["in_time_placebo"]
+        assert block["status"] == "ran" and block["n_dates"] >= 1
+
     def test_scm_does_not_call_honest_did(self, scm_fit):
         """HonestDiD sensitivity should NOT run on SCM (fit-based / native path)."""
         res, _ = scm_fit
@@ -2108,18 +2129,22 @@ def test_scm_significance_not_marked_done_until_placebo_run(self, scm_fit):
         res, _ = scm_fit  # the fixture does NOT run the placebo
         schema = DiagnosticReport(res).to_dict()
         labels = " ".join(s.get("label", "") for s in schema.get("next_steps", [])).lower()
-        assert "placebo" in labels  # the significance recommendation still surfaces
+        # Target the IN-SPACE step specifically (the in-time-placebo step's label also
+        # contains "placebo" but is a different, always-on ADH-2015 recommendation).
+        assert "in-space placebo" in labels  # the significance recommendation surfaces
 
     def test_scm_placebo_step_completes_after_run(self, scm_fit):
-        """Once the opt-in placebo has been run, DR stops recommending it."""
+        """Once the opt-in in-space placebo has been run, DR stops recommending it."""
         res, _ = scm_fit
         before = DiagnosticReport(res).to_dict()["next_steps"]
-        assert any("placebo" in s.get("label", "").lower() for s in before)
+        assert any("in-space placebo" in s.get("label", "").lower() for s in before)
         with warnings.catch_warnings():
             warnings.simplefilter("ignore")
             res.in_space_placebo()  # opt-in significance procedure now done
         after = DiagnosticReport(res).to_dict()["next_steps"]
-        assert not any("placebo" in s.get("label", "").lower() for s in after)
+        # The in-space step is suppressed; the (differently-tagged) in-time-placebo
+        # and leave-one-out recommendations are unaffected.
+        assert not any("in-space placebo" in s.get("label", "").lower() for s in after)
 
     def test_scm_rejects_precomputed_parallel_trends_and_sensitivity(self, scm_fit):
         # Like SDiD/TROP, SCM computes its PT verdict internally (the scm_fit
diff --git a/tests/test_methodology_synthetic_control.py b/tests/test_methodology_synthetic_control.py
index ae563d01..ee85ea78 100644
--- a/tests/test_methodology_synthetic_control.py
+++ b/tests/test_methodology_synthetic_control.py
@@ -32,7 +32,12 @@
 import pandas as pd
 import pytest
 
-from diff_diff import SyntheticControl, SyntheticControlResults, synthetic_control
+from diff_diff import (
+    DiagnosticReport,
+    SyntheticControl,
+    SyntheticControlResults,
+    synthetic_control,
+)
 from tests.conftest import assert_nan_inference
 
 DATA_DIR = Path(__file__).parent / "data"
@@ -845,6 +850,35 @@ def test_basque_tier2_nested_band():
     assert res.donor_weights.get(10, 0) + res.donor_weights.get(14, 0) > 0.7
 
 
+def test_basque_tier1_leave_one_out_parity():
+    """Tier-1 LOO (deterministic): dropping the dominant donor (region 10) with R's
+    ``solution.v`` held fixed, the reduced-pool refit's ATT and gap path match R's
+    drop-donor ``synth`` exactly (a direct R anchor on the reduced-pool W-solve;
+    ``leave_one_out()`` on a custom-V fit reuses that fixed V on the donor pool minus
+    the dropped unit). Region 10 carries ~85% of the full-pool weight, so dropping it
+    swings the synthetic onto regions 7+14 — the single-donor-dependence signal LOO
+    exists to surface."""
+    golden, df = _load_golden()
+    if "leave_one_out" not in golden:
+        pytest.skip("LOO golden missing — regenerate via the R script.")
+    loo_g = golden["leave_one_out"]
+    dropped = int(loo_g["dropped_regionno"])
+    custom_v = np.asarray(golden["solution_v"], dtype=float)
+    res = SyntheticControl(v_method="custom", custom_v=custom_v).fit(
+        df, "gdpcap", "treated", "regionno", "year", **_basque_kwargs(golden)
+    )
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        loo = res.leave_one_out()
+    row = loo[(loo["status"] == "loo") & (loo["dropped_unit"] == dropped)]
+    assert len(row) == 1
+    assert float(row["att"].iloc[0]) == pytest.approx(float(loo_g["att"]), abs=1e-2)
+    # Full reduced-pool gap trajectory (1955-1997) matches R's drop-donor synth.
+    gaps = res.get_leave_one_out_gaps()
+    gap_py = gaps[gaps["dropped_unit"] == dropped].sort_values("period")["gap"].to_numpy()
+    np.testing.assert_allclose(gap_py, np.asarray(loo_g["gap"], dtype=float), atol=2e-2)
+
+
 # ---------------------------------------------------------------------------
 # In-space placebo permutation inference (Abadie-Diamond-Hainmueller 2010 §2.4)
 # ---------------------------------------------------------------------------
@@ -1328,4 +1362,845 @@ def test_rmspe_ratio_is_root_scale():
     assert _rmspe_ratio(pre, post, scale=10.0) == pytest.approx(1.5)
     # Zero post-effect -> ratio 0; perfect pre-fit -> finite (floored), not inf.
     assert _rmspe_ratio(pre, np.zeros(2), scale=10.0) == pytest.approx(0.0)
+    # Perfect pre-fit (zero pre-gaps) -> floored denominator -> finite, not inf.
     assert np.isfinite(_rmspe_ratio(np.zeros(2), post, scale=10.0))
+
+
+# ---------------------------------------------------------------------------
+# Leave-one-out donor robustness (ADH 2015 §4)
+# ---------------------------------------------------------------------------
+
+
+def _equal_mix_panel(n_donors=5, T=8, T0=6, effect=3.0, seed=1):
+    """Near-identical donors -> equal-ish weights -> dropping any one barely moves
+    the synthetic (the LOO-stable regime)."""
+    rng = np.random.default_rng(seed)
+    years = list(range(2000, 2000 + T))
+    base = rng.normal(10, 0.4, n_donors)
+    common = np.cumsum(rng.normal(0, 0.2, T))  # shared trend
+    donors = {j: base[j] + common + rng.normal(0, 0.08, T) for j in range(n_donors)}
+    treated = np.mean([donors[j] for j in range(n_donors)], axis=0) + rng.normal(0, 0.04, T)
+    treated = treated.copy()
+    treated[T0:] += effect
+    rows = []
+    for j in range(n_donors):
+        for t in range(T):
+            rows.append({"unit": f"d{j}", "year": years[t], "y": donors[j][t], "treated": 0})
+    for t in range(T):
+        rows.append({"unit": "treated", "year": years[t], "y": treated[t], "treated": int(t >= T0)})
+    return pd.DataFrame(rows)
+
+
+def _single_donor_panel(n_donors=4, T=8, T0=6, effect=3.0, seed=2):
+    """One donor (d0) tracks the treated unit; the rest are far away -> weight
+    concentrates on d0 -> dropping d0 swings the result (the LOO-fragile regime)."""
+    rng = np.random.default_rng(seed)
+    years = list(range(2000, 2000 + T))
+    d0_path = 10 + np.cumsum(rng.normal(0, 0.3, T))
+    donors = {0: d0_path + rng.normal(0, 0.03, T)}
+    for j in range(1, n_donors):
+        donors[j] = (25.0 + 6.0 * j) + np.cumsum(rng.normal(0, 0.3, T))  # far from treated
+    treated = d0_path + rng.normal(0, 0.03, T)
+    treated = treated.copy()
+    treated[T0:] += effect
+    rows = []
+    for j in range(n_donors):
+        for t in range(T):
+            rows.append({"unit": f"d{j}", "year": years[t], "y": donors[j][t], "treated": 0})
+    for t in range(T):
+        rows.append({"unit": "treated", "year": years[t], "y": treated[t], "treated": int(t >= T0)})
+    return pd.DataFrame(rows)
+
+
+def _fit_cheap(df):
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        return synthetic_control(df, "y", "treated", "unit", "year", seed=0, **_FAST)
+
+
+_LOO_COLS = ["dropped_unit", "att", "pre_rmspe", "post_rmspe", "rmspe_ratio", "delta_att", "status"]
+
+
+def test_leave_one_out_baseline_row_and_structure():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        loo = res.leave_one_out()
+    assert list(loo.columns) == _LOO_COLS
+    # Exactly one baseline row, first, reading directly from the full fit.
+    base = loo.iloc[0]
+    # dropped_unit is "not applicable" for the baseline row (pandas renders the
+    # None as NA in the donor-id column).
+    assert base["status"] == "baseline" and pd.isna(base["dropped_unit"])
+    assert base["att"] == pytest.approx(res.att) and base["delta_att"] == 0.0
+    assert base["pre_rmspe"] == pytest.approx(res.pre_rmspe)
+    assert base["rmspe_ratio"] == pytest.approx(res.rmspe_ratio)
+    # One LOO row per positively-weighted donor (no failures on this clean panel).
+    pos = [d for d in res._fit_snapshot.donor_ids if d in res.donor_weights]
+    loo_rows = loo[loo["status"] == "loo"]
+    assert set(loo_rows["dropped_unit"]) == set(pos)
+    assert res._loo_n_failed == 0 and res._loo_status == "ran"
+    # delta_att == att - full att, exactly.
+    for _, r in loo_rows.iterrows():
+        assert r["delta_att"] == pytest.approx(r["att"] - res.att)
+    # Sorted by |delta_att| descending.
+    deltas = loo_rows["delta_att"].abs().to_numpy()
+    assert np.all(np.diff(deltas) <= 1e-12)
+    # att_range spans the LOO refits.
+    lo, hi = res._loo_att_range
+    assert lo <= hi and lo == pytest.approx(loo_rows["att"].min())
+    assert hi == pytest.approx(loo_rows["att"].max())
+
+
+def test_leave_one_out_stable_when_no_donor_dominates():
+    res = _fit_cheap(_equal_mix_panel(n_donors=5, effect=3.0))
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        loo = res.leave_one_out()
+    loo_rows = loo[loo["status"] == "loo"]
+    # Near-identical donors -> dropping any one barely moves the ATT (well under the
+    # 3.0 effect). att_range is correspondingly tight.
+    assert loo_rows["delta_att"].abs().max() < 1.0
+    lo, hi = res._loo_att_range
+    assert (hi - lo) < 1.0
+
+
+def test_leave_one_out_swings_when_one_donor_dominates():
+    res = _fit_cheap(_single_donor_panel(n_donors=4, effect=3.0))
+    # Weight concentrates on d0.
+    assert res.donor_weights.get("d0", 0.0) > 0.5
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        loo = res.leave_one_out()
+    loo_rows = loo[loo["status"] == "loo"]
+    # Dropping the dominant donor is the most influential drop (top finite row) and
+    # moves the ATT by a non-trivial amount.
+    top = loo_rows.iloc[0]
+    assert top["dropped_unit"] == "d0"
+    assert abs(top["delta_att"]) > 0.2
+
+
+def test_leave_one_out_deterministic():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        loo1 = res.leave_one_out()
+        loo2 = res.leave_one_out()
+    pd.testing.assert_frame_equal(loo1, loo2)
+
+
+def test_leave_one_out_requires_two_donors():
+    res = _fit_for_placebo(n_donors=1)
+    with pytest.warns(UserWarning, match="at least 2 donors"):
+        loo = res.leave_one_out()
+    assert len(loo) == 1 and loo.iloc[0]["status"] == "baseline"
+    assert res._loo_status == "too_few_donors" and res._loo_att_range is None
+
+
+def test_leave_one_out_fails_closed_on_nonconverged_treated_fit():
+    df, _, _ = _make_panel(n_donors=4, effect=3.0)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df, "y", "treated", "unit", "year", seed=0, inner_max_iter=1, **_FAST_CHURN
+        )
+    assert res._fit_converged is False
+    with pytest.warns(UserWarning, match="did not converge at fit time"):
+        loo = res.leave_one_out()
+    assert len(loo) == 1 and loo.iloc[0]["status"] == "baseline"
+    assert res._loo_status == "treated_fit_nonconverged"
+
+
+def test_leave_one_out_refit_failure_tallied(monkeypatch):
+    import importlib
+
+    sc = importlib.import_module("diff_diff.synthetic_control")
+    res = _fit_for_placebo(n_donors=4)
+    real_fit_unit = sc._placebo_fit_unit
+    calls = {"n": 0}
+
+    def flaky_fit_unit(snap, unit, donor_pool, n_starts):
+        calls["n"] += 1
+        if calls["n"] == 1:  # first leave-one-out refit "fails"
+            return None
+        return real_fit_unit(snap, unit, donor_pool, n_starts)
+
+    monkeypatch.setattr(sc, "_placebo_fit_unit", flaky_fit_unit)
+    with pytest.warns(UserWarning, match="failed to converge"):
+        loo = res.leave_one_out()
+    assert res._loo_n_failed == 1
+    failed = loo[loo["status"] == "failed"]
+    assert len(failed) == 1
+    assert failed[["att", "pre_rmspe", "rmspe_ratio", "delta_att"]].isna().all().all()
+    # Failed rows sort last (after the baseline + the converged LOO rows).
+    assert loo.iloc[-1]["status"] == "failed"
+
+
+def test_leave_one_out_pickle_drops_gaps_keeps_table():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.leave_one_out()
+    restored = pickle.loads(pickle.dumps(res))
+    # The summary table + scalars survive; panel-derived gap paths do not.
+    pd.testing.assert_frame_equal(restored.get_leave_one_out_df(), res.get_leave_one_out_df())
+    assert restored._loo_gaps is None
+    assert restored._loo_att_range == res._loo_att_range
+    with pytest.raises(ValueError, match="not retained after pickling"):
+        restored.get_leave_one_out_gaps()
+
+
+def test_leave_one_out_gaps_long_form():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.leave_one_out()
+    gaps = res.get_leave_one_out_gaps()
+    assert list(gaps.columns) == ["dropped_unit", "period", "gap", "phase"]
+    pos = [d for d in res._fit_snapshot.donor_ids if d in res.donor_weights]
+    assert set(gaps["dropped_unit"]) == set(pos)
+    # Every dropped donor has a full pre+post trajectory.
+    n_periods = len(res.pre_periods) + len(res.post_periods)
+    assert (gaps.groupby("dropped_unit").size() == n_periods).all()
+    assert set(gaps["phase"]) == {"pre", "post"}
+
+
+def test_leave_one_out_accessor_before_run_raises():
+    res = _fit_for_placebo(n_donors=4)
+    with pytest.raises(ValueError, match="call leave_one_out"):
+        res.get_leave_one_out_df()
+    with pytest.raises(ValueError, match="call leave_one_out"):
+        res.get_leave_one_out_gaps()
+
+
+def test_leave_one_out_does_not_touch_analytical_inference():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.leave_one_out()
+    assert_nan_inference(
+        {"se": res.se, "t_stat": res.t_stat, "p_value": res.p_value, "conf_int": res.conf_int}
+    )
+    assert res.is_significant is False
+
+
+def test_leave_one_out_requires_snapshot():
+    res = _fit_for_placebo(n_donors=4)
+    restored = pickle.loads(pickle.dumps(res))
+    with pytest.raises(ValueError, match="requires the fit snapshot"):
+        restored.leave_one_out()
+
+
+# ---------------------------------------------------------------------------
+# In-time placebo: snapshot-truncation helper (ADH 2015 §4)
+# ---------------------------------------------------------------------------
+
+
+def _snap_for_in_time(**kw):
+    return _fit_for_placebo(n_donors=4, **kw)._fit_snapshot
+
+
+def test_truncate_snapshot_positional_split():
+    from diff_diff.synthetic_control import _truncate_snapshot_in_time
+
+    snap = _snap_for_in_time()
+    assert list(snap.pre_periods) == [2000, 2001, 2002, 2003, 2004, 2005]
+    mod, _ = _truncate_snapshot_in_time(snap, 2003)
+    assert mod is not None
+    assert mod.pre_periods == [2000, 2001, 2002]  # pre-fake = strictly before t_f
+    assert mod.post_periods == [2003, 2004, 2005]  # post-fake = held-out pre, t_f first
+    # all_periods EXCLUDES the true post periods (2006, 2007) -> airtight no-peeking.
+    assert mod.all_periods == [2000, 2001, 2002, 2003, 2004, 2005]
+    assert 2006 not in mod.all_periods and 2007 not in mod.all_periods
+
+
+def test_truncate_snapshot_drops_specs_in_held_out_window():
+    from diff_diff.synthetic_control import _truncate_snapshot_in_time
+
+    snap = _snap_for_in_time()  # default pre_period_outcomes="all": one lag per pre period
+    mod, dropped = _truncate_snapshot_in_time(snap, 2003)
+    for spec in mod.specs:  # surviving specs reference only pre-fake periods
+        assert all(p < 2003 for p in spec.periods)
+    assert len(dropped) == 3  # lags at 2003/2004/2005 dropped
+    assert len(mod.specs) == len(snap.specs) - 3
+
+
+def test_truncate_snapshot_custom_v_lockstep():
+    from diff_diff.synthetic_control import _truncate_snapshot_in_time
+
+    df, _, _ = _make_panel(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=np.arange(1.0, 7.0),  # distinct entries to verify the subset
+            inner_min_decrease=1e-3,
+        )
+    snap = res._fit_snapshot
+    mod, _ = _truncate_snapshot_in_time(snap, 2003)
+    # custom_v subset IN LOCKSTEP with the surviving specs (the default lag specs are
+    # ordered by ascending pre period, so the first three entries survive).
+    assert mod.custom_v is not None and len(mod.custom_v) == len(mod.specs)
+    np.testing.assert_array_equal(mod.custom_v, np.array([1.0, 2.0, 3.0]))
+
+
+def test_truncate_snapshot_straddling_window_partial_keep():
+    from diff_diff.synthetic_control import _truncate_snapshot_in_time
+
+    df, _, _ = _make_panel(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            special_predictors=[("y", [2002, 2003, 2004], "mean")],
+            pre_period_outcomes=[2000, 2001],
+            inner_min_decrease=1e-3,
+        )
+    snap = res._fit_snapshot
+    mod, _ = _truncate_snapshot_in_time(snap, 2003)
+    # The special predictor straddles t_f -> truncated to its pre-fake part [2002].
+    special = [s for s in mod.specs if s.kind == "special"]
+    assert len(special) == 1 and special[0].periods == [2002]
+
+
+def test_truncate_snapshot_infeasible_too_few_pre_fake():
+    from diff_diff.synthetic_control import _truncate_snapshot_in_time
+
+    snap = _snap_for_in_time()
+    # Fewer than 2 pre-fake periods -> infeasible (the deliberate >=2 rule; an
+    # auto-swept single-pre-fake placebo is a non-credible pre-fit — documented Note).
+    assert _truncate_snapshot_in_time(snap, 2000)[0] is None  # 0 pre-fake
+    assert _truncate_snapshot_in_time(snap, 2001)[0] is None  # 1 pre-fake
+
+
+def test_truncate_snapshot_infeasible_all_specs_dropped():
+    from diff_diff.synthetic_control import _truncate_snapshot_in_time
+
+    df, _, _ = _make_panel(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            special_predictors=[("y", [2004, 2005], "mean")],
+            pre_period_outcomes=[2004, 2005],
+            inner_min_decrease=1e-3,
+        )
+    snap = res._fit_snapshot
+    # t_f=2003 leaves >=2 pre-fake periods, but every spec lives in [2004, 2005]
+    # -> all dropped -> infeasible (cannot fit with zero predictors).
+    mod, dropped = _truncate_snapshot_in_time(snap, 2003)
+    assert mod is None and len(dropped) == len(snap.specs)
+
+
+def test_truncate_snapshot_does_not_mutate_original():
+    from diff_diff.synthetic_control import _truncate_snapshot_in_time
+
+    snap = _snap_for_in_time()
+    before = [list(s.periods) for s in snap.specs]
+    _truncate_snapshot_in_time(snap, 2003)
+    after = [list(s.periods) for s in snap.specs]
+    assert before == after  # shared spec objects are never mutated in place
+
+
+# ---------------------------------------------------------------------------
+# In-time placebo: end-to-end (ADH 2015 §4)
+# ---------------------------------------------------------------------------
+
+_IN_TIME_COLS = [
+    "placebo_period",
+    "placebo_att",
+    "pre_fit_rmspe",
+    "rmspe_ratio",
+    "n_pre_fake",
+    "n_post_fake",
+    "n_dropped_specs",
+    "status",
+]
+
+
+def test_in_time_placebo_near_zero_when_effect_post_only():
+    # The effect is only in the TRUE post window (>=2006); every backdated placebo
+    # falls in the clean pre window, so the placebo "effect" should be ~0.
+    res = _fit_for_placebo(n_donors=4, effect=3.0)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        itp = res.in_time_placebo()
+    assert list(itp.columns) == _IN_TIME_COLS
+    ran = itp[itp["status"] == "ran"]
+    assert len(ran) > 0
+    assert ran["placebo_att"].abs().max() < 1.0  # well below the 3.0 true effect
+
+
+def test_in_time_placebo_sweep_feasibility():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        itp = res.in_time_placebo()
+    # pre = [2000..2005] -> feasible dates = pre[2:] = [2002, 2003, 2004, 2005]
+    # (>=2 pre-fake periods — the deliberate Note-documented restriction).
+    assert list(itp["placebo_period"]) == [2002, 2003, 2004, 2005]
+    assert (itp["status"] == "ran").all()
+    # n_pre_fake + n_post_fake == n_pre for every row, with >=2 pre-fake + >=1 post-fake.
+    assert ((itp["n_pre_fake"] + itp["n_post_fake"]) == len(res.pre_periods)).all()
+    assert (itp["n_pre_fake"] >= 2).all() and (itp["n_post_fake"] >= 1).all()
+
+
+def test_in_time_placebo_explicit_post_date_raises():
+    res = _fit_for_placebo(n_donors=4)
+    with pytest.raises(ValueError, match="true post-treatment period"):
+        res.in_time_placebo([2006])
+
+
+def test_in_time_placebo_date_not_in_pre_raises():
+    res = _fit_for_placebo(n_donors=4)
+    with pytest.raises(ValueError, match="not a pre-treatment period"):
+        res.in_time_placebo([1999])
+
+
+def test_in_time_placebo_empty_explicit_input_raises():
+    # An explicit but EMPTY container is malformed (NOT "every date infeasible") -> raise
+    # (codex R6 P1). None still means "sweep all feasible dates".
+    res = _fit_for_placebo(n_donors=4)
+    for empty in ([], (), pd.Index([]), np.array([])):
+        with pytest.raises(ValueError, match="placebo_periods is empty"):
+            res.in_time_placebo(empty)
+    # The malformed call must not leave any in-time state behind.
+    assert res._in_time_df is None and res._in_time_status is None
+
+
+def test_in_time_placebo_dedups_and_canonicalizes_explicit_dates():
+    # Duplicate / unordered explicit dates -> de-duplicated + pre-period-ordered, so no
+    # duplicate refits and n_dates is not inflated (codex R7 P3).
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        itp = res.in_time_placebo([2004, 2002, 2004])  # duplicate 2004, unordered
+    assert list(itp["placebo_period"]) == [2002, 2004]  # unique, canonical pre-period order
+
+
+def test_in_time_placebo_ran_block_reports_partial_coverage():
+    # CI codex P2: a sweep where SOME dates ran and SOME were infeasible must surface
+    # n_ran / n_infeasible on the status="ran" block so coverage is not overstated.
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.in_time_placebo([2001, 2003])  # 2001 infeasible (1 pre-fake), 2003 runs
+    assert res._in_time_status == "ran"  # at least one date ran
+    block = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]["in_time_placebo"]
+    assert block["status"] == "ran"
+    assert block["n_dates"] == 2 and block["n_ran"] == 1
+    assert block["n_infeasible"] == 1 and block["n_failed"] == 0
+
+
+def test_leave_one_out_immune_to_donor_weights_mutation():
+    # Codex R8 P1: the LOO drop-set is FROZEN at fit time (snap.weighted_donor_ids =
+    # the >1e-6 reportable support), NOT read from the mutable presentation-level
+    # donor_weights dict. So mutating donor_weights after the fit must NOT change which
+    # donors are dropped — the robustness result depends only on the fit.
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        before = set(res.leave_one_out()[lambda d: d["status"] != "baseline"]["dropped_unit"])
+    assert before == set(res._fit_snapshot.weighted_donor_ids)  # drops the frozen support
+    # Mutate the public dict: drop a real donor, inject a bogus one.
+    victim = next(iter(res.donor_weights))
+    res.donor_weights = {k: v for k, v in res.donor_weights.items() if k != victim}
+    res.donor_weights["bogus_donor"] = 0.99
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        after = set(res.leave_one_out()[lambda d: d["status"] != "baseline"]["dropped_unit"])
+    assert after == before  # unchanged by the mutation
+    assert "bogus_donor" not in after  # a donor not in the fit is never dropped
+    assert victim in after  # still dropped despite removal from donor_weights
+
+
+def test_in_time_placebo_early_date_infeasible_no_raise():
+    res = _fit_for_placebo(n_donors=4)
+    # A valid pre-date with too few (<2) pre-fake periods -> NaN infeasible row +
+    # warning, NOT a raise.
+    with pytest.warns(UserWarning, match="infeasible"):
+        itp = res.in_time_placebo([2001])  # 1 pre-fake period
+    assert len(itp) == 1 and itp.iloc[0]["status"] == "infeasible"
+    assert np.isnan(itp.iloc[0]["placebo_att"])
+
+
+def test_in_time_placebo_custom_v_zero_mass_is_infeasible_not_failed():
+    # A custom_v whose mass lies entirely on specs that TRUNCATE drops leaves a
+    # zero-mass surviving V -> the date is INFEASIBLE under the supplied custom_v,
+    # NOT a convergence failure (codex R2 P1b: v/v.sum() would be 0/0).
+    df, _, _ = _make_panel(n_donors=4)  # default: 6 lag specs (2000..2005)
+    v = np.array([0.0, 0.0, 0.0, 1.0, 1.0, 1.0])  # all mass on the 2003/2004/2005 lags
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=v,
+            inner_min_decrease=1e-3,
+        )
+        itp = res.in_time_placebo([2003])  # keeps lags 2000/2001/2002 -> all zero weight
+    row = itp[itp["placebo_period"] == 2003]
+    assert len(row) == 1 and row.iloc[0]["status"] == "infeasible"  # NOT "failed"
+    assert res._in_time_status == "all_dates_infeasible"
+
+
+def test_leave_one_out_uniform_shift_surfaced_by_delta_not_range(monkeypatch):
+    # Codex R3 P1b: when every donor-drop shifts the ATT the SAME way, the raw
+    # att_range has ~zero width (looks stable) but the donor dependence is large.
+    # The headline metric must be baseline-relative (max |delta_att|), not the range.
+    import importlib
+
+    sc = importlib.import_module("diff_diff.synthetic_control")
+    res = _fit_for_placebo(n_donors=4)
+    baseline = float(res.att)
+    snap = res._fit_snapshot
+    shift = 5.0  # same large shift for EVERY drop -> uniform
+
+    def uniform_shift(snap_arg, unit, pool, n_starts):
+        gp = {p: 0.0 for p in snap.pre_periods}
+        gp.update({p: baseline + shift for p in snap.post_periods})
+        return gp, 1.0
+
+    monkeypatch.setattr(sc, "_placebo_fit_unit", uniform_shift)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.leave_one_out()
+    lo, hi = res._loo_att_range
+    assert (hi - lo) == pytest.approx(0.0, abs=1e-9)  # raw range would hide the shift
+    assert res._loo_max_abs_delta_att == pytest.approx(shift, abs=1e-9)  # delta reveals it
+    native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+    assert native["leave_one_out"]["max_abs_delta_att"] == pytest.approx(shift, abs=1e-9)
+
+
+def test_in_time_placebo_windowed_covariate_dropped_and_warns():
+    # A special predictor measured over [2004, 2005] falls entirely in the held-out
+    # window for t_f=2003 -> dropped (TRUNCATE) + warning + n_dropped_specs reflects it.
+    df, _, _ = _make_panel(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            special_predictors=[("y", [2004, 2005], "mean")],
+            pre_period_outcomes=[2000, 2001, 2002, 2003],
+            inner_min_decrease=1e-3,
+        )
+    with pytest.warns(UserWarning, match="dropped"):
+        itp = res.in_time_placebo([2003])
+    row = itp.iloc[0]
+    # The special predictor (and the lag at 2003) lie in [2003, 2005] -> dropped.
+    assert row["n_dropped_specs"] >= 1 and row["status"] == "ran"
+
+
+def test_in_time_placebo_all_specs_dropped_infeasible():
+    df, _, _ = _make_panel(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            special_predictors=[("y", [2004, 2005], "mean")],
+            pre_period_outcomes=[2004, 2005],
+            inner_min_decrease=1e-3,
+        )
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        itp = res.in_time_placebo([2003])  # every predictor is at 2004/2005
+    assert itp.iloc[0]["status"] == "infeasible"
+
+
+def test_in_time_placebo_custom_v_runs_without_shape_error():
+    # End-to-end guard for the custom_v lockstep subset: without it the custom path
+    # would raise a shape mismatch once specs are dropped.
+    df, _, _ = _make_panel(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=np.ones(6),
+            inner_min_decrease=1e-3,
+        )
+        itp = res.in_time_placebo()
+    assert (itp["status"] == "ran").any()
+
+
+def test_in_time_placebo_accepts_2d_custom_v():
+    # fit() accepts an array-like custom_v (e.g. a (1, k) row vector, raveled during
+    # validation); the in-time TRUNCATE subset must ravel before indexing or a 2D
+    # custom_v raises IndexError (codex R5 P1). Must match the 1D result exactly.
+    df, _, _ = _make_panel(n_donors=4)
+    v1d = np.arange(1.0, 7.0)
+    v2d = v1d.reshape(1, 6)  # row-vector form accepted at fit time
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res1 = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=v1d,
+            inner_min_decrease=1e-3,
+        )
+        res2 = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=v2d,
+            inner_min_decrease=1e-3,
+        )
+        itp1 = res1.in_time_placebo([2003])
+        itp2 = res2.in_time_placebo([2003])  # would IndexError before the ravel fix
+    pd.testing.assert_frame_equal(itp1, itp2)
+
+
+def test_in_time_placebo_deterministic():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        itp1 = res.in_time_placebo()
+        itp2 = res.in_time_placebo()
+    pd.testing.assert_frame_equal(itp1, itp2)
+
+
+def test_in_time_placebo_fails_closed_on_nonconverged_treated_fit():
+    df, _, _ = _make_panel(n_donors=4, effect=3.0)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df, "y", "treated", "unit", "year", seed=0, inner_max_iter=1, **_FAST_CHURN
+        )
+    assert res._fit_converged is False
+    with pytest.warns(UserWarning, match="did not converge"):
+        itp = res.in_time_placebo()
+    assert len(itp) == 0 and res._in_time_status == "treated_fit_nonconverged"
+
+
+def test_in_time_placebo_pickle_drops_gaps_keeps_table():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.in_time_placebo()
+    restored = pickle.loads(pickle.dumps(res))
+    pd.testing.assert_frame_equal(restored.get_in_time_placebo_df(), res.get_in_time_placebo_df())
+    assert restored._in_time_gaps is None
+    with pytest.raises(ValueError, match="not retained after pickling"):
+        restored.get_in_time_placebo_gaps()
+    with pytest.raises(ValueError, match="requires the fit snapshot"):
+        restored.in_time_placebo()
+
+
+def test_in_time_placebo_gaps_long_form():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.in_time_placebo([2003])
+    gaps = res.get_in_time_placebo_gaps()
+    assert list(gaps.columns) == ["placebo_period", "period", "gap", "phase"]
+    assert set(gaps["phase"]) == {"pre_fake", "post_fake"}
+    # Periods before t_f=2003 are pre_fake; 2003+ are post_fake.
+    assert set(gaps.loc[gaps["phase"] == "pre_fake", "period"]) == {2000, 2001, 2002}
+    assert set(gaps.loc[gaps["phase"] == "post_fake", "period"]) == {2003, 2004, 2005}
+
+
+def test_in_time_placebo_accessor_before_run_raises():
+    res = _fit_for_placebo(n_donors=4)
+    with pytest.raises(ValueError, match="call in_time_placebo"):
+        res.get_in_time_placebo_df()
+    with pytest.raises(ValueError, match="call in_time_placebo"):
+        res.get_in_time_placebo_gaps()
+
+
+def test_in_time_placebo_does_not_touch_analytical_inference():
+    res = _fit_for_placebo(n_donors=4)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res.in_time_placebo()
+    assert_nan_inference(
+        {"se": res.se, "t_stat": res.t_stat, "p_value": res.p_value, "conf_int": res.conf_int}
+    )
+    assert res.is_significant is False
+
+
+# ---------------------------------------------------------------------------
+# Self-consistency parity: the ADH-2015 diagnostics are EXACT re-runs of the
+# validated solver on the equivalent sub-problem.
+#
+# R `Synth` has NO in-time-placebo or leave-one-out function (verified against its
+# full CRAN function index), so there is no canonical R *output* to match for these
+# diagnostics specifically. Instead we prove (deterministically, via a fixed custom
+# V) that leave_one_out() equals a from-scratch fit on the reduced donor pool, and
+# in_time_placebo() equals a from-scratch fit on the backdated/truncated panel.
+# Because the custom-V solver is itself R-anchored on Basque
+# (test_basque_tier1_custom_v_parity), this transitively anchors the diagnostics to
+# R while directly validating that the re-run mechanism is exact (not approximate).
+# ---------------------------------------------------------------------------
+
+
+def test_leave_one_out_matches_fresh_reduced_pool_fit():
+    df, _, _ = _make_panel(n_donors=4)
+    v = np.arange(1.0, 7.0)  # k = 6 default lag predictors; fixed V -> deterministic
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=v,
+            inner_min_decrease=1e-3,
+        )
+        loo = res.leave_one_out()
+    donor_ids = list(res._fit_snapshot.donor_ids)
+    d = [x for x in donor_ids if x in res.donor_weights][0]  # a positively-weighted donor
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        fresh = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=v,
+            inner_min_decrease=1e-3,
+            donor_pool=[x for x in donor_ids if x != d],
+        )
+    loo_att = loo.loc[loo["dropped_unit"] == d, "att"].iloc[0]
+    assert loo_att == pytest.approx(fresh.att, abs=1e-7)
+
+
+def test_in_time_placebo_matches_fresh_backdated_fit():
+    df, _, _ = _make_panel(n_donors=4)  # years 2000-2007, T0=6 -> pre = 2000..2005
+    v = np.arange(1.0, 7.0)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        res = synthetic_control(
+            df,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=v,
+            inner_min_decrease=1e-3,
+        )
+        itp = res.in_time_placebo([2003])
+    placebo_att = itp.loc[itp["placebo_period"] == 2003, "placebo_att"].iloc[0]
+    # Fresh backdated fit: drop the true post periods, treat 2003 as the intervention,
+    # feed the pre-fake-subset V (lags at 2000/2001/2002 -> v[:3]).
+    back = df[df["year"] <= 2005].copy()
+    back["treated"] = ((back["unit"] == "treated") & (back["year"] >= 2003)).astype(int)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        fresh = synthetic_control(
+            back,
+            "y",
+            "treated",
+            "unit",
+            "year",
+            v_method="custom",
+            custom_v=v[:3],
+            inner_min_decrease=1e-3,
+        )
+    assert placebo_att == pytest.approx(fresh.att, abs=1e-7)
+
+
+# ---------------------------------------------------------------------------
+# All-refits-failed branches (codex R1 P1): when EVERY refit fails to converge,
+# the status must NOT be reported as "ran" / mislabeled as dimensional infeasibility.
+# ---------------------------------------------------------------------------
+
+
+def test_leave_one_out_all_refits_failed_status(monkeypatch):
+    import importlib
+
+    sc = importlib.import_module("diff_diff.synthetic_control")
+    res = _fit_for_placebo(n_donors=4)
+    monkeypatch.setattr(sc, "_placebo_fit_unit", lambda *a, **k: None)  # every drop fails
+    with pytest.warns(UserWarning, match="failed to converge"):
+        loo = res.leave_one_out()
+    # Distinct status (NOT "ran"); att_range is None; baseline + only failed rows.
+    assert res._loo_status == "all_refits_failed"
+    assert res._loo_att_range is None
+    assert (loo["status"] != "loo").all()  # no successful drop
+    assert (loo.iloc[1:]["status"] == "failed").all()
+    # DiagnosticReport must surface it as NOT "ran", with the convergence reason.
+    native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+    assert native["leave_one_out"]["status"] != "ran"
+    # Machine-readable code distinguishes numerical failure from structural infeasibility.
+    assert native["leave_one_out"]["reason_code"] == "all_refits_failed"
+    assert "failed to converge" in native["leave_one_out"]["reason"]
+
+
+def test_in_time_placebo_all_dates_failed_status(monkeypatch):
+    import importlib
+
+    sc = importlib.import_module("diff_diff.synthetic_control")
+    res = _fit_for_placebo(n_donors=4)
+    monkeypatch.setattr(sc, "_placebo_fit_unit", lambda *a, **k: None)  # every refit fails
+    with pytest.warns(UserWarning, match="failed to converge"):
+        itp = res.in_time_placebo()
+    # Convergence failure must NOT be mislabeled as dimensional infeasibility.
+    assert res._in_time_status == "all_dates_failed"
+    assert (itp["status"] == "failed").all() and len(itp) > 0
+    native = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]
+    assert native["in_time_placebo"]["status"] != "ran"
+    assert native["in_time_placebo"]["reason_code"] == "all_dates_failed"
+    assert "failed to converge" in native["in_time_placebo"]["reason"]
+
+
+def test_in_time_placebo_mixed_failed_and_infeasible_status(monkeypatch):
+    # Codex R8 P2: a no-success run with BOTH a dimensionally-infeasible date AND a
+    # convergence-failed date must report the mixed "all_dates_unusable" status with
+    # both counts — NOT be mislabeled as exclusively failed (which would falsely claim
+    # "none was dimensionally infeasible").
+    import importlib
+
+    sc = importlib.import_module("diff_diff.synthetic_control")
+    res = _fit_for_placebo(n_donors=4)
+    # Feasible dates "fail" to converge; 2001 (1 pre-fake) is dimensionally infeasible.
+    monkeypatch.setattr(sc, "_placebo_fit_unit", lambda *a, **k: None)
+    with warnings.catch_warnings():
+        warnings.simplefilter("ignore")
+        itp = res.in_time_placebo([2001, 2003])  # 2001 infeasible, 2003 fails
+    assert res._in_time_status == "all_dates_unusable"
+    assert res._in_time_n_failed == 1 and res._in_time_n_infeasible == 1
+    assert set(itp["status"]) == {"infeasible", "failed"}
+    block = DiagnosticReport(res).to_dict()["estimator_native_diagnostics"]["in_time_placebo"]
+    assert block["reason_code"] == "all_dates_unusable"
+    assert block["n_failed"] == 1 and block["n_infeasible"] == 1
diff --git a/tests/test_practitioner.py b/tests/test_practitioner.py
index 4153d840..7cb7d734 100644
--- a/tests/test_practitioner.py
+++ b/tests/test_practitioner.py
@@ -356,6 +356,9 @@ def test_synthetic_control_results(self, mock_scm_results):
         all_labels = " ".join(s.get("label", "") for s in output["next_steps"]).lower()
         assert "in_space_placebo" in all_code
         assert "placebo" in all_labels
+        # The ADH-2015 robustness steps also surface (opt-in diagnostics, non-STEPS
+        # tags so a caller's completed_steps can never suppress them).
+        assert "leave_one_out" in all_code and "in_time_placebo" in all_code
         # SCM is not a staggered DiD: no control-group / anticipation knobs.
         handler_steps = [s for s in output["next_steps"] if s["baker_step"] > 2]
         handler_code = " ".join(s.get("code", "") for s in handler_steps)