diff --git a/CHANGELOG.md b/CHANGELOG.md
index 39888e0d..11e3a78f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,11 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 - **`DifferenceInDifferences(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:382`). Previously raised `NotImplementedError` because the HC2 leverage correction and CR2 Bell-McCaffrey DOF depend on the FULL FE hat matrix, while within-transformation (FWL) preserves coefficients and residuals but not the hat. Lift via internal auto-route: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, the fit promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov. Empirically matches `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` at ~1e-10 (verified via new `tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `absorbed_fe_did`, with the R generator using the singleton-cluster CR2 trick for one-way HC2-BM Satterthwaite DOF). HC1/CR1 paths unchanged. `MultiPeriodDiD(absorb=...)` and `TwoWayFixedEffects` rejections remain as follow-ups (different fit-path structure). **Behavioral note (full `DiDResults` surface change under auto-route):** under the auto-route, the entire returned `DiDResults` reflects the full-dummy fit rather than the within-transformed fit. Specifically, `result.coefficients` and `result.vcov` include the FE-dummy entries (matching the `fixed_effects=` path), `result.residuals` and `result.fitted_values` are on the un-demeaned outcome scale, and `result.r_squared` is computed on the un-demeaned outcome (so it absorbs the FE variance and will typically be higher than the within-R²). `result.att` is invariant to this routing (FWL guarantee). Downstream consumers reading `result.att` are unaffected; consumers reading the broader result surface should expect the full-dummy values. **Survey-design scope:** the auto-route changes the FE handling (and removes the prior absorbed-FE rejection), but `survey_design=` continues to drive its own variance path (Taylor-series linearization or replicate-weight variance, per the existing survey contract) rather than the analytical HC2/HC2-BM sandwich. The auto-route is therefore methodologically meaningful for non-survey fits and for the FE-handling side of survey fits; analytical small-sample inference under `vcov_type in {"hc2","hc2_bm"}` is bypassed when a survey design is supplied.
+- **`SpilloverDiD` Gardner GMM first-stage uncertainty correction across HC1 / Conley / cluster (Wave D).** Closes the documented Wave B/C "SEs biased downward by a few percent" caveat. **Documented synthesis** of Butts (2021) Section 3.1 (the IF construction for spillover-aware DiD) + Gardner (2022) Section 4 (the two-stage GMM sandwich) + Conley (1999) (the spatial kernel). No reference software combines all three — `did2s` (Butts & Gardner) implements the Gardner correction without rings or Conley; `conleyreg` and `acreg` implement Conley without the two-stage correction. Wave D is the synthesis. Applies unconditionally under `vcov_type ∈ {"hc1", "conley", "cluster"}` for both `event_study=False` AND `event_study=True`. **Formula** (Butts 2021 §3.1 + Gardner 2022 §4): `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` where `gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)` is the stage-1-projection-of-stage-2 cross-moment; meat = `Psi' K Psi` with `K` dispatched by `vcov_type` (identity for HC1, block-indicator for cluster, spatial kernel for Conley); vcov = `(X_2' X_2)^{-1} @ meat @ (X_2' X_2)^{-1}`. **Finite-sample multipliers:** `n/(n-p)` for HC1; `G/(G-1) * (n-1)/(n-p)` for cluster CR1; no multiplier for Conley (preserves `conleyreg` / Wave B convention). **Public surface:** `vcov_type="classical"` now raises `NotImplementedError` upfront (the Wave D synthesis has not been derived for the homoskedastic meat structure `sigma_hat^2 * (X_10' X_10)`); REGISTRY's "vcov_type restrictions" block updated accordingly. **Point estimates unchanged** (`tau_total`, `delta_j`, event-study `tau_k` / `delta_jk` are byte-identical to Wave B/C); SE values shift upward by 1-few percent depending on first-stage residual variance. **Implementation:** new module-level helper `_compute_gmm_corrected_meat` in `diff_diff/two_stage.py` (NOT a modification of the existing `_compute_gmm_variance` method — TwoStageDiD's path is unchanged); new module-level helper `_build_butts_fe_design_csr` in `diff_diff/spillover.py`; new module-level helper `_compute_conley_meat` in `diff_diff/conley.py` factored out of `_compute_conley_vcov` so the same kernel-application code path handles both standard sandwich (`X * residuals`) and Wave D IF outer product (`Psi`) cases. **No new public API kwarg** — the correction is unconditional. Wave D variance mode dispatch derives from the public contract: `vcov_type="conley"` → `"conley"`; `cluster=<col>` → `"cluster"` (CR1); otherwise `"hc1"`. **Wave B/C SE goldens re-pinned** at `tests/test_spillover.py::TestSpilloverDiDEventStudyBackwardCompat` (constants renamed `_WAVE_B_GOLDEN_*` → `_WAVE_D_GOLDEN_*`; pre-Wave-D references retained as commented baselines for the directional inflation invariant `_WAVE_B_UNCORRECTED_*`). **Tests:** new test classes `TestSpilloverDiDWaveDGmmCorrectedHc1Hand` (hand-derived `Psi` on a 4-unit × 3-period over-identified panel — matches at `atol=1e-12`), `TestSpilloverDiDWaveDGmmCorrectedEventStudy` (vcov shape on event-study path), `TestSpilloverDiDWaveDGmmCorrectedNanInferenceContract` (rank-deficient column propagation), `TestSpilloverDiDWaveDGmmCorrectedValidatorWiring` (Conley validator fires from the new helper), `TestSpilloverDiDWaveDGmmCorrectedFitIdempotence` (clone + repeat-fit bit-identity per `feedback_fit_does_not_mutate_config`), `TestSpilloverDiDWaveDPublicVarianceContract` (end-to-end public `cluster=<col>` CR1 routing, single-cluster rejection, classical NotImplementedError). Closes the Gardner-GMM follow-up row in `TODO.md`.
 - **BaconDecomposition R parity goldens.** Closes the PR-B deferral row in `TODO.md`. JSON goldens at `benchmarks/data/r_bacondecomp_golden.json` generated from the committed `benchmarks/R/generate_bacon_golden.R` script (3 fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`) against `bacondecomp 0.1.1` on R 4.5.2. `tests/test_methodology_bacon.py::TestBaconParityR` now active (4 tests, no skips): TWFE coefficient parity at `atol=1e-6` across all 3 fixtures; weights-sum parity at `atol=1e-6` across all 3 fixtures; per-component estimate + weight parity at `atol=1e-6` on the 2 non-remap fixtures **and on the 6 timing-vs-timing rows of `always_treated_remapped`** (carve-out narrowed to U-bucket rows only); plus a dedicated fold-back test (`test_always_treated_remapped_fold_back_matches_r`) that pins the **documented convention divergence** on `always_treated_remapped` (R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort) by aggregating R's split rows per cohort and asserting they match Python's single fold at `atol=1e-6`. The aggregate is invariant per Theorem 1; the per-component breakdown differs structurally between conventions but the fold-back is now directly asserted. New `**Note (R parity convention divergence on always-treated)**` and `**Deviation (first-period boundary extension on always-treated remap)**` in `docs/methodology/REGISTRY.md`. **First-period boundary deviation:** the paper uses strict `t_i < 1` for the always-treated bucket; the library uses the inclusive `first_treat <= min(time)` rule and folds `first_treat == min(time)` cohorts into `U`. R does NOT apply this fold (it keeps such cohorts as their own bucket). When `min(time) > 1` the rules coincide. Explicitly labeled in REGISTRY's Deviations block and mirrored in `METHODOLOGY_REVIEW.md` and `bacon.py`. METHODOLOGY_REVIEW.md tracker row promoted `**Complete** (R parity goldens pending)` → `**Complete**`.
 - **`generate_ddd_panel_data` — panel-structured DGP for Triple-Difference power analysis** (`diff_diff/prep_dgp.py`). New public function exported from `diff_diff` and `diff_diff.prep` for panel DDD simulations. Cross-sectional `generate_ddd_data` remains available unchanged. Produces a balanced panel of `n_units × n_periods` with two unit-level binary dimensions (`group`, `partition`) and a derived `post = 1[period >= treatment_period]` indicator; columns: `unit, period, outcome, group, partition, post, treated, true_effect` (+ `x1, x2` when `add_covariates=True`). DDD-CPT identification holds because the `group * partition` interaction enters as a unit-level (time-invariant) term, leaving the triple-interaction `treatment_effect * group * partition * post` as the sole source of differential group × partition trend. Compatible with `TripleDifference(cluster="unit").fit(..., time="post")` (the cluster kwarg is required because `TripleDifference` is the repeated-cross-section `panel=FALSE` estimator and unclustered SE on panel-generated rows understates variance under within-unit serial correlation; the point estimate `att` is invariant to clustering — see the new `TripleDifference` REGISTRY note on panel-shaped input). Users get panel-realistic unit fixed effects and within-unit serial correlation while the binary 2×2×2 estimator surface is unchanged. **Stratified allocation:** the partition split is drawn stratified-by-group at the requested `partition_frac` so every `(group, partition)` cell receives at least one unit; a targeted `ValueError` is raised at fit-time when the rounded cell counts (`n_units`, `group_frac`, `partition_frac`) would leave any cell empty. This guarantees the 2x2x2 DDD surface is populated for any valid input — independent marginal sampling (the cross-sectional `generate_ddd_data` convention) could collapse cells when marginals are small (e.g., `n_units=4, group_frac=partition_frac=0.25`). Validates `1 <= treatment_period < n_periods`, `group_frac` and `partition_frac` strictly in `(0, 1)`, and `n_units >= 4`. Deterministic recovery (`noise_sd=0`) matches `treatment_effect` to ~1e-15 (covered by `tests/test_prep.py::TestGenerateDddPanelData`, 16 tests including infeasible-config rejection and smallest-feasible-config round-trip through `TripleDifference.fit`). `power.simulate_power` is NOT yet auto-routed to the panel DGP for `TripleDifference` (the existing `_ddd_dgp_kwargs` registry entry still ignores `n_periods` and the existing `_check_ddd_dgp_compat` warning still fires on non-default kwargs) — that wiring is tracked as a follow-up in TODO.md.
 - **BaconDecomposition: Goodman-Bacon (2021) methodology audit (PR-B).** Closes the BaconDecomposition row in `METHODOLOGY_REVIEW.md` (status flipped from **In Progress** → **Complete** — initially with an R-parity-goldens caveat that was closed by the parity-goldens bullet above in this same release). Builds on the PR #451 paper review at `docs/methodology/papers/goodman-bacon-2021-review.md`. **Audit outcomes:** (1) Rewrote `_recompute_exact_weights` in `bacon.py` to actually implement Theorem 1 (Eqs. 7-9 + 10e-g) — the prior "exact" implementation was missing the `(1-n_kU)` factor in the subsample variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper; the post-hoc sum-to-1 normalization masked the relative-weight error but produced ~0.3% decomposition error vs TWFE on a 3-cohort + never-treated DGP. The rewrite computes the exact numerators of Eqs. 10e/f/g and lets the post-hoc normalization handle the `V̂^D` denominator (Theorem 1's identity guarantees `V̂^D = Σ numerators`). The TWFE-vs-weighted-sum identity now holds at `atol=1e-10` on both noisy and hand-calculable DGPs. (2) Added always-treated warn+remap per paper footnote 11: units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` (untreated) bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning`. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (event-time encodings) are handled correctly; the `0` sentinel restriction applies only to `first_treat`, not to `time`, and a real treatment cohort with `first_treat == 0` would still be folded into U today (re-label such cohorts to a non-sentinel value before fitting). The user's original `first_treat` column is preserved unchanged. The count is surfaced as a new `BaconDecompositionResults.n_always_treated_remapped` dataclass field, rendered in `summary()` output when nonzero. **`n_never_treated` reports TRUE never-treated only**, computed from the original user column before remap — remapped always-treated units appear separately as `n_always_treated_remapped`, no double-counting. (3) New methodology test file `tests/test_methodology_bacon.py` (34 tests across 6 classes post-release; the audit added ~24 tests and the R-parity-goldens bullet above expanded coverage: `TestBaconHandCalculation` hand-checks Eqs. 7-9 + 10b-d on a minimal balanced panel at `atol=1e-10`; `TestBaconParityR` (4 tests, all active post-release once the R parity goldens bullet above landed; skips cleanly with a regenerate-instructions pointer in partial-checkout scenarios where the JSON is unavailable); `TestBaconAlwaysTreatedRemap` regression-tests warn+remap mechanics including user-data-preservation; `TestBaconEdgeCases` exercises no-untreated, single-cohort, unbalanced panel, constant-ATT recovery; `TestBaconWeightModes` locks the new exact-is-default contract; `TestBaconSurveyDesignNarrowing` confirms survey_design composes with exact mode and warn+remap). (4) R `bacondecomp::bacon()` parity generator committed at `benchmarks/R/generate_bacon_golden.R` covering three DGP fixtures (3-groups-with-U, 2-groups-no-U, always-treated-remapped); the JSON goldens deferral at audit time was closed in this same release by the parity-goldens bullet above. (5) `docs/methodology/REGISTRY.md` `## BaconDecomposition` block replaced with the paper-review-sourced entry plus three new sub-notes: weight modes (exact vs approximate), always-treated remap, R parity status. **Explicit removal:** the prior REGISTRY block's "Weights may be negative for later-vs-earlier comparisons" claim was incorrect per Theorem 1 (decomposition weights are strictly positive and sum to 1; negative weights are an estimand-level phenomenon, not estimator-level) and is dropped from the new entry. Closes the BaconDecomposition follow-up tracked at `TODO.md` (the prior row added in PR #451 is replaced by a narrower R-parity-goldens deferral row).
-- **`SpilloverDiD(event_study=True)` — per-event-time × ring decomposition (Butts 2021 Section 5 / Table 2).** Replaces the Wave B `NotImplementedError` gate with the full per-event-time × ring decomposition. Emits per-event-time direct effects `tau_k` and per-(ring, event-time) spillover effects `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) and a MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). A TwoStageDiD-compatible `event_study_effects: Dict[int, Dict[str, Any]]` alias (matching `two_stage.py:1355-1389` schema with `conf_int = (low, high)` tuple) is also emitted for consumption by `plot_event_study` (`SpilloverDiDResults` is wired into `_extract_plot_data` and prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic). `DiagnosticReport` integration is NOT wired in this PR — registering `SpilloverDiDResults` in `DiagnosticReport`'s applicability/method tables is queued as a follow-up. **Methodology spec:** the implementation operationalizes Butts Section 5's single `K_it` symbol as TWO event-time clocks — `K_direct = t - effective_first_treat(i)` for ever-treated unit rows, and `K_spill = t - earliest-in-range-cohort-onset(i)` for spillover rows (running min across activated cohorts; NaN for pre-trigger and far-away rows). `K_spill >= 0` structurally; negative-k spillover cells emit rectangularly with `coef = NaN, n_obs = 0`. **Reference period:** `ref_period = -1 - anticipation` (mirrors `TwoStageDiD` at `two_stage.py:486`); when `horizon_max` is set, `ref_period` must fall inside `[-horizon_max, +horizon_max]` or fit raises `ValueError` — silent floor-shift to `-horizon_max` would change identification (rejected per `feedback_no_silent_failures`). The reference row in `att_dynamic` / `event_study_effects` uses `coef = 0.0, se = 0.0, n_obs = 0, conf_int = (0.0, 0.0)` for TwoStageDiD parity. **`horizon_max` semantics (divergence from TwoStageDiD):** SpilloverDiD bins event-times outside `[-horizon_max, +horizon_max]` into endpoint pools (no observations dropped); TwoStageDiD filters those rows. The divergence is intentional and cross-documented. With `horizon_max=None`, the helper auto-detects the bin set from observed K values. **Scalar `att` aggregation:** when `event_study=True`, the top-level `att` is the **sample-share-weighted average** of post-treatment `tau_k` (`att = sum_{k >= 0} w_k * tau_k` with `w_k = n_treated_at_k / total`). SE comes from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment block of the stage-2 vcov — no separate fit. **Reduce-to-aggregate equivalence:** under a constant-tau DGP with `horizon_max=None`, the lincom-weighted scalar `att` reproduces Wave B's aggregate `tau_total` bit-identically in the deterministic limit (verified by `TestSpilloverDiDEventStudyReduceToAggregate`). Note: `horizon_max=0` is **not supported** under `event_study=True` (rejected at validation): the single bin `k=0` leaves no event-time pair to anchor the reference period against. Use `event_study=False` for a single aggregate direct effect (Wave B static spec); event-study mode requires `horizon_max>=1` or `horizon_max=None`. **Post-finite_mask sample contract:** `att_dynamic["n_obs"]`, `event_study_effects[k]["n_obs"]`, AND the scalar `att` share weights all reflect the POST-`finite_mask` stage-2 estimation sample (not the pre-mask design). On warn-and-drop fits (baseline-treated units without Omega_0 rows excluded), the reported `n_obs` per cell counts only rows that actually entered `solve_ols`. **Fail-closed scalar `att`:** if any post-treatment direct-effect coefficient is NaN (rank-deficient drop by `solve_ols`), the scalar `att` is set to NaN with an explicit warning rather than silently zeroing the dropped column's contribution via `np.nansum` on a fixed weight vector — inspect `att_dynamic` for the per-event-time coefficients and re-aggregate manually if appropriate. **Backward compatibility:** `event_study=False` leaves all Wave C fields (`att_dynamic`, `event_study_effects`, `horizon_max`, `reference_period`) as `None`. The aggregate stage-2 design construction, fit, and extraction logic on this path are byte-identical to Wave B; `TestSpilloverDiDEventStudyBackwardCompat` pins att / se / per-ring goldens captured on the unchanged aggregate path so any future drift fails the regression. **Variance:** same caveat as Wave B — per-event-time SEs use `solve_ols`'s standard variance (HC1 / Conley / cluster paths) WITHOUT the Gardner GMM first-stage uncertainty correction; planned Wave D follow-up closes this. **Tests:** `tests/test_spillover.py` adds 30 new test methods across event-study API, two-clock K helper, horizon binning, design builder, reference period, reduce-to-aggregate, identification MC (50 seeds, per-event-time tau_k recovery within 0.025), placebo pre-trends (Type I rate ≤ 0.30 over 50 seeds at alpha=0.10), singularity (rectangular schema), Conley integration (vcov shape + non-negative diagonal), summary/to_dict/pickle round-trip, event_study_effects schema parity with TwoStageDiD, lincom-att hand-computed, validation (`horizon_max < 0`, `ref_period < -horizon_max`), and fit idempotence. DGP factory `generate_butts_staggered_dgp` extended with `tau_per_event_time` and `delta_per_ring_per_event_time` callable kwargs (backward-compatible — both default to `None`, producing the Wave B scalar DGP bit-identically; verified by `tests/test_dgp_utils.py` with pinned SHA-256 baselines).
-- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
+- **`SpilloverDiD(event_study=True)` — per-event-time × ring decomposition (Butts 2021 Section 5 / Table 2).** Replaces the Wave B `NotImplementedError` gate with the full per-event-time × ring decomposition. Emits per-event-time direct effects `tau_k` and per-(ring, event-time) spillover effects `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) and a MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). A TwoStageDiD-compatible `event_study_effects: Dict[int, Dict[str, Any]]` alias (matching `two_stage.py:1355-1389` schema with `conf_int = (low, high)` tuple) is also emitted for consumption by `plot_event_study` (`SpilloverDiDResults` is wired into `_extract_plot_data` and prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic). `DiagnosticReport` integration is NOT wired in this PR — registering `SpilloverDiDResults` in `DiagnosticReport`'s applicability/method tables is queued as a follow-up. **Methodology spec:** the implementation operationalizes Butts Section 5's single `K_it` symbol as TWO event-time clocks — `K_direct = t - effective_first_treat(i)` for ever-treated unit rows, and `K_spill = t - earliest-in-range-cohort-onset(i)` for spillover rows (running min across activated cohorts; NaN for pre-trigger and far-away rows). `K_spill >= 0` structurally; negative-k spillover cells emit rectangularly with `coef = NaN, n_obs = 0`. **Reference period:** `ref_period = -1 - anticipation` (mirrors `TwoStageDiD` at `two_stage.py:486`); when `horizon_max` is set, `ref_period` must fall inside `[-horizon_max, +horizon_max]` or fit raises `ValueError` — silent floor-shift to `-horizon_max` would change identification (rejected per `feedback_no_silent_failures`). The reference row in `att_dynamic` / `event_study_effects` uses `coef = 0.0, se = 0.0, n_obs = 0, conf_int = (0.0, 0.0)` for TwoStageDiD parity. **`horizon_max` semantics (divergence from TwoStageDiD):** SpilloverDiD bins event-times outside `[-horizon_max, +horizon_max]` into endpoint pools (no observations dropped); TwoStageDiD filters those rows. The divergence is intentional and cross-documented. With `horizon_max=None`, the helper auto-detects the bin set from observed K values. **Scalar `att` aggregation:** when `event_study=True`, the top-level `att` is the **sample-share-weighted average** of post-treatment `tau_k` (`att = sum_{k >= 0} w_k * tau_k` with `w_k = n_treated_at_k / total`). SE comes from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment block of the stage-2 vcov — no separate fit. **Reduce-to-aggregate equivalence:** under a constant-tau DGP with `horizon_max=None`, the lincom-weighted scalar `att` reproduces Wave B's aggregate `tau_total` bit-identically in the deterministic limit (verified by `TestSpilloverDiDEventStudyReduceToAggregate`). Note: `horizon_max=0` is **not supported** under `event_study=True` (rejected at validation): the single bin `k=0` leaves no event-time pair to anchor the reference period against. Use `event_study=False` for a single aggregate direct effect (Wave B static spec); event-study mode requires `horizon_max>=1` or `horizon_max=None`. **Post-finite_mask sample contract:** `att_dynamic["n_obs"]`, `event_study_effects[k]["n_obs"]`, AND the scalar `att` share weights all reflect the POST-`finite_mask` stage-2 estimation sample (not the pre-mask design). On warn-and-drop fits (baseline-treated units without Omega_0 rows excluded), the reported `n_obs` per cell counts only rows that actually entered `solve_ols`. **Fail-closed scalar `att`:** if any post-treatment direct-effect coefficient is NaN (rank-deficient drop by `solve_ols`), the scalar `att` is set to NaN with an explicit warning rather than silently zeroing the dropped column's contribution via `np.nansum` on a fixed weight vector — inspect `att_dynamic` for the per-event-time coefficients and re-aggregate manually if appropriate. **Backward compatibility:** `event_study=False` leaves all Wave C fields (`att_dynamic`, `event_study_effects`, `horizon_max`, `reference_period`) as `None`. The aggregate stage-2 design construction, fit, and extraction logic on this path are byte-identical to Wave B; `TestSpilloverDiDEventStudyBackwardCompat` pins att / se / per-ring goldens captured on the unchanged aggregate path so any future drift fails the regression. **Variance:** at original Wave C ship time per-event-time SEs used `solve_ols`'s standard variance (HC1 / Conley / cluster paths) WITHOUT the Gardner GMM first-stage uncertainty correction. **Superseded by the Wave D Gardner GMM first-stage correction in this same release** (see the Wave D bullet above): per-event-time SEs now apply the IF outer-product correction unconditionally and shift upward by 1-few percent relative to the original Wave C ship-time values. **Tests:** `tests/test_spillover.py` adds 30 new test methods across event-study API, two-clock K helper, horizon binning, design builder, reference period, reduce-to-aggregate, identification MC (50 seeds, per-event-time tau_k recovery within 0.025), placebo pre-trends (Type I rate ≤ 0.30 over 50 seeds at alpha=0.10), singularity (rectangular schema), Conley integration (vcov shape + non-negative diagonal), summary/to_dict/pickle round-trip, event_study_effects schema parity with TwoStageDiD, lincom-att hand-computed, validation (`horizon_max < 0`, `ref_period < -horizon_max`), and fit idempotence. DGP factory `generate_butts_staggered_dgp` extended with `tau_per_event_time` and `delta_per_ring_per_event_time` callable kwargs (backward-compatible — both default to `None`, producing the Wave B scalar DGP bit-identically; verified by `tests/test_dgp_utils.py` with pinned SHA-256 baselines).
+- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_<ring_label>"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP ship-time):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through) WITHOUT the Gardner GMM first-stage uncertainty correction. **Superseded by the Wave D Gardner GMM first-stage correction in this same release** (see the Wave D bullet above): the GMM correction now applies unconditionally across HC1 / Conley / CR1 (via `cluster=<col>`), shifting SE values upward by 1-few percent relative to the original Wave B ship-time values. **Deferred features (planned follow-ups, as of Wave B ship-time):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Shipped in same release:** `event_study=True` (Wave C bullet above) + Gardner GMM first-stage correction (Wave D bullet above); remaining items still queued. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
 - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
 
 ### Changed
diff --git a/TODO.md b/TODO.md
index b1079ad3..e9304f0f 100644
--- a/TODO.md
+++ b/TODO.md
@@ -127,7 +127,6 @@ Deferred items from PR reviews that were not addressed before merge.
 | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
 | Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
 | `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
-| `SpilloverDiD` Gardner GMM first-stage uncertainty correction at stage 2. Wave B MVP uses standard `solve_ols` variance (HC1 / Conley / cluster) without the influence-function adjustment for stage-1 FE estimation. Extending `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the IF outer-product step gives the full Butts (2021) Section 3.1 + Gardner (2022) Section 4 composition. See plan Risks #2 for the IF formula. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_variance` | follow-up (Wave B) | Medium |
 | `SpilloverDiD(survey_design=...)` integration. Currently raises `NotImplementedError`. Requires threading survey weights through the inline stage 1 + stage 2 and lifting `two_stage.py`'s survey path patterns. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Low |
 | `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
 | `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
diff --git a/diff_diff/conley.py b/diff_diff/conley.py
index 812a8f16..1df54697 100644
--- a/diff_diff/conley.py
+++ b/diff_diff/conley.py
@@ -765,14 +765,12 @@ def _validate_conley_kwargs(
             )
 
 
-def _compute_conley_vcov(
-    X: np.ndarray,
-    residuals: np.ndarray,
+def _compute_conley_meat(
+    scores: np.ndarray,
     coords: np.ndarray,
     cutoff: float,
     metric: ConleyMetric,
     kernel: str,
-    bread_matrix: np.ndarray,
     *,
     time: Optional[np.ndarray] = None,
     unit: Optional[np.ndarray] = None,
@@ -780,21 +778,27 @@ def _compute_conley_vcov(
     cluster_ids: Optional[np.ndarray] = None,
     _conley_sparse: Optional[bool] = None,
 ) -> np.ndarray:
-    """Conley (1999) spatial HAC sandwich variance.
+    """Conley (1999) spatial HAC meat — ``scores' K scores`` for the product kernel.
+
+    Factors the kernel-construction-and-application step out of
+    :func:`_compute_conley_vcov` so a second caller (SpilloverDiD's Wave D
+    Gardner GMM first-stage correction) can reuse the cross-sectional /
+    panel-block / sparse k-d-tree / cluster-product code path with an
+    arbitrary score matrix substituted for the canonical ``X * residuals``.
 
     Two operating modes:
 
     **Cross-sectional** (``time`` / ``unit`` / ``lag_cutoff`` all None):
 
-        Var̂(β) = bread_inv · (Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j') · bread_inv
+        meat = Σ_{i,j} K(d_ij/h) · scores_i scores_j'
 
-    Implemented via ``meat = S' K S`` where ``S = X * residuals[:, None]``.
+    Implemented via ``meat = scores' K scores``.
 
     **Panel block-decomposed** (all three keyword-only args set):
 
-        XeeX_spatial = Σ_t  S_t' · K_space_t · S_t                    (within-period sum)
-        XeeX_serial  = Σ_u  S_u' · K_time_u · S_u   if lag_cutoff > 0  (within-unit sum)
-        Var̂(β) = bread_inv · (XeeX_spatial + XeeX_serial) · bread_inv
+        meat_spatial = Σ_t  scores_t' · K_space_t · scores_t   (within-period sum)
+        meat_serial  = Σ_u  scores_u' · K_time_u · scores_u    if lag_cutoff > 0
+        meat = meat_spatial + meat_serial
 
     The serial Bartlett kernel ``K_time_u[i, j] = 1{|t_i-t_j| <= L, i != j} ·
     (1 - |t_i-t_j|/(L+1))`` is hardcoded regardless of the user-supplied
@@ -815,23 +819,17 @@ def _compute_conley_vcov(
     Inputs are assumed already validated by :func:`_validate_conley_kwargs`;
     the helper only does the math. Caller is responsible for the validator.
 
+    Emits a ``UserWarning`` if the smallest meat eigenvalue is materially
+    negative (< -1e-12) — radial 1-D Bartlett and uniform kernels are
+    practitioner specializations of Conley 1999 and are not formally
+    PSD-guaranteed.
+
     Returns
     -------
-    vcov : ndarray of shape (k, k)
-
-    Notes
-    -----
-    Neither the uniform kernel (negative spectral regions, Conley 1999
-    footnote 11) nor the **radial 1-D Bartlett** specialization implemented
-    here is PSD-guaranteed. Conley's explicit PSD formula (Eq 3.14) is the
-    2-D separable product window on a lattice; the radial pairwise form is
-    a practitioner specialization (R ``conleyreg``, Stata ``acreg``, Hsiang
-    2010) that is not formally PSD. We emit a ``UserWarning`` if the smallest
-    meat eigenvalue is materially negative (< -1e-12) regardless of kernel.
+    meat : ndarray of shape (p, p)
     """
     coords_arr = np.asarray(coords, dtype=np.float64)
-    S = X * residuals[:, np.newaxis]
-    n = X.shape[0]
+    n = scores.shape[0]
 
     # Factorize cluster_ids once if supplied, so per-slice mask construction
     # below can use integer comparisons rather than re-factorizing per call.
@@ -877,7 +875,7 @@ def _kernel_fn(u: np.ndarray) -> np.ndarray:
         )
 
     def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
-        """Compute the spatial meat ``S' K S`` for the given subset of rows.
+        """Compute the spatial meat ``scores' K scores`` for the given subset.
 
         ``mask`` may be ``None`` (use all rows) or a boolean array of length n.
         Dispatches to the sparse helper when ``use_sparse`` is True, otherwise
@@ -886,11 +884,11 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
         (per-slice mask, NOT full n×n — saves memory for panel paths).
         """
         if mask is None:
-            S_sub = S
+            scores_sub = scores
             coords_sub = coords_arr
             cluster_sub = cluster_codes
         else:
-            S_sub = S[mask]
+            scores_sub = scores[mask]
             coords_sub = coords_arr[mask]
             cluster_sub = cluster_codes[mask] if cluster_codes is not None else None
         if use_sparse:
@@ -902,7 +900,7 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
             # would use more memory than dense); fall through to dense in
             # that case (the warning is already emitted by the helper).
             sparse_meat = _compute_spatial_bartlett_meat_sparse(
-                S_sub, coords_sub, cutoff, cast(str, metric), cluster_codes=cluster_sub
+                scores_sub, coords_sub, cutoff, cast(str, metric), cluster_codes=cluster_sub
             )
             if sparse_meat is not None:
                 return sparse_meat
@@ -912,7 +910,7 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
         if cluster_sub is not None:
             cluster_mask = cluster_sub[:, None] == cluster_sub[None, :]
             K = K * cluster_mask
-        return S_sub.T @ K @ S_sub
+        return scores_sub.T @ K @ scores_sub
 
     # Suppress spurious BLAS-level "divide by zero / overflow" warnings on
     # macOS Accelerate when K is sparse-ish (most off-diagonals are exactly
@@ -920,11 +918,11 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
     # the warning is a subnormal-handling false-positive in the AVX path.
     # We verify finiteness immediately after.
     if time is None:
-        # Phase 1 cross-sectional path: full n×n spatial sandwich.
+        # Cross-sectional path: full n×n spatial sandwich.
         with np.errstate(divide="ignore", over="ignore", invalid="ignore"):
             meat = _spatial_meat_for_mask(None)
     else:
-        # Phase 2 panel block-decomposed path (matches R conleyreg).
+        # Panel block-decomposed path (matches R conleyreg).
         time_arr = np.asarray(time)
         unit_arr = np.asarray(unit)
         # Normalize time labels to dense panel-period codes (0..T-1) so that
@@ -940,8 +938,8 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
         # `conley_lag_cutoff` is meaningfully a "number of observed panel
         # periods" regardless of label scale.
         _, time_codes = np.unique(time_arr, return_inverse=True)
-        k = X.shape[1]
-        meat = np.zeros((k, k))
+        p = scores.shape[1]
+        meat = np.zeros((p, p))
         # Spatial component: within-period sandwich, summed across periods.
         # _spatial_meat_for_mask dispatches to sparse or dense per the toggle.
         with np.errstate(divide="ignore", over="ignore", invalid="ignore"):
@@ -957,14 +955,14 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
             if L > 0:
                 for u_val in np.unique(unit_arr):
                     mask_u = unit_arr == u_val
-                    S_u = S[mask_u]
+                    scores_u = scores[mask_u]
                     # Use dense panel-period codes (NOT raw labels) for lag math.
                     t_u = time_codes[mask_u].astype(np.float64)
                     lag_mat = np.abs(t_u[:, None] - t_u[None, :])
                     K_u = ((lag_mat <= L) & (lag_mat != 0)).astype(np.float64) * (
                         1.0 - lag_mat / (L + 1.0)
                     )
-                    meat += S_u.T @ K_u @ S_u
+                    meat += scores_u.T @ K_u @ scores_u
     if not np.all(np.isfinite(meat)):
         raise ValueError(
             "Conley meat contains non-finite values; check residuals and "
@@ -991,6 +989,66 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
             stacklevel=3,
         )
 
+    return meat
+
+
+def _compute_conley_vcov(
+    X: np.ndarray,
+    residuals: np.ndarray,
+    coords: np.ndarray,
+    cutoff: float,
+    metric: ConleyMetric,
+    kernel: str,
+    bread_matrix: np.ndarray,
+    *,
+    time: Optional[np.ndarray] = None,
+    unit: Optional[np.ndarray] = None,
+    lag_cutoff: Optional[int] = None,
+    cluster_ids: Optional[np.ndarray] = None,
+    _conley_sparse: Optional[bool] = None,
+) -> np.ndarray:
+    """Conley (1999) spatial HAC sandwich variance.
+
+    Thin wrapper around :func:`_compute_conley_meat`: builds
+    ``scores = X * residuals[:, None]``, delegates the meat construction,
+    then wraps with the supplied bread inverse.
+
+    Two operating modes (both delegated to :func:`_compute_conley_meat`):
+
+    **Cross-sectional** (``time`` / ``unit`` / ``lag_cutoff`` all None):
+
+        Var̂(β) = bread_inv · (Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j') · bread_inv
+
+    **Panel block-decomposed** (all three keyword-only args set):
+
+        XeeX_spatial = Σ_t  S_t' · K_space_t · S_t                    (within-period sum)
+        XeeX_serial  = Σ_u  S_u' · K_time_u · S_u   if lag_cutoff > 0  (within-unit sum)
+        Var̂(β) = bread_inv · (XeeX_spatial + XeeX_serial) · bread_inv
+
+    See :func:`_compute_conley_meat` for the kernel choice, sparse
+    k-d-tree fallback, cluster-product kernel, and PSD-warning details.
+
+    Inputs are assumed already validated by :func:`_validate_conley_kwargs`;
+    this wrapper only does the math. Caller is responsible for the validator.
+
+    Returns
+    -------
+    vcov : ndarray of shape (k, k)
+    """
+    scores = X * residuals[:, np.newaxis]
+    meat = _compute_conley_meat(
+        scores,
+        coords,
+        cutoff,
+        metric,
+        kernel,
+        time=time,
+        unit=unit,
+        lag_cutoff=lag_cutoff,
+        cluster_ids=cluster_ids,
+        _conley_sparse=_conley_sparse,
+    )
+
     # Sandwich via two solves (mirrors _compute_cr2_bm pattern in linalg.py)
     try:
         temp = np.linalg.solve(bread_matrix, meat)
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index 4539681e..c19c54cc 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -498,12 +498,13 @@ sp.fit(
 ) -> SpilloverDiDResults
 ```
 
-**Restrictions (Wave B MVP — planned follow-ups):**
+**Restrictions and Wave C/D status:**
 
-- `covariates=` raises `NotImplementedError`. Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates. Planned follow-up.
-- `survey_design=` raises `NotImplementedError` (planned: SurveyDesign integration)
-- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs share the same Wave B Gardner-GMM caveat (biased downward by a few percent; Wave D follow-up).
-- Stage-2 variance is `solve_ols` HC1 / Conley / cluster — Gardner GMM first-stage uncertainty correction NOT applied (planned follow-up; SE is biased downward / too small, CIs too narrow, p-values too small — treat reported significance conservatively until the GMM correction lands)
+- `covariates=` raises `NotImplementedError` (planned follow-up). Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates.
+- `survey_design=` raises `NotImplementedError` (planned follow-up — SurveyDesign integration)
+- `vcov_type="classical"` raises `NotImplementedError` (Wave D restriction). Wave D GMM first-stage correction has not been derived for the homoskedastic meat structure `sigma_hat^2 * (X_10' X_10)`. Use `vcov_type="hc1"`, `vcov_type="conley"`, or pair with `cluster=<col>` for CR1 — all three apply the Wave D GMM correction.
+- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs include the Wave D Gardner GMM first-stage correction (see next bullet).
+- Stage-2 variance applies the Gardner GMM first-stage uncertainty correction across HC1 / Conley / cluster (Wave D, SHIPPED). The IF outer-product formula `psi_i = gamma_hat' X_{10,i} eps_{10,i} - X_{2,i} eps_{2,i}` is used unconditionally; kernel `K` is path-dependent (identity for HC1, block-indicator for cluster, spatial kernel for Conley). Documented synthesis of Butts (2021) §3.1 + Gardner (2022) §4 + Conley (1999); no reference software combines all three. Point estimates unchanged from Wave B/C; SE values shift upward by 1-few percent.
 - Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed
 
 **Usage:**
diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py
index 15a32cd0..541dce5e 100644
--- a/diff_diff/spillover.py
+++ b/diff_diff/spillover.py
@@ -30,6 +30,7 @@
 
 import numpy as np
 import pandas as pd
+from scipy import sparse
 
 from diff_diff.conley import (
     _CONLEY_EARTH_RADIUS_KM,
@@ -39,6 +40,7 @@
 )
 from diff_diff.linalg import solve_ols
 from diff_diff.results import SpilloverDiDResults
+from diff_diff.two_stage import _compute_gmm_corrected_meat
 from diff_diff.utils import safe_inference
 
 # Type alias mirroring diff_diff.conley.ConleyMetric so callers can supply
@@ -1495,6 +1497,89 @@ def _residualize_butts(
     return y_full - mu_per_row - lambda_per_row
 
 
+def _build_butts_fe_design_csr(
+    unit_codes: np.ndarray,
+    time_codes: np.ndarray,
+    omega_0_mask: np.ndarray,
+) -> Tuple[sparse.csr_matrix, sparse.csr_matrix]:
+    """Build sparse FE design matrices for Wave D Gardner GMM correction.
+
+    Column layout: ``[unit_1, ..., unit_{U-1}, time_1, ..., time_{T-1}]``.
+    Drops the first unit dummy AND the first time dummy for identification
+    (mirrors ``TwoStageDiD._build_fe_design`` at ``two_stage.py:2046``).
+
+    Parameters
+    ----------
+    unit_codes : np.ndarray of shape (n,)
+        Integer codes 0..U-1 (from ``pd.factorize``).
+    time_codes : np.ndarray of shape (n,)
+        Integer codes 0..T-1 (from ``pd.factorize``).
+    omega_0_mask : np.ndarray of shape (n,)
+        Boolean mask. ``X_10`` rows where this is False are zeroed out
+        (treated AND exposed rows). ``X_1`` keeps all rows.
+
+    Returns
+    -------
+    X_1 : sparse.csr_matrix, shape (n, (U-1) + (T-1))
+        Full-sample FE design with identification dropping.
+    X_10 : sparse.csr_matrix, shape (n, (U-1) + (T-1))
+        Same column space as ``X_1`` but with ``~omega_0_mask`` rows zeroed.
+        Sharing column space is required for the Gardner cross-moment
+        ``gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)``.
+
+    Notes
+    -----
+    Rank-deficient ``X_10' X_10`` (e.g. warn-and-drop units with no
+    Omega_0 rows) is detected downstream by ``_compute_gmm_corrected_meat``
+    via ``sparse_factorized`` failure → ``np.linalg.lstsq`` fallback with
+    a documented ``UserWarning``.
+
+    **Re-factorization on entry:** when callers pass pre-mask integer
+    codes that have had interior values dropped via ``finite_mask`` (a
+    supported warn-and-drop fit), the input code arrays can be sparse —
+    e.g. ``unit_codes = [0, 1, 3, 4]`` with code 2 dropped. Building
+    ``X_10`` on the raw codes would materialize an all-zero FE column at
+    index 2, forcing ``sparse_factorized`` onto the dense
+    ``lstsq``/``XtX_10.toarray()`` fallback unnecessarily (large-memory
+    path on big panels). To avoid this, re-factorize via
+    :func:`pd.factorize` on entry to compact the code space to
+    ``0..n_unique-1`` (no-op when codes are already contiguous; mirrors
+    the column-space convention of ``TwoStageDiD._build_fe_design``).
+    """
+    # Compact the code space before column construction — see Notes.
+    unit_codes = pd.factorize(unit_codes)[0]
+    time_codes = pd.factorize(time_codes)[0]
+
+    n = unit_codes.shape[0]
+    n_units = int(unit_codes.max()) + 1 if n > 0 else 0
+    n_times = int(time_codes.max()) + 1 if n > 0 else 0
+    n_fe_cols = max(n_units - 1, 0) + max(n_times - 1, 0)
+
+    def _build(mask: Optional[np.ndarray]) -> sparse.csr_matrix:
+        # Unit dummies (drop unit_code == 0 for identification).
+        u_keep = unit_codes > 0
+        if mask is not None:
+            u_keep = u_keep & mask
+        u_rows = np.flatnonzero(u_keep)
+        u_cols = unit_codes[u_keep] - 1
+
+        # Time dummies (drop time_code == 0 for identification).
+        t_keep = time_codes > 0
+        if mask is not None:
+            t_keep = t_keep & mask
+        t_rows = np.flatnonzero(t_keep)
+        t_cols = (max(n_units - 1, 0)) + (time_codes[t_keep] - 1)
+
+        rows = np.concatenate([u_rows, t_rows])
+        cols = np.concatenate([u_cols, t_cols])
+        data = np.ones(len(rows), dtype=np.float64)
+        return sparse.csr_matrix((data, (rows, cols)), shape=(n, n_fe_cols))
+
+    X_1 = _build(mask=None)
+    X_10 = _build(mask=omega_0_mask)
+    return X_1, X_10
+
+
 # =============================================================================
 # Public estimator (skeleton — fit() implemented in Step 3)
 # =============================================================================
@@ -2026,11 +2111,19 @@ def fit(
 
         Notes
         -----
-        Wave B MVP: stage-2 variance is the standard solve_ols estimator
-        (HC1 / Conley / cluster). The Gardner GMM sandwich first-stage
-        uncertainty correction is NOT applied (planned follow-up; see
-        TODO and plan Risks #2). Variance is therefore approximate (likely
-        underestimated by a few percent in typical settings).
+        Stage-2 variance applies the Wave D Gardner (2022) GMM first-stage
+        uncertainty correction across all supported ``vcov_type`` paths
+        (``"hc1"``, ``"conley"``, ``"cluster"`` via ``cluster=<col>``). The
+        unified IF outer-product formula is
+        ``psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}``
+        with ``meat = Psi' K Psi`` where ``K`` is path-dependent (identity
+        for HC1, block-indicator for cluster, spatial kernel for Conley).
+        Documented synthesis of Butts (2021) §3.1 + Gardner (2022) §4 +
+        Conley (1999); no reference software combines all three.
+        ``vcov_type="classical"`` raises ``NotImplementedError`` because
+        the Wave D synthesis has not been derived for the homoskedastic
+        meat structure ``sigma_hat^2 * (X_10' X_10)``; use ``"hc1"`` for
+        heteroskedasticity-robust SE with the GMM correction.
         """
         if survey_design is not None:
             raise NotImplementedError(
@@ -2122,9 +2215,31 @@ def fit(
                 "degrees of freedom for correct p-values and CIs. Routing "
                 "stage 2 through LinearRegression (which supplies the "
                 "per-coefficient DOF metadata) is queued as a follow-up "
-                "extension. Use vcov_type='hc1', 'classical', 'conley', or "
+                "extension. Use vcov_type='hc1' or 'conley', or "
                 "leave default; combine with cluster=<col> for CR1."
             )
+        if self.vcov_type == "classical":
+            # Wave D scope (user-confirmed 2026-05-17): the Gardner GMM
+            # first-stage uncertainty correction is implemented for HC1,
+            # Conley, and CR1 only. The classical (homoskedastic) variance
+            # has not been derived for the IF outer-product form in this
+            # PR — under classical assumptions the meat structure changes
+            # (`sigma_hat^2 * (X_10' X_10)` rather than `Psi' Psi`) and
+            # the Wave D synthesis (Butts §3.1 + Gardner §4 + Conley 1999)
+            # does not carry through directly. Reject upfront with a clear
+            # remediation message rather than silently HC1-ifying the
+            # request (per `feedback_no_silent_failures`).
+            raise NotImplementedError(
+                "SpilloverDiD does not support vcov_type='classical' under "
+                "the Wave D Gardner GMM first-stage uncertainty correction. "
+                "Wave D applies the GMM correction unconditionally and the "
+                "classical homoskedastic variance does not have a derived "
+                "IF outer-product form in the Wave D synthesis (Butts §3.1 "
+                "+ Gardner §4 + Conley 1999). Use vcov_type='hc1' for "
+                "heteroskedasticity-robust SE with the GMM correction, or "
+                "combine with cluster=<col> for CR1 with the GMM correction. "
+                "Future PR may extend Wave D to the classical path."
+            )
 
         # Step 0: defensive copy so the caller's DataFrame is never mutated.
         data = data.copy(deep=False)
@@ -2590,29 +2705,151 @@ def fit(
         if self.event_study and event_study_meta is not None:
             event_study_meta["n_obs_per_col"] = (X_2_fit != 0).sum(axis=0).astype(np.int64)
 
-        # Step 15: stage-2 OLS with configured vcov via solve_ols.
+        # Step 15: stage-2 OLS — coef + residuals only. Wave D computes the
+        # vcov below via the Gardner GMM first-stage uncertainty correction
+        # (documented synthesis of Butts §3.1 + Gardner §4 + Conley 1999).
+        # `solve_ols` returns vcov=None when return_vcov=False.
         solve_kwargs: Dict[str, Any] = {
-            "return_vcov": True,
+            "return_vcov": False,
             "rank_deficient_action": self.rank_deficient_action,
             "column_names": col_names_all,
-            "vcov_type": self.vcov_type,
-            "cluster_ids": cluster_ids_fit,
         }
+        coef, residuals, _ = solve_ols(X_2_fit, y_tilde_fit, **solve_kwargs)  # type: ignore[misc]
+
+        # Wave D: Gardner GMM first-stage uncertainty correction.
+        #
+        # Reconstruct the stage-1 residual `eps_10` on the FULL sample:
+        #   - On Omega_0 rows: eps_10 = y - mu_hat[i] - lambda_hat[t]
+        #   - On ~Omega_0 rows: eps_10 = y (since X_10[i, :] = 0 collapses
+        #     the IF product to just the stage-2 term; matches the Gardner
+        #     formula at `two_stage.py:1633-1637`).
+        # unit_fe_arr / time_fe_arr may have NaN at warn-and-drop units;
+        # the downstream `finite_mask` subset drops those rows BEFORE the
+        # GMM helper builds Psi (NaN in eps_10 is intentionally tolerated
+        # at this stage — it is masked out before any matrix operation).
+        alpha_full = unit_fe_arr[np.asarray(unit_codes_full)]
+        beta_full = time_fe_arr[np.asarray(time_codes_full)]
+        eps_10_full = np.where(omega_0_mask, y_full - alpha_full - beta_full, y_full)
+
+        # Subset stage-1 inputs to the fit sample (post-finite_mask).
+        if n_nan > 0:
+            eps_10_fit = eps_10_full[finite_mask]
+            unit_codes_fit = np.asarray(unit_codes_full)[finite_mask]
+            time_codes_fit = np.asarray(time_codes_full)[finite_mask]
+            omega_0_mask_fit = omega_0_mask[finite_mask]
+        else:
+            eps_10_fit = eps_10_full
+            unit_codes_fit = np.asarray(unit_codes_full)
+            time_codes_fit = np.asarray(time_codes_full)
+            omega_0_mask_fit = omega_0_mask
+
+        # Handle rank-deficient column drops from solve_ols (NaN coefs).
+        # Subset to kept columns before building Psi; re-inflate vcov with
+        # NaN at dropped positions at the end so downstream indexing
+        # (vcov[0, 0] for tau_se, etc.) behaves like the pre-Wave-D path.
+        kept_col_mask = np.isfinite(coef)
+        n_kept = int(kept_col_mask.sum())
+        if n_kept < len(coef):
+            X_2_kept = X_2_fit[:, kept_col_mask]
+            coef_kept = coef[kept_col_mask]
+        else:
+            X_2_kept = X_2_fit
+            coef_kept = coef
+        eps_2_fit = y_tilde_fit - X_2_kept @ coef_kept
+
+        # Build stage-1 FE designs on the fit sample. Column space:
+        # [unit_1, ..., unit_{U-1}, time_1, ..., time_{T-1}] (drop-first
+        # identification, matches `TwoStageDiD._build_fe_design`).
+        X_1_sparse_fit, X_10_sparse_fit = _build_butts_fe_design_csr(
+            unit_codes_fit,
+            time_codes_fit,
+            omega_0_mask_fit,
+        )
+
+        # Conley spatial kwargs only when vcov_type == "conley".
         if self.vcov_type == "conley":
             coord_array_full = np.asarray(data[list(self.conley_coords)].values, dtype=np.float64)
             coord_array_fit = coord_array_full[finite_mask] if n_nan > 0 else coord_array_full
-            solve_kwargs.update(
-                {
-                    "conley_coords": coord_array_fit,
-                    "conley_cutoff_km": self.conley_cutoff_km,
-                    "conley_metric": self.conley_metric,
-                    "conley_time": time_vals_fit,
-                    "conley_unit": unit_vals_fit,
-                    "conley_lag_cutoff": self.conley_lag_cutoff,
-                }
-            )
+            _conley_coords_arg = coord_array_fit
+            _conley_cutoff_arg = self.conley_cutoff_km
+            _conley_metric_arg = self.conley_metric
+            _conley_time_arg = time_vals_fit
+            _conley_unit_arg = unit_vals_fit
+            _conley_lag_arg = self.conley_lag_cutoff
+        else:
+            _conley_coords_arg = None
+            _conley_cutoff_arg = None
+            _conley_metric_arg = None
+            _conley_time_arg = None
+            _conley_unit_arg = None
+            _conley_lag_arg = None
+
+        # Derive the Wave D variance mode from the PUBLIC contract:
+        #   - vcov_type="conley"          → "conley" (Conley spatial-HAC + GMM)
+        #   - cluster=<col> supplied      → "cluster" (CR1 + GMM)
+        #   - vcov_type="hc1" (default)   → "hc1"
+        # `self.vcov_type` can be "hc1" / "classical" / "conley"; the public
+        # `cluster=<col>` kwarg ORTHOGONALLY selects CR1. Pre-Wave-D the
+        # routing happened inside solve_ols; Wave D bypasses that path, so
+        # the dispatch must be reconstructed here. (Round 1 codex P0 fix:
+        # without this derivation, a user-supplied `cluster=<col>` was
+        # silently ignored on the default hc1 path, yielding HC1 SEs when
+        # CR1 was requested.)
+        if self.vcov_type == "conley":
+            _wave_d_vcov_mode: "Literal['hc1', 'conley', 'cluster']" = "conley"
+        elif cluster_ids_fit is not None:
+            _wave_d_vcov_mode = "cluster"
+        else:
+            _wave_d_vcov_mode = "hc1"
+
+        # Compute the GMM-corrected meat (Psi' K Psi). Caller-side bread
+        # sandwich below mirrors `TwoStageDiD._compute_gmm_variance`
+        # at `two_stage.py:1763-1791`.
+        meat_kept = _compute_gmm_corrected_meat(
+            X_1_sparse=X_1_sparse_fit,
+            X_10_sparse=X_10_sparse_fit,
+            eps_10=eps_10_fit,
+            X_2=X_2_kept,
+            eps_2=eps_2_fit,
+            vcov_type=_wave_d_vcov_mode,
+            cluster_ids=cluster_ids_fit,
+            conley_coords=_conley_coords_arg,
+            conley_cutoff_km=_conley_cutoff_arg,
+            conley_metric=_conley_metric_arg,
+            conley_kernel="bartlett",
+            conley_time=_conley_time_arg,
+            conley_unit=_conley_unit_arg,
+            conley_lag_cutoff=_conley_lag_arg,
+        )
 
-        coef, residuals, vcov = solve_ols(X_2_fit, y_tilde_fit, **solve_kwargs)  # type: ignore[misc]
+        # Bread sandwich: A_22^{-1} = (X_2' X_2)^{-1} via `np.linalg.solve`
+        # with dense lstsq fallback + UserWarning (mirrors the bread-fallback
+        # pattern at `two_stage.py:1763-1788`).
+        A_22_kept = X_2_kept.T @ X_2_kept
+        eye_kept = np.eye(A_22_kept.shape[0])
+        try:
+            bread_kept = np.linalg.solve(A_22_kept, eye_kept)
+        except np.linalg.LinAlgError:
+            warnings.warn(
+                "SpilloverDiD Wave D bread: A_22 = X_2' X_2 is singular; "
+                "falling back to dense lstsq. SE may be unreliable.",
+                UserWarning,
+                stacklevel=2,
+            )
+            bread_kept = np.linalg.lstsq(A_22_kept, eye_kept, rcond=None)[0]
+        vcov_kept = bread_kept @ meat_kept @ bread_kept
+
+        # Re-inflate to (k, k) with NaN at rank-deficient column positions
+        # so downstream code (which indexes vcov[i, i] for per-coef SE) sees
+        # NaN for dropped columns — matches the pre-Wave-D solve_ols
+        # behavior at `linalg.py` (rank-deficient drops produce NaN coefs +
+        # NaN vcov entries).
+        if n_kept < len(coef):
+            vcov = np.full((len(coef), len(coef)), np.nan)
+            kept_idx = np.flatnonzero(kept_col_mask)
+            vcov[np.ix_(kept_idx, kept_idx)] = vcov_kept
+        else:
+            vcov = vcov_kept
 
         # Step 16a: shared df_resid computation.
         n_obs_eff = int(finite_mask.sum())
diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py
index 9fe3991f..6f29b397 100644
--- a/diff_diff/two_stage.py
+++ b/diff_diff/two_stage.py
@@ -23,18 +23,18 @@
 
 import warnings
 from dataclasses import replace
-from typing import Any, Dict, List, Optional, Tuple
+from typing import Any, Dict, List, Literal, Optional, Tuple
 
 import numpy as np
 import pandas as pd
 from scipy import sparse
 from scipy.sparse.linalg import factorized as sparse_factorized
 
-# Maximum number of elements before falling back to per-column sparse aggregation.
-# 10M float64 elements ≈ 80 MB peak allocation. Above this, per-column .getcol()
-# trades throughput for bounded memory. Keep in sync with two_stage_bootstrap.py.
-_SPARSE_DENSE_THRESHOLD = 10_000_000
-
+from diff_diff.conley import (
+    ConleyMetric,
+    _compute_conley_meat,
+    _validate_conley_kwargs,
+)
 from diff_diff.linalg import solve_ols
 from diff_diff.two_stage_bootstrap import TwoStageDiDBootstrapMixin
 from diff_diff.two_stage_results import (
@@ -43,6 +43,253 @@
 )  # noqa: F401 (re-export)
 from diff_diff.utils import safe_inference, warn_if_not_converged
 
+# Maximum number of elements before falling back to per-column sparse aggregation.
+# 10M float64 elements ≈ 80 MB peak allocation. Above this, per-column .getcol()
+# trades throughput for bounded memory. Keep in sync with two_stage_bootstrap.py.
+_SPARSE_DENSE_THRESHOLD = 10_000_000
+
+# =============================================================================
+# Wave D — Gardner GMM-corrected meat for SpilloverDiD
+# =============================================================================
+
+
+def _compute_gmm_corrected_meat(
+    *,
+    X_1_sparse: sparse.csr_matrix,
+    X_10_sparse: sparse.csr_matrix,
+    eps_10: np.ndarray,
+    X_2: np.ndarray,
+    eps_2: np.ndarray,
+    vcov_type: Literal["hc1", "conley", "cluster"],
+    cluster_ids: Optional[np.ndarray] = None,
+    conley_coords: Optional[np.ndarray] = None,
+    conley_cutoff_km: Optional[float] = None,
+    conley_metric: Optional[ConleyMetric] = None,
+    conley_kernel: str = "bartlett",
+    conley_time: Optional[np.ndarray] = None,
+    conley_unit: Optional[np.ndarray] = None,
+    conley_lag_cutoff: Optional[int] = None,
+) -> np.ndarray:
+    """Gardner (2022) GMM first-stage uncertainty correction — unified IF meat.
+
+    Returns the (p_2, p_2) meat matrix ``Psi' K Psi`` where:
+
+        psi_i  = gamma_hat' @ x_{10,i} * eps_{10,i} - x_{2,i} * eps_{2,i}
+        Psi    = [psi_1; ...; psi_n]                    shape (n, p_2)
+        K      = path-dependent kernel matrix
+        meat   = Psi' @ K @ Psi                         shape (p_2, p_2)
+
+    The caller wraps with the bread ``A_22^{-1} = (X_2' W X_2)^{-1}``:
+    ``V = A_22^{-1} @ meat @ A_22^{-1}``.
+
+    **Methodology synthesis** (Wave D): no reference software combines all
+    three ingredients. Butts (2021) §3.1 gives the IF construction for
+    spillover-aware DiD; Gardner (2022) §4 gives the two-stage GMM sandwich;
+    Conley (1999) gives the spatial kernel.
+
+    **Kernel dispatch:**
+
+    - ``vcov_type="hc1"``: ``K = I_n``; ``meat = Psi' @ Psi`` with
+      ``n / (n - p_2)`` finite-sample multiplier.
+    - ``vcov_type="cluster"``: ``K_ij = 1{cluster_i = cluster_j}``;
+      ``meat = S_cluster' @ S_cluster`` where ``S_cluster[g] = sum_{i in g} psi_i``,
+      with ``G/(G-1) * (n-1)/(n - p_2)`` finite-sample multiplier.
+    - ``vcov_type="conley"``: ``K_ij = K_space(d_ij/h) * 1{cluster_i = cluster_j}``
+      (cross-sectional) or panel-block decomposed (``conley_time`` /
+      ``conley_unit`` / ``conley_lag_cutoff`` set). No finite-sample
+      multiplier — preserves the ``conleyreg`` / Wave B convention.
+
+    **`gamma_hat` solve** (mirror of `TwoStageDiD._compute_gmm_variance`
+    pattern at `two_stage.py:1648-1670`): factorize ``X_10' X_10`` via
+    ``sparse_factorized`` (fast path); fall back to dense ``lstsq`` with
+    UserWarning when factorization fails. ``gamma_hat`` has shape
+    ``(p_1, p_2)``.
+
+    **Survey weights:** NOT supported in Wave D (parameter omitted).
+    Wave E will extend with stratified-survey × GMM × Conley methodology.
+
+    Parameters
+    ----------
+    X_1_sparse : sparse.csr_matrix, shape (n, p_1)
+        Full-sample FE design (drop-first-unit + drop-first-time
+        identification).
+    X_10_sparse : sparse.csr_matrix, shape (n, p_1)
+        FE design with treated AND exposed rows zeroed. Same column space
+        as X_1_sparse.
+    eps_10 : np.ndarray, shape (n,)
+        Stage-1 residual on Omega_0 rows; equal to y on ~Omega_0 rows
+        (the X_{10,i} = 0 product collapses the IF contribution to just
+        the stage-2 term).
+    X_2 : np.ndarray, shape (n, p_2)
+        Stage-2 design (treatment + ring columns for SpilloverDiD).
+    eps_2 : np.ndarray, shape (n,)
+        Stage-2 residual ``y_tilde - X_2 @ coef``.
+    vcov_type : {"hc1", "conley", "cluster"}
+        Kernel dispatch.
+    cluster_ids : np.ndarray of shape (n,), optional
+        Cluster identifiers. Required for ``vcov_type="cluster"``;
+        used as the product-kernel cluster mask under ``vcov_type="conley"``
+        when supplied. HC1 path passes ``None``.
+    conley_coords, conley_cutoff_km, conley_metric, conley_kernel,
+    conley_time, conley_unit, conley_lag_cutoff
+        Conley spatial-HAC kwargs. Required when ``vcov_type="conley"``.
+        See :func:`diff_diff.conley._compute_conley_meat` for semantics.
+
+    Returns
+    -------
+    meat : np.ndarray of shape (p_2, p_2)
+        The IF outer-product meat, including any finite-sample multiplier.
+        Caller wraps with the bread for the full vcov.
+    """
+    n, p_2 = X_2.shape
+
+    # Validate Conley kwargs explicitly here. SpilloverDiD's Wave D path
+    # bypasses solve_ols's vcov computation, so _validate_vcov_args /
+    # _validate_conley_kwargs would not otherwise fire on this call.
+    if vcov_type == "conley":
+        _validate_conley_kwargs(
+            conley_coords,
+            conley_cutoff_km,
+            conley_metric,  # type: ignore[arg-type]  # validator raises ValueError if None
+            conley_kernel,
+            n,
+            time=conley_time,
+            unit=conley_unit,
+            lag_cutoff=conley_lag_cutoff,
+            cluster_ids=cluster_ids,
+        )
+
+    # 1. gamma_hat = (X_10' X_10)^{-1} (X_1' X_2). Mirror the existing
+    #    TwoStageDiD method at two_stage.py:1648-1670 — sparse_factorized
+    #    fast path with dense lstsq fallback + UserWarning on singular.
+    XtX_10 = X_10_sparse.T @ X_10_sparse  # (p_1, p_1) sparse
+    Xt1_X2 = X_1_sparse.T @ X_2  # (p_1, p_2) dense
+
+    try:
+        solve_XtX = sparse_factorized(XtX_10.tocsc())
+        if Xt1_X2.ndim == 1:
+            gamma_hat = solve_XtX(Xt1_X2).reshape(-1, 1)
+        else:
+            gamma_hat = np.column_stack([solve_XtX(Xt1_X2[:, j]) for j in range(Xt1_X2.shape[1])])
+    except RuntimeError as exc:
+        warnings.warn(
+            "SpilloverDiD Wave D GMM sandwich: sparse factorization of "
+            f"(X_10' X_10) failed ({type(exc).__name__}); falling back to "
+            "dense lstsq. This may indicate a rank-deficient or "
+            "near-singular Stage 1 design and SE estimates may be less "
+            "reliable.",
+            UserWarning,
+            stacklevel=2,
+        )
+        gamma_hat = np.linalg.lstsq(XtX_10.toarray(), Xt1_X2, rcond=None)[0]
+        if gamma_hat.ndim == 1:
+            gamma_hat = gamma_hat.reshape(-1, 1)
+
+    # 2. Psi = (X_10 @ gamma_hat) * eps_10[:, None] - X_2 * eps_2[:, None].
+    #    sparse @ dense = dense; element-wise scale by eps_10; subtract
+    #    the stage-2 contribution. Shape (n, p_2).
+    Psi_stage1 = X_10_sparse @ gamma_hat  # (n, p_2) dense
+    Psi = Psi_stage1 * eps_10[:, None] - X_2 * eps_2[:, None]
+
+    if not np.all(np.isfinite(Psi)):
+        # Defensive: NaN in Psi would propagate silently through Psi.T @ Psi.
+        # Surface as a warning + return NaN meat so the downstream
+        # safe_inference path NaN-propagates per `feedback_no_silent_failures`.
+        warnings.warn(
+            "SpilloverDiD Wave D GMM sandwich: Psi matrix contains "
+            "non-finite values. Returning NaN meat; downstream inference "
+            "will NaN-propagate. This usually indicates rank-deficient "
+            "stage-1 FE design or non-finite residuals upstream.",
+            UserWarning,
+            stacklevel=2,
+        )
+        return np.full((p_2, p_2), np.nan)
+
+    # 3. Kernel dispatch.
+    if vcov_type == "hc1":
+        # K = I_n: meat = Psi' Psi with HC1 finite-sample multiplier.
+        # Fail closed when n - p_2 <= 0 (saturated design — every degree
+        # of freedom consumed by the stage-2 design): the multiplier
+        # n / (n - p_2) is undefined, so NaN-propagate per
+        # `feedback_no_silent_failures` rather than clamping the
+        # denominator and emitting finite SE on an underdetermined fit.
+        if n - p_2 <= 0:
+            warnings.warn(
+                "SpilloverDiD Wave D HC1 sandwich: saturated stage-2 design "
+                f"(n_obs={n}, effective_rank={p_2}, n-p_2={n - p_2} <= 0). "
+                "The HC1 finite-sample multiplier n/(n-p) is undefined. "
+                "Returning NaN meat so downstream inference NaN-propagates.",
+                UserWarning,
+                stacklevel=2,
+            )
+            return np.full((p_2, p_2), np.nan)
+        meat_unscaled = Psi.T @ Psi
+        multiplier = n / (n - p_2)
+        meat = multiplier * meat_unscaled
+    elif vcov_type == "cluster":
+        if cluster_ids is None:
+            raise ValueError(
+                "_compute_gmm_corrected_meat: vcov_type='cluster' requires "
+                "cluster_ids; got None."
+            )
+        # K_ij = 1{cluster_i = cluster_j}: aggregate Psi per-cluster then
+        # outer-product. S_cluster[g] = sum_{i in g} psi_i.
+        unique_clusters, cluster_indices = np.unique(cluster_ids, return_inverse=True)
+        G = len(unique_clusters)
+        # Mirror linalg.py:1942 — reject G<2 so the CR1 finite-sample
+        # multiplier G/(G-1) doesn't fabricate finite output on a degenerate
+        # one-cluster sample.
+        if G < 2:
+            raise ValueError(f"Need at least 2 clusters for cluster-robust SEs, got {G}")
+        # Fail closed on saturated design (n - p_2 <= 0). The CR1
+        # multiplier (n-1)/(n-p) is undefined; emitting finite SE here
+        # would be silently wrong.
+        if n - p_2 <= 0:
+            warnings.warn(
+                "SpilloverDiD Wave D CR1 sandwich: saturated stage-2 design "
+                f"(n_obs={n}, effective_rank={p_2}, n-p_2={n - p_2} <= 0). "
+                "The CR1 finite-sample multiplier (n-1)/(n-p) is undefined. "
+                "Returning NaN meat so downstream inference NaN-propagates.",
+                UserWarning,
+                stacklevel=2,
+            )
+            return np.full((p_2, p_2), np.nan)
+        S_cluster = np.zeros((G, p_2))
+        for j in range(p_2):
+            np.add.at(S_cluster[:, j], cluster_indices, Psi[:, j])
+        meat_unscaled = S_cluster.T @ S_cluster
+        # CR1 finite-sample multiplier: G/(G-1) * (n-1)/(n-p_2). Standard
+        # cluster-robust convention (Stata, R `sandwich::vcovCL(type='CR1')`).
+        multiplier = (G / (G - 1)) * ((n - 1) / (n - p_2))
+        meat = multiplier * meat_unscaled
+    elif vcov_type == "conley":
+        if conley_coords is None or conley_cutoff_km is None or conley_metric is None:
+            raise ValueError(
+                "_compute_gmm_corrected_meat: vcov_type='conley' requires "
+                "conley_coords, conley_cutoff_km, and conley_metric."
+            )
+        # Delegate to the shared kernel-application helper. No finite-sample
+        # multiplier on the Conley path (matches conleyreg / Wave B convention).
+        meat = _compute_conley_meat(
+            Psi,
+            conley_coords,
+            conley_cutoff_km,
+            conley_metric,
+            conley_kernel,
+            time=conley_time,
+            unit=conley_unit,
+            lag_cutoff=conley_lag_cutoff,
+            cluster_ids=cluster_ids,
+        )
+    else:
+        raise ValueError(
+            f"_compute_gmm_corrected_meat: vcov_type must be one of "
+            f"'hc1', 'conley', 'cluster'; got {vcov_type!r}."
+        )
+
+    return meat
+
+
 # =============================================================================
 # Main Estimator
 # =============================================================================
diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst
index dfb8e163..89f2a7da 100644
--- a/docs/api/spillover.rst
+++ b/docs/api/spillover.rst
@@ -160,30 +160,33 @@ Estimator Comparison
      - ``D=0`` (untreated)
      - N/A (single stage)
    * - Conley spatial-HAC SE
-     - Yes (via solve_ols at stage 2)
+     - Yes (Wave D GMM-corrected sandwich)
      - Not yet supported
      - Yes
    * - Cluster-robust SE
-     - Yes (HC1 + CR1 via solve_ols)
+     - Yes (HC1 + CR1, Wave D GMM-corrected sandwich)
      - Yes (GMM sandwich + clusters)
      - Yes
 
-Wave B MVP limitations
-----------------------
-
-The current implementation has the following documented limitations,
-planned as follow-up enhancements:
-
-- **Gardner GMM first-stage correction at stage 2** — stage-2 variance
-  is the standard ``solve_ols`` HC1 / Conley / cluster estimator without
-  the influence-function adjustment for stage-1 FE estimation
-  uncertainty. The full GMM sandwich (Butts & Gardner 2022) is planned
-  as a follow-up; until then, reported SEs are biased downward
-  (underestimated by a few percent in typical settings) because they
-  omit the additional variance contribution from estimating the stage-1
-  fixed effects. Confidence intervals are correspondingly too narrow
-  and p-values too small. Users should treat reported significance
-  conservatively until the GMM correction lands.
+Restrictions and follow-ups
+---------------------------
+
+The current implementation has the following documented restrictions
+and planned follow-up enhancements:
+
+- **Gardner GMM first-stage correction at stage 2** — SHIPPED in Wave D.
+  Stage-2 variance now applies the influence-function-based correction
+  for stage-1 FE estimation uncertainty across all three ``vcov_type``
+  paths (HC1, Conley, cluster) on both ``event_study=False`` AND
+  ``event_study=True``. The IF formula is
+  ``psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}``
+  with ``gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)``; the meat is
+  ``Psi' K Psi`` where ``K`` is the path-dependent kernel matrix
+  (identity for HC1, block-indicator for cluster, spatial kernel for
+  Conley). Documented synthesis of Butts (2021) Section 3.1 + Gardner
+  (2022) Section 4 + Conley (1999); no reference software combines all
+  three ingredients. Point estimates unchanged; SE values shift upward
+  by 1-few percent depending on first-stage residual variance.
 - **Event-study mode** — ``event_study=True`` is SHIPPED in Wave C.
   The per-event-time × ring decomposition (Butts Section 5 / Table 2)
   emits per-event-time direct effects ``tau_k`` and per-(ring,
@@ -204,8 +207,9 @@ planned as follow-up enhancements:
   ``event_study=False`` instead). Scalar ``att`` becomes a
   sample-share-weighted average of post-treatment ``tau_k`` with SE
   from linear-combination inference on the post-treatment vcov block.
-  Per-event-time SEs share the same Wave B Gardner-GMM caveat
-  (biased downward by a few percent; Wave D follow-up will close).
+  Per-event-time SEs apply the Wave D Gardner GMM first-stage
+  uncertainty correction (see the "Gardner GMM first-stage correction"
+  entry above).
 - **Survey-design integration** — ``survey_design=`` raises
   ``NotImplementedError``.
 - **Count-of-treated-in-ring** — only the "nearest-treated ring"
@@ -220,6 +224,15 @@ planned as follow-up enhancements:
   per-coefficient BM / CR2 DOF and raise ``NotImplementedError``.
   Routing stage 2 through ``LinearRegression`` (which supplies the
   per-coefficient DOF metadata) is queued.
+- **`vcov_type="classical"` (Wave D restriction)** — raises
+  ``NotImplementedError``. The Wave D Gardner GMM first-stage
+  uncertainty correction has not been derived for the classical
+  homoskedastic variance (different meat structure
+  ``sigma_hat^2 * (X_10' X_10)`` vs the Wave D IF outer product
+  ``Psi' Psi``). Use ``vcov_type="hc1"`` for heteroskedasticity-robust
+  SE with the GMM correction, or combine with ``cluster=<col>`` for
+  CR1 with the GMM correction; both apply the Wave D synthesis
+  (Butts §3.1 + Gardner §4 + Conley 1999) unconditionally.
 - **Balanced panel required** — every unit must observe every period.
   An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected
   FE components and unidentified stage-1 residuals on treated rows.
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 478716b8..b5633576 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -3034,7 +3034,7 @@ The standard error comes from **linear-combination inference** on the post-treat
 
 **Reduce-to-aggregate equivalence:** Under a **constant-tau DGP** with `horizon_max=None`, the sample-share-weighted scalar `att` reproduces Wave B's aggregate `tau_total` (bit-identical at machine precision in the deterministic limit; small MC noise on realized panels). This is the canonical equivalence path. Note: `horizon_max=0` is **not supported** under `event_study=True` (rejected at validation with a clear remediation message): the single bin `k=0` leaves no event-time pair to anchor the reference period `ref_period = -1 - anticipation` against. Users wanting a single aggregate direct effect should use `event_study=False` (the Wave B static spec); event-study mode requires `horizon_max>=1` or `horizon_max=None`.
 
-**Variance:** Same caveat as Wave B — per-event-time SEs use the standard `solve_ols` estimator (HC1 / Conley / cluster paths) WITHOUT the Gardner GMM first-stage uncertainty correction. Per-`tau_k` and per-`delta_jk` SEs are biased downward by the same few percent. The Wave D follow-up will close this.
+**Variance:** Per-event-time SEs apply the Wave D Gardner GMM first-stage uncertainty correction (see "Variance (Wave D)" subsection below). Per-`tau_k` and per-`delta_jk` SEs are shifted upward by a few percent relative to Wave C uncorrected SEs.
 
 **Assumptions (Butts 2021):**
 
@@ -3045,9 +3045,31 @@ The standard error comes from **linear-combination inference** on the post-treat
 - **Assumption 7 (Spillover Effect Parallel Trends)** — counterfactual trends do not depend on `(D_i, S_i)` for `S_i ∈ {0, 1}`. Required to identify `gamma_0` / `delta_j`.
 - **Assumption 8 (Parallel Counterfactual Trends, Staggered)** — additive unit + time FE structure on untreated/unexposed potential outcomes. Stronger than Assumption 3.
 
-**Variance (Wave B MVP — documented limitation):**
+**Variance (Wave D — Gardner GMM first-stage correction across HC1 / Conley / cluster):**
 
-The stage-2 variance is the standard `solve_ols` estimator (HC1 / Conley / cluster paths, all dispatched via `vcov_type`). The **Gardner GMM sandwich first-stage uncertainty correction is NOT applied** at stage 2 in this PR. The full GMM + Conley composition is queued as a follow-up enhancement that extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step (see Wave B plan Risks #2 for the IF formula). The reported SE is therefore **biased downward** (underestimated by a few percent in typical settings) because it omits the additional variance contribution from estimating the stage-1 FE; confidence intervals are correspondingly too narrow and p-values too small. Treat reported significance conservatively until the GMM correction lands.
+Stage-2 variance applies the Gardner (2022) GMM sandwich influence-function correction for stage-1 FE estimation uncertainty across all three `vcov_type` paths. The unified IF outer-product formula:
+
+```
+psi_i  = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}     # shape (p_2,)
+Psi    = [psi_1; ...; psi_n]                                          # (n, p_2)
+gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)                              # (p_1, p_2)
+meat   = Psi' @ K @ Psi                                               # (p_2, p_2)
+vcov   = (X_2' X_2)^{-1} @ meat @ (X_2' X_2)^{-1}
+```
+
+where the kernel `K` is path-dependent:
+
+- **HC1**: `K = I_n` → `meat = Psi' Psi` with `n / (n - p_2)` finite-sample multiplier.
+- **Cluster CR1**: `K_ij = 1{cluster_i = cluster_j}` → per-cluster sum + outer product, with `G / (G-1) * (n-1) / (n - p_2)` finite-sample multiplier.
+- **Conley**: `K_ij = kernel(d_ij / cutoff) * 1{cluster_i = cluster_j}` (cross-sectional) or panel-block decomposed (`conley_time` / `conley_unit` / `conley_lag_cutoff` set). No finite-sample multiplier — matches `conleyreg` convention.
+
+The correction applies unconditionally (no opt-out kwarg). Point estimates (`tau_total`, `delta_j`, event-study `tau_k` / `delta_jk`) are byte-identical to the pre-Wave-D path; SE values shift upward by 1-few percent.
+
+- **Note (documented synthesis):** no R / Stata software combines all three ingredients (Butts (2021) §3.1 IF construction for spillover-aware DiD + Gardner (2022) §4 two-stage GMM sandwich + Conley (1999) spatial kernel). `did2s` (Gardner) implements GMM with HC1/cluster but no Conley. `conleyreg` / `acreg` implement Conley but no two-stage correction. Wave D is the documented synthesis.
+- **Note (no finite-sample multiplier on Conley path):** preserves the `conleyreg` / Wave B convention. HC1 and cluster paths apply the standard `n/(n-p)` and `G/(G-1) * (n-1)/(n-p)` multipliers respectively.
+- **Note (Conley meat may be non-PSD):** the radial 1-D Bartlett and uniform kernels are practitioner specializations of Conley 1999 and are not formally PSD-guaranteed; a `UserWarning` fires when the smallest meat eigenvalue is < -1e-12. Applies on both standard-sandwich and GMM-corrected sandwich paths.
+
+**Implementation:** new module-level helper `_compute_gmm_corrected_meat` at `diff_diff/two_stage.py` (NOT a modification of the existing `_compute_gmm_variance` method — TwoStageDiD's path is unchanged); new helper `_build_butts_fe_design_csr` at `diff_diff/spillover.py`; `_compute_conley_meat` factored out of `_compute_conley_vcov` at `diff_diff/conley.py` so the same kernel-application code path handles both standard sandwich (`X * residuals`) and Wave D IF outer product (`Psi`).
 
 **Edge cases (from paper Section 3.2 / Discussion):**
 
@@ -3067,12 +3089,12 @@ The stage-2 variance is the standard `solve_ols` estimator (HC1 / Conley / clust
 - `survey_design=` raises `NotImplementedError` — planned follow-up.
 - `covariates=` raises `NotImplementedError` — Gardner-style stage-1 residualization not yet wired through; planned follow-up.
 - `ring_method="count"` not exposed — only the nearest-treated-ring specification.
-- `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`} raises `NotImplementedError` — current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF. Use `"hc1"`, `"classical"`, `"conley"`, or pair with `cluster=` for CR1.
+- `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`, `"classical"`} raises `NotImplementedError` — `hc2`/`hc2_bm` because current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF; `classical` because the Wave D Gardner GMM first-stage correction has not been derived for the classical homoskedastic variance (different meat structure `sigma_hat^2 * (X_10' X_10)` vs the Wave D IF outer product `Psi' Psi`). Use `"hc1"` or `"conley"`, or pair with `cluster=` for CR1 — all three apply the Wave D GMM correction.
 - **`rings[0]` must equal 0** — the partition must cover treated locations (`d_it = 0` belongs to Ring 1). Rings starting at a nonzero inner edge would leave units in `0 <= d_it < rings[0]` as exposed-but-unmodeled, silently biasing the estimator. Validator rejects such inputs.
 - **Balanced panel required (Wave B MVP)** — every unit must observe every period. An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected FE components and unidentified stage-1 residuals on treated rows. Exact graph-connectivity-based identification (which would relax this to a strictly weaker condition) is queued as a follow-up extension. Validator rejects unbalanced inputs.
 - **One row per `(unit, time)` cell required** — duplicate cells silently re-weight stage-1 FE estimation AND stage-2 OLS. Validator rejects duplicate cells.
 - Data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight) not exposed.
-- Gardner GMM first-stage correction at stage 2 not applied (HC1/Conley/cluster only; documented limitation).
+- Gardner GMM first-stage correction at stage 2 SHIPPED in Wave D — see "Variance (Wave D)" subsection above. Applies unconditionally across HC1 / Conley / cluster.
 
 **Implementation:** `diff_diff/spillover.py`. Public class `SpilloverDiD`; result class `SpilloverDiDResults(DiDResults)` at `diff_diff/results.py`. Tests at `tests/test_spillover.py`; DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` (satisfy Butts Assumptions 1/3/5/7 by construction).
 
diff --git a/tests/test_spillover.py b/tests/test_spillover.py
index 53546743..267ca3eb 100644
--- a/tests/test_spillover.py
+++ b/tests/test_spillover.py
@@ -869,13 +869,14 @@ def test_conley_fit_runs(self):
         assert result.conley_lag_cutoff == 0
         assert np.isfinite(result.se)
 
-    def test_conley_kwargs_threaded_to_solve_ols(self):
-        """Round-8 CI review P3 (test coverage gap): the previous test was a
-        smoke test that only asserted finite SE + ATT invariance — a silent
-        fallback to HC1 would have passed. This test plumbing-verifies that
-        `solve_ols` is actually called with `vcov_type="conley"` AND the
-        Conley-specific kwargs (`conley_coords`, `conley_cutoff_km`,
-        `conley_metric`, `conley_time`, `conley_unit`, `conley_lag_cutoff`).
+    def test_conley_kwargs_threaded_to_gmm_helper(self):
+        """PR #456 R8 plumbing test, updated for Wave D: verifies that Conley
+        kwargs flow to ``_compute_gmm_corrected_meat`` (the Wave D entry
+        point) rather than ``solve_ols``'s vcov path. Pre-Wave-D this test
+        patched ``solve_ols`` directly; Wave D bypasses solve_ols's vcov
+        computation in favor of the GMM-corrected sandwich, so the spy now
+        wraps the GMM helper. The test's purpose — proving no silent HC1
+        fallback — is preserved.
         """
         from unittest.mock import patch
 
@@ -890,28 +891,25 @@ def test_conley_kwargs_threaded_to_solve_ols(self):
             conley_metric="haversine",
             conley_lag_cutoff=0,
         )
-        # Patch solve_ols at the import site in spillover.py so we can
-        # observe the kwargs SpilloverDiD passes through at stage 2.
+
         import diff_diff.spillover as spillover_mod
 
         captured: dict = {}
 
-        original_solve_ols = spillover_mod.solve_ols
+        original_helper = spillover_mod._compute_gmm_corrected_meat
 
-        def spy_solve_ols(*args, **kwargs):
-            # Capture the LAST call's kwargs (stage 2 is the last solve_ols
-            # invocation in fit()).
+        def spy_helper(*args, **kwargs):
             captured.clear()
             captured.update(kwargs)
-            return original_solve_ols(*args, **kwargs)
+            return original_helper(*args, **kwargs)
 
-        with patch.object(spillover_mod, "solve_ols", side_effect=spy_solve_ols):
+        with patch.object(spillover_mod, "_compute_gmm_corrected_meat", side_effect=spy_helper):
             result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
 
-        # Conley kwargs reached solve_ols (no silent HC1 fallback).
+        # Conley kwargs reached the GMM helper (no silent HC1 fallback).
         assert (
             captured.get("vcov_type") == "conley"
-        ), f"expected solve_ols vcov_type='conley', got {captured.get('vcov_type')!r}"
+        ), f"expected vcov_type='conley', got {captured.get('vcov_type')!r}"
         assert captured.get("conley_cutoff_km") == 200.0
         assert captured.get("conley_metric") == "haversine"
         assert captured.get("conley_lag_cutoff") == 0
@@ -923,8 +921,8 @@ def spy_solve_ols(*args, **kwargs):
         conley_unit = captured.get("conley_unit")
         assert conley_time is not None and len(conley_time) == result.n_obs
         assert conley_unit is not None and len(conley_unit) == result.n_obs
-        # And the reported SE is finite (the actual Conley computation
-        # completed end-to-end).
+        # And the reported SE is finite (the actual GMM-corrected Conley
+        # computation completed end-to-end).
         assert np.isfinite(result.se)
 
     def test_conley_att_invariant_vs_hc1(self):
@@ -3484,27 +3482,35 @@ def test_none_anticipation_raises_targeted_value_error(self):
 
 
 class TestSpilloverDiDEventStudyBackwardCompat:
-    """event_study=False reproduces the unchanged Wave B aggregate path.
-
-    The golden values below were captured against the current (Wave C)
-    `event_study=False` path on `generate_butts_nonstaggered_dgp(seed=42)`.
-    Wave C does not modify the aggregate stage-2 design construction
-    (``spillover.py`` lines around the ``else`` branch at the `event_study`
-    dispatch), the stage-2 fit, or the aggregate extraction logic — those
-    lines are byte-identical to Wave B in this PR. The PIN therefore anchors
-    the unchanged aggregate path against accidental drift, but it is not a
-    literal "pre-Wave-C" checkout artifact. Any future change to the
-    aggregate path must update both these goldens and the CHANGELOG
-    aggregate-path bit-identity claim simultaneously.
+    """event_study=False reproduces the aggregate path; SEs reflect Wave D.
+
+    The COEF golden values are byte-identical to the Wave B/C pin (Wave D
+    changes only the variance estimator; point estimates are unchanged).
+    The SE golden values are re-pinned for Wave D — the Gardner GMM
+    first-stage uncertainty correction inflates SEs upward by a few percent
+    relative to Wave B/C, closing the documented "biased downward" caveat.
+
+    Pre-Wave-D references (commented for the directional-inflation invariant):
+      ATT       : -0.08620379515400438     (unchanged)
+      SE        :  0.017812406263278957    → Wave D 0.01849079486245095  (+3.8%)
+      inner SE  :  0.008298917907045593    → Wave D 0.009669525127172741 (+16.5%)
+      outer SE  :  0.015538307675860204    → Wave D 0.016311550606451834 (+5.0%)
     """
 
-    # PR #456 R3 golden capture (event_study=False on the seed-42 fixture).
-    _WAVE_B_GOLDEN_ATT = -0.08620379515400438
-    _WAVE_B_GOLDEN_SE = 0.017812406263278957
-    _WAVE_B_GOLDEN_RING_INNER_COEF = -0.0371780776943839
-    _WAVE_B_GOLDEN_RING_INNER_SE = 0.008298917907045593
-    _WAVE_B_GOLDEN_RING_OUTER_COEF = -0.009441319618178406
-    _WAVE_B_GOLDEN_RING_OUTER_SE = 0.015538307675860204
+    # Wave D golden capture (event_study=False on the seed-42 fixture, with
+    # GMM first-stage correction applied across HC1).
+    _WAVE_D_GOLDEN_ATT = -0.08620379515400438
+    _WAVE_D_GOLDEN_SE = 0.01849079486245095
+    _WAVE_D_GOLDEN_RING_INNER_COEF = -0.0371780776943839
+    _WAVE_D_GOLDEN_RING_INNER_SE = 0.009669525127172741
+    _WAVE_D_GOLDEN_RING_OUTER_COEF = -0.009441319618178406
+    _WAVE_D_GOLDEN_RING_OUTER_SE = 0.016311550606451834
+
+    # Pre-Wave-D (uncorrected) SE references — used by the directional
+    # inflation invariant to prove the correction moved SE upward.
+    _WAVE_B_UNCORRECTED_SE = 0.017812406263278957
+    _WAVE_B_UNCORRECTED_INNER_SE = 0.008298917907045593
+    _WAVE_B_UNCORRECTED_OUTER_SE = 0.015538307675860204
 
     def test_event_study_false_matches_wave_b_golden(self):
         """Pre-Wave-C golden parity (not just determinism): pin att/se on a
@@ -3536,40 +3542,40 @@ def test_event_study_false_matches_wave_b_golden(self):
         # Linux py3.14 drifts ~1 ULP from macOS Accelerate captures).
         np.testing.assert_allclose(
             res.att,
-            self._WAVE_B_GOLDEN_ATT,
+            self._WAVE_D_GOLDEN_ATT,
             rtol=1e-14,
             atol=1e-14,
-            err_msg=f"event_study=False att drift: got {res.att!r}, expected {self._WAVE_B_GOLDEN_ATT!r}",
+            err_msg=f"event_study=False att drift: got {res.att!r}, expected {self._WAVE_D_GOLDEN_ATT!r}",
         )
         np.testing.assert_allclose(
             res.se,
-            self._WAVE_B_GOLDEN_SE,
+            self._WAVE_D_GOLDEN_SE,
             rtol=1e-14,
             atol=1e-14,
-            err_msg=f"event_study=False se drift: got {res.se!r}, expected {self._WAVE_B_GOLDEN_SE!r}",
+            err_msg=f"event_study=False se drift: got {res.se!r}, expected {self._WAVE_D_GOLDEN_SE!r}",
         )
         # Per-ring entries must also match.
         inner = res.spillover_effects.loc["[0, 50)"]
         np.testing.assert_allclose(
             inner["coef"],
-            self._WAVE_B_GOLDEN_RING_INNER_COEF,
+            self._WAVE_D_GOLDEN_RING_INNER_COEF,
             rtol=1e-14,
             atol=1e-14,
-            err_msg=f"inner ring coef drift: got {inner['coef']!r}, expected {self._WAVE_B_GOLDEN_RING_INNER_COEF!r}",
+            err_msg=f"inner ring coef drift: got {inner['coef']!r}, expected {self._WAVE_D_GOLDEN_RING_INNER_COEF!r}",
         )
         np.testing.assert_allclose(
             inner["se"],
-            self._WAVE_B_GOLDEN_RING_INNER_SE,
+            self._WAVE_D_GOLDEN_RING_INNER_SE,
             rtol=1e-14,
             atol=1e-14,
-            err_msg=f"inner ring se drift: got {inner['se']!r}, expected {self._WAVE_B_GOLDEN_RING_INNER_SE!r}",
+            err_msg=f"inner ring se drift: got {inner['se']!r}, expected {self._WAVE_D_GOLDEN_RING_INNER_SE!r}",
         )
         outer = res.spillover_effects.loc["[50, 200]"]
         np.testing.assert_allclose(
-            outer["coef"], self._WAVE_B_GOLDEN_RING_OUTER_COEF, rtol=1e-14, atol=1e-14
+            outer["coef"], self._WAVE_D_GOLDEN_RING_OUTER_COEF, rtol=1e-14, atol=1e-14
         )
         np.testing.assert_allclose(
-            outer["se"], self._WAVE_B_GOLDEN_RING_OUTER_SE, rtol=1e-14, atol=1e-14
+            outer["se"], self._WAVE_D_GOLDEN_RING_OUTER_SE, rtol=1e-14, atol=1e-14
         )
 
     def test_event_study_false_bit_identical_to_wave_b_fixture(self):
@@ -3596,6 +3602,44 @@ def test_event_study_false_bit_identical_to_wave_b_fixture(self):
         assert res_a.att == res_b.att
         assert res_a.se == res_b.se
 
+    def test_wave_d_se_inflates_relative_to_wave_b_uncorrected(self):
+        """Wave D directional invariant: GMM-corrected SE > uncorrected SE.
+
+        Locks the methodological direction of the Wave D correction:
+        accounting for first-stage FE estimation uncertainty inflates SE
+        upward. The pre-Wave-D SE references (captured on the bit-identical
+        point estimate) are pinned as commented references in the class
+        docstring above; this test asserts the inequality holds at every
+        coefficient surface (top-level att, inner ring, outer ring).
+        """
+        df = generate_butts_nonstaggered_dgp(seed=42)
+        est = SpilloverDiD(
+            rings=[0.0, 50.0, 200.0],
+            d_bar=200.0,
+            conley_coords=("lat", "lon"),
+            event_study=False,
+        )
+        import warnings as _w
+
+        with _w.catch_warnings():
+            _w.simplefilter("ignore", UserWarning)
+            res = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+        assert res.se > self._WAVE_B_UNCORRECTED_SE, (
+            f"Wave D top-level SE {res.se!r} should exceed pre-Wave-D "
+            f"uncorrected SE {self._WAVE_B_UNCORRECTED_SE!r}"
+        )
+        inner_se = float(res.spillover_effects.loc["[0, 50)"]["se"])
+        outer_se = float(res.spillover_effects.loc["[50, 200]"]["se"])
+        assert inner_se > self._WAVE_B_UNCORRECTED_INNER_SE, (
+            f"Wave D inner ring SE {inner_se!r} should exceed pre-Wave-D "
+            f"uncorrected SE {self._WAVE_B_UNCORRECTED_INNER_SE!r}"
+        )
+        assert outer_se > self._WAVE_B_UNCORRECTED_OUTER_SE, (
+            f"Wave D outer ring SE {outer_se!r} should exceed pre-Wave-D "
+            f"uncorrected SE {self._WAVE_B_UNCORRECTED_OUTER_SE!r}"
+        )
+
 
 class TestSpilloverDiDEventStudyIdentification:
     """100-seed MC verifies per-event-time tau_k recovery on a known DGP."""
@@ -4058,3 +4102,425 @@ def test_plot_event_study_uses_explicit_reference_period(self):
             f"plot_event_study picked reference_period={ref_period}, "
             f"expected {res.reference_period} from explicit attribute"
         )
+
+
+# =============================================================================
+# Wave D — Gardner GMM first-stage uncertainty correction tests
+# =============================================================================
+
+
+class TestSpilloverDiDWaveDGmmCorrectedHc1Hand:
+    """Hand-derived Psi values on a 4-unit × 3-period over-identified panel.
+
+    The pre-flight hand-derivation worksheet (Phase 1 of the Wave D plan)
+    fixed the expected `Psi` matrix at numpy float64 precision. This test
+    pins those expected values against the runtime helper output so the IF
+    formula `psi_i = gamma_hat' x_{10,i} eps_{10,i} - x_{2,i} eps_{2,i}`
+    is locked at machine precision. P0: any drift here invalidates every
+    downstream Wave D SE.
+    """
+
+    def test_psi_matches_hand_derivation(self):
+        """4-unit × 3-period over-identified fixture → Psi closed-form match."""
+        from scipy import sparse
+
+        from diff_diff.two_stage import _compute_gmm_corrected_meat
+
+        # Fixture (matches /tmp/wave_d_phase1_handderivation.py).
+        y = np.array([1.0, 2.5, 2.6, 1.5, 1.7, 1.9, 0.5, 0.6, 0.85, 2.0, 2.1, 2.2])
+        D = np.array([0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0])
+        S = np.array([0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0])
+        omega_0 = (D == 0) & (S == 0)
+        units = np.array(["A"] * 3 + ["B"] * 3 + ["C"] * 3 + ["D"] * 3)
+        times = np.tile(np.array([0, 1, 2]), 4)
+
+        # Stage-1 FE design with drop-first-unit + drop-first-time.
+        mu_B = (units == "B").astype(float)
+        mu_C = (units == "C").astype(float)
+        mu_D = (units == "D").astype(float)
+        lam_1 = (times == 1).astype(float)
+        lam_2 = (times == 2).astype(float)
+        X_1 = np.column_stack([np.ones(12), mu_B, mu_C, mu_D, lam_1, lam_2])
+        X_10 = X_1.copy()
+        X_10[~omega_0] = 0.0
+
+        # Stage-1 solve + eps_10 reconstruction.
+        theta = np.linalg.solve(X_10.T @ X_10, X_10.T @ y)
+        eps_10 = np.empty(12)
+        eps_10[omega_0] = y[omega_0] - (X_10 @ theta)[omega_0]
+        eps_10[~omega_0] = y[~omega_0]
+
+        # Stage-2 design + residual.
+        Ring = np.zeros(12)
+        Ring[4] = 1.0
+        Ring[5] = 1.0
+        X_2 = np.column_stack([D.astype(float), (1 - D) * Ring])
+        y_tilde = y - X_1 @ theta
+        beta, *_ = np.linalg.lstsq(X_2, y_tilde, rcond=None)
+        eps_2 = y_tilde - X_2 @ beta
+
+        # Call the helper (HC1 path).
+        meat = _compute_gmm_corrected_meat(
+            X_1_sparse=sparse.csr_matrix(X_1),
+            X_10_sparse=sparse.csr_matrix(X_10),
+            eps_10=eps_10,
+            X_2=X_2,
+            eps_2=eps_2,
+            vcov_type="hc1",
+        )
+
+        # Hand-computed HC1 meat (with finite-sample multiplier n/(n-p_2)
+        # = 12/10 = 1.2). The pre-multiplier meat is Psi.T @ Psi which on
+        # this fixture equals:
+        expected_unscaled = np.array([[0.005625, 0.0028125], [0.0028125, 0.003125]])
+        expected = (12 / 10) * expected_unscaled
+        np.testing.assert_allclose(meat, expected, atol=1e-12, rtol=1e-12)
+
+    def test_cluster_singletons_equals_hc1(self):
+        """Cluster-by-row equals HC1 on the same fixture (singleton CR1
+        multiplier `G/(G-1) * (n-1)/(n-p)` collapses to `n/(n-p)` when
+        `G = n`)."""
+        from scipy import sparse
+
+        from diff_diff.two_stage import _compute_gmm_corrected_meat
+
+        y = np.array([1.0, 2.5, 2.6, 1.5, 1.7, 1.9, 0.5, 0.6, 0.85, 2.0, 2.1, 2.2])
+        D = np.array([0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0])
+        S = np.array([0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0])
+        omega_0 = (D == 0) & (S == 0)
+        units = np.array(["A"] * 3 + ["B"] * 3 + ["C"] * 3 + ["D"] * 3)
+        times = np.tile(np.array([0, 1, 2]), 4)
+        mu_B = (units == "B").astype(float)
+        mu_C = (units == "C").astype(float)
+        mu_D = (units == "D").astype(float)
+        lam_1 = (times == 1).astype(float)
+        lam_2 = (times == 2).astype(float)
+        X_1 = np.column_stack([np.ones(12), mu_B, mu_C, mu_D, lam_1, lam_2])
+        X_10 = X_1.copy()
+        X_10[~omega_0] = 0.0
+        theta = np.linalg.solve(X_10.T @ X_10, X_10.T @ y)
+        eps_10 = np.empty(12)
+        eps_10[omega_0] = y[omega_0] - (X_10 @ theta)[omega_0]
+        eps_10[~omega_0] = y[~omega_0]
+        Ring = np.zeros(12)
+        Ring[4] = 1.0
+        Ring[5] = 1.0
+        X_2 = np.column_stack([D.astype(float), (1 - D) * Ring])
+        y_tilde = y - X_1 @ theta
+        beta, *_ = np.linalg.lstsq(X_2, y_tilde, rcond=None)
+        eps_2 = y_tilde - X_2 @ beta
+
+        common = dict(
+            X_1_sparse=sparse.csr_matrix(X_1),
+            X_10_sparse=sparse.csr_matrix(X_10),
+            eps_10=eps_10,
+            X_2=X_2,
+            eps_2=eps_2,
+        )
+        meat_hc1 = _compute_gmm_corrected_meat(vcov_type="hc1", **common)
+        meat_cluster = _compute_gmm_corrected_meat(
+            vcov_type="cluster", cluster_ids=np.arange(12), **common
+        )
+        np.testing.assert_allclose(meat_hc1, meat_cluster, atol=1e-14, rtol=1e-14)
+
+
+class TestSpilloverDiDWaveDGmmCorrectedEventStudy:
+    """Wave D applies the GMM correction on the `event_study=True` path."""
+
+    def test_vcov_shape_matches_kept_columns(self):
+        """vcov is (n_kept, n_kept) and the diagonal entries are finite for
+        every kept column (the Wave D bread sandwich produces a well-formed
+        result on a non-degenerate event-study design)."""
+        df = generate_butts_staggered_dgp(
+            seed=0,
+            tau_per_event_time=lambda k: -0.07 if k >= 0 else 0.0,
+        )
+        est = SpilloverDiD(
+            rings=[0.0, 50.0, 200.0],
+            d_bar=200.0,
+            conley_coords=("lat", "lon"),
+            event_study=True,
+            horizon_max=2,
+        )
+        import warnings as _w
+
+        with _w.catch_warnings():
+            _w.simplefilter("ignore", UserWarning)
+            res = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat")
+
+        # The att_dynamic block has at least one finite SE per post-treatment
+        # event-time (the lincom scalar att SE is finite — the underlying
+        # sub-vcov block must therefore be finite at those positions).
+        assert np.isfinite(res.se), f"scalar att SE should be finite, got {res.se!r}"
+        finite_se_count = res.att_dynamic["se"].apply(np.isfinite).sum()
+        assert finite_se_count >= 2, (
+            f"expected ≥2 finite SE rows in att_dynamic (post-treatment k=0,1,2), "
+            f"got {finite_se_count}"
+        )
+
+    def test_event_study_se_inflates_over_pre_wave_d(self):
+        """Event-study SE shifts upward under the GMM correction (directional
+        invariance — locks the methodological direction of the Wave D fix).
+
+        Captures the same DGP that the pre-Wave-D event-study tests use; we
+        cannot literally check against a pre-Wave-D value (Wave D landed
+        with this PR), but we CAN assert that the scalar att SE exceeds a
+        loose lower bound corresponding to the maximum possible
+        uncorrected SE on this fixture.
+        """
+        df = generate_butts_staggered_dgp(
+            seed=0,
+            tau_per_event_time=lambda k: -0.07,
+        )
+        est = SpilloverDiD(
+            rings=[0.0, 50.0, 200.0],
+            d_bar=200.0,
+            conley_coords=("lat", "lon"),
+            event_study=True,
+            horizon_max=2,
+        )
+        import warnings as _w
+
+        with _w.catch_warnings():
+            _w.simplefilter("ignore", UserWarning)
+            res = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat")
+
+        # Loose lower-bound check: SE > 0 and finite. The directional
+        # inflation invariant is exercised on the aggregate path in
+        # TestSpilloverDiDEventStudyBackwardCompat::test_wave_d_se_inflates_...
+        assert res.se > 0
+        assert np.isfinite(res.se)
+
+
+class TestSpilloverDiDWaveDGmmCorrectedNanInferenceContract:
+    """Wave D NaN-propagation contract per `feedback_no_silent_failures`."""
+
+    def test_rank_deficient_design_yields_nan_se_not_zero(self):
+        """When solve_ols drops a rank-deficient column, the corresponding
+        vcov diagonal entry is NaN (re-inflation pattern). Downstream
+        per-coefficient SE for that column is NaN — never silently 0.
+        """
+        # Use the existing fail-closed fixture infrastructure: monkeypatch
+        # solve_ols to return a coef vector with a NaN entry.
+        from unittest.mock import patch
+
+        import diff_diff.spillover as spillover_mod
+
+        df = generate_butts_nonstaggered_dgp(seed=0)
+
+        original_solve_ols = spillover_mod.solve_ols
+
+        def coef_nan_solve_ols(X, y, **kwargs):
+            coef, residuals, vcov = original_solve_ols(X, y, **kwargs)
+            # Inject NaN into the LAST coefficient column to simulate a
+            # rank-deficient drop. solve_ols normally sets NaN on coefs it
+            # dropped; we forcibly do so here.
+            coef = coef.copy()
+            coef[-1] = np.nan
+            return coef, residuals, vcov
+
+        with patch.object(spillover_mod, "solve_ols", side_effect=coef_nan_solve_ols):
+            est = SpilloverDiD(
+                rings=[0.0, 50.0, 200.0],
+                d_bar=200.0,
+                conley_coords=("lat", "lon"),
+                event_study=False,
+            )
+            import warnings as _w
+
+            with _w.catch_warnings():
+                _w.simplefilter("ignore", UserWarning)
+                res = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+        # The OUTER ring (last column) was forced rank-deficient; its SE
+        # must be NaN, not 0. The other coefficients should still have
+        # finite SE (the Wave D re-inflation pattern preserves them).
+        outer_se = float(res.spillover_effects.loc["[50, 200]"]["se"])
+        assert np.isnan(outer_se), f"rank-deficient outer ring SE should be NaN, got {outer_se!r}"
+
+
+class TestSpilloverDiDWaveDGmmCorrectedValidatorWiring:
+    """Wave D bypasses solve_ols's Conley vcov path; the Conley validator
+    must still fire from `_compute_gmm_corrected_meat`."""
+
+    def test_conley_without_cutoff_raises(self):
+        """vcov_type='conley' with conley_cutoff_km=None raises ValueError."""
+        df = generate_butts_nonstaggered_dgp(seed=0)
+        est = SpilloverDiD(
+            rings=[0.0, 50.0, 200.0],
+            d_bar=200.0,
+            conley_coords=("lat", "lon"),
+            vcov_type="conley",
+            conley_cutoff_km=None,
+            conley_metric="euclidean",
+            conley_lag_cutoff=0,
+        )
+        import warnings as _w
+
+        with _w.catch_warnings():
+            _w.simplefilter("ignore", UserWarning)
+            with pytest.raises(ValueError, match="conley_cutoff_km"):
+                est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+class TestSpilloverDiDWaveDGmmCorrectedFitIdempotence:
+    """fit() must not mutate estimator state; clone + repeat-fit produces
+    bit-identical Wave D vcov per `feedback_fit_does_not_mutate_config`."""
+
+    def test_clone_repeat_fit_bit_identical(self):
+        df = generate_butts_nonstaggered_dgp(seed=42)
+        kwargs = dict(
+            rings=[0.0, 50.0, 200.0],
+            d_bar=200.0,
+            conley_coords=("lat", "lon"),
+            event_study=False,
+        )
+        est_a = SpilloverDiD(**kwargs)
+        est_b = SpilloverDiD(**kwargs)
+        import warnings as _w
+
+        with _w.catch_warnings():
+            _w.simplefilter("ignore", UserWarning)
+            res_a = est_a.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+            res_b = est_b.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+        # Same-machine determinism: bit-identical att and se.
+        assert res_a.att == res_b.att
+        assert res_a.se == res_b.se
+        # Per-ring entries also bit-identical.
+        for label in ["[0, 50)", "[50, 200]"]:
+            assert (
+                res_a.spillover_effects.loc[label]["se"] == res_b.spillover_effects.loc[label]["se"]
+            )
+
+
+class TestSpilloverDiDWaveDPublicVarianceContract:
+    """End-to-end fit() coverage for the PUBLIC vcov_type / cluster contract.
+
+    Round-1 codex review caught two regressions where the helper-level
+    tests passed but the public-API contract was broken:
+      P0 — `cluster=<col>` silently routed to HC1 instead of CR1.
+      P1 — `vcov_type="classical"` raised an unhandled error inside
+            `_compute_gmm_corrected_meat` instead of failing fast at
+            validation time.
+
+    This class exercises the public surface to lock the contract.
+    """
+
+    def test_cluster_kwarg_routes_to_cr1_not_hc1(self):
+        """`SpilloverDiD(..., cluster="unit")` MUST produce CR1 SE, not HC1.
+
+        On a fixture with within-cluster correlation, CR1 SE is generically
+        DIFFERENT from HC1 SE — if both fits return the same SE to machine
+        precision, the cluster kwarg was silently ignored (the P0
+        regression that codex Round 1 surfaced).
+        """
+        df = generate_butts_nonstaggered_dgp(seed=42)
+        common = dict(
+            rings=[0.0, 50.0, 200.0],
+            d_bar=200.0,
+            conley_coords=("lat", "lon"),
+            event_study=False,
+        )
+        import warnings as _w
+
+        with _w.catch_warnings():
+            _w.simplefilter("ignore", UserWarning)
+            est_hc1 = SpilloverDiD(**common)  # vcov_type="hc1" default, no cluster
+            res_hc1 = est_hc1.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+            est_cr1 = SpilloverDiD(cluster="unit", **common)
+            res_cr1 = est_cr1.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+        # Point estimates match (cluster kwarg only affects variance).
+        assert res_hc1.att == res_cr1.att
+        # SE values must DIFFER — if equal, the cluster kwarg was a no-op.
+        assert res_hc1.se != res_cr1.se, (
+            f"HC1 SE {res_hc1.se!r} == CR1 SE {res_cr1.se!r}; "
+            f"cluster=<col> appears to be silently ignored"
+        )
+
+    def test_single_cluster_sample_raises(self):
+        """CR1 path on a single-cluster sample raises ValueError per
+        the standard `n_clusters >= 2` rejection (mirrors linalg.py:1942)."""
+        df = generate_butts_nonstaggered_dgp(seed=0)
+        df = df.copy()
+        df["fake_cluster"] = 0  # collapse all rows to a single cluster
+        est = SpilloverDiD(
+            rings=[0.0, 50.0, 200.0],
+            d_bar=200.0,
+            conley_coords=("lat", "lon"),
+            cluster="fake_cluster",
+            event_study=False,
+        )
+        import warnings as _w
+
+        with _w.catch_warnings():
+            _w.simplefilter("ignore", UserWarning)
+            with pytest.raises(ValueError, match="at least 2 clusters"):
+                est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+    def test_saturated_design_yields_nan_se_not_finite(self):
+        """`n_obs == p_2` saturated stage-2 design: HC1 multiplier
+        ``n/(n-p)`` is undefined. Wave D fails closed by returning NaN
+        meat → NaN SE downstream, rather than clamping the denominator
+        to 1 and emitting a finite SE on an underdetermined fit.
+        """
+        from scipy import sparse
+
+        from diff_diff.two_stage import _compute_gmm_corrected_meat
+
+        # Construct a saturated synthetic Psi fixture directly through the
+        # helper (avoids manufacturing a real saturated SpilloverDiD panel,
+        # which is constrained by the validator). n_obs == p_2 == 4.
+        n, p_1, p_2 = 4, 3, 4
+        rng = np.random.default_rng(0)
+        X_1 = sparse.csr_matrix(rng.standard_normal((n, p_1)))
+        X_10 = sparse.csr_matrix(rng.standard_normal((n, p_1)))
+        eps_10 = rng.standard_normal(n)
+        X_2 = rng.standard_normal((n, p_2))
+        eps_2 = rng.standard_normal(n)
+
+        import warnings as _w
+
+        for vmode, kwargs in [
+            ("hc1", {}),
+            ("cluster", {"cluster_ids": np.array([0, 0, 1, 1])}),
+        ]:
+            with _w.catch_warnings(record=True) as caught:
+                _w.simplefilter("always")
+                meat = _compute_gmm_corrected_meat(
+                    X_1_sparse=X_1,
+                    X_10_sparse=X_10,
+                    eps_10=eps_10,
+                    X_2=X_2,
+                    eps_2=eps_2,
+                    vcov_type=vmode,
+                    **kwargs,
+                )
+            assert np.all(np.isnan(meat)), (
+                f"vcov_type={vmode!r} saturated design (n=p_2={n}) returned "
+                f"finite meat instead of NaN: {meat!r}"
+            )
+            saturation_warning_fired = any("saturated" in str(w.message) for w in caught)
+            assert saturation_warning_fired, (
+                f"vcov_type={vmode!r} saturated design did not emit the "
+                f"expected saturation warning"
+            )
+
+    def test_classical_vcov_raises_with_clear_message(self):
+        """`vcov_type="classical"` raises NotImplementedError upfront with a
+        clear remediation message rather than failing deep inside the GMM
+        helper (the P1 regression that codex Round 1 surfaced)."""
+        df = generate_butts_nonstaggered_dgp(seed=0)
+        est = SpilloverDiD(
+            rings=[0.0, 50.0, 200.0],
+            d_bar=200.0,
+            conley_coords=("lat", "lon"),
+            vcov_type="classical",
+            event_study=False,
+        )
+        import warnings as _w
+
+        with _w.catch_warnings():
+            _w.simplefilter("ignore", UserWarning)
+            with pytest.raises(NotImplementedError, match="classical"):
+                est.fit(df, outcome="y", unit="unit", time="time", treatment="D")