igerber · igerber · May 17, 2026 · May 17, 2026 · May 17, 2026 · May 17, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/TODO.md b/TODO.md
@@ -127,7 +127,6 @@ Deferred items from PR reviews that were not addressed before merge.
 | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
 | Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
 | `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
-| `SpilloverDiD` Gardner GMM first-stage uncertainty correction at stage 2. Wave B MVP uses standard `solve_ols` variance (HC1 / Conley / cluster) without the influence-function adjustment for stage-1 FE estimation. Extending `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the IF outer-product step gives the full Butts (2021) Section 3.1 + Gardner (2022) Section 4 composition. See plan Risks #2 for the IF formula. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_variance` | follow-up (Wave B) | Medium |
 | `SpilloverDiD(survey_design=...)` integration. Currently raises `NotImplementedError`. Requires threading survey weights through the inline stage 1 + stage 2 and lifting `two_stage.py`'s survey path patterns. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Low |
 | `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
 | `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |

diff --git a/diff_diff/conley.py b/diff_diff/conley.py
@@ -765,36 +765,40 @@ def _validate_conley_kwargs(
             )
 
 
-def _compute_conley_vcov(
-    X: np.ndarray,
-    residuals: np.ndarray,
+def _compute_conley_meat(
+    scores: np.ndarray,
     coords: np.ndarray,
     cutoff: float,
     metric: ConleyMetric,
     kernel: str,
-    bread_matrix: np.ndarray,
     *,
     time: Optional[np.ndarray] = None,
     unit: Optional[np.ndarray] = None,
     lag_cutoff: Optional[int] = None,
     cluster_ids: Optional[np.ndarray] = None,
     _conley_sparse: Optional[bool] = None,
 ) -> np.ndarray:
-    """Conley (1999) spatial HAC sandwich variance.
+    """Conley (1999) spatial HAC meat — ``scores' K scores`` for the product kernel.
+
+    Factors the kernel-construction-and-application step out of
+    :func:`_compute_conley_vcov` so a second caller (SpilloverDiD's Wave D
+    Gardner GMM first-stage correction) can reuse the cross-sectional /
+    panel-block / sparse k-d-tree / cluster-product code path with an
+    arbitrary score matrix substituted for the canonical ``X * residuals``.
 
     Two operating modes:
 
     **Cross-sectional** (``time`` / ``unit`` / ``lag_cutoff`` all None):
 
-        Var̂(β) = bread_inv · (Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j') · bread_inv
+        meat = Σ_{i,j} K(d_ij/h) · scores_i scores_j'
 
-    Implemented via ``meat = S' K S`` where ``S = X * residuals[:, None]``.
+    Implemented via ``meat = scores' K scores``.
 
     **Panel block-decomposed** (all three keyword-only args set):
 
-        XeeX_spatial = Σ_t  S_t' · K_space_t · S_t                    (within-period sum)
-        XeeX_serial  = Σ_u  S_u' · K_time_u · S_u   if lag_cutoff > 0  (within-unit sum)
-        Var̂(β) = bread_inv · (XeeX_spatial + XeeX_serial) · bread_inv
+        meat_spatial = Σ_t  scores_t' · K_space_t · scores_t   (within-period sum)
+        meat_serial  = Σ_u  scores_u' · K_time_u · scores_u    if lag_cutoff > 0
+        meat = meat_spatial + meat_serial
 
     The serial Bartlett kernel ``K_time_u[i, j] = 1{|t_i-t_j| <= L, i != j} ·
     (1 - |t_i-t_j|/(L+1))`` is hardcoded regardless of the user-supplied
@@ -815,23 +819,17 @@ def _compute_conley_vcov(
     Inputs are assumed already validated by :func:`_validate_conley_kwargs`;
     the helper only does the math. Caller is responsible for the validator.
 
+    Emits a ``UserWarning`` if the smallest meat eigenvalue is materially
+    negative (< -1e-12) — radial 1-D Bartlett and uniform kernels are
+    practitioner specializations of Conley 1999 and are not formally
+    PSD-guaranteed.
+
     Returns
     -------
-    vcov : ndarray of shape (k, k)
-
-    Notes
-    -----
-    Neither the uniform kernel (negative spectral regions, Conley 1999
-    footnote 11) nor the **radial 1-D Bartlett** specialization implemented
-    here is PSD-guaranteed. Conley's explicit PSD formula (Eq 3.14) is the
-    2-D separable product window on a lattice; the radial pairwise form is
-    a practitioner specialization (R ``conleyreg``, Stata ``acreg``, Hsiang
-    2010) that is not formally PSD. We emit a ``UserWarning`` if the smallest
-    meat eigenvalue is materially negative (< -1e-12) regardless of kernel.
+    meat : ndarray of shape (p, p)
     """
     coords_arr = np.asarray(coords, dtype=np.float64)
-    S = X * residuals[:, np.newaxis]
-    n = X.shape[0]
+    n = scores.shape[0]
 
     # Factorize cluster_ids once if supplied, so per-slice mask construction
     # below can use integer comparisons rather than re-factorizing per call.
@@ -877,7 +875,7 @@ def _kernel_fn(u: np.ndarray) -> np.ndarray:
         )
 
     def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
-        """Compute the spatial meat ``S' K S`` for the given subset of rows.
+        """Compute the spatial meat ``scores' K scores`` for the given subset.
 
         ``mask`` may be ``None`` (use all rows) or a boolean array of length n.
         Dispatches to the sparse helper when ``use_sparse`` is True, otherwise
@@ -886,11 +884,11 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
         (per-slice mask, NOT full n×n — saves memory for panel paths).
         """
         if mask is None:
-            S_sub = S
+            scores_sub = scores
             coords_sub = coords_arr
             cluster_sub = cluster_codes
         else:
-            S_sub = S[mask]
+            scores_sub = scores[mask]
             coords_sub = coords_arr[mask]
             cluster_sub = cluster_codes[mask] if cluster_codes is not None else None
         if use_sparse:
@@ -902,7 +900,7 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
             # would use more memory than dense); fall through to dense in
             # that case (the warning is already emitted by the helper).
             sparse_meat = _compute_spatial_bartlett_meat_sparse(
-                S_sub, coords_sub, cutoff, cast(str, metric), cluster_codes=cluster_sub
+                scores_sub, coords_sub, cutoff, cast(str, metric), cluster_codes=cluster_sub
             )
             if sparse_meat is not None:
                 return sparse_meat
@@ -912,19 +910,19 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
         if cluster_sub is not None:
             cluster_mask = cluster_sub[:, None] == cluster_sub[None, :]
             K = K * cluster_mask
-        return S_sub.T @ K @ S_sub
+        return scores_sub.T @ K @ scores_sub
 
     # Suppress spurious BLAS-level "divide by zero / overflow" warnings on
     # macOS Accelerate when K is sparse-ish (most off-diagonals are exactly
     # 0 outside the cutoff). The matmul result is mathematically correct;
     # the warning is a subnormal-handling false-positive in the AVX path.
     # We verify finiteness immediately after.
     if time is None:
-        # Phase 1 cross-sectional path: full n×n spatial sandwich.
+        # Cross-sectional path: full n×n spatial sandwich.
         with np.errstate(divide="ignore", over="ignore", invalid="ignore"):
             meat = _spatial_meat_for_mask(None)
     else:
-        # Phase 2 panel block-decomposed path (matches R conleyreg).
+        # Panel block-decomposed path (matches R conleyreg).
         time_arr = np.asarray(time)
         unit_arr = np.asarray(unit)
         # Normalize time labels to dense panel-period codes (0..T-1) so that
@@ -940,8 +938,8 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
         # `conley_lag_cutoff` is meaningfully a "number of observed panel
         # periods" regardless of label scale.
         _, time_codes = np.unique(time_arr, return_inverse=True)
-        k = X.shape[1]
-        meat = np.zeros((k, k))
+        p = scores.shape[1]
+        meat = np.zeros((p, p))
         # Spatial component: within-period sandwich, summed across periods.
         # _spatial_meat_for_mask dispatches to sparse or dense per the toggle.
         with np.errstate(divide="ignore", over="ignore", invalid="ignore"):
@@ -957,14 +955,14 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
             if L > 0:
                 for u_val in np.unique(unit_arr):
                     mask_u = unit_arr == u_val
-                    S_u = S[mask_u]
+                    scores_u = scores[mask_u]
                     # Use dense panel-period codes (NOT raw labels) for lag math.
                     t_u = time_codes[mask_u].astype(np.float64)
                     lag_mat = np.abs(t_u[:, None] - t_u[None, :])
                     K_u = ((lag_mat <= L) & (lag_mat != 0)).astype(np.float64) * (
                         1.0 - lag_mat / (L + 1.0)
                     )
-                    meat += S_u.T @ K_u @ S_u
+                    meat += scores_u.T @ K_u @ scores_u
     if not np.all(np.isfinite(meat)):
         raise ValueError(
             "Conley meat contains non-finite values; check residuals and "
@@ -991,6 +989,66 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
             stacklevel=3,
         )
 
+    return meat
+
+
+def _compute_conley_vcov(
+    X: np.ndarray,
+    residuals: np.ndarray,
+    coords: np.ndarray,
+    cutoff: float,
+    metric: ConleyMetric,
+    kernel: str,
+    bread_matrix: np.ndarray,
+    *,
+    time: Optional[np.ndarray] = None,
+    unit: Optional[np.ndarray] = None,
+    lag_cutoff: Optional[int] = None,
+    cluster_ids: Optional[np.ndarray] = None,
+    _conley_sparse: Optional[bool] = None,
+) -> np.ndarray:
+    """Conley (1999) spatial HAC sandwich variance.
+
+    Thin wrapper around :func:`_compute_conley_meat`: builds
+    ``scores = X * residuals[:, None]``, delegates the meat construction,
+    then wraps with the supplied bread inverse.
+
+    Two operating modes (both delegated to :func:`_compute_conley_meat`):
+
+    **Cross-sectional** (``time`` / ``unit`` / ``lag_cutoff`` all None):
+
+        Var̂(β) = bread_inv · (Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j') · bread_inv
+
+    **Panel block-decomposed** (all three keyword-only args set):
+
+        XeeX_spatial = Σ_t  S_t' · K_space_t · S_t                    (within-period sum)
+        XeeX_serial  = Σ_u  S_u' · K_time_u · S_u   if lag_cutoff > 0  (within-unit sum)
+        Var̂(β) = bread_inv · (XeeX_spatial + XeeX_serial) · bread_inv
+
+    See :func:`_compute_conley_meat` for the kernel choice, sparse
+    k-d-tree fallback, cluster-product kernel, and PSD-warning details.
+
+    Inputs are assumed already validated by :func:`_validate_conley_kwargs`;
+    this wrapper only does the math. Caller is responsible for the validator.
+
+    Returns
+    -------
+    vcov : ndarray of shape (k, k)
+    """
+    scores = X * residuals[:, np.newaxis]
+    meat = _compute_conley_meat(
+        scores,
+        coords,
+        cutoff,
+        metric,
+        kernel,
+        time=time,
+        unit=unit,
+        lag_cutoff=lag_cutoff,
+        cluster_ids=cluster_ids,
+        _conley_sparse=_conley_sparse,
+    )
+
     # Sandwich via two solves (mirrors _compute_cr2_bm pattern in linalg.py)
     try:
         temp = np.linalg.solve(bread_matrix, meat)

diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
@@ -498,12 +498,13 @@ sp.fit(
 ) -> SpilloverDiDResults
 ```
 
-**Restrictions (Wave B MVP — planned follow-ups):**
+**Restrictions and Wave C/D status:**
 
-- `covariates=` raises `NotImplementedError`. Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates. Planned follow-up.
-- `survey_design=` raises `NotImplementedError` (planned: SurveyDesign integration)
-- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs share the same Wave B Gardner-GMM caveat (biased downward by a few percent; Wave D follow-up).
-- Stage-2 variance is `solve_ols` HC1 / Conley / cluster — Gardner GMM first-stage uncertainty correction NOT applied (planned follow-up; SE is biased downward / too small, CIs too narrow, p-values too small — treat reported significance conservatively until the GMM correction lands)
+- `covariates=` raises `NotImplementedError` (planned follow-up). Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates.
+- `survey_design=` raises `NotImplementedError` (planned follow-up — SurveyDesign integration)
+- `vcov_type="classical"` raises `NotImplementedError` (Wave D restriction). Wave D GMM first-stage correction has not been derived for the homoskedastic meat structure `sigma_hat^2 * (X_10' X_10)`. Use `vcov_type="hc1"`, `vcov_type="conley"`, or pair with `cluster=<col>` for CR1 — all three apply the Wave D GMM correction.
+- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs include the Wave D Gardner GMM first-stage correction (see next bullet).
+- Stage-2 variance applies the Gardner GMM first-stage uncertainty correction across HC1 / Conley / cluster (Wave D, SHIPPED). The IF outer-product formula `psi_i = gamma_hat' X_{10,i} eps_{10,i} - X_{2,i} eps_{2,i}` is used unconditionally; kernel `K` is path-dependent (identity for HC1, block-indicator for cluster, spatial kernel for Conley). Documented synthesis of Butts (2021) §3.1 + Gardner (2022) §4 + Conley (1999); no reference software combines all three. Point estimates unchanged from Wave B/C; SE values shift upward by 1-few percent.
 - Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed
 
 **Usage:**