diff --git a/CHANGELOG.md b/CHANGELOG.md
index 39888e0d..11e3a78f 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,11 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- **`DifferenceInDifferences(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:382`). Previously raised `NotImplementedError` because the HC2 leverage correction and CR2 Bell-McCaffrey DOF depend on the FULL FE hat matrix, while within-transformation (FWL) preserves coefficients and residuals but not the hat. Lift via internal auto-route: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, the fit promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov. Empirically matches `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` at ~1e-10 (verified via new `tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `absorbed_fe_did`, with the R generator using the singleton-cluster CR2 trick for one-way HC2-BM Satterthwaite DOF). HC1/CR1 paths unchanged. `MultiPeriodDiD(absorb=...)` and `TwoWayFixedEffects` rejections remain as follow-ups (different fit-path structure). **Behavioral note (full `DiDResults` surface change under auto-route):** under the auto-route, the entire returned `DiDResults` reflects the full-dummy fit rather than the within-transformed fit. Specifically, `result.coefficients` and `result.vcov` include the FE-dummy entries (matching the `fixed_effects=` path), `result.residuals` and `result.fitted_values` are on the un-demeaned outcome scale, and `result.r_squared` is computed on the un-demeaned outcome (so it absorbs the FE variance and will typically be higher than the within-R²). `result.att` is invariant to this routing (FWL guarantee). Downstream consumers reading `result.att` are unaffected; consumers reading the broader result surface should expect the full-dummy values. **Survey-design scope:** the auto-route changes the FE handling (and removes the prior absorbed-FE rejection), but `survey_design=` continues to drive its own variance path (Taylor-series linearization or replicate-weight variance, per the existing survey contract) rather than the analytical HC2/HC2-BM sandwich. The auto-route is therefore methodologically meaningful for non-survey fits and for the FE-handling side of survey fits; analytical small-sample inference under `vcov_type in {"hc2","hc2_bm"}` is bypassed when a survey design is supplied.
+- **`SpilloverDiD` Gardner GMM first-stage uncertainty correction across HC1 / Conley / cluster (Wave D).** Closes the documented Wave B/C "SEs biased downward by a few percent" caveat. **Documented synthesis** of Butts (2021) Section 3.1 (the IF construction for spillover-aware DiD) + Gardner (2022) Section 4 (the two-stage GMM sandwich) + Conley (1999) (the spatial kernel). No reference software combines all three — `did2s` (Butts & Gardner) implements the Gardner correction without rings or Conley; `conleyreg` and `acreg` implement Conley without the two-stage correction. Wave D is the synthesis. Applies unconditionally under `vcov_type ∈ {"hc1", "conley", "cluster"}` for both `event_study=False` AND `event_study=True`. **Formula** (Butts 2021 §3.1 + Gardner 2022 §4): `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` where `gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)` is the stage-1-projection-of-stage-2 cross-moment; meat = `Psi' K Psi` with `K` dispatched by `vcov_type` (identity for HC1, block-indicator for cluster, spatial kernel for Conley); vcov = `(X_2' X_2)^{-1} @ meat @ (X_2' X_2)^{-1}`. **Finite-sample multipliers:** `n/(n-p)` for HC1; `G/(G-1) * (n-1)/(n-p)` for cluster CR1; no multiplier for Conley (preserves `conleyreg` / Wave B convention). **Public surface:** `vcov_type="classical"` now raises `NotImplementedError` upfront (the Wave D synthesis has not been derived for the homoskedastic meat structure `sigma_hat^2 * (X_10' X_10)`); REGISTRY's "vcov_type restrictions" block updated accordingly. **Point estimates unchanged** (`tau_total`, `delta_j`, event-study `tau_k` / `delta_jk` are byte-identical to Wave B/C); SE values shift upward by 1-few percent depending on first-stage residual variance. **Implementation:** new module-level helper `_compute_gmm_corrected_meat` in `diff_diff/two_stage.py` (NOT a modification of the existing `_compute_gmm_variance` method — TwoStageDiD's path is unchanged); new module-level helper `_build_butts_fe_design_csr` in `diff_diff/spillover.py`; new module-level helper `_compute_conley_meat` in `diff_diff/conley.py` factored out of `_compute_conley_vcov` so the same kernel-application code path handles both standard sandwich (`X * residuals`) and Wave D IF outer product (`Psi`) cases. **No new public API kwarg** — the correction is unconditional. Wave D variance mode dispatch derives from the public contract: `vcov_type="conley"` → `"conley"`; `cluster=
` → `"cluster"` (CR1); otherwise `"hc1"`. **Wave B/C SE goldens re-pinned** at `tests/test_spillover.py::TestSpilloverDiDEventStudyBackwardCompat` (constants renamed `_WAVE_B_GOLDEN_*` → `_WAVE_D_GOLDEN_*`; pre-Wave-D references retained as commented baselines for the directional inflation invariant `_WAVE_B_UNCORRECTED_*`). **Tests:** new test classes `TestSpilloverDiDWaveDGmmCorrectedHc1Hand` (hand-derived `Psi` on a 4-unit × 3-period over-identified panel — matches at `atol=1e-12`), `TestSpilloverDiDWaveDGmmCorrectedEventStudy` (vcov shape on event-study path), `TestSpilloverDiDWaveDGmmCorrectedNanInferenceContract` (rank-deficient column propagation), `TestSpilloverDiDWaveDGmmCorrectedValidatorWiring` (Conley validator fires from the new helper), `TestSpilloverDiDWaveDGmmCorrectedFitIdempotence` (clone + repeat-fit bit-identity per `feedback_fit_does_not_mutate_config`), `TestSpilloverDiDWaveDPublicVarianceContract` (end-to-end public `cluster=` CR1 routing, single-cluster rejection, classical NotImplementedError). Closes the Gardner-GMM follow-up row in `TODO.md`.
- **BaconDecomposition R parity goldens.** Closes the PR-B deferral row in `TODO.md`. JSON goldens at `benchmarks/data/r_bacondecomp_golden.json` generated from the committed `benchmarks/R/generate_bacon_golden.R` script (3 fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`) against `bacondecomp 0.1.1` on R 4.5.2. `tests/test_methodology_bacon.py::TestBaconParityR` now active (4 tests, no skips): TWFE coefficient parity at `atol=1e-6` across all 3 fixtures; weights-sum parity at `atol=1e-6` across all 3 fixtures; per-component estimate + weight parity at `atol=1e-6` on the 2 non-remap fixtures **and on the 6 timing-vs-timing rows of `always_treated_remapped`** (carve-out narrowed to U-bucket rows only); plus a dedicated fold-back test (`test_always_treated_remapped_fold_back_matches_r`) that pins the **documented convention divergence** on `always_treated_remapped` (R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort) by aggregating R's split rows per cohort and asserting they match Python's single fold at `atol=1e-6`. The aggregate is invariant per Theorem 1; the per-component breakdown differs structurally between conventions but the fold-back is now directly asserted. New `**Note (R parity convention divergence on always-treated)**` and `**Deviation (first-period boundary extension on always-treated remap)**` in `docs/methodology/REGISTRY.md`. **First-period boundary deviation:** the paper uses strict `t_i < 1` for the always-treated bucket; the library uses the inclusive `first_treat <= min(time)` rule and folds `first_treat == min(time)` cohorts into `U`. R does NOT apply this fold (it keeps such cohorts as their own bucket). When `min(time) > 1` the rules coincide. Explicitly labeled in REGISTRY's Deviations block and mirrored in `METHODOLOGY_REVIEW.md` and `bacon.py`. METHODOLOGY_REVIEW.md tracker row promoted `**Complete** (R parity goldens pending)` → `**Complete**`.
- **`generate_ddd_panel_data` — panel-structured DGP for Triple-Difference power analysis** (`diff_diff/prep_dgp.py`). New public function exported from `diff_diff` and `diff_diff.prep` for panel DDD simulations. Cross-sectional `generate_ddd_data` remains available unchanged. Produces a balanced panel of `n_units × n_periods` with two unit-level binary dimensions (`group`, `partition`) and a derived `post = 1[period >= treatment_period]` indicator; columns: `unit, period, outcome, group, partition, post, treated, true_effect` (+ `x1, x2` when `add_covariates=True`). DDD-CPT identification holds because the `group * partition` interaction enters as a unit-level (time-invariant) term, leaving the triple-interaction `treatment_effect * group * partition * post` as the sole source of differential group × partition trend. Compatible with `TripleDifference(cluster="unit").fit(..., time="post")` (the cluster kwarg is required because `TripleDifference` is the repeated-cross-section `panel=FALSE` estimator and unclustered SE on panel-generated rows understates variance under within-unit serial correlation; the point estimate `att` is invariant to clustering — see the new `TripleDifference` REGISTRY note on panel-shaped input). Users get panel-realistic unit fixed effects and within-unit serial correlation while the binary 2×2×2 estimator surface is unchanged. **Stratified allocation:** the partition split is drawn stratified-by-group at the requested `partition_frac` so every `(group, partition)` cell receives at least one unit; a targeted `ValueError` is raised at fit-time when the rounded cell counts (`n_units`, `group_frac`, `partition_frac`) would leave any cell empty. This guarantees the 2x2x2 DDD surface is populated for any valid input — independent marginal sampling (the cross-sectional `generate_ddd_data` convention) could collapse cells when marginals are small (e.g., `n_units=4, group_frac=partition_frac=0.25`). Validates `1 <= treatment_period < n_periods`, `group_frac` and `partition_frac` strictly in `(0, 1)`, and `n_units >= 4`. Deterministic recovery (`noise_sd=0`) matches `treatment_effect` to ~1e-15 (covered by `tests/test_prep.py::TestGenerateDddPanelData`, 16 tests including infeasible-config rejection and smallest-feasible-config round-trip through `TripleDifference.fit`). `power.simulate_power` is NOT yet auto-routed to the panel DGP for `TripleDifference` (the existing `_ddd_dgp_kwargs` registry entry still ignores `n_periods` and the existing `_check_ddd_dgp_compat` warning still fires on non-default kwargs) — that wiring is tracked as a follow-up in TODO.md.
- **BaconDecomposition: Goodman-Bacon (2021) methodology audit (PR-B).** Closes the BaconDecomposition row in `METHODOLOGY_REVIEW.md` (status flipped from **In Progress** → **Complete** — initially with an R-parity-goldens caveat that was closed by the parity-goldens bullet above in this same release). Builds on the PR #451 paper review at `docs/methodology/papers/goodman-bacon-2021-review.md`. **Audit outcomes:** (1) Rewrote `_recompute_exact_weights` in `bacon.py` to actually implement Theorem 1 (Eqs. 7-9 + 10e-g) — the prior "exact" implementation was missing the `(1-n_kU)` factor in the subsample variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper; the post-hoc sum-to-1 normalization masked the relative-weight error but produced ~0.3% decomposition error vs TWFE on a 3-cohort + never-treated DGP. The rewrite computes the exact numerators of Eqs. 10e/f/g and lets the post-hoc normalization handle the `V̂^D` denominator (Theorem 1's identity guarantees `V̂^D = Σ numerators`). The TWFE-vs-weighted-sum identity now holds at `atol=1e-10` on both noisy and hand-calculable DGPs. (2) Added always-treated warn+remap per paper footnote 11: units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` (untreated) bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning`. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (event-time encodings) are handled correctly; the `0` sentinel restriction applies only to `first_treat`, not to `time`, and a real treatment cohort with `first_treat == 0` would still be folded into U today (re-label such cohorts to a non-sentinel value before fitting). The user's original `first_treat` column is preserved unchanged. The count is surfaced as a new `BaconDecompositionResults.n_always_treated_remapped` dataclass field, rendered in `summary()` output when nonzero. **`n_never_treated` reports TRUE never-treated only**, computed from the original user column before remap — remapped always-treated units appear separately as `n_always_treated_remapped`, no double-counting. (3) New methodology test file `tests/test_methodology_bacon.py` (34 tests across 6 classes post-release; the audit added ~24 tests and the R-parity-goldens bullet above expanded coverage: `TestBaconHandCalculation` hand-checks Eqs. 7-9 + 10b-d on a minimal balanced panel at `atol=1e-10`; `TestBaconParityR` (4 tests, all active post-release once the R parity goldens bullet above landed; skips cleanly with a regenerate-instructions pointer in partial-checkout scenarios where the JSON is unavailable); `TestBaconAlwaysTreatedRemap` regression-tests warn+remap mechanics including user-data-preservation; `TestBaconEdgeCases` exercises no-untreated, single-cohort, unbalanced panel, constant-ATT recovery; `TestBaconWeightModes` locks the new exact-is-default contract; `TestBaconSurveyDesignNarrowing` confirms survey_design composes with exact mode and warn+remap). (4) R `bacondecomp::bacon()` parity generator committed at `benchmarks/R/generate_bacon_golden.R` covering three DGP fixtures (3-groups-with-U, 2-groups-no-U, always-treated-remapped); the JSON goldens deferral at audit time was closed in this same release by the parity-goldens bullet above. (5) `docs/methodology/REGISTRY.md` `## BaconDecomposition` block replaced with the paper-review-sourced entry plus three new sub-notes: weight modes (exact vs approximate), always-treated remap, R parity status. **Explicit removal:** the prior REGISTRY block's "Weights may be negative for later-vs-earlier comparisons" claim was incorrect per Theorem 1 (decomposition weights are strictly positive and sum to 1; negative weights are an estimand-level phenomenon, not estimator-level) and is dropped from the new entry. Closes the BaconDecomposition follow-up tracked at `TODO.md` (the prior row added in PR #451 is replaced by a narrower R-parity-goldens deferral row).
-- **`SpilloverDiD(event_study=True)` — per-event-time × ring decomposition (Butts 2021 Section 5 / Table 2).** Replaces the Wave B `NotImplementedError` gate with the full per-event-time × ring decomposition. Emits per-event-time direct effects `tau_k` and per-(ring, event-time) spillover effects `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) and a MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). A TwoStageDiD-compatible `event_study_effects: Dict[int, Dict[str, Any]]` alias (matching `two_stage.py:1355-1389` schema with `conf_int = (low, high)` tuple) is also emitted for consumption by `plot_event_study` (`SpilloverDiDResults` is wired into `_extract_plot_data` and prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic). `DiagnosticReport` integration is NOT wired in this PR — registering `SpilloverDiDResults` in `DiagnosticReport`'s applicability/method tables is queued as a follow-up. **Methodology spec:** the implementation operationalizes Butts Section 5's single `K_it` symbol as TWO event-time clocks — `K_direct = t - effective_first_treat(i)` for ever-treated unit rows, and `K_spill = t - earliest-in-range-cohort-onset(i)` for spillover rows (running min across activated cohorts; NaN for pre-trigger and far-away rows). `K_spill >= 0` structurally; negative-k spillover cells emit rectangularly with `coef = NaN, n_obs = 0`. **Reference period:** `ref_period = -1 - anticipation` (mirrors `TwoStageDiD` at `two_stage.py:486`); when `horizon_max` is set, `ref_period` must fall inside `[-horizon_max, +horizon_max]` or fit raises `ValueError` — silent floor-shift to `-horizon_max` would change identification (rejected per `feedback_no_silent_failures`). The reference row in `att_dynamic` / `event_study_effects` uses `coef = 0.0, se = 0.0, n_obs = 0, conf_int = (0.0, 0.0)` for TwoStageDiD parity. **`horizon_max` semantics (divergence from TwoStageDiD):** SpilloverDiD bins event-times outside `[-horizon_max, +horizon_max]` into endpoint pools (no observations dropped); TwoStageDiD filters those rows. The divergence is intentional and cross-documented. With `horizon_max=None`, the helper auto-detects the bin set from observed K values. **Scalar `att` aggregation:** when `event_study=True`, the top-level `att` is the **sample-share-weighted average** of post-treatment `tau_k` (`att = sum_{k >= 0} w_k * tau_k` with `w_k = n_treated_at_k / total`). SE comes from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment block of the stage-2 vcov — no separate fit. **Reduce-to-aggregate equivalence:** under a constant-tau DGP with `horizon_max=None`, the lincom-weighted scalar `att` reproduces Wave B's aggregate `tau_total` bit-identically in the deterministic limit (verified by `TestSpilloverDiDEventStudyReduceToAggregate`). Note: `horizon_max=0` is **not supported** under `event_study=True` (rejected at validation): the single bin `k=0` leaves no event-time pair to anchor the reference period against. Use `event_study=False` for a single aggregate direct effect (Wave B static spec); event-study mode requires `horizon_max>=1` or `horizon_max=None`. **Post-finite_mask sample contract:** `att_dynamic["n_obs"]`, `event_study_effects[k]["n_obs"]`, AND the scalar `att` share weights all reflect the POST-`finite_mask` stage-2 estimation sample (not the pre-mask design). On warn-and-drop fits (baseline-treated units without Omega_0 rows excluded), the reported `n_obs` per cell counts only rows that actually entered `solve_ols`. **Fail-closed scalar `att`:** if any post-treatment direct-effect coefficient is NaN (rank-deficient drop by `solve_ols`), the scalar `att` is set to NaN with an explicit warning rather than silently zeroing the dropped column's contribution via `np.nansum` on a fixed weight vector — inspect `att_dynamic` for the per-event-time coefficients and re-aggregate manually if appropriate. **Backward compatibility:** `event_study=False` leaves all Wave C fields (`att_dynamic`, `event_study_effects`, `horizon_max`, `reference_period`) as `None`. The aggregate stage-2 design construction, fit, and extraction logic on this path are byte-identical to Wave B; `TestSpilloverDiDEventStudyBackwardCompat` pins att / se / per-ring goldens captured on the unchanged aggregate path so any future drift fails the regression. **Variance:** same caveat as Wave B — per-event-time SEs use `solve_ols`'s standard variance (HC1 / Conley / cluster paths) WITHOUT the Gardner GMM first-stage uncertainty correction; planned Wave D follow-up closes this. **Tests:** `tests/test_spillover.py` adds 30 new test methods across event-study API, two-clock K helper, horizon binning, design builder, reference period, reduce-to-aggregate, identification MC (50 seeds, per-event-time tau_k recovery within 0.025), placebo pre-trends (Type I rate ≤ 0.30 over 50 seeds at alpha=0.10), singularity (rectangular schema), Conley integration (vcov shape + non-negative diagonal), summary/to_dict/pickle round-trip, event_study_effects schema parity with TwoStageDiD, lincom-att hand-computed, validation (`horizon_max < 0`, `ref_period < -horizon_max`), and fit idempotence. DGP factory `generate_butts_staggered_dgp` extended with `tau_per_event_time` and `delta_per_ring_per_event_time` callable kwargs (backward-compatible — both default to `None`, producing the Wave B scalar DGP bit-identically; verified by `tests/test_dgp_utils.py` with pinned SHA-256 baselines).
-- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
+- **`SpilloverDiD(event_study=True)` — per-event-time × ring decomposition (Butts 2021 Section 5 / Table 2).** Replaces the Wave B `NotImplementedError` gate with the full per-event-time × ring decomposition. Emits per-event-time direct effects `tau_k` and per-(ring, event-time) spillover effects `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) and a MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). A TwoStageDiD-compatible `event_study_effects: Dict[int, Dict[str, Any]]` alias (matching `two_stage.py:1355-1389` schema with `conf_int = (low, high)` tuple) is also emitted for consumption by `plot_event_study` (`SpilloverDiDResults` is wired into `_extract_plot_data` and prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic). `DiagnosticReport` integration is NOT wired in this PR — registering `SpilloverDiDResults` in `DiagnosticReport`'s applicability/method tables is queued as a follow-up. **Methodology spec:** the implementation operationalizes Butts Section 5's single `K_it` symbol as TWO event-time clocks — `K_direct = t - effective_first_treat(i)` for ever-treated unit rows, and `K_spill = t - earliest-in-range-cohort-onset(i)` for spillover rows (running min across activated cohorts; NaN for pre-trigger and far-away rows). `K_spill >= 0` structurally; negative-k spillover cells emit rectangularly with `coef = NaN, n_obs = 0`. **Reference period:** `ref_period = -1 - anticipation` (mirrors `TwoStageDiD` at `two_stage.py:486`); when `horizon_max` is set, `ref_period` must fall inside `[-horizon_max, +horizon_max]` or fit raises `ValueError` — silent floor-shift to `-horizon_max` would change identification (rejected per `feedback_no_silent_failures`). The reference row in `att_dynamic` / `event_study_effects` uses `coef = 0.0, se = 0.0, n_obs = 0, conf_int = (0.0, 0.0)` for TwoStageDiD parity. **`horizon_max` semantics (divergence from TwoStageDiD):** SpilloverDiD bins event-times outside `[-horizon_max, +horizon_max]` into endpoint pools (no observations dropped); TwoStageDiD filters those rows. The divergence is intentional and cross-documented. With `horizon_max=None`, the helper auto-detects the bin set from observed K values. **Scalar `att` aggregation:** when `event_study=True`, the top-level `att` is the **sample-share-weighted average** of post-treatment `tau_k` (`att = sum_{k >= 0} w_k * tau_k` with `w_k = n_treated_at_k / total`). SE comes from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment block of the stage-2 vcov — no separate fit. **Reduce-to-aggregate equivalence:** under a constant-tau DGP with `horizon_max=None`, the lincom-weighted scalar `att` reproduces Wave B's aggregate `tau_total` bit-identically in the deterministic limit (verified by `TestSpilloverDiDEventStudyReduceToAggregate`). Note: `horizon_max=0` is **not supported** under `event_study=True` (rejected at validation): the single bin `k=0` leaves no event-time pair to anchor the reference period against. Use `event_study=False` for a single aggregate direct effect (Wave B static spec); event-study mode requires `horizon_max>=1` or `horizon_max=None`. **Post-finite_mask sample contract:** `att_dynamic["n_obs"]`, `event_study_effects[k]["n_obs"]`, AND the scalar `att` share weights all reflect the POST-`finite_mask` stage-2 estimation sample (not the pre-mask design). On warn-and-drop fits (baseline-treated units without Omega_0 rows excluded), the reported `n_obs` per cell counts only rows that actually entered `solve_ols`. **Fail-closed scalar `att`:** if any post-treatment direct-effect coefficient is NaN (rank-deficient drop by `solve_ols`), the scalar `att` is set to NaN with an explicit warning rather than silently zeroing the dropped column's contribution via `np.nansum` on a fixed weight vector — inspect `att_dynamic` for the per-event-time coefficients and re-aggregate manually if appropriate. **Backward compatibility:** `event_study=False` leaves all Wave C fields (`att_dynamic`, `event_study_effects`, `horizon_max`, `reference_period`) as `None`. The aggregate stage-2 design construction, fit, and extraction logic on this path are byte-identical to Wave B; `TestSpilloverDiDEventStudyBackwardCompat` pins att / se / per-ring goldens captured on the unchanged aggregate path so any future drift fails the regression. **Variance:** at original Wave C ship time per-event-time SEs used `solve_ols`'s standard variance (HC1 / Conley / cluster paths) WITHOUT the Gardner GMM first-stage uncertainty correction. **Superseded by the Wave D Gardner GMM first-stage correction in this same release** (see the Wave D bullet above): per-event-time SEs now apply the IF outer-product correction unconditionally and shift upward by 1-few percent relative to the original Wave C ship-time values. **Tests:** `tests/test_spillover.py` adds 30 new test methods across event-study API, two-clock K helper, horizon binning, design builder, reference period, reduce-to-aggregate, identification MC (50 seeds, per-event-time tau_k recovery within 0.025), placebo pre-trends (Type I rate ≤ 0.30 over 50 seeds at alpha=0.10), singularity (rectangular schema), Conley integration (vcov shape + non-negative diagonal), summary/to_dict/pickle round-trip, event_study_effects schema parity with TwoStageDiD, lincom-att hand-computed, validation (`horizon_max < 0`, `ref_period < -horizon_max`), and fit idempotence. DGP factory `generate_butts_staggered_dgp` extended with `tau_per_event_time` and `delta_per_ring_per_event_time` callable kwargs (backward-compatible — both default to `None`, producing the Wave B scalar DGP bit-identically; verified by `tests/test_dgp_utils.py` with pinned SHA-256 baselines).
+- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP ship-time):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through) WITHOUT the Gardner GMM first-stage uncertainty correction. **Superseded by the Wave D Gardner GMM first-stage correction in this same release** (see the Wave D bullet above): the GMM correction now applies unconditionally across HC1 / Conley / CR1 (via `cluster=`), shifting SE values upward by 1-few percent relative to the original Wave B ship-time values. **Deferred features (planned follow-ups, as of Wave B ship-time):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Shipped in same release:** `event_study=True` (Wave C bullet above) + Gardner GMM first-stage correction (Wave D bullet above); remaining items still queued. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction.
- **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422.
### Changed
diff --git a/TODO.md b/TODO.md
index b1079ad3..e9304f0f 100644
--- a/TODO.md
+++ b/TODO.md
@@ -127,7 +127,6 @@ Deferred items from PR reviews that were not addressed before merge.
| SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low |
| Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium |
| `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
-| `SpilloverDiD` Gardner GMM first-stage uncertainty correction at stage 2. Wave B MVP uses standard `solve_ols` variance (HC1 / Conley / cluster) without the influence-function adjustment for stage-1 FE estimation. Extending `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the IF outer-product step gives the full Butts (2021) Section 3.1 + Gardner (2022) Section 4 composition. See plan Risks #2 for the IF formula. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_variance` | follow-up (Wave B) | Medium |
| `SpilloverDiD(survey_design=...)` integration. Currently raises `NotImplementedError`. Requires threading survey weights through the inline stage 1 + stage 2 and lifting `two_stage.py`'s survey path patterns. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Low |
| `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
| `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
diff --git a/diff_diff/conley.py b/diff_diff/conley.py
index 812a8f16..1df54697 100644
--- a/diff_diff/conley.py
+++ b/diff_diff/conley.py
@@ -765,14 +765,12 @@ def _validate_conley_kwargs(
)
-def _compute_conley_vcov(
- X: np.ndarray,
- residuals: np.ndarray,
+def _compute_conley_meat(
+ scores: np.ndarray,
coords: np.ndarray,
cutoff: float,
metric: ConleyMetric,
kernel: str,
- bread_matrix: np.ndarray,
*,
time: Optional[np.ndarray] = None,
unit: Optional[np.ndarray] = None,
@@ -780,21 +778,27 @@ def _compute_conley_vcov(
cluster_ids: Optional[np.ndarray] = None,
_conley_sparse: Optional[bool] = None,
) -> np.ndarray:
- """Conley (1999) spatial HAC sandwich variance.
+ """Conley (1999) spatial HAC meat — ``scores' K scores`` for the product kernel.
+
+ Factors the kernel-construction-and-application step out of
+ :func:`_compute_conley_vcov` so a second caller (SpilloverDiD's Wave D
+ Gardner GMM first-stage correction) can reuse the cross-sectional /
+ panel-block / sparse k-d-tree / cluster-product code path with an
+ arbitrary score matrix substituted for the canonical ``X * residuals``.
Two operating modes:
**Cross-sectional** (``time`` / ``unit`` / ``lag_cutoff`` all None):
- Var̂(β) = bread_inv · (Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j') · bread_inv
+ meat = Σ_{i,j} K(d_ij/h) · scores_i scores_j'
- Implemented via ``meat = S' K S`` where ``S = X * residuals[:, None]``.
+ Implemented via ``meat = scores' K scores``.
**Panel block-decomposed** (all three keyword-only args set):
- XeeX_spatial = Σ_t S_t' · K_space_t · S_t (within-period sum)
- XeeX_serial = Σ_u S_u' · K_time_u · S_u if lag_cutoff > 0 (within-unit sum)
- Var̂(β) = bread_inv · (XeeX_spatial + XeeX_serial) · bread_inv
+ meat_spatial = Σ_t scores_t' · K_space_t · scores_t (within-period sum)
+ meat_serial = Σ_u scores_u' · K_time_u · scores_u if lag_cutoff > 0
+ meat = meat_spatial + meat_serial
The serial Bartlett kernel ``K_time_u[i, j] = 1{|t_i-t_j| <= L, i != j} ·
(1 - |t_i-t_j|/(L+1))`` is hardcoded regardless of the user-supplied
@@ -815,23 +819,17 @@ def _compute_conley_vcov(
Inputs are assumed already validated by :func:`_validate_conley_kwargs`;
the helper only does the math. Caller is responsible for the validator.
+ Emits a ``UserWarning`` if the smallest meat eigenvalue is materially
+ negative (< -1e-12) — radial 1-D Bartlett and uniform kernels are
+ practitioner specializations of Conley 1999 and are not formally
+ PSD-guaranteed.
+
Returns
-------
- vcov : ndarray of shape (k, k)
-
- Notes
- -----
- Neither the uniform kernel (negative spectral regions, Conley 1999
- footnote 11) nor the **radial 1-D Bartlett** specialization implemented
- here is PSD-guaranteed. Conley's explicit PSD formula (Eq 3.14) is the
- 2-D separable product window on a lattice; the radial pairwise form is
- a practitioner specialization (R ``conleyreg``, Stata ``acreg``, Hsiang
- 2010) that is not formally PSD. We emit a ``UserWarning`` if the smallest
- meat eigenvalue is materially negative (< -1e-12) regardless of kernel.
+ meat : ndarray of shape (p, p)
"""
coords_arr = np.asarray(coords, dtype=np.float64)
- S = X * residuals[:, np.newaxis]
- n = X.shape[0]
+ n = scores.shape[0]
# Factorize cluster_ids once if supplied, so per-slice mask construction
# below can use integer comparisons rather than re-factorizing per call.
@@ -877,7 +875,7 @@ def _kernel_fn(u: np.ndarray) -> np.ndarray:
)
def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
- """Compute the spatial meat ``S' K S`` for the given subset of rows.
+ """Compute the spatial meat ``scores' K scores`` for the given subset.
``mask`` may be ``None`` (use all rows) or a boolean array of length n.
Dispatches to the sparse helper when ``use_sparse`` is True, otherwise
@@ -886,11 +884,11 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
(per-slice mask, NOT full n×n — saves memory for panel paths).
"""
if mask is None:
- S_sub = S
+ scores_sub = scores
coords_sub = coords_arr
cluster_sub = cluster_codes
else:
- S_sub = S[mask]
+ scores_sub = scores[mask]
coords_sub = coords_arr[mask]
cluster_sub = cluster_codes[mask] if cluster_codes is not None else None
if use_sparse:
@@ -902,7 +900,7 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
# would use more memory than dense); fall through to dense in
# that case (the warning is already emitted by the helper).
sparse_meat = _compute_spatial_bartlett_meat_sparse(
- S_sub, coords_sub, cutoff, cast(str, metric), cluster_codes=cluster_sub
+ scores_sub, coords_sub, cutoff, cast(str, metric), cluster_codes=cluster_sub
)
if sparse_meat is not None:
return sparse_meat
@@ -912,7 +910,7 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
if cluster_sub is not None:
cluster_mask = cluster_sub[:, None] == cluster_sub[None, :]
K = K * cluster_mask
- return S_sub.T @ K @ S_sub
+ return scores_sub.T @ K @ scores_sub
# Suppress spurious BLAS-level "divide by zero / overflow" warnings on
# macOS Accelerate when K is sparse-ish (most off-diagonals are exactly
@@ -920,11 +918,11 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
# the warning is a subnormal-handling false-positive in the AVX path.
# We verify finiteness immediately after.
if time is None:
- # Phase 1 cross-sectional path: full n×n spatial sandwich.
+ # Cross-sectional path: full n×n spatial sandwich.
with np.errstate(divide="ignore", over="ignore", invalid="ignore"):
meat = _spatial_meat_for_mask(None)
else:
- # Phase 2 panel block-decomposed path (matches R conleyreg).
+ # Panel block-decomposed path (matches R conleyreg).
time_arr = np.asarray(time)
unit_arr = np.asarray(unit)
# Normalize time labels to dense panel-period codes (0..T-1) so that
@@ -940,8 +938,8 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
# `conley_lag_cutoff` is meaningfully a "number of observed panel
# periods" regardless of label scale.
_, time_codes = np.unique(time_arr, return_inverse=True)
- k = X.shape[1]
- meat = np.zeros((k, k))
+ p = scores.shape[1]
+ meat = np.zeros((p, p))
# Spatial component: within-period sandwich, summed across periods.
# _spatial_meat_for_mask dispatches to sparse or dense per the toggle.
with np.errstate(divide="ignore", over="ignore", invalid="ignore"):
@@ -957,14 +955,14 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
if L > 0:
for u_val in np.unique(unit_arr):
mask_u = unit_arr == u_val
- S_u = S[mask_u]
+ scores_u = scores[mask_u]
# Use dense panel-period codes (NOT raw labels) for lag math.
t_u = time_codes[mask_u].astype(np.float64)
lag_mat = np.abs(t_u[:, None] - t_u[None, :])
K_u = ((lag_mat <= L) & (lag_mat != 0)).astype(np.float64) * (
1.0 - lag_mat / (L + 1.0)
)
- meat += S_u.T @ K_u @ S_u
+ meat += scores_u.T @ K_u @ scores_u
if not np.all(np.isfinite(meat)):
raise ValueError(
"Conley meat contains non-finite values; check residuals and "
@@ -991,6 +989,66 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray:
stacklevel=3,
)
+ return meat
+
+
+def _compute_conley_vcov(
+ X: np.ndarray,
+ residuals: np.ndarray,
+ coords: np.ndarray,
+ cutoff: float,
+ metric: ConleyMetric,
+ kernel: str,
+ bread_matrix: np.ndarray,
+ *,
+ time: Optional[np.ndarray] = None,
+ unit: Optional[np.ndarray] = None,
+ lag_cutoff: Optional[int] = None,
+ cluster_ids: Optional[np.ndarray] = None,
+ _conley_sparse: Optional[bool] = None,
+) -> np.ndarray:
+ """Conley (1999) spatial HAC sandwich variance.
+
+ Thin wrapper around :func:`_compute_conley_meat`: builds
+ ``scores = X * residuals[:, None]``, delegates the meat construction,
+ then wraps with the supplied bread inverse.
+
+ Two operating modes (both delegated to :func:`_compute_conley_meat`):
+
+ **Cross-sectional** (``time`` / ``unit`` / ``lag_cutoff`` all None):
+
+ Var̂(β) = bread_inv · (Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j') · bread_inv
+
+ **Panel block-decomposed** (all three keyword-only args set):
+
+ XeeX_spatial = Σ_t S_t' · K_space_t · S_t (within-period sum)
+ XeeX_serial = Σ_u S_u' · K_time_u · S_u if lag_cutoff > 0 (within-unit sum)
+ Var̂(β) = bread_inv · (XeeX_spatial + XeeX_serial) · bread_inv
+
+ See :func:`_compute_conley_meat` for the kernel choice, sparse
+ k-d-tree fallback, cluster-product kernel, and PSD-warning details.
+
+ Inputs are assumed already validated by :func:`_validate_conley_kwargs`;
+ this wrapper only does the math. Caller is responsible for the validator.
+
+ Returns
+ -------
+ vcov : ndarray of shape (k, k)
+ """
+ scores = X * residuals[:, np.newaxis]
+ meat = _compute_conley_meat(
+ scores,
+ coords,
+ cutoff,
+ metric,
+ kernel,
+ time=time,
+ unit=unit,
+ lag_cutoff=lag_cutoff,
+ cluster_ids=cluster_ids,
+ _conley_sparse=_conley_sparse,
+ )
+
# Sandwich via two solves (mirrors _compute_cr2_bm pattern in linalg.py)
try:
temp = np.linalg.solve(bread_matrix, meat)
diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt
index 4539681e..c19c54cc 100644
--- a/diff_diff/guides/llms-full.txt
+++ b/diff_diff/guides/llms-full.txt
@@ -498,12 +498,13 @@ sp.fit(
) -> SpilloverDiDResults
```
-**Restrictions (Wave B MVP — planned follow-ups):**
+**Restrictions and Wave C/D status:**
-- `covariates=` raises `NotImplementedError`. Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates. Planned follow-up.
-- `survey_design=` raises `NotImplementedError` (planned: SurveyDesign integration)
-- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs share the same Wave B Gardner-GMM caveat (biased downward by a few percent; Wave D follow-up).
-- Stage-2 variance is `solve_ols` HC1 / Conley / cluster — Gardner GMM first-stage uncertainty correction NOT applied (planned follow-up; SE is biased downward / too small, CIs too narrow, p-values too small — treat reported significance conservatively until the GMM correction lands)
+- `covariates=` raises `NotImplementedError` (planned follow-up). Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates.
+- `survey_design=` raises `NotImplementedError` (planned follow-up — SurveyDesign integration)
+- `vcov_type="classical"` raises `NotImplementedError` (Wave D restriction). Wave D GMM first-stage correction has not been derived for the homoskedastic meat structure `sigma_hat^2 * (X_10' X_10)`. Use `vcov_type="hc1"`, `vcov_type="conley"`, or pair with `cluster=` for CR1 — all three apply the Wave D GMM correction.
+- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs include the Wave D Gardner GMM first-stage correction (see next bullet).
+- Stage-2 variance applies the Gardner GMM first-stage uncertainty correction across HC1 / Conley / cluster (Wave D, SHIPPED). The IF outer-product formula `psi_i = gamma_hat' X_{10,i} eps_{10,i} - X_{2,i} eps_{2,i}` is used unconditionally; kernel `K` is path-dependent (identity for HC1, block-indicator for cluster, spatial kernel for Conley). Documented synthesis of Butts (2021) §3.1 + Gardner (2022) §4 + Conley (1999); no reference software combines all three. Point estimates unchanged from Wave B/C; SE values shift upward by 1-few percent.
- Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed
**Usage:**
diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py
index 15a32cd0..541dce5e 100644
--- a/diff_diff/spillover.py
+++ b/diff_diff/spillover.py
@@ -30,6 +30,7 @@
import numpy as np
import pandas as pd
+from scipy import sparse
from diff_diff.conley import (
_CONLEY_EARTH_RADIUS_KM,
@@ -39,6 +40,7 @@
)
from diff_diff.linalg import solve_ols
from diff_diff.results import SpilloverDiDResults
+from diff_diff.two_stage import _compute_gmm_corrected_meat
from diff_diff.utils import safe_inference
# Type alias mirroring diff_diff.conley.ConleyMetric so callers can supply
@@ -1495,6 +1497,89 @@ def _residualize_butts(
return y_full - mu_per_row - lambda_per_row
+def _build_butts_fe_design_csr(
+ unit_codes: np.ndarray,
+ time_codes: np.ndarray,
+ omega_0_mask: np.ndarray,
+) -> Tuple[sparse.csr_matrix, sparse.csr_matrix]:
+ """Build sparse FE design matrices for Wave D Gardner GMM correction.
+
+ Column layout: ``[unit_1, ..., unit_{U-1}, time_1, ..., time_{T-1}]``.
+ Drops the first unit dummy AND the first time dummy for identification
+ (mirrors ``TwoStageDiD._build_fe_design`` at ``two_stage.py:2046``).
+
+ Parameters
+ ----------
+ unit_codes : np.ndarray of shape (n,)
+ Integer codes 0..U-1 (from ``pd.factorize``).
+ time_codes : np.ndarray of shape (n,)
+ Integer codes 0..T-1 (from ``pd.factorize``).
+ omega_0_mask : np.ndarray of shape (n,)
+ Boolean mask. ``X_10`` rows where this is False are zeroed out
+ (treated AND exposed rows). ``X_1`` keeps all rows.
+
+ Returns
+ -------
+ X_1 : sparse.csr_matrix, shape (n, (U-1) + (T-1))
+ Full-sample FE design with identification dropping.
+ X_10 : sparse.csr_matrix, shape (n, (U-1) + (T-1))
+ Same column space as ``X_1`` but with ``~omega_0_mask`` rows zeroed.
+ Sharing column space is required for the Gardner cross-moment
+ ``gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)``.
+
+ Notes
+ -----
+ Rank-deficient ``X_10' X_10`` (e.g. warn-and-drop units with no
+ Omega_0 rows) is detected downstream by ``_compute_gmm_corrected_meat``
+ via ``sparse_factorized`` failure → ``np.linalg.lstsq`` fallback with
+ a documented ``UserWarning``.
+
+ **Re-factorization on entry:** when callers pass pre-mask integer
+ codes that have had interior values dropped via ``finite_mask`` (a
+ supported warn-and-drop fit), the input code arrays can be sparse —
+ e.g. ``unit_codes = [0, 1, 3, 4]`` with code 2 dropped. Building
+ ``X_10`` on the raw codes would materialize an all-zero FE column at
+ index 2, forcing ``sparse_factorized`` onto the dense
+ ``lstsq``/``XtX_10.toarray()`` fallback unnecessarily (large-memory
+ path on big panels). To avoid this, re-factorize via
+ :func:`pd.factorize` on entry to compact the code space to
+ ``0..n_unique-1`` (no-op when codes are already contiguous; mirrors
+ the column-space convention of ``TwoStageDiD._build_fe_design``).
+ """
+ # Compact the code space before column construction — see Notes.
+ unit_codes = pd.factorize(unit_codes)[0]
+ time_codes = pd.factorize(time_codes)[0]
+
+ n = unit_codes.shape[0]
+ n_units = int(unit_codes.max()) + 1 if n > 0 else 0
+ n_times = int(time_codes.max()) + 1 if n > 0 else 0
+ n_fe_cols = max(n_units - 1, 0) + max(n_times - 1, 0)
+
+ def _build(mask: Optional[np.ndarray]) -> sparse.csr_matrix:
+ # Unit dummies (drop unit_code == 0 for identification).
+ u_keep = unit_codes > 0
+ if mask is not None:
+ u_keep = u_keep & mask
+ u_rows = np.flatnonzero(u_keep)
+ u_cols = unit_codes[u_keep] - 1
+
+ # Time dummies (drop time_code == 0 for identification).
+ t_keep = time_codes > 0
+ if mask is not None:
+ t_keep = t_keep & mask
+ t_rows = np.flatnonzero(t_keep)
+ t_cols = (max(n_units - 1, 0)) + (time_codes[t_keep] - 1)
+
+ rows = np.concatenate([u_rows, t_rows])
+ cols = np.concatenate([u_cols, t_cols])
+ data = np.ones(len(rows), dtype=np.float64)
+ return sparse.csr_matrix((data, (rows, cols)), shape=(n, n_fe_cols))
+
+ X_1 = _build(mask=None)
+ X_10 = _build(mask=omega_0_mask)
+ return X_1, X_10
+
+
# =============================================================================
# Public estimator (skeleton — fit() implemented in Step 3)
# =============================================================================
@@ -2026,11 +2111,19 @@ def fit(
Notes
-----
- Wave B MVP: stage-2 variance is the standard solve_ols estimator
- (HC1 / Conley / cluster). The Gardner GMM sandwich first-stage
- uncertainty correction is NOT applied (planned follow-up; see
- TODO and plan Risks #2). Variance is therefore approximate (likely
- underestimated by a few percent in typical settings).
+ Stage-2 variance applies the Wave D Gardner (2022) GMM first-stage
+ uncertainty correction across all supported ``vcov_type`` paths
+ (``"hc1"``, ``"conley"``, ``"cluster"`` via ``cluster=``). The
+ unified IF outer-product formula is
+ ``psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}``
+ with ``meat = Psi' K Psi`` where ``K`` is path-dependent (identity
+ for HC1, block-indicator for cluster, spatial kernel for Conley).
+ Documented synthesis of Butts (2021) §3.1 + Gardner (2022) §4 +
+ Conley (1999); no reference software combines all three.
+ ``vcov_type="classical"`` raises ``NotImplementedError`` because
+ the Wave D synthesis has not been derived for the homoskedastic
+ meat structure ``sigma_hat^2 * (X_10' X_10)``; use ``"hc1"`` for
+ heteroskedasticity-robust SE with the GMM correction.
"""
if survey_design is not None:
raise NotImplementedError(
@@ -2122,9 +2215,31 @@ def fit(
"degrees of freedom for correct p-values and CIs. Routing "
"stage 2 through LinearRegression (which supplies the "
"per-coefficient DOF metadata) is queued as a follow-up "
- "extension. Use vcov_type='hc1', 'classical', 'conley', or "
+ "extension. Use vcov_type='hc1' or 'conley', or "
"leave default; combine with cluster= for CR1."
)
+ if self.vcov_type == "classical":
+ # Wave D scope (user-confirmed 2026-05-17): the Gardner GMM
+ # first-stage uncertainty correction is implemented for HC1,
+ # Conley, and CR1 only. The classical (homoskedastic) variance
+ # has not been derived for the IF outer-product form in this
+ # PR — under classical assumptions the meat structure changes
+ # (`sigma_hat^2 * (X_10' X_10)` rather than `Psi' Psi`) and
+ # the Wave D synthesis (Butts §3.1 + Gardner §4 + Conley 1999)
+ # does not carry through directly. Reject upfront with a clear
+ # remediation message rather than silently HC1-ifying the
+ # request (per `feedback_no_silent_failures`).
+ raise NotImplementedError(
+ "SpilloverDiD does not support vcov_type='classical' under "
+ "the Wave D Gardner GMM first-stage uncertainty correction. "
+ "Wave D applies the GMM correction unconditionally and the "
+ "classical homoskedastic variance does not have a derived "
+ "IF outer-product form in the Wave D synthesis (Butts §3.1 "
+ "+ Gardner §4 + Conley 1999). Use vcov_type='hc1' for "
+ "heteroskedasticity-robust SE with the GMM correction, or "
+ "combine with cluster= for CR1 with the GMM correction. "
+ "Future PR may extend Wave D to the classical path."
+ )
# Step 0: defensive copy so the caller's DataFrame is never mutated.
data = data.copy(deep=False)
@@ -2590,29 +2705,151 @@ def fit(
if self.event_study and event_study_meta is not None:
event_study_meta["n_obs_per_col"] = (X_2_fit != 0).sum(axis=0).astype(np.int64)
- # Step 15: stage-2 OLS with configured vcov via solve_ols.
+ # Step 15: stage-2 OLS — coef + residuals only. Wave D computes the
+ # vcov below via the Gardner GMM first-stage uncertainty correction
+ # (documented synthesis of Butts §3.1 + Gardner §4 + Conley 1999).
+ # `solve_ols` returns vcov=None when return_vcov=False.
solve_kwargs: Dict[str, Any] = {
- "return_vcov": True,
+ "return_vcov": False,
"rank_deficient_action": self.rank_deficient_action,
"column_names": col_names_all,
- "vcov_type": self.vcov_type,
- "cluster_ids": cluster_ids_fit,
}
+ coef, residuals, _ = solve_ols(X_2_fit, y_tilde_fit, **solve_kwargs) # type: ignore[misc]
+
+ # Wave D: Gardner GMM first-stage uncertainty correction.
+ #
+ # Reconstruct the stage-1 residual `eps_10` on the FULL sample:
+ # - On Omega_0 rows: eps_10 = y - mu_hat[i] - lambda_hat[t]
+ # - On ~Omega_0 rows: eps_10 = y (since X_10[i, :] = 0 collapses
+ # the IF product to just the stage-2 term; matches the Gardner
+ # formula at `two_stage.py:1633-1637`).
+ # unit_fe_arr / time_fe_arr may have NaN at warn-and-drop units;
+ # the downstream `finite_mask` subset drops those rows BEFORE the
+ # GMM helper builds Psi (NaN in eps_10 is intentionally tolerated
+ # at this stage — it is masked out before any matrix operation).
+ alpha_full = unit_fe_arr[np.asarray(unit_codes_full)]
+ beta_full = time_fe_arr[np.asarray(time_codes_full)]
+ eps_10_full = np.where(omega_0_mask, y_full - alpha_full - beta_full, y_full)
+
+ # Subset stage-1 inputs to the fit sample (post-finite_mask).
+ if n_nan > 0:
+ eps_10_fit = eps_10_full[finite_mask]
+ unit_codes_fit = np.asarray(unit_codes_full)[finite_mask]
+ time_codes_fit = np.asarray(time_codes_full)[finite_mask]
+ omega_0_mask_fit = omega_0_mask[finite_mask]
+ else:
+ eps_10_fit = eps_10_full
+ unit_codes_fit = np.asarray(unit_codes_full)
+ time_codes_fit = np.asarray(time_codes_full)
+ omega_0_mask_fit = omega_0_mask
+
+ # Handle rank-deficient column drops from solve_ols (NaN coefs).
+ # Subset to kept columns before building Psi; re-inflate vcov with
+ # NaN at dropped positions at the end so downstream indexing
+ # (vcov[0, 0] for tau_se, etc.) behaves like the pre-Wave-D path.
+ kept_col_mask = np.isfinite(coef)
+ n_kept = int(kept_col_mask.sum())
+ if n_kept < len(coef):
+ X_2_kept = X_2_fit[:, kept_col_mask]
+ coef_kept = coef[kept_col_mask]
+ else:
+ X_2_kept = X_2_fit
+ coef_kept = coef
+ eps_2_fit = y_tilde_fit - X_2_kept @ coef_kept
+
+ # Build stage-1 FE designs on the fit sample. Column space:
+ # [unit_1, ..., unit_{U-1}, time_1, ..., time_{T-1}] (drop-first
+ # identification, matches `TwoStageDiD._build_fe_design`).
+ X_1_sparse_fit, X_10_sparse_fit = _build_butts_fe_design_csr(
+ unit_codes_fit,
+ time_codes_fit,
+ omega_0_mask_fit,
+ )
+
+ # Conley spatial kwargs only when vcov_type == "conley".
if self.vcov_type == "conley":
coord_array_full = np.asarray(data[list(self.conley_coords)].values, dtype=np.float64)
coord_array_fit = coord_array_full[finite_mask] if n_nan > 0 else coord_array_full
- solve_kwargs.update(
- {
- "conley_coords": coord_array_fit,
- "conley_cutoff_km": self.conley_cutoff_km,
- "conley_metric": self.conley_metric,
- "conley_time": time_vals_fit,
- "conley_unit": unit_vals_fit,
- "conley_lag_cutoff": self.conley_lag_cutoff,
- }
- )
+ _conley_coords_arg = coord_array_fit
+ _conley_cutoff_arg = self.conley_cutoff_km
+ _conley_metric_arg = self.conley_metric
+ _conley_time_arg = time_vals_fit
+ _conley_unit_arg = unit_vals_fit
+ _conley_lag_arg = self.conley_lag_cutoff
+ else:
+ _conley_coords_arg = None
+ _conley_cutoff_arg = None
+ _conley_metric_arg = None
+ _conley_time_arg = None
+ _conley_unit_arg = None
+ _conley_lag_arg = None
+
+ # Derive the Wave D variance mode from the PUBLIC contract:
+ # - vcov_type="conley" → "conley" (Conley spatial-HAC + GMM)
+ # - cluster= supplied → "cluster" (CR1 + GMM)
+ # - vcov_type="hc1" (default) → "hc1"
+ # `self.vcov_type` can be "hc1" / "classical" / "conley"; the public
+ # `cluster=` kwarg ORTHOGONALLY selects CR1. Pre-Wave-D the
+ # routing happened inside solve_ols; Wave D bypasses that path, so
+ # the dispatch must be reconstructed here. (Round 1 codex P0 fix:
+ # without this derivation, a user-supplied `cluster=` was
+ # silently ignored on the default hc1 path, yielding HC1 SEs when
+ # CR1 was requested.)
+ if self.vcov_type == "conley":
+ _wave_d_vcov_mode: "Literal['hc1', 'conley', 'cluster']" = "conley"
+ elif cluster_ids_fit is not None:
+ _wave_d_vcov_mode = "cluster"
+ else:
+ _wave_d_vcov_mode = "hc1"
+
+ # Compute the GMM-corrected meat (Psi' K Psi). Caller-side bread
+ # sandwich below mirrors `TwoStageDiD._compute_gmm_variance`
+ # at `two_stage.py:1763-1791`.
+ meat_kept = _compute_gmm_corrected_meat(
+ X_1_sparse=X_1_sparse_fit,
+ X_10_sparse=X_10_sparse_fit,
+ eps_10=eps_10_fit,
+ X_2=X_2_kept,
+ eps_2=eps_2_fit,
+ vcov_type=_wave_d_vcov_mode,
+ cluster_ids=cluster_ids_fit,
+ conley_coords=_conley_coords_arg,
+ conley_cutoff_km=_conley_cutoff_arg,
+ conley_metric=_conley_metric_arg,
+ conley_kernel="bartlett",
+ conley_time=_conley_time_arg,
+ conley_unit=_conley_unit_arg,
+ conley_lag_cutoff=_conley_lag_arg,
+ )
- coef, residuals, vcov = solve_ols(X_2_fit, y_tilde_fit, **solve_kwargs) # type: ignore[misc]
+ # Bread sandwich: A_22^{-1} = (X_2' X_2)^{-1} via `np.linalg.solve`
+ # with dense lstsq fallback + UserWarning (mirrors the bread-fallback
+ # pattern at `two_stage.py:1763-1788`).
+ A_22_kept = X_2_kept.T @ X_2_kept
+ eye_kept = np.eye(A_22_kept.shape[0])
+ try:
+ bread_kept = np.linalg.solve(A_22_kept, eye_kept)
+ except np.linalg.LinAlgError:
+ warnings.warn(
+ "SpilloverDiD Wave D bread: A_22 = X_2' X_2 is singular; "
+ "falling back to dense lstsq. SE may be unreliable.",
+ UserWarning,
+ stacklevel=2,
+ )
+ bread_kept = np.linalg.lstsq(A_22_kept, eye_kept, rcond=None)[0]
+ vcov_kept = bread_kept @ meat_kept @ bread_kept
+
+ # Re-inflate to (k, k) with NaN at rank-deficient column positions
+ # so downstream code (which indexes vcov[i, i] for per-coef SE) sees
+ # NaN for dropped columns — matches the pre-Wave-D solve_ols
+ # behavior at `linalg.py` (rank-deficient drops produce NaN coefs +
+ # NaN vcov entries).
+ if n_kept < len(coef):
+ vcov = np.full((len(coef), len(coef)), np.nan)
+ kept_idx = np.flatnonzero(kept_col_mask)
+ vcov[np.ix_(kept_idx, kept_idx)] = vcov_kept
+ else:
+ vcov = vcov_kept
# Step 16a: shared df_resid computation.
n_obs_eff = int(finite_mask.sum())
diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py
index 9fe3991f..6f29b397 100644
--- a/diff_diff/two_stage.py
+++ b/diff_diff/two_stage.py
@@ -23,18 +23,18 @@
import warnings
from dataclasses import replace
-from typing import Any, Dict, List, Optional, Tuple
+from typing import Any, Dict, List, Literal, Optional, Tuple
import numpy as np
import pandas as pd
from scipy import sparse
from scipy.sparse.linalg import factorized as sparse_factorized
-# Maximum number of elements before falling back to per-column sparse aggregation.
-# 10M float64 elements ≈ 80 MB peak allocation. Above this, per-column .getcol()
-# trades throughput for bounded memory. Keep in sync with two_stage_bootstrap.py.
-_SPARSE_DENSE_THRESHOLD = 10_000_000
-
+from diff_diff.conley import (
+ ConleyMetric,
+ _compute_conley_meat,
+ _validate_conley_kwargs,
+)
from diff_diff.linalg import solve_ols
from diff_diff.two_stage_bootstrap import TwoStageDiDBootstrapMixin
from diff_diff.two_stage_results import (
@@ -43,6 +43,253 @@
) # noqa: F401 (re-export)
from diff_diff.utils import safe_inference, warn_if_not_converged
+# Maximum number of elements before falling back to per-column sparse aggregation.
+# 10M float64 elements ≈ 80 MB peak allocation. Above this, per-column .getcol()
+# trades throughput for bounded memory. Keep in sync with two_stage_bootstrap.py.
+_SPARSE_DENSE_THRESHOLD = 10_000_000
+
+# =============================================================================
+# Wave D — Gardner GMM-corrected meat for SpilloverDiD
+# =============================================================================
+
+
+def _compute_gmm_corrected_meat(
+ *,
+ X_1_sparse: sparse.csr_matrix,
+ X_10_sparse: sparse.csr_matrix,
+ eps_10: np.ndarray,
+ X_2: np.ndarray,
+ eps_2: np.ndarray,
+ vcov_type: Literal["hc1", "conley", "cluster"],
+ cluster_ids: Optional[np.ndarray] = None,
+ conley_coords: Optional[np.ndarray] = None,
+ conley_cutoff_km: Optional[float] = None,
+ conley_metric: Optional[ConleyMetric] = None,
+ conley_kernel: str = "bartlett",
+ conley_time: Optional[np.ndarray] = None,
+ conley_unit: Optional[np.ndarray] = None,
+ conley_lag_cutoff: Optional[int] = None,
+) -> np.ndarray:
+ """Gardner (2022) GMM first-stage uncertainty correction — unified IF meat.
+
+ Returns the (p_2, p_2) meat matrix ``Psi' K Psi`` where:
+
+ psi_i = gamma_hat' @ x_{10,i} * eps_{10,i} - x_{2,i} * eps_{2,i}
+ Psi = [psi_1; ...; psi_n] shape (n, p_2)
+ K = path-dependent kernel matrix
+ meat = Psi' @ K @ Psi shape (p_2, p_2)
+
+ The caller wraps with the bread ``A_22^{-1} = (X_2' W X_2)^{-1}``:
+ ``V = A_22^{-1} @ meat @ A_22^{-1}``.
+
+ **Methodology synthesis** (Wave D): no reference software combines all
+ three ingredients. Butts (2021) §3.1 gives the IF construction for
+ spillover-aware DiD; Gardner (2022) §4 gives the two-stage GMM sandwich;
+ Conley (1999) gives the spatial kernel.
+
+ **Kernel dispatch:**
+
+ - ``vcov_type="hc1"``: ``K = I_n``; ``meat = Psi' @ Psi`` with
+ ``n / (n - p_2)`` finite-sample multiplier.
+ - ``vcov_type="cluster"``: ``K_ij = 1{cluster_i = cluster_j}``;
+ ``meat = S_cluster' @ S_cluster`` where ``S_cluster[g] = sum_{i in g} psi_i``,
+ with ``G/(G-1) * (n-1)/(n - p_2)`` finite-sample multiplier.
+ - ``vcov_type="conley"``: ``K_ij = K_space(d_ij/h) * 1{cluster_i = cluster_j}``
+ (cross-sectional) or panel-block decomposed (``conley_time`` /
+ ``conley_unit`` / ``conley_lag_cutoff`` set). No finite-sample
+ multiplier — preserves the ``conleyreg`` / Wave B convention.
+
+ **`gamma_hat` solve** (mirror of `TwoStageDiD._compute_gmm_variance`
+ pattern at `two_stage.py:1648-1670`): factorize ``X_10' X_10`` via
+ ``sparse_factorized`` (fast path); fall back to dense ``lstsq`` with
+ UserWarning when factorization fails. ``gamma_hat`` has shape
+ ``(p_1, p_2)``.
+
+ **Survey weights:** NOT supported in Wave D (parameter omitted).
+ Wave E will extend with stratified-survey × GMM × Conley methodology.
+
+ Parameters
+ ----------
+ X_1_sparse : sparse.csr_matrix, shape (n, p_1)
+ Full-sample FE design (drop-first-unit + drop-first-time
+ identification).
+ X_10_sparse : sparse.csr_matrix, shape (n, p_1)
+ FE design with treated AND exposed rows zeroed. Same column space
+ as X_1_sparse.
+ eps_10 : np.ndarray, shape (n,)
+ Stage-1 residual on Omega_0 rows; equal to y on ~Omega_0 rows
+ (the X_{10,i} = 0 product collapses the IF contribution to just
+ the stage-2 term).
+ X_2 : np.ndarray, shape (n, p_2)
+ Stage-2 design (treatment + ring columns for SpilloverDiD).
+ eps_2 : np.ndarray, shape (n,)
+ Stage-2 residual ``y_tilde - X_2 @ coef``.
+ vcov_type : {"hc1", "conley", "cluster"}
+ Kernel dispatch.
+ cluster_ids : np.ndarray of shape (n,), optional
+ Cluster identifiers. Required for ``vcov_type="cluster"``;
+ used as the product-kernel cluster mask under ``vcov_type="conley"``
+ when supplied. HC1 path passes ``None``.
+ conley_coords, conley_cutoff_km, conley_metric, conley_kernel,
+ conley_time, conley_unit, conley_lag_cutoff
+ Conley spatial-HAC kwargs. Required when ``vcov_type="conley"``.
+ See :func:`diff_diff.conley._compute_conley_meat` for semantics.
+
+ Returns
+ -------
+ meat : np.ndarray of shape (p_2, p_2)
+ The IF outer-product meat, including any finite-sample multiplier.
+ Caller wraps with the bread for the full vcov.
+ """
+ n, p_2 = X_2.shape
+
+ # Validate Conley kwargs explicitly here. SpilloverDiD's Wave D path
+ # bypasses solve_ols's vcov computation, so _validate_vcov_args /
+ # _validate_conley_kwargs would not otherwise fire on this call.
+ if vcov_type == "conley":
+ _validate_conley_kwargs(
+ conley_coords,
+ conley_cutoff_km,
+ conley_metric, # type: ignore[arg-type] # validator raises ValueError if None
+ conley_kernel,
+ n,
+ time=conley_time,
+ unit=conley_unit,
+ lag_cutoff=conley_lag_cutoff,
+ cluster_ids=cluster_ids,
+ )
+
+ # 1. gamma_hat = (X_10' X_10)^{-1} (X_1' X_2). Mirror the existing
+ # TwoStageDiD method at two_stage.py:1648-1670 — sparse_factorized
+ # fast path with dense lstsq fallback + UserWarning on singular.
+ XtX_10 = X_10_sparse.T @ X_10_sparse # (p_1, p_1) sparse
+ Xt1_X2 = X_1_sparse.T @ X_2 # (p_1, p_2) dense
+
+ try:
+ solve_XtX = sparse_factorized(XtX_10.tocsc())
+ if Xt1_X2.ndim == 1:
+ gamma_hat = solve_XtX(Xt1_X2).reshape(-1, 1)
+ else:
+ gamma_hat = np.column_stack([solve_XtX(Xt1_X2[:, j]) for j in range(Xt1_X2.shape[1])])
+ except RuntimeError as exc:
+ warnings.warn(
+ "SpilloverDiD Wave D GMM sandwich: sparse factorization of "
+ f"(X_10' X_10) failed ({type(exc).__name__}); falling back to "
+ "dense lstsq. This may indicate a rank-deficient or "
+ "near-singular Stage 1 design and SE estimates may be less "
+ "reliable.",
+ UserWarning,
+ stacklevel=2,
+ )
+ gamma_hat = np.linalg.lstsq(XtX_10.toarray(), Xt1_X2, rcond=None)[0]
+ if gamma_hat.ndim == 1:
+ gamma_hat = gamma_hat.reshape(-1, 1)
+
+ # 2. Psi = (X_10 @ gamma_hat) * eps_10[:, None] - X_2 * eps_2[:, None].
+ # sparse @ dense = dense; element-wise scale by eps_10; subtract
+ # the stage-2 contribution. Shape (n, p_2).
+ Psi_stage1 = X_10_sparse @ gamma_hat # (n, p_2) dense
+ Psi = Psi_stage1 * eps_10[:, None] - X_2 * eps_2[:, None]
+
+ if not np.all(np.isfinite(Psi)):
+ # Defensive: NaN in Psi would propagate silently through Psi.T @ Psi.
+ # Surface as a warning + return NaN meat so the downstream
+ # safe_inference path NaN-propagates per `feedback_no_silent_failures`.
+ warnings.warn(
+ "SpilloverDiD Wave D GMM sandwich: Psi matrix contains "
+ "non-finite values. Returning NaN meat; downstream inference "
+ "will NaN-propagate. This usually indicates rank-deficient "
+ "stage-1 FE design or non-finite residuals upstream.",
+ UserWarning,
+ stacklevel=2,
+ )
+ return np.full((p_2, p_2), np.nan)
+
+ # 3. Kernel dispatch.
+ if vcov_type == "hc1":
+ # K = I_n: meat = Psi' Psi with HC1 finite-sample multiplier.
+ # Fail closed when n - p_2 <= 0 (saturated design — every degree
+ # of freedom consumed by the stage-2 design): the multiplier
+ # n / (n - p_2) is undefined, so NaN-propagate per
+ # `feedback_no_silent_failures` rather than clamping the
+ # denominator and emitting finite SE on an underdetermined fit.
+ if n - p_2 <= 0:
+ warnings.warn(
+ "SpilloverDiD Wave D HC1 sandwich: saturated stage-2 design "
+ f"(n_obs={n}, effective_rank={p_2}, n-p_2={n - p_2} <= 0). "
+ "The HC1 finite-sample multiplier n/(n-p) is undefined. "
+ "Returning NaN meat so downstream inference NaN-propagates.",
+ UserWarning,
+ stacklevel=2,
+ )
+ return np.full((p_2, p_2), np.nan)
+ meat_unscaled = Psi.T @ Psi
+ multiplier = n / (n - p_2)
+ meat = multiplier * meat_unscaled
+ elif vcov_type == "cluster":
+ if cluster_ids is None:
+ raise ValueError(
+ "_compute_gmm_corrected_meat: vcov_type='cluster' requires "
+ "cluster_ids; got None."
+ )
+ # K_ij = 1{cluster_i = cluster_j}: aggregate Psi per-cluster then
+ # outer-product. S_cluster[g] = sum_{i in g} psi_i.
+ unique_clusters, cluster_indices = np.unique(cluster_ids, return_inverse=True)
+ G = len(unique_clusters)
+ # Mirror linalg.py:1942 — reject G<2 so the CR1 finite-sample
+ # multiplier G/(G-1) doesn't fabricate finite output on a degenerate
+ # one-cluster sample.
+ if G < 2:
+ raise ValueError(f"Need at least 2 clusters for cluster-robust SEs, got {G}")
+ # Fail closed on saturated design (n - p_2 <= 0). The CR1
+ # multiplier (n-1)/(n-p) is undefined; emitting finite SE here
+ # would be silently wrong.
+ if n - p_2 <= 0:
+ warnings.warn(
+ "SpilloverDiD Wave D CR1 sandwich: saturated stage-2 design "
+ f"(n_obs={n}, effective_rank={p_2}, n-p_2={n - p_2} <= 0). "
+ "The CR1 finite-sample multiplier (n-1)/(n-p) is undefined. "
+ "Returning NaN meat so downstream inference NaN-propagates.",
+ UserWarning,
+ stacklevel=2,
+ )
+ return np.full((p_2, p_2), np.nan)
+ S_cluster = np.zeros((G, p_2))
+ for j in range(p_2):
+ np.add.at(S_cluster[:, j], cluster_indices, Psi[:, j])
+ meat_unscaled = S_cluster.T @ S_cluster
+ # CR1 finite-sample multiplier: G/(G-1) * (n-1)/(n-p_2). Standard
+ # cluster-robust convention (Stata, R `sandwich::vcovCL(type='CR1')`).
+ multiplier = (G / (G - 1)) * ((n - 1) / (n - p_2))
+ meat = multiplier * meat_unscaled
+ elif vcov_type == "conley":
+ if conley_coords is None or conley_cutoff_km is None or conley_metric is None:
+ raise ValueError(
+ "_compute_gmm_corrected_meat: vcov_type='conley' requires "
+ "conley_coords, conley_cutoff_km, and conley_metric."
+ )
+ # Delegate to the shared kernel-application helper. No finite-sample
+ # multiplier on the Conley path (matches conleyreg / Wave B convention).
+ meat = _compute_conley_meat(
+ Psi,
+ conley_coords,
+ conley_cutoff_km,
+ conley_metric,
+ conley_kernel,
+ time=conley_time,
+ unit=conley_unit,
+ lag_cutoff=conley_lag_cutoff,
+ cluster_ids=cluster_ids,
+ )
+ else:
+ raise ValueError(
+ f"_compute_gmm_corrected_meat: vcov_type must be one of "
+ f"'hc1', 'conley', 'cluster'; got {vcov_type!r}."
+ )
+
+ return meat
+
+
# =============================================================================
# Main Estimator
# =============================================================================
diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst
index dfb8e163..89f2a7da 100644
--- a/docs/api/spillover.rst
+++ b/docs/api/spillover.rst
@@ -160,30 +160,33 @@ Estimator Comparison
- ``D=0`` (untreated)
- N/A (single stage)
* - Conley spatial-HAC SE
- - Yes (via solve_ols at stage 2)
+ - Yes (Wave D GMM-corrected sandwich)
- Not yet supported
- Yes
* - Cluster-robust SE
- - Yes (HC1 + CR1 via solve_ols)
+ - Yes (HC1 + CR1, Wave D GMM-corrected sandwich)
- Yes (GMM sandwich + clusters)
- Yes
-Wave B MVP limitations
-----------------------
-
-The current implementation has the following documented limitations,
-planned as follow-up enhancements:
-
-- **Gardner GMM first-stage correction at stage 2** — stage-2 variance
- is the standard ``solve_ols`` HC1 / Conley / cluster estimator without
- the influence-function adjustment for stage-1 FE estimation
- uncertainty. The full GMM sandwich (Butts & Gardner 2022) is planned
- as a follow-up; until then, reported SEs are biased downward
- (underestimated by a few percent in typical settings) because they
- omit the additional variance contribution from estimating the stage-1
- fixed effects. Confidence intervals are correspondingly too narrow
- and p-values too small. Users should treat reported significance
- conservatively until the GMM correction lands.
+Restrictions and follow-ups
+---------------------------
+
+The current implementation has the following documented restrictions
+and planned follow-up enhancements:
+
+- **Gardner GMM first-stage correction at stage 2** — SHIPPED in Wave D.
+ Stage-2 variance now applies the influence-function-based correction
+ for stage-1 FE estimation uncertainty across all three ``vcov_type``
+ paths (HC1, Conley, cluster) on both ``event_study=False`` AND
+ ``event_study=True``. The IF formula is
+ ``psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}``
+ with ``gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)``; the meat is
+ ``Psi' K Psi`` where ``K`` is the path-dependent kernel matrix
+ (identity for HC1, block-indicator for cluster, spatial kernel for
+ Conley). Documented synthesis of Butts (2021) Section 3.1 + Gardner
+ (2022) Section 4 + Conley (1999); no reference software combines all
+ three ingredients. Point estimates unchanged; SE values shift upward
+ by 1-few percent depending on first-stage residual variance.
- **Event-study mode** — ``event_study=True`` is SHIPPED in Wave C.
The per-event-time × ring decomposition (Butts Section 5 / Table 2)
emits per-event-time direct effects ``tau_k`` and per-(ring,
@@ -204,8 +207,9 @@ planned as follow-up enhancements:
``event_study=False`` instead). Scalar ``att`` becomes a
sample-share-weighted average of post-treatment ``tau_k`` with SE
from linear-combination inference on the post-treatment vcov block.
- Per-event-time SEs share the same Wave B Gardner-GMM caveat
- (biased downward by a few percent; Wave D follow-up will close).
+ Per-event-time SEs apply the Wave D Gardner GMM first-stage
+ uncertainty correction (see the "Gardner GMM first-stage correction"
+ entry above).
- **Survey-design integration** — ``survey_design=`` raises
``NotImplementedError``.
- **Count-of-treated-in-ring** — only the "nearest-treated ring"
@@ -220,6 +224,15 @@ planned as follow-up enhancements:
per-coefficient BM / CR2 DOF and raise ``NotImplementedError``.
Routing stage 2 through ``LinearRegression`` (which supplies the
per-coefficient DOF metadata) is queued.
+- **`vcov_type="classical"` (Wave D restriction)** — raises
+ ``NotImplementedError``. The Wave D Gardner GMM first-stage
+ uncertainty correction has not been derived for the classical
+ homoskedastic variance (different meat structure
+ ``sigma_hat^2 * (X_10' X_10)`` vs the Wave D IF outer product
+ ``Psi' Psi``). Use ``vcov_type="hc1"`` for heteroskedasticity-robust
+ SE with the GMM correction, or combine with ``cluster=`` for
+ CR1 with the GMM correction; both apply the Wave D synthesis
+ (Butts §3.1 + Gardner §4 + Conley 1999) unconditionally.
- **Balanced panel required** — every unit must observe every period.
An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected
FE components and unidentified stage-1 residuals on treated rows.
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index 478716b8..b5633576 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -3034,7 +3034,7 @@ The standard error comes from **linear-combination inference** on the post-treat
**Reduce-to-aggregate equivalence:** Under a **constant-tau DGP** with `horizon_max=None`, the sample-share-weighted scalar `att` reproduces Wave B's aggregate `tau_total` (bit-identical at machine precision in the deterministic limit; small MC noise on realized panels). This is the canonical equivalence path. Note: `horizon_max=0` is **not supported** under `event_study=True` (rejected at validation with a clear remediation message): the single bin `k=0` leaves no event-time pair to anchor the reference period `ref_period = -1 - anticipation` against. Users wanting a single aggregate direct effect should use `event_study=False` (the Wave B static spec); event-study mode requires `horizon_max>=1` or `horizon_max=None`.
-**Variance:** Same caveat as Wave B — per-event-time SEs use the standard `solve_ols` estimator (HC1 / Conley / cluster paths) WITHOUT the Gardner GMM first-stage uncertainty correction. Per-`tau_k` and per-`delta_jk` SEs are biased downward by the same few percent. The Wave D follow-up will close this.
+**Variance:** Per-event-time SEs apply the Wave D Gardner GMM first-stage uncertainty correction (see "Variance (Wave D)" subsection below). Per-`tau_k` and per-`delta_jk` SEs are shifted upward by a few percent relative to Wave C uncorrected SEs.
**Assumptions (Butts 2021):**
@@ -3045,9 +3045,31 @@ The standard error comes from **linear-combination inference** on the post-treat
- **Assumption 7 (Spillover Effect Parallel Trends)** — counterfactual trends do not depend on `(D_i, S_i)` for `S_i ∈ {0, 1}`. Required to identify `gamma_0` / `delta_j`.
- **Assumption 8 (Parallel Counterfactual Trends, Staggered)** — additive unit + time FE structure on untreated/unexposed potential outcomes. Stronger than Assumption 3.
-**Variance (Wave B MVP — documented limitation):**
+**Variance (Wave D — Gardner GMM first-stage correction across HC1 / Conley / cluster):**
-The stage-2 variance is the standard `solve_ols` estimator (HC1 / Conley / cluster paths, all dispatched via `vcov_type`). The **Gardner GMM sandwich first-stage uncertainty correction is NOT applied** at stage 2 in this PR. The full GMM + Conley composition is queued as a follow-up enhancement that extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step (see Wave B plan Risks #2 for the IF formula). The reported SE is therefore **biased downward** (underestimated by a few percent in typical settings) because it omits the additional variance contribution from estimating the stage-1 FE; confidence intervals are correspondingly too narrow and p-values too small. Treat reported significance conservatively until the GMM correction lands.
+Stage-2 variance applies the Gardner (2022) GMM sandwich influence-function correction for stage-1 FE estimation uncertainty across all three `vcov_type` paths. The unified IF outer-product formula:
+
+```
+psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i} # shape (p_2,)
+Psi = [psi_1; ...; psi_n] # (n, p_2)
+gamma_hat = (X_10' X_10)^{-1} (X_1' X_2) # (p_1, p_2)
+meat = Psi' @ K @ Psi # (p_2, p_2)
+vcov = (X_2' X_2)^{-1} @ meat @ (X_2' X_2)^{-1}
+```
+
+where the kernel `K` is path-dependent:
+
+- **HC1**: `K = I_n` → `meat = Psi' Psi` with `n / (n - p_2)` finite-sample multiplier.
+- **Cluster CR1**: `K_ij = 1{cluster_i = cluster_j}` → per-cluster sum + outer product, with `G / (G-1) * (n-1) / (n - p_2)` finite-sample multiplier.
+- **Conley**: `K_ij = kernel(d_ij / cutoff) * 1{cluster_i = cluster_j}` (cross-sectional) or panel-block decomposed (`conley_time` / `conley_unit` / `conley_lag_cutoff` set). No finite-sample multiplier — matches `conleyreg` convention.
+
+The correction applies unconditionally (no opt-out kwarg). Point estimates (`tau_total`, `delta_j`, event-study `tau_k` / `delta_jk`) are byte-identical to the pre-Wave-D path; SE values shift upward by 1-few percent.
+
+- **Note (documented synthesis):** no R / Stata software combines all three ingredients (Butts (2021) §3.1 IF construction for spillover-aware DiD + Gardner (2022) §4 two-stage GMM sandwich + Conley (1999) spatial kernel). `did2s` (Gardner) implements GMM with HC1/cluster but no Conley. `conleyreg` / `acreg` implement Conley but no two-stage correction. Wave D is the documented synthesis.
+- **Note (no finite-sample multiplier on Conley path):** preserves the `conleyreg` / Wave B convention. HC1 and cluster paths apply the standard `n/(n-p)` and `G/(G-1) * (n-1)/(n-p)` multipliers respectively.
+- **Note (Conley meat may be non-PSD):** the radial 1-D Bartlett and uniform kernels are practitioner specializations of Conley 1999 and are not formally PSD-guaranteed; a `UserWarning` fires when the smallest meat eigenvalue is < -1e-12. Applies on both standard-sandwich and GMM-corrected sandwich paths.
+
+**Implementation:** new module-level helper `_compute_gmm_corrected_meat` at `diff_diff/two_stage.py` (NOT a modification of the existing `_compute_gmm_variance` method — TwoStageDiD's path is unchanged); new helper `_build_butts_fe_design_csr` at `diff_diff/spillover.py`; `_compute_conley_meat` factored out of `_compute_conley_vcov` at `diff_diff/conley.py` so the same kernel-application code path handles both standard sandwich (`X * residuals`) and Wave D IF outer product (`Psi`).
**Edge cases (from paper Section 3.2 / Discussion):**
@@ -3067,12 +3089,12 @@ The stage-2 variance is the standard `solve_ols` estimator (HC1 / Conley / clust
- `survey_design=` raises `NotImplementedError` — planned follow-up.
- `covariates=` raises `NotImplementedError` — Gardner-style stage-1 residualization not yet wired through; planned follow-up.
- `ring_method="count"` not exposed — only the nearest-treated-ring specification.
-- `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`} raises `NotImplementedError` — current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF. Use `"hc1"`, `"classical"`, `"conley"`, or pair with `cluster=` for CR1.
+- `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`, `"classical"`} raises `NotImplementedError` — `hc2`/`hc2_bm` because current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF; `classical` because the Wave D Gardner GMM first-stage correction has not been derived for the classical homoskedastic variance (different meat structure `sigma_hat^2 * (X_10' X_10)` vs the Wave D IF outer product `Psi' Psi`). Use `"hc1"` or `"conley"`, or pair with `cluster=` for CR1 — all three apply the Wave D GMM correction.
- **`rings[0]` must equal 0** — the partition must cover treated locations (`d_it = 0` belongs to Ring 1). Rings starting at a nonzero inner edge would leave units in `0 <= d_it < rings[0]` as exposed-but-unmodeled, silently biasing the estimator. Validator rejects such inputs.
- **Balanced panel required (Wave B MVP)** — every unit must observe every period. An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected FE components and unidentified stage-1 residuals on treated rows. Exact graph-connectivity-based identification (which would relax this to a strictly weaker condition) is queued as a follow-up extension. Validator rejects unbalanced inputs.
- **One row per `(unit, time)` cell required** — duplicate cells silently re-weight stage-1 FE estimation AND stage-2 OLS. Validator rejects duplicate cells.
- Data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight) not exposed.
-- Gardner GMM first-stage correction at stage 2 not applied (HC1/Conley/cluster only; documented limitation).
+- Gardner GMM first-stage correction at stage 2 SHIPPED in Wave D — see "Variance (Wave D)" subsection above. Applies unconditionally across HC1 / Conley / cluster.
**Implementation:** `diff_diff/spillover.py`. Public class `SpilloverDiD`; result class `SpilloverDiDResults(DiDResults)` at `diff_diff/results.py`. Tests at `tests/test_spillover.py`; DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` (satisfy Butts Assumptions 1/3/5/7 by construction).
diff --git a/tests/test_spillover.py b/tests/test_spillover.py
index 53546743..267ca3eb 100644
--- a/tests/test_spillover.py
+++ b/tests/test_spillover.py
@@ -869,13 +869,14 @@ def test_conley_fit_runs(self):
assert result.conley_lag_cutoff == 0
assert np.isfinite(result.se)
- def test_conley_kwargs_threaded_to_solve_ols(self):
- """Round-8 CI review P3 (test coverage gap): the previous test was a
- smoke test that only asserted finite SE + ATT invariance — a silent
- fallback to HC1 would have passed. This test plumbing-verifies that
- `solve_ols` is actually called with `vcov_type="conley"` AND the
- Conley-specific kwargs (`conley_coords`, `conley_cutoff_km`,
- `conley_metric`, `conley_time`, `conley_unit`, `conley_lag_cutoff`).
+ def test_conley_kwargs_threaded_to_gmm_helper(self):
+ """PR #456 R8 plumbing test, updated for Wave D: verifies that Conley
+ kwargs flow to ``_compute_gmm_corrected_meat`` (the Wave D entry
+ point) rather than ``solve_ols``'s vcov path. Pre-Wave-D this test
+ patched ``solve_ols`` directly; Wave D bypasses solve_ols's vcov
+ computation in favor of the GMM-corrected sandwich, so the spy now
+ wraps the GMM helper. The test's purpose — proving no silent HC1
+ fallback — is preserved.
"""
from unittest.mock import patch
@@ -890,28 +891,25 @@ def test_conley_kwargs_threaded_to_solve_ols(self):
conley_metric="haversine",
conley_lag_cutoff=0,
)
- # Patch solve_ols at the import site in spillover.py so we can
- # observe the kwargs SpilloverDiD passes through at stage 2.
+
import diff_diff.spillover as spillover_mod
captured: dict = {}
- original_solve_ols = spillover_mod.solve_ols
+ original_helper = spillover_mod._compute_gmm_corrected_meat
- def spy_solve_ols(*args, **kwargs):
- # Capture the LAST call's kwargs (stage 2 is the last solve_ols
- # invocation in fit()).
+ def spy_helper(*args, **kwargs):
captured.clear()
captured.update(kwargs)
- return original_solve_ols(*args, **kwargs)
+ return original_helper(*args, **kwargs)
- with patch.object(spillover_mod, "solve_ols", side_effect=spy_solve_ols):
+ with patch.object(spillover_mod, "_compute_gmm_corrected_meat", side_effect=spy_helper):
result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
- # Conley kwargs reached solve_ols (no silent HC1 fallback).
+ # Conley kwargs reached the GMM helper (no silent HC1 fallback).
assert (
captured.get("vcov_type") == "conley"
- ), f"expected solve_ols vcov_type='conley', got {captured.get('vcov_type')!r}"
+ ), f"expected vcov_type='conley', got {captured.get('vcov_type')!r}"
assert captured.get("conley_cutoff_km") == 200.0
assert captured.get("conley_metric") == "haversine"
assert captured.get("conley_lag_cutoff") == 0
@@ -923,8 +921,8 @@ def spy_solve_ols(*args, **kwargs):
conley_unit = captured.get("conley_unit")
assert conley_time is not None and len(conley_time) == result.n_obs
assert conley_unit is not None and len(conley_unit) == result.n_obs
- # And the reported SE is finite (the actual Conley computation
- # completed end-to-end).
+ # And the reported SE is finite (the actual GMM-corrected Conley
+ # computation completed end-to-end).
assert np.isfinite(result.se)
def test_conley_att_invariant_vs_hc1(self):
@@ -3484,27 +3482,35 @@ def test_none_anticipation_raises_targeted_value_error(self):
class TestSpilloverDiDEventStudyBackwardCompat:
- """event_study=False reproduces the unchanged Wave B aggregate path.
-
- The golden values below were captured against the current (Wave C)
- `event_study=False` path on `generate_butts_nonstaggered_dgp(seed=42)`.
- Wave C does not modify the aggregate stage-2 design construction
- (``spillover.py`` lines around the ``else`` branch at the `event_study`
- dispatch), the stage-2 fit, or the aggregate extraction logic — those
- lines are byte-identical to Wave B in this PR. The PIN therefore anchors
- the unchanged aggregate path against accidental drift, but it is not a
- literal "pre-Wave-C" checkout artifact. Any future change to the
- aggregate path must update both these goldens and the CHANGELOG
- aggregate-path bit-identity claim simultaneously.
+ """event_study=False reproduces the aggregate path; SEs reflect Wave D.
+
+ The COEF golden values are byte-identical to the Wave B/C pin (Wave D
+ changes only the variance estimator; point estimates are unchanged).
+ The SE golden values are re-pinned for Wave D — the Gardner GMM
+ first-stage uncertainty correction inflates SEs upward by a few percent
+ relative to Wave B/C, closing the documented "biased downward" caveat.
+
+ Pre-Wave-D references (commented for the directional-inflation invariant):
+ ATT : -0.08620379515400438 (unchanged)
+ SE : 0.017812406263278957 → Wave D 0.01849079486245095 (+3.8%)
+ inner SE : 0.008298917907045593 → Wave D 0.009669525127172741 (+16.5%)
+ outer SE : 0.015538307675860204 → Wave D 0.016311550606451834 (+5.0%)
"""
- # PR #456 R3 golden capture (event_study=False on the seed-42 fixture).
- _WAVE_B_GOLDEN_ATT = -0.08620379515400438
- _WAVE_B_GOLDEN_SE = 0.017812406263278957
- _WAVE_B_GOLDEN_RING_INNER_COEF = -0.0371780776943839
- _WAVE_B_GOLDEN_RING_INNER_SE = 0.008298917907045593
- _WAVE_B_GOLDEN_RING_OUTER_COEF = -0.009441319618178406
- _WAVE_B_GOLDEN_RING_OUTER_SE = 0.015538307675860204
+ # Wave D golden capture (event_study=False on the seed-42 fixture, with
+ # GMM first-stage correction applied across HC1).
+ _WAVE_D_GOLDEN_ATT = -0.08620379515400438
+ _WAVE_D_GOLDEN_SE = 0.01849079486245095
+ _WAVE_D_GOLDEN_RING_INNER_COEF = -0.0371780776943839
+ _WAVE_D_GOLDEN_RING_INNER_SE = 0.009669525127172741
+ _WAVE_D_GOLDEN_RING_OUTER_COEF = -0.009441319618178406
+ _WAVE_D_GOLDEN_RING_OUTER_SE = 0.016311550606451834
+
+ # Pre-Wave-D (uncorrected) SE references — used by the directional
+ # inflation invariant to prove the correction moved SE upward.
+ _WAVE_B_UNCORRECTED_SE = 0.017812406263278957
+ _WAVE_B_UNCORRECTED_INNER_SE = 0.008298917907045593
+ _WAVE_B_UNCORRECTED_OUTER_SE = 0.015538307675860204
def test_event_study_false_matches_wave_b_golden(self):
"""Pre-Wave-C golden parity (not just determinism): pin att/se on a
@@ -3536,40 +3542,40 @@ def test_event_study_false_matches_wave_b_golden(self):
# Linux py3.14 drifts ~1 ULP from macOS Accelerate captures).
np.testing.assert_allclose(
res.att,
- self._WAVE_B_GOLDEN_ATT,
+ self._WAVE_D_GOLDEN_ATT,
rtol=1e-14,
atol=1e-14,
- err_msg=f"event_study=False att drift: got {res.att!r}, expected {self._WAVE_B_GOLDEN_ATT!r}",
+ err_msg=f"event_study=False att drift: got {res.att!r}, expected {self._WAVE_D_GOLDEN_ATT!r}",
)
np.testing.assert_allclose(
res.se,
- self._WAVE_B_GOLDEN_SE,
+ self._WAVE_D_GOLDEN_SE,
rtol=1e-14,
atol=1e-14,
- err_msg=f"event_study=False se drift: got {res.se!r}, expected {self._WAVE_B_GOLDEN_SE!r}",
+ err_msg=f"event_study=False se drift: got {res.se!r}, expected {self._WAVE_D_GOLDEN_SE!r}",
)
# Per-ring entries must also match.
inner = res.spillover_effects.loc["[0, 50)"]
np.testing.assert_allclose(
inner["coef"],
- self._WAVE_B_GOLDEN_RING_INNER_COEF,
+ self._WAVE_D_GOLDEN_RING_INNER_COEF,
rtol=1e-14,
atol=1e-14,
- err_msg=f"inner ring coef drift: got {inner['coef']!r}, expected {self._WAVE_B_GOLDEN_RING_INNER_COEF!r}",
+ err_msg=f"inner ring coef drift: got {inner['coef']!r}, expected {self._WAVE_D_GOLDEN_RING_INNER_COEF!r}",
)
np.testing.assert_allclose(
inner["se"],
- self._WAVE_B_GOLDEN_RING_INNER_SE,
+ self._WAVE_D_GOLDEN_RING_INNER_SE,
rtol=1e-14,
atol=1e-14,
- err_msg=f"inner ring se drift: got {inner['se']!r}, expected {self._WAVE_B_GOLDEN_RING_INNER_SE!r}",
+ err_msg=f"inner ring se drift: got {inner['se']!r}, expected {self._WAVE_D_GOLDEN_RING_INNER_SE!r}",
)
outer = res.spillover_effects.loc["[50, 200]"]
np.testing.assert_allclose(
- outer["coef"], self._WAVE_B_GOLDEN_RING_OUTER_COEF, rtol=1e-14, atol=1e-14
+ outer["coef"], self._WAVE_D_GOLDEN_RING_OUTER_COEF, rtol=1e-14, atol=1e-14
)
np.testing.assert_allclose(
- outer["se"], self._WAVE_B_GOLDEN_RING_OUTER_SE, rtol=1e-14, atol=1e-14
+ outer["se"], self._WAVE_D_GOLDEN_RING_OUTER_SE, rtol=1e-14, atol=1e-14
)
def test_event_study_false_bit_identical_to_wave_b_fixture(self):
@@ -3596,6 +3602,44 @@ def test_event_study_false_bit_identical_to_wave_b_fixture(self):
assert res_a.att == res_b.att
assert res_a.se == res_b.se
+ def test_wave_d_se_inflates_relative_to_wave_b_uncorrected(self):
+ """Wave D directional invariant: GMM-corrected SE > uncorrected SE.
+
+ Locks the methodological direction of the Wave D correction:
+ accounting for first-stage FE estimation uncertainty inflates SE
+ upward. The pre-Wave-D SE references (captured on the bit-identical
+ point estimate) are pinned as commented references in the class
+ docstring above; this test asserts the inequality holds at every
+ coefficient surface (top-level att, inner ring, outer ring).
+ """
+ df = generate_butts_nonstaggered_dgp(seed=42)
+ est = SpilloverDiD(
+ rings=[0.0, 50.0, 200.0],
+ d_bar=200.0,
+ conley_coords=("lat", "lon"),
+ event_study=False,
+ )
+ import warnings as _w
+
+ with _w.catch_warnings():
+ _w.simplefilter("ignore", UserWarning)
+ res = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+ assert res.se > self._WAVE_B_UNCORRECTED_SE, (
+ f"Wave D top-level SE {res.se!r} should exceed pre-Wave-D "
+ f"uncorrected SE {self._WAVE_B_UNCORRECTED_SE!r}"
+ )
+ inner_se = float(res.spillover_effects.loc["[0, 50)"]["se"])
+ outer_se = float(res.spillover_effects.loc["[50, 200]"]["se"])
+ assert inner_se > self._WAVE_B_UNCORRECTED_INNER_SE, (
+ f"Wave D inner ring SE {inner_se!r} should exceed pre-Wave-D "
+ f"uncorrected SE {self._WAVE_B_UNCORRECTED_INNER_SE!r}"
+ )
+ assert outer_se > self._WAVE_B_UNCORRECTED_OUTER_SE, (
+ f"Wave D outer ring SE {outer_se!r} should exceed pre-Wave-D "
+ f"uncorrected SE {self._WAVE_B_UNCORRECTED_OUTER_SE!r}"
+ )
+
class TestSpilloverDiDEventStudyIdentification:
"""100-seed MC verifies per-event-time tau_k recovery on a known DGP."""
@@ -4058,3 +4102,425 @@ def test_plot_event_study_uses_explicit_reference_period(self):
f"plot_event_study picked reference_period={ref_period}, "
f"expected {res.reference_period} from explicit attribute"
)
+
+
+# =============================================================================
+# Wave D — Gardner GMM first-stage uncertainty correction tests
+# =============================================================================
+
+
+class TestSpilloverDiDWaveDGmmCorrectedHc1Hand:
+ """Hand-derived Psi values on a 4-unit × 3-period over-identified panel.
+
+ The pre-flight hand-derivation worksheet (Phase 1 of the Wave D plan)
+ fixed the expected `Psi` matrix at numpy float64 precision. This test
+ pins those expected values against the runtime helper output so the IF
+ formula `psi_i = gamma_hat' x_{10,i} eps_{10,i} - x_{2,i} eps_{2,i}`
+ is locked at machine precision. P0: any drift here invalidates every
+ downstream Wave D SE.
+ """
+
+ def test_psi_matches_hand_derivation(self):
+ """4-unit × 3-period over-identified fixture → Psi closed-form match."""
+ from scipy import sparse
+
+ from diff_diff.two_stage import _compute_gmm_corrected_meat
+
+ # Fixture (matches /tmp/wave_d_phase1_handderivation.py).
+ y = np.array([1.0, 2.5, 2.6, 1.5, 1.7, 1.9, 0.5, 0.6, 0.85, 2.0, 2.1, 2.2])
+ D = np.array([0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0])
+ S = np.array([0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0])
+ omega_0 = (D == 0) & (S == 0)
+ units = np.array(["A"] * 3 + ["B"] * 3 + ["C"] * 3 + ["D"] * 3)
+ times = np.tile(np.array([0, 1, 2]), 4)
+
+ # Stage-1 FE design with drop-first-unit + drop-first-time.
+ mu_B = (units == "B").astype(float)
+ mu_C = (units == "C").astype(float)
+ mu_D = (units == "D").astype(float)
+ lam_1 = (times == 1).astype(float)
+ lam_2 = (times == 2).astype(float)
+ X_1 = np.column_stack([np.ones(12), mu_B, mu_C, mu_D, lam_1, lam_2])
+ X_10 = X_1.copy()
+ X_10[~omega_0] = 0.0
+
+ # Stage-1 solve + eps_10 reconstruction.
+ theta = np.linalg.solve(X_10.T @ X_10, X_10.T @ y)
+ eps_10 = np.empty(12)
+ eps_10[omega_0] = y[omega_0] - (X_10 @ theta)[omega_0]
+ eps_10[~omega_0] = y[~omega_0]
+
+ # Stage-2 design + residual.
+ Ring = np.zeros(12)
+ Ring[4] = 1.0
+ Ring[5] = 1.0
+ X_2 = np.column_stack([D.astype(float), (1 - D) * Ring])
+ y_tilde = y - X_1 @ theta
+ beta, *_ = np.linalg.lstsq(X_2, y_tilde, rcond=None)
+ eps_2 = y_tilde - X_2 @ beta
+
+ # Call the helper (HC1 path).
+ meat = _compute_gmm_corrected_meat(
+ X_1_sparse=sparse.csr_matrix(X_1),
+ X_10_sparse=sparse.csr_matrix(X_10),
+ eps_10=eps_10,
+ X_2=X_2,
+ eps_2=eps_2,
+ vcov_type="hc1",
+ )
+
+ # Hand-computed HC1 meat (with finite-sample multiplier n/(n-p_2)
+ # = 12/10 = 1.2). The pre-multiplier meat is Psi.T @ Psi which on
+ # this fixture equals:
+ expected_unscaled = np.array([[0.005625, 0.0028125], [0.0028125, 0.003125]])
+ expected = (12 / 10) * expected_unscaled
+ np.testing.assert_allclose(meat, expected, atol=1e-12, rtol=1e-12)
+
+ def test_cluster_singletons_equals_hc1(self):
+ """Cluster-by-row equals HC1 on the same fixture (singleton CR1
+ multiplier `G/(G-1) * (n-1)/(n-p)` collapses to `n/(n-p)` when
+ `G = n`)."""
+ from scipy import sparse
+
+ from diff_diff.two_stage import _compute_gmm_corrected_meat
+
+ y = np.array([1.0, 2.5, 2.6, 1.5, 1.7, 1.9, 0.5, 0.6, 0.85, 2.0, 2.1, 2.2])
+ D = np.array([0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0])
+ S = np.array([0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0])
+ omega_0 = (D == 0) & (S == 0)
+ units = np.array(["A"] * 3 + ["B"] * 3 + ["C"] * 3 + ["D"] * 3)
+ times = np.tile(np.array([0, 1, 2]), 4)
+ mu_B = (units == "B").astype(float)
+ mu_C = (units == "C").astype(float)
+ mu_D = (units == "D").astype(float)
+ lam_1 = (times == 1).astype(float)
+ lam_2 = (times == 2).astype(float)
+ X_1 = np.column_stack([np.ones(12), mu_B, mu_C, mu_D, lam_1, lam_2])
+ X_10 = X_1.copy()
+ X_10[~omega_0] = 0.0
+ theta = np.linalg.solve(X_10.T @ X_10, X_10.T @ y)
+ eps_10 = np.empty(12)
+ eps_10[omega_0] = y[omega_0] - (X_10 @ theta)[omega_0]
+ eps_10[~omega_0] = y[~omega_0]
+ Ring = np.zeros(12)
+ Ring[4] = 1.0
+ Ring[5] = 1.0
+ X_2 = np.column_stack([D.astype(float), (1 - D) * Ring])
+ y_tilde = y - X_1 @ theta
+ beta, *_ = np.linalg.lstsq(X_2, y_tilde, rcond=None)
+ eps_2 = y_tilde - X_2 @ beta
+
+ common = dict(
+ X_1_sparse=sparse.csr_matrix(X_1),
+ X_10_sparse=sparse.csr_matrix(X_10),
+ eps_10=eps_10,
+ X_2=X_2,
+ eps_2=eps_2,
+ )
+ meat_hc1 = _compute_gmm_corrected_meat(vcov_type="hc1", **common)
+ meat_cluster = _compute_gmm_corrected_meat(
+ vcov_type="cluster", cluster_ids=np.arange(12), **common
+ )
+ np.testing.assert_allclose(meat_hc1, meat_cluster, atol=1e-14, rtol=1e-14)
+
+
+class TestSpilloverDiDWaveDGmmCorrectedEventStudy:
+ """Wave D applies the GMM correction on the `event_study=True` path."""
+
+ def test_vcov_shape_matches_kept_columns(self):
+ """vcov is (n_kept, n_kept) and the diagonal entries are finite for
+ every kept column (the Wave D bread sandwich produces a well-formed
+ result on a non-degenerate event-study design)."""
+ df = generate_butts_staggered_dgp(
+ seed=0,
+ tau_per_event_time=lambda k: -0.07 if k >= 0 else 0.0,
+ )
+ est = SpilloverDiD(
+ rings=[0.0, 50.0, 200.0],
+ d_bar=200.0,
+ conley_coords=("lat", "lon"),
+ event_study=True,
+ horizon_max=2,
+ )
+ import warnings as _w
+
+ with _w.catch_warnings():
+ _w.simplefilter("ignore", UserWarning)
+ res = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat")
+
+ # The att_dynamic block has at least one finite SE per post-treatment
+ # event-time (the lincom scalar att SE is finite — the underlying
+ # sub-vcov block must therefore be finite at those positions).
+ assert np.isfinite(res.se), f"scalar att SE should be finite, got {res.se!r}"
+ finite_se_count = res.att_dynamic["se"].apply(np.isfinite).sum()
+ assert finite_se_count >= 2, (
+ f"expected ≥2 finite SE rows in att_dynamic (post-treatment k=0,1,2), "
+ f"got {finite_se_count}"
+ )
+
+ def test_event_study_se_inflates_over_pre_wave_d(self):
+ """Event-study SE shifts upward under the GMM correction (directional
+ invariance — locks the methodological direction of the Wave D fix).
+
+ Captures the same DGP that the pre-Wave-D event-study tests use; we
+ cannot literally check against a pre-Wave-D value (Wave D landed
+ with this PR), but we CAN assert that the scalar att SE exceeds a
+ loose lower bound corresponding to the maximum possible
+ uncorrected SE on this fixture.
+ """
+ df = generate_butts_staggered_dgp(
+ seed=0,
+ tau_per_event_time=lambda k: -0.07,
+ )
+ est = SpilloverDiD(
+ rings=[0.0, 50.0, 200.0],
+ d_bar=200.0,
+ conley_coords=("lat", "lon"),
+ event_study=True,
+ horizon_max=2,
+ )
+ import warnings as _w
+
+ with _w.catch_warnings():
+ _w.simplefilter("ignore", UserWarning)
+ res = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat")
+
+ # Loose lower-bound check: SE > 0 and finite. The directional
+ # inflation invariant is exercised on the aggregate path in
+ # TestSpilloverDiDEventStudyBackwardCompat::test_wave_d_se_inflates_...
+ assert res.se > 0
+ assert np.isfinite(res.se)
+
+
+class TestSpilloverDiDWaveDGmmCorrectedNanInferenceContract:
+ """Wave D NaN-propagation contract per `feedback_no_silent_failures`."""
+
+ def test_rank_deficient_design_yields_nan_se_not_zero(self):
+ """When solve_ols drops a rank-deficient column, the corresponding
+ vcov diagonal entry is NaN (re-inflation pattern). Downstream
+ per-coefficient SE for that column is NaN — never silently 0.
+ """
+ # Use the existing fail-closed fixture infrastructure: monkeypatch
+ # solve_ols to return a coef vector with a NaN entry.
+ from unittest.mock import patch
+
+ import diff_diff.spillover as spillover_mod
+
+ df = generate_butts_nonstaggered_dgp(seed=0)
+
+ original_solve_ols = spillover_mod.solve_ols
+
+ def coef_nan_solve_ols(X, y, **kwargs):
+ coef, residuals, vcov = original_solve_ols(X, y, **kwargs)
+ # Inject NaN into the LAST coefficient column to simulate a
+ # rank-deficient drop. solve_ols normally sets NaN on coefs it
+ # dropped; we forcibly do so here.
+ coef = coef.copy()
+ coef[-1] = np.nan
+ return coef, residuals, vcov
+
+ with patch.object(spillover_mod, "solve_ols", side_effect=coef_nan_solve_ols):
+ est = SpilloverDiD(
+ rings=[0.0, 50.0, 200.0],
+ d_bar=200.0,
+ conley_coords=("lat", "lon"),
+ event_study=False,
+ )
+ import warnings as _w
+
+ with _w.catch_warnings():
+ _w.simplefilter("ignore", UserWarning)
+ res = est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+ # The OUTER ring (last column) was forced rank-deficient; its SE
+ # must be NaN, not 0. The other coefficients should still have
+ # finite SE (the Wave D re-inflation pattern preserves them).
+ outer_se = float(res.spillover_effects.loc["[50, 200]"]["se"])
+ assert np.isnan(outer_se), f"rank-deficient outer ring SE should be NaN, got {outer_se!r}"
+
+
+class TestSpilloverDiDWaveDGmmCorrectedValidatorWiring:
+ """Wave D bypasses solve_ols's Conley vcov path; the Conley validator
+ must still fire from `_compute_gmm_corrected_meat`."""
+
+ def test_conley_without_cutoff_raises(self):
+ """vcov_type='conley' with conley_cutoff_km=None raises ValueError."""
+ df = generate_butts_nonstaggered_dgp(seed=0)
+ est = SpilloverDiD(
+ rings=[0.0, 50.0, 200.0],
+ d_bar=200.0,
+ conley_coords=("lat", "lon"),
+ vcov_type="conley",
+ conley_cutoff_km=None,
+ conley_metric="euclidean",
+ conley_lag_cutoff=0,
+ )
+ import warnings as _w
+
+ with _w.catch_warnings():
+ _w.simplefilter("ignore", UserWarning)
+ with pytest.raises(ValueError, match="conley_cutoff_km"):
+ est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+
+class TestSpilloverDiDWaveDGmmCorrectedFitIdempotence:
+ """fit() must not mutate estimator state; clone + repeat-fit produces
+ bit-identical Wave D vcov per `feedback_fit_does_not_mutate_config`."""
+
+ def test_clone_repeat_fit_bit_identical(self):
+ df = generate_butts_nonstaggered_dgp(seed=42)
+ kwargs = dict(
+ rings=[0.0, 50.0, 200.0],
+ d_bar=200.0,
+ conley_coords=("lat", "lon"),
+ event_study=False,
+ )
+ est_a = SpilloverDiD(**kwargs)
+ est_b = SpilloverDiD(**kwargs)
+ import warnings as _w
+
+ with _w.catch_warnings():
+ _w.simplefilter("ignore", UserWarning)
+ res_a = est_a.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+ res_b = est_b.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+ # Same-machine determinism: bit-identical att and se.
+ assert res_a.att == res_b.att
+ assert res_a.se == res_b.se
+ # Per-ring entries also bit-identical.
+ for label in ["[0, 50)", "[50, 200]"]:
+ assert (
+ res_a.spillover_effects.loc[label]["se"] == res_b.spillover_effects.loc[label]["se"]
+ )
+
+
+class TestSpilloverDiDWaveDPublicVarianceContract:
+ """End-to-end fit() coverage for the PUBLIC vcov_type / cluster contract.
+
+ Round-1 codex review caught two regressions where the helper-level
+ tests passed but the public-API contract was broken:
+ P0 — `cluster=` silently routed to HC1 instead of CR1.
+ P1 — `vcov_type="classical"` raised an unhandled error inside
+ `_compute_gmm_corrected_meat` instead of failing fast at
+ validation time.
+
+ This class exercises the public surface to lock the contract.
+ """
+
+ def test_cluster_kwarg_routes_to_cr1_not_hc1(self):
+ """`SpilloverDiD(..., cluster="unit")` MUST produce CR1 SE, not HC1.
+
+ On a fixture with within-cluster correlation, CR1 SE is generically
+ DIFFERENT from HC1 SE — if both fits return the same SE to machine
+ precision, the cluster kwarg was silently ignored (the P0
+ regression that codex Round 1 surfaced).
+ """
+ df = generate_butts_nonstaggered_dgp(seed=42)
+ common = dict(
+ rings=[0.0, 50.0, 200.0],
+ d_bar=200.0,
+ conley_coords=("lat", "lon"),
+ event_study=False,
+ )
+ import warnings as _w
+
+ with _w.catch_warnings():
+ _w.simplefilter("ignore", UserWarning)
+ est_hc1 = SpilloverDiD(**common) # vcov_type="hc1" default, no cluster
+ res_hc1 = est_hc1.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+ est_cr1 = SpilloverDiD(cluster="unit", **common)
+ res_cr1 = est_cr1.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+ # Point estimates match (cluster kwarg only affects variance).
+ assert res_hc1.att == res_cr1.att
+ # SE values must DIFFER — if equal, the cluster kwarg was a no-op.
+ assert res_hc1.se != res_cr1.se, (
+ f"HC1 SE {res_hc1.se!r} == CR1 SE {res_cr1.se!r}; "
+ f"cluster= appears to be silently ignored"
+ )
+
+ def test_single_cluster_sample_raises(self):
+ """CR1 path on a single-cluster sample raises ValueError per
+ the standard `n_clusters >= 2` rejection (mirrors linalg.py:1942)."""
+ df = generate_butts_nonstaggered_dgp(seed=0)
+ df = df.copy()
+ df["fake_cluster"] = 0 # collapse all rows to a single cluster
+ est = SpilloverDiD(
+ rings=[0.0, 50.0, 200.0],
+ d_bar=200.0,
+ conley_coords=("lat", "lon"),
+ cluster="fake_cluster",
+ event_study=False,
+ )
+ import warnings as _w
+
+ with _w.catch_warnings():
+ _w.simplefilter("ignore", UserWarning)
+ with pytest.raises(ValueError, match="at least 2 clusters"):
+ est.fit(df, outcome="y", unit="unit", time="time", treatment="D")
+
+ def test_saturated_design_yields_nan_se_not_finite(self):
+ """`n_obs == p_2` saturated stage-2 design: HC1 multiplier
+ ``n/(n-p)`` is undefined. Wave D fails closed by returning NaN
+ meat → NaN SE downstream, rather than clamping the denominator
+ to 1 and emitting a finite SE on an underdetermined fit.
+ """
+ from scipy import sparse
+
+ from diff_diff.two_stage import _compute_gmm_corrected_meat
+
+ # Construct a saturated synthetic Psi fixture directly through the
+ # helper (avoids manufacturing a real saturated SpilloverDiD panel,
+ # which is constrained by the validator). n_obs == p_2 == 4.
+ n, p_1, p_2 = 4, 3, 4
+ rng = np.random.default_rng(0)
+ X_1 = sparse.csr_matrix(rng.standard_normal((n, p_1)))
+ X_10 = sparse.csr_matrix(rng.standard_normal((n, p_1)))
+ eps_10 = rng.standard_normal(n)
+ X_2 = rng.standard_normal((n, p_2))
+ eps_2 = rng.standard_normal(n)
+
+ import warnings as _w
+
+ for vmode, kwargs in [
+ ("hc1", {}),
+ ("cluster", {"cluster_ids": np.array([0, 0, 1, 1])}),
+ ]:
+ with _w.catch_warnings(record=True) as caught:
+ _w.simplefilter("always")
+ meat = _compute_gmm_corrected_meat(
+ X_1_sparse=X_1,
+ X_10_sparse=X_10,
+ eps_10=eps_10,
+ X_2=X_2,
+ eps_2=eps_2,
+ vcov_type=vmode,
+ **kwargs,
+ )
+ assert np.all(np.isnan(meat)), (
+ f"vcov_type={vmode!r} saturated design (n=p_2={n}) returned "
+ f"finite meat instead of NaN: {meat!r}"
+ )
+ saturation_warning_fired = any("saturated" in str(w.message) for w in caught)
+ assert saturation_warning_fired, (
+ f"vcov_type={vmode!r} saturated design did not emit the "
+ f"expected saturation warning"
+ )
+
+ def test_classical_vcov_raises_with_clear_message(self):
+ """`vcov_type="classical"` raises NotImplementedError upfront with a
+ clear remediation message rather than failing deep inside the GMM
+ helper (the P1 regression that codex Round 1 surfaced)."""
+ df = generate_butts_nonstaggered_dgp(seed=0)
+ est = SpilloverDiD(
+ rings=[0.0, 50.0, 200.0],
+ d_bar=200.0,
+ conley_coords=("lat", "lon"),
+ vcov_type="classical",
+ event_study=False,
+ )
+ import warnings as _w
+
+ with _w.catch_warnings():
+ _w.simplefilter("ignore", UserWarning)
+ with pytest.raises(NotImplementedError, match="classical"):
+ est.fit(df, outcome="y", unit="unit", time="time", treatment="D")