diff --git a/CHANGELOG.md b/CHANGELOG.md index 39888e0d..11e3a78f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,11 +9,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **`DifferenceInDifferences(absorb=..., vcov_type in {"hc2", "hc2_bm"})` now supported** (`diff_diff/estimators.py:382`). Previously raised `NotImplementedError` because the HC2 leverage correction and CR2 Bell-McCaffrey DOF depend on the FULL FE hat matrix, while within-transformation (FWL) preserves coefficients and residuals but not the hat. Lift via internal auto-route: when `absorb=` is paired with `vcov_type in {"hc2","hc2_bm"}`, the fit promotes the absorb columns to `fixed_effects=` internally so the existing full-dummy-design code path computes the algebraically correct vcov. Empirically matches `lm() + sandwich::vcovHC(type="HC2")` and `lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` at ~1e-10 (verified via new `tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity` against `benchmarks/data/clubsandwich_cr2_golden.json` scenario `absorbed_fe_did`, with the R generator using the singleton-cluster CR2 trick for one-way HC2-BM Satterthwaite DOF). HC1/CR1 paths unchanged. `MultiPeriodDiD(absorb=...)` and `TwoWayFixedEffects` rejections remain as follow-ups (different fit-path structure). **Behavioral note (full `DiDResults` surface change under auto-route):** under the auto-route, the entire returned `DiDResults` reflects the full-dummy fit rather than the within-transformed fit. Specifically, `result.coefficients` and `result.vcov` include the FE-dummy entries (matching the `fixed_effects=` path), `result.residuals` and `result.fitted_values` are on the un-demeaned outcome scale, and `result.r_squared` is computed on the un-demeaned outcome (so it absorbs the FE variance and will typically be higher than the within-R²). `result.att` is invariant to this routing (FWL guarantee). Downstream consumers reading `result.att` are unaffected; consumers reading the broader result surface should expect the full-dummy values. **Survey-design scope:** the auto-route changes the FE handling (and removes the prior absorbed-FE rejection), but `survey_design=` continues to drive its own variance path (Taylor-series linearization or replicate-weight variance, per the existing survey contract) rather than the analytical HC2/HC2-BM sandwich. The auto-route is therefore methodologically meaningful for non-survey fits and for the FE-handling side of survey fits; analytical small-sample inference under `vcov_type in {"hc2","hc2_bm"}` is bypassed when a survey design is supplied. +- **`SpilloverDiD` Gardner GMM first-stage uncertainty correction across HC1 / Conley / cluster (Wave D).** Closes the documented Wave B/C "SEs biased downward by a few percent" caveat. **Documented synthesis** of Butts (2021) Section 3.1 (the IF construction for spillover-aware DiD) + Gardner (2022) Section 4 (the two-stage GMM sandwich) + Conley (1999) (the spatial kernel). No reference software combines all three — `did2s` (Butts & Gardner) implements the Gardner correction without rings or Conley; `conleyreg` and `acreg` implement Conley without the two-stage correction. Wave D is the synthesis. Applies unconditionally under `vcov_type ∈ {"hc1", "conley", "cluster"}` for both `event_study=False` AND `event_study=True`. **Formula** (Butts 2021 §3.1 + Gardner 2022 §4): `psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}` where `gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)` is the stage-1-projection-of-stage-2 cross-moment; meat = `Psi' K Psi` with `K` dispatched by `vcov_type` (identity for HC1, block-indicator for cluster, spatial kernel for Conley); vcov = `(X_2' X_2)^{-1} @ meat @ (X_2' X_2)^{-1}`. **Finite-sample multipliers:** `n/(n-p)` for HC1; `G/(G-1) * (n-1)/(n-p)` for cluster CR1; no multiplier for Conley (preserves `conleyreg` / Wave B convention). **Public surface:** `vcov_type="classical"` now raises `NotImplementedError` upfront (the Wave D synthesis has not been derived for the homoskedastic meat structure `sigma_hat^2 * (X_10' X_10)`); REGISTRY's "vcov_type restrictions" block updated accordingly. **Point estimates unchanged** (`tau_total`, `delta_j`, event-study `tau_k` / `delta_jk` are byte-identical to Wave B/C); SE values shift upward by 1-few percent depending on first-stage residual variance. **Implementation:** new module-level helper `_compute_gmm_corrected_meat` in `diff_diff/two_stage.py` (NOT a modification of the existing `_compute_gmm_variance` method — TwoStageDiD's path is unchanged); new module-level helper `_build_butts_fe_design_csr` in `diff_diff/spillover.py`; new module-level helper `_compute_conley_meat` in `diff_diff/conley.py` factored out of `_compute_conley_vcov` so the same kernel-application code path handles both standard sandwich (`X * residuals`) and Wave D IF outer product (`Psi`) cases. **No new public API kwarg** — the correction is unconditional. Wave D variance mode dispatch derives from the public contract: `vcov_type="conley"` → `"conley"`; `cluster=` → `"cluster"` (CR1); otherwise `"hc1"`. **Wave B/C SE goldens re-pinned** at `tests/test_spillover.py::TestSpilloverDiDEventStudyBackwardCompat` (constants renamed `_WAVE_B_GOLDEN_*` → `_WAVE_D_GOLDEN_*`; pre-Wave-D references retained as commented baselines for the directional inflation invariant `_WAVE_B_UNCORRECTED_*`). **Tests:** new test classes `TestSpilloverDiDWaveDGmmCorrectedHc1Hand` (hand-derived `Psi` on a 4-unit × 3-period over-identified panel — matches at `atol=1e-12`), `TestSpilloverDiDWaveDGmmCorrectedEventStudy` (vcov shape on event-study path), `TestSpilloverDiDWaveDGmmCorrectedNanInferenceContract` (rank-deficient column propagation), `TestSpilloverDiDWaveDGmmCorrectedValidatorWiring` (Conley validator fires from the new helper), `TestSpilloverDiDWaveDGmmCorrectedFitIdempotence` (clone + repeat-fit bit-identity per `feedback_fit_does_not_mutate_config`), `TestSpilloverDiDWaveDPublicVarianceContract` (end-to-end public `cluster=` CR1 routing, single-cluster rejection, classical NotImplementedError). Closes the Gardner-GMM follow-up row in `TODO.md`. - **BaconDecomposition R parity goldens.** Closes the PR-B deferral row in `TODO.md`. JSON goldens at `benchmarks/data/r_bacondecomp_golden.json` generated from the committed `benchmarks/R/generate_bacon_golden.R` script (3 fixtures: `uniform_3groups_with_never_treated`, `two_groups_no_never_treated`, `always_treated_remapped`) against `bacondecomp 0.1.1` on R 4.5.2. `tests/test_methodology_bacon.py::TestBaconParityR` now active (4 tests, no skips): TWFE coefficient parity at `atol=1e-6` across all 3 fixtures; weights-sum parity at `atol=1e-6` across all 3 fixtures; per-component estimate + weight parity at `atol=1e-6` on the 2 non-remap fixtures **and on the 6 timing-vs-timing rows of `always_treated_remapped`** (carve-out narrowed to U-bucket rows only); plus a dedicated fold-back test (`test_always_treated_remapped_fold_back_matches_r`) that pins the **documented convention divergence** on `always_treated_remapped` (R keeps `first_treat=1` as a distinct timing cohort and emits `Later vs Always Treated` comparisons; Python's paper-footnote-11 convention remaps those units to `U` and folds them into a single `treated_vs_never` cell per treated cohort) by aggregating R's split rows per cohort and asserting they match Python's single fold at `atol=1e-6`. The aggregate is invariant per Theorem 1; the per-component breakdown differs structurally between conventions but the fold-back is now directly asserted. New `**Note (R parity convention divergence on always-treated)**` and `**Deviation (first-period boundary extension on always-treated remap)**` in `docs/methodology/REGISTRY.md`. **First-period boundary deviation:** the paper uses strict `t_i < 1` for the always-treated bucket; the library uses the inclusive `first_treat <= min(time)` rule and folds `first_treat == min(time)` cohorts into `U`. R does NOT apply this fold (it keeps such cohorts as their own bucket). When `min(time) > 1` the rules coincide. Explicitly labeled in REGISTRY's Deviations block and mirrored in `METHODOLOGY_REVIEW.md` and `bacon.py`. METHODOLOGY_REVIEW.md tracker row promoted `**Complete** (R parity goldens pending)` → `**Complete**`. - **`generate_ddd_panel_data` — panel-structured DGP for Triple-Difference power analysis** (`diff_diff/prep_dgp.py`). New public function exported from `diff_diff` and `diff_diff.prep` for panel DDD simulations. Cross-sectional `generate_ddd_data` remains available unchanged. Produces a balanced panel of `n_units × n_periods` with two unit-level binary dimensions (`group`, `partition`) and a derived `post = 1[period >= treatment_period]` indicator; columns: `unit, period, outcome, group, partition, post, treated, true_effect` (+ `x1, x2` when `add_covariates=True`). DDD-CPT identification holds because the `group * partition` interaction enters as a unit-level (time-invariant) term, leaving the triple-interaction `treatment_effect * group * partition * post` as the sole source of differential group × partition trend. Compatible with `TripleDifference(cluster="unit").fit(..., time="post")` (the cluster kwarg is required because `TripleDifference` is the repeated-cross-section `panel=FALSE` estimator and unclustered SE on panel-generated rows understates variance under within-unit serial correlation; the point estimate `att` is invariant to clustering — see the new `TripleDifference` REGISTRY note on panel-shaped input). Users get panel-realistic unit fixed effects and within-unit serial correlation while the binary 2×2×2 estimator surface is unchanged. **Stratified allocation:** the partition split is drawn stratified-by-group at the requested `partition_frac` so every `(group, partition)` cell receives at least one unit; a targeted `ValueError` is raised at fit-time when the rounded cell counts (`n_units`, `group_frac`, `partition_frac`) would leave any cell empty. This guarantees the 2x2x2 DDD surface is populated for any valid input — independent marginal sampling (the cross-sectional `generate_ddd_data` convention) could collapse cells when marginals are small (e.g., `n_units=4, group_frac=partition_frac=0.25`). Validates `1 <= treatment_period < n_periods`, `group_frac` and `partition_frac` strictly in `(0, 1)`, and `n_units >= 4`. Deterministic recovery (`noise_sd=0`) matches `treatment_effect` to ~1e-15 (covered by `tests/test_prep.py::TestGenerateDddPanelData`, 16 tests including infeasible-config rejection and smallest-feasible-config round-trip through `TripleDifference.fit`). `power.simulate_power` is NOT yet auto-routed to the panel DGP for `TripleDifference` (the existing `_ddd_dgp_kwargs` registry entry still ignores `n_periods` and the existing `_check_ddd_dgp_compat` warning still fires on non-default kwargs) — that wiring is tracked as a follow-up in TODO.md. - **BaconDecomposition: Goodman-Bacon (2021) methodology audit (PR-B).** Closes the BaconDecomposition row in `METHODOLOGY_REVIEW.md` (status flipped from **In Progress** → **Complete** — initially with an R-parity-goldens caveat that was closed by the parity-goldens bullet above in this same release). Builds on the PR #451 paper review at `docs/methodology/papers/goodman-bacon-2021-review.md`. **Audit outcomes:** (1) Rewrote `_recompute_exact_weights` in `bacon.py` to actually implement Theorem 1 (Eqs. 7-9 + 10e-g) — the prior "exact" implementation was missing the `(1-n_kU)` factor in the subsample variance, did not square the sample share, and added an extraneous `unit_share` factor not present in the paper; the post-hoc sum-to-1 normalization masked the relative-weight error but produced ~0.3% decomposition error vs TWFE on a 3-cohort + never-treated DGP. The rewrite computes the exact numerators of Eqs. 10e/f/g and lets the post-hoc normalization handle the `V̂^D` denominator (Theorem 1's identity guarantees `V̂^D = Σ numerators`). The TWFE-vs-weighted-sum identity now holds at `atol=1e-10` on both noisy and hand-calculable DGPs. (2) Added always-treated warn+remap per paper footnote 11: units whose `first_treat` is at or before the first observable period (`first_treat <= min(time)`, excluding the never-treated sentinels `0` and `np.inf`) are automatically remapped to the `U` (untreated) bucket via an internal column (`__bacon_first_treat_internal__`) with a `UserWarning`. Detection uses ordered-time logic on the **time axis**, so panels whose `time` column has negative or zero-crossing labels (event-time encodings) are handled correctly; the `0` sentinel restriction applies only to `first_treat`, not to `time`, and a real treatment cohort with `first_treat == 0` would still be folded into U today (re-label such cohorts to a non-sentinel value before fitting). The user's original `first_treat` column is preserved unchanged. The count is surfaced as a new `BaconDecompositionResults.n_always_treated_remapped` dataclass field, rendered in `summary()` output when nonzero. **`n_never_treated` reports TRUE never-treated only**, computed from the original user column before remap — remapped always-treated units appear separately as `n_always_treated_remapped`, no double-counting. (3) New methodology test file `tests/test_methodology_bacon.py` (34 tests across 6 classes post-release; the audit added ~24 tests and the R-parity-goldens bullet above expanded coverage: `TestBaconHandCalculation` hand-checks Eqs. 7-9 + 10b-d on a minimal balanced panel at `atol=1e-10`; `TestBaconParityR` (4 tests, all active post-release once the R parity goldens bullet above landed; skips cleanly with a regenerate-instructions pointer in partial-checkout scenarios where the JSON is unavailable); `TestBaconAlwaysTreatedRemap` regression-tests warn+remap mechanics including user-data-preservation; `TestBaconEdgeCases` exercises no-untreated, single-cohort, unbalanced panel, constant-ATT recovery; `TestBaconWeightModes` locks the new exact-is-default contract; `TestBaconSurveyDesignNarrowing` confirms survey_design composes with exact mode and warn+remap). (4) R `bacondecomp::bacon()` parity generator committed at `benchmarks/R/generate_bacon_golden.R` covering three DGP fixtures (3-groups-with-U, 2-groups-no-U, always-treated-remapped); the JSON goldens deferral at audit time was closed in this same release by the parity-goldens bullet above. (5) `docs/methodology/REGISTRY.md` `## BaconDecomposition` block replaced with the paper-review-sourced entry plus three new sub-notes: weight modes (exact vs approximate), always-treated remap, R parity status. **Explicit removal:** the prior REGISTRY block's "Weights may be negative for later-vs-earlier comparisons" claim was incorrect per Theorem 1 (decomposition weights are strictly positive and sum to 1; negative weights are an estimand-level phenomenon, not estimator-level) and is dropped from the new entry. Closes the BaconDecomposition follow-up tracked at `TODO.md` (the prior row added in PR #451 is replaced by a narrower R-parity-goldens deferral row). -- **`SpilloverDiD(event_study=True)` — per-event-time × ring decomposition (Butts 2021 Section 5 / Table 2).** Replaces the Wave B `NotImplementedError` gate with the full per-event-time × ring decomposition. Emits per-event-time direct effects `tau_k` and per-(ring, event-time) spillover effects `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) and a MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). A TwoStageDiD-compatible `event_study_effects: Dict[int, Dict[str, Any]]` alias (matching `two_stage.py:1355-1389` schema with `conf_int = (low, high)` tuple) is also emitted for consumption by `plot_event_study` (`SpilloverDiDResults` is wired into `_extract_plot_data` and prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic). `DiagnosticReport` integration is NOT wired in this PR — registering `SpilloverDiDResults` in `DiagnosticReport`'s applicability/method tables is queued as a follow-up. **Methodology spec:** the implementation operationalizes Butts Section 5's single `K_it` symbol as TWO event-time clocks — `K_direct = t - effective_first_treat(i)` for ever-treated unit rows, and `K_spill = t - earliest-in-range-cohort-onset(i)` for spillover rows (running min across activated cohorts; NaN for pre-trigger and far-away rows). `K_spill >= 0` structurally; negative-k spillover cells emit rectangularly with `coef = NaN, n_obs = 0`. **Reference period:** `ref_period = -1 - anticipation` (mirrors `TwoStageDiD` at `two_stage.py:486`); when `horizon_max` is set, `ref_period` must fall inside `[-horizon_max, +horizon_max]` or fit raises `ValueError` — silent floor-shift to `-horizon_max` would change identification (rejected per `feedback_no_silent_failures`). The reference row in `att_dynamic` / `event_study_effects` uses `coef = 0.0, se = 0.0, n_obs = 0, conf_int = (0.0, 0.0)` for TwoStageDiD parity. **`horizon_max` semantics (divergence from TwoStageDiD):** SpilloverDiD bins event-times outside `[-horizon_max, +horizon_max]` into endpoint pools (no observations dropped); TwoStageDiD filters those rows. The divergence is intentional and cross-documented. With `horizon_max=None`, the helper auto-detects the bin set from observed K values. **Scalar `att` aggregation:** when `event_study=True`, the top-level `att` is the **sample-share-weighted average** of post-treatment `tau_k` (`att = sum_{k >= 0} w_k * tau_k` with `w_k = n_treated_at_k / total`). SE comes from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment block of the stage-2 vcov — no separate fit. **Reduce-to-aggregate equivalence:** under a constant-tau DGP with `horizon_max=None`, the lincom-weighted scalar `att` reproduces Wave B's aggregate `tau_total` bit-identically in the deterministic limit (verified by `TestSpilloverDiDEventStudyReduceToAggregate`). Note: `horizon_max=0` is **not supported** under `event_study=True` (rejected at validation): the single bin `k=0` leaves no event-time pair to anchor the reference period against. Use `event_study=False` for a single aggregate direct effect (Wave B static spec); event-study mode requires `horizon_max>=1` or `horizon_max=None`. **Post-finite_mask sample contract:** `att_dynamic["n_obs"]`, `event_study_effects[k]["n_obs"]`, AND the scalar `att` share weights all reflect the POST-`finite_mask` stage-2 estimation sample (not the pre-mask design). On warn-and-drop fits (baseline-treated units without Omega_0 rows excluded), the reported `n_obs` per cell counts only rows that actually entered `solve_ols`. **Fail-closed scalar `att`:** if any post-treatment direct-effect coefficient is NaN (rank-deficient drop by `solve_ols`), the scalar `att` is set to NaN with an explicit warning rather than silently zeroing the dropped column's contribution via `np.nansum` on a fixed weight vector — inspect `att_dynamic` for the per-event-time coefficients and re-aggregate manually if appropriate. **Backward compatibility:** `event_study=False` leaves all Wave C fields (`att_dynamic`, `event_study_effects`, `horizon_max`, `reference_period`) as `None`. The aggregate stage-2 design construction, fit, and extraction logic on this path are byte-identical to Wave B; `TestSpilloverDiDEventStudyBackwardCompat` pins att / se / per-ring goldens captured on the unchanged aggregate path so any future drift fails the regression. **Variance:** same caveat as Wave B — per-event-time SEs use `solve_ols`'s standard variance (HC1 / Conley / cluster paths) WITHOUT the Gardner GMM first-stage uncertainty correction; planned Wave D follow-up closes this. **Tests:** `tests/test_spillover.py` adds 30 new test methods across event-study API, two-clock K helper, horizon binning, design builder, reference period, reduce-to-aggregate, identification MC (50 seeds, per-event-time tau_k recovery within 0.025), placebo pre-trends (Type I rate ≤ 0.30 over 50 seeds at alpha=0.10), singularity (rectangular schema), Conley integration (vcov shape + non-negative diagonal), summary/to_dict/pickle round-trip, event_study_effects schema parity with TwoStageDiD, lincom-att hand-computed, validation (`horizon_max < 0`, `ref_period < -horizon_max`), and fit idempotence. DGP factory `generate_butts_staggered_dgp` extended with `tau_per_event_time` and `delta_per_ring_per_event_time` callable kwargs (backward-compatible — both default to `None`, producing the Wave B scalar DGP bit-identically; verified by `tests/test_dgp_utils.py` with pinned SHA-256 baselines). -- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through). The Gardner GMM first-stage uncertainty correction is NOT applied at stage 2 in this PR (documented limitation; planned follow-up extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step). **Deferred features (planned follow-ups):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. +- **`SpilloverDiD(event_study=True)` — per-event-time × ring decomposition (Butts 2021 Section 5 / Table 2).** Replaces the Wave B `NotImplementedError` gate with the full per-event-time × ring decomposition. Emits per-event-time direct effects `tau_k` and per-(ring, event-time) spillover effects `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) and a MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). A TwoStageDiD-compatible `event_study_effects: Dict[int, Dict[str, Any]]` alias (matching `two_stage.py:1355-1389` schema with `conf_int = (low, high)` tuple) is also emitted for consumption by `plot_event_study` (`SpilloverDiDResults` is wired into `_extract_plot_data` and prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic). `DiagnosticReport` integration is NOT wired in this PR — registering `SpilloverDiDResults` in `DiagnosticReport`'s applicability/method tables is queued as a follow-up. **Methodology spec:** the implementation operationalizes Butts Section 5's single `K_it` symbol as TWO event-time clocks — `K_direct = t - effective_first_treat(i)` for ever-treated unit rows, and `K_spill = t - earliest-in-range-cohort-onset(i)` for spillover rows (running min across activated cohorts; NaN for pre-trigger and far-away rows). `K_spill >= 0` structurally; negative-k spillover cells emit rectangularly with `coef = NaN, n_obs = 0`. **Reference period:** `ref_period = -1 - anticipation` (mirrors `TwoStageDiD` at `two_stage.py:486`); when `horizon_max` is set, `ref_period` must fall inside `[-horizon_max, +horizon_max]` or fit raises `ValueError` — silent floor-shift to `-horizon_max` would change identification (rejected per `feedback_no_silent_failures`). The reference row in `att_dynamic` / `event_study_effects` uses `coef = 0.0, se = 0.0, n_obs = 0, conf_int = (0.0, 0.0)` for TwoStageDiD parity. **`horizon_max` semantics (divergence from TwoStageDiD):** SpilloverDiD bins event-times outside `[-horizon_max, +horizon_max]` into endpoint pools (no observations dropped); TwoStageDiD filters those rows. The divergence is intentional and cross-documented. With `horizon_max=None`, the helper auto-detects the bin set from observed K values. **Scalar `att` aggregation:** when `event_study=True`, the top-level `att` is the **sample-share-weighted average** of post-treatment `tau_k` (`att = sum_{k >= 0} w_k * tau_k` with `w_k = n_treated_at_k / total`). SE comes from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment block of the stage-2 vcov — no separate fit. **Reduce-to-aggregate equivalence:** under a constant-tau DGP with `horizon_max=None`, the lincom-weighted scalar `att` reproduces Wave B's aggregate `tau_total` bit-identically in the deterministic limit (verified by `TestSpilloverDiDEventStudyReduceToAggregate`). Note: `horizon_max=0` is **not supported** under `event_study=True` (rejected at validation): the single bin `k=0` leaves no event-time pair to anchor the reference period against. Use `event_study=False` for a single aggregate direct effect (Wave B static spec); event-study mode requires `horizon_max>=1` or `horizon_max=None`. **Post-finite_mask sample contract:** `att_dynamic["n_obs"]`, `event_study_effects[k]["n_obs"]`, AND the scalar `att` share weights all reflect the POST-`finite_mask` stage-2 estimation sample (not the pre-mask design). On warn-and-drop fits (baseline-treated units without Omega_0 rows excluded), the reported `n_obs` per cell counts only rows that actually entered `solve_ols`. **Fail-closed scalar `att`:** if any post-treatment direct-effect coefficient is NaN (rank-deficient drop by `solve_ols`), the scalar `att` is set to NaN with an explicit warning rather than silently zeroing the dropped column's contribution via `np.nansum` on a fixed weight vector — inspect `att_dynamic` for the per-event-time coefficients and re-aggregate manually if appropriate. **Backward compatibility:** `event_study=False` leaves all Wave C fields (`att_dynamic`, `event_study_effects`, `horizon_max`, `reference_period`) as `None`. The aggregate stage-2 design construction, fit, and extraction logic on this path are byte-identical to Wave B; `TestSpilloverDiDEventStudyBackwardCompat` pins att / se / per-ring goldens captured on the unchanged aggregate path so any future drift fails the regression. **Variance:** at original Wave C ship time per-event-time SEs used `solve_ols`'s standard variance (HC1 / Conley / cluster paths) WITHOUT the Gardner GMM first-stage uncertainty correction. **Superseded by the Wave D Gardner GMM first-stage correction in this same release** (see the Wave D bullet above): per-event-time SEs now apply the IF outer-product correction unconditionally and shift upward by 1-few percent relative to the original Wave C ship-time values. **Tests:** `tests/test_spillover.py` adds 30 new test methods across event-study API, two-clock K helper, horizon binning, design builder, reference period, reduce-to-aggregate, identification MC (50 seeds, per-event-time tau_k recovery within 0.025), placebo pre-trends (Type I rate ≤ 0.30 over 50 seeds at alpha=0.10), singularity (rectangular schema), Conley integration (vcov shape + non-negative diagonal), summary/to_dict/pickle round-trip, event_study_effects schema parity with TwoStageDiD, lincom-att hand-computed, validation (`horizon_max < 0`, `ref_period < -horizon_max`), and fit idempotence. DGP factory `generate_butts_staggered_dgp` extended with `tau_per_event_time` and `delta_per_ring_per_event_time` callable kwargs (backward-compatible — both default to `None`, producing the Wave B scalar DGP bit-identically; verified by `tests/test_dgp_utils.py` with pinned SHA-256 baselines). +- **`SpilloverDiD` — ring-indicator spillover-aware DiD (Butts 2021).** New standalone estimator at `diff_diff/spillover.py` implementing two-stage Gardner methodology with ring-indicator covariates that identify direct effect on treated (`tau_total`) alongside per-ring spillover effects on near-control units (`delta_j`). Documented synthesis of ingredients (no single published software covers the exact recipe — `did2s` implements Gardner two-stage without rings; the Butts ring estimator has no R/Stata package): Butts (2021) Section 5 / Table 2 identification, Gardner (2022) two-stage residualize-then-fit, and the Conley spatial-HAC vcov shipped in 3.3.3. Handles both panel non-staggered (Equations 5/6/8) and Section 5 staggered timing in one estimator — non-staggered is the special case where all treated units share an onset time. **API:** `SpilloverDiD(rings=[0, 50, 100, 200], conley_coords=("lat","lon"), ...).fit(data, outcome="y", unit="unit", time="t", treatment="D")` (binary D auto-converted to `first_treat`) or `.fit(..., first_treat="first_treat")` (Gardner convention). Result: `SpilloverDiDResults(DiDResults)` with `.att` = `tau_total`, `.spillover_effects` (per-ring `pd.DataFrame` with `coef`/`se`/`t_stat`/`p_value`/`ci_low`/`ci_high`), `.ring_breakpoints`, `.d_bar`, `.n_units_ever_in_ring`, `.n_far_away_obs`, `.is_staggered`. `.coefficients` exposes all `(1+K)` stage-2 entries (`"treatment"` + `"_spillover_"`) plus an `"ATT"` alias keyed to vcov columns. **Methodology spec (committed):** stage-2 regressor is the time-varying `(1 - D_it) * Ring_{it,j}` form (paper page 12's `S_it = S_i * 1{t >= t_treat}` notation; Section 5 Table 2's `S^k_{it}` / `Ring^k_{it,j}`). Reading the literal unit-static `(1 - D_it) * S_i` from Equation 5 is algebraically rank-deficient under TWFE (`(1-D_it) * S_i = S_i - D_it`, with `S_i` absorbed by `mu_i`, leaving `-D_it`); only the time-varying form supports the paper's identification (Proposition 2.3). Stage-1 subsample uses Butts' STRICTER `Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed), not TwoStageDiD's `{D_it = 0}` alone — this prevents spillover-contaminated near-controls in pre/post periods from biasing the time FE. **Gardner identity (non-staggered):** a 20-seed deterministic regression test pins `SpilloverDiD.att` against a direct single-stage TWFE ring regression on the full sample (`y ~ mu_i + lambda_t + tau * D_it + sum_j delta_j * (1 - D_it) * Ring_{it,j}`) at `atol=1e-10` — empirically bit-identical, so the reported non-staggered `tau_total` IS the Butts Eqs. 4-6 estimator. **Identification-check policy (period strict, unit warn-and-drop, plus connectivity):** every period must have at least one Omega_0 row (hard `ValueError` — dropping a period removes all units' cross-time identification). Units lacking Omega_0 rows (e.g. baseline-treated units with `D_it = 1` at every observed `t`) are warned-and-dropped: their unit FE is NaN, residualization writes NaN on their rows, and the downstream finite-mask path excludes them from stage 2 — mirrors `TwoStageDiD`'s always-treated convention. Additionally, the supported-units bipartite graph (units linked by shared Omega_0 periods) must form a single connected component; `K > 1` components raise `ValueError` because the FE solver would return only component-specific constants and residualization would silently mix them across components (defense-in-depth — under absorbing treatment the disconnected case may be unreachable through the upstream validators, but the check future-proofs Wave B follow-ups). **Public API restrictions (Wave B MVP):** `covariates=` raises `NotImplementedError` because Gardner-style two-stage requires covariate effects estimated on the untreated-and-unexposed subsample at stage 1 (appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates); non-absorbing / reversible treatment patterns (e.g. `[0, 1, 0]`) raise `ValueError` rather than being silently coerced into "treated from first 1 onward"; non-constant `first_treat` values across rows of the same unit raise `ValueError`; `conley_coords` is required on every fit path (not just `vcov_type="conley"`) because ring construction always uses it. **Far-away control identification:** uses CURRENT-period untreated status (`D_it = 0`) rather than never-treated-only, so all-eventually-treated staggered designs (no never-treated units) can identify the counterfactual via not-yet-treated far-away rows. **Variance (Wave B MVP ship-time):** stage-2 OLS variance via `solve_ols` (HC1 / Conley / cluster paths all flow through) WITHOUT the Gardner GMM first-stage uncertainty correction. **Superseded by the Wave D Gardner GMM first-stage correction in this same release** (see the Wave D bullet above): the GMM correction now applies unconditionally across HC1 / Conley / CR1 (via `cluster=`), shifting SE values upward by 1-few percent relative to the original Wave B ship-time values. **Deferred features (planned follow-ups, as of Wave B ship-time):** `event_study=True` per-event-time × ring coefficients (Butts Table 2), `survey_design=` integration, `ring_method="count"` (count-of-treated-in-ring), data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight), Gardner GMM first-stage correction at stage 2, sparse staggered ring-distance path. **Shipped in same release:** `event_study=True` (Wave C bullet above) + Gardner GMM first-stage correction (Wave D bullet above); remaining items still queued. **Tests:** `tests/test_spillover.py` (157 tests across ring-construction primitives, validators, fit integration, raw-data invariant, identification MC — non-staggered DGP at 50 seeds + 200-seed `@pytest.mark.slow` variant recovers both `tau_total` and `delta_1`; staggered DGP at 30 seeds anchors both `tau_total` and `delta_1` — Conley plumbing (verifies `solve_ols` is called with `vcov_type="conley"` + Conley kwargs, no silent HC1 fallback), Gardner identity bit-identity, coefficients-vs-vcov alignment, warn-and-drop, rank_deficient_action validation, Omega_0 bipartite-graph connectivity, anticipation behavior on both fit paths). DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` satisfy Butts Assumptions 1/3/5/7 by construction. - **`ChaisemartinDHaultfoeuille.predict_het` × `placebo`: R-parity on both global and per-path surfaces.** R-verified — `did_multiplegt_dyn(predict_het, placebo)` emits heterogeneity OLS results on backward (placebo) horizons via R's `DIDmultiplegtDYN:::did_multiplegt_main` placebo block (`effect = matrix(-i, ...)` rbind site); the same block runs per-by_level under `did_multiplegt_dyn(by_path, predict_het, placebo)`, so both global `res$results$predict_het` and per-by_level `res$by_level_i$results$predict_het` slots emit backward rows. R's predict_het syntax with `placebo > 0` requires the `c(-1)` sentinel in the horizon vector to trigger "compute heterogeneity for ALL forward (1..effects) AND ALL placebo (1..placebo) positions" — passing positive-only horizons errors with "specified numbers in predict_het that exceed the number of placebos". Python mirrors via `_compute_heterogeneity_test(..., placebo=L_max)` (set automatically from `self.placebo` at both global and per-path call sites in `fit()`) — the function iterates forward (1..L_max) and backward (-1..-L_max) horizons in a single loop with an explicit `out_idx < 0` eligibility guard for backward horizons whose `F_g` is too small (would otherwise silently misread `N_mat` via numpy negative indexing). `results.heterogeneity_effects` uses negative-int keys for backward horizons; `path_heterogeneity_effects` does the same per path. Placebo rows in `to_dataframe(level="by_path")` have non-NaN `het_*` columns when `placebo=True` and `heterogeneity=` are both set. **Survey gate (warn + skip):** `survey_design + placebo + heterogeneity` emits a `UserWarning` at fit-time and falls back to forward-horizon-only heterogeneity on both surfaces — the Binder TSL cell-period allocator's REGISTRY justification is tied to **post-period** attribution; backward-horizon attribution puts ψ_g mass on a pre-period cell, a separate library-extension claim that needs its own derivation. Forward-horizon `predict_het + survey_design` continues to work unchanged on both global and per-path surfaces. The function-level `_compute_heterogeneity_test` keeps a per-iteration `NotImplementedError` backstop for direct callers that bypass fit(). Pre-period allocator derivation deferred to a follow-up methodology PR (tracked in TODO.md). R parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityHeterogeneityWithPlacebo` (scenario 23, `multi_path_reversible_predict_het_with_placebo_global`, `placebo=2, effects=3, no by_path`) and `::TestDCDHDynRParityByPathHeterogeneityWithPlacebo` (scenario 22, same DGP plus `by_path=3`); pinned at `BETA_RTOL=1e-6` / `SE_RTOL=1e-5` for `beta` / `se` / `t_stat` / `n_obs` and `INFERENCE_RTOL=1e-4` for `p_value` / `conf_int` across 3 paths × (3 forward + 2 placebo) = 15 horizons + 1 global × 5 horizons. Cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPredictHetPlacebo` (placebo het column population, survey-gate warn+skip behavior, forward+survey anti-regression, `out_idx<0` eligibility guard, single-path telescope `path_heterogeneity_effects[(only_path,)] == heterogeneity_effects` bit-exactly, summary rendering, direct-call `NotImplementedError` backstop). Closes TODO #422. ### Changed diff --git a/TODO.md b/TODO.md index b1079ad3..e9304f0f 100644 --- a/TODO.md +++ b/TODO.md @@ -127,7 +127,6 @@ Deferred items from PR reviews that were not addressed before merge. | SyntheticDiD: bootstrap cross-language parity anchor against R's default `synthdid::vcov(method="bootstrap")` (refit; rebinds `opts` per draw) or Julia `Synthdid.jl::src/vcov.jl::bootstrap_se` (refit by construction). Same-library validation (placebo-SE tracking, AER §6.3 MC truth) is in place; a cross-language anchor is desirable to bolster the methodology contract. Julia is the cleanest target — minimal wrapping work and refit-native vcov. Tolerance target: 1e-6 on Monte Carlo samples (different BLAS + RNG paths preclude 1e-10). The R-parity fixture from the previous release was deleted because it pinned the now-removed fixed-weight path. | `benchmarks/R/`, `benchmarks/julia/`, `tests/` | follow-up | Low | | Conley + survey weights / `survey_design`. Score-reweighted meat `s_i = w_i · X_i · ε_i` is mechanical, but PSU clustering interaction with the spatial kernel and replicate-weights variance under spatial correlation are non-trivial (Bertanha-Imbens 2014 covers cluster-sample but not the explicit Conley case). Phase 5 of the spillover-conley initiative; paper review prerequisite. Currently raises `NotImplementedError` at the linalg validator. | `linalg.py::_validate_vcov_args` | Phase 5 (spillover-conley) | Medium | | `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low | -| `SpilloverDiD` Gardner GMM first-stage uncertainty correction at stage 2. Wave B MVP uses standard `solve_ols` variance (HC1 / Conley / cluster) without the influence-function adjustment for stage-1 FE estimation. Extending `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the IF outer-product step gives the full Butts (2021) Section 3.1 + Gardner (2022) Section 4 composition. See plan Risks #2 for the IF formula. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_gmm_variance` | follow-up (Wave B) | Medium | | `SpilloverDiD(survey_design=...)` integration. Currently raises `NotImplementedError`. Requires threading survey weights through the inline stage 1 + stage 2 and lifting `two_stage.py`'s survey path patterns. | `spillover.py::SpilloverDiD.fit` | follow-up (Wave B) | Low | | `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low | | `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low | diff --git a/diff_diff/conley.py b/diff_diff/conley.py index 812a8f16..1df54697 100644 --- a/diff_diff/conley.py +++ b/diff_diff/conley.py @@ -765,14 +765,12 @@ def _validate_conley_kwargs( ) -def _compute_conley_vcov( - X: np.ndarray, - residuals: np.ndarray, +def _compute_conley_meat( + scores: np.ndarray, coords: np.ndarray, cutoff: float, metric: ConleyMetric, kernel: str, - bread_matrix: np.ndarray, *, time: Optional[np.ndarray] = None, unit: Optional[np.ndarray] = None, @@ -780,21 +778,27 @@ def _compute_conley_vcov( cluster_ids: Optional[np.ndarray] = None, _conley_sparse: Optional[bool] = None, ) -> np.ndarray: - """Conley (1999) spatial HAC sandwich variance. + """Conley (1999) spatial HAC meat — ``scores' K scores`` for the product kernel. + + Factors the kernel-construction-and-application step out of + :func:`_compute_conley_vcov` so a second caller (SpilloverDiD's Wave D + Gardner GMM first-stage correction) can reuse the cross-sectional / + panel-block / sparse k-d-tree / cluster-product code path with an + arbitrary score matrix substituted for the canonical ``X * residuals``. Two operating modes: **Cross-sectional** (``time`` / ``unit`` / ``lag_cutoff`` all None): - Var̂(β) = bread_inv · (Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j') · bread_inv + meat = Σ_{i,j} K(d_ij/h) · scores_i scores_j' - Implemented via ``meat = S' K S`` where ``S = X * residuals[:, None]``. + Implemented via ``meat = scores' K scores``. **Panel block-decomposed** (all three keyword-only args set): - XeeX_spatial = Σ_t S_t' · K_space_t · S_t (within-period sum) - XeeX_serial = Σ_u S_u' · K_time_u · S_u if lag_cutoff > 0 (within-unit sum) - Var̂(β) = bread_inv · (XeeX_spatial + XeeX_serial) · bread_inv + meat_spatial = Σ_t scores_t' · K_space_t · scores_t (within-period sum) + meat_serial = Σ_u scores_u' · K_time_u · scores_u if lag_cutoff > 0 + meat = meat_spatial + meat_serial The serial Bartlett kernel ``K_time_u[i, j] = 1{|t_i-t_j| <= L, i != j} · (1 - |t_i-t_j|/(L+1))`` is hardcoded regardless of the user-supplied @@ -815,23 +819,17 @@ def _compute_conley_vcov( Inputs are assumed already validated by :func:`_validate_conley_kwargs`; the helper only does the math. Caller is responsible for the validator. + Emits a ``UserWarning`` if the smallest meat eigenvalue is materially + negative (< -1e-12) — radial 1-D Bartlett and uniform kernels are + practitioner specializations of Conley 1999 and are not formally + PSD-guaranteed. + Returns ------- - vcov : ndarray of shape (k, k) - - Notes - ----- - Neither the uniform kernel (negative spectral regions, Conley 1999 - footnote 11) nor the **radial 1-D Bartlett** specialization implemented - here is PSD-guaranteed. Conley's explicit PSD formula (Eq 3.14) is the - 2-D separable product window on a lattice; the radial pairwise form is - a practitioner specialization (R ``conleyreg``, Stata ``acreg``, Hsiang - 2010) that is not formally PSD. We emit a ``UserWarning`` if the smallest - meat eigenvalue is materially negative (< -1e-12) regardless of kernel. + meat : ndarray of shape (p, p) """ coords_arr = np.asarray(coords, dtype=np.float64) - S = X * residuals[:, np.newaxis] - n = X.shape[0] + n = scores.shape[0] # Factorize cluster_ids once if supplied, so per-slice mask construction # below can use integer comparisons rather than re-factorizing per call. @@ -877,7 +875,7 @@ def _kernel_fn(u: np.ndarray) -> np.ndarray: ) def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray: - """Compute the spatial meat ``S' K S`` for the given subset of rows. + """Compute the spatial meat ``scores' K scores`` for the given subset. ``mask`` may be ``None`` (use all rows) or a boolean array of length n. Dispatches to the sparse helper when ``use_sparse`` is True, otherwise @@ -886,11 +884,11 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray: (per-slice mask, NOT full n×n — saves memory for panel paths). """ if mask is None: - S_sub = S + scores_sub = scores coords_sub = coords_arr cluster_sub = cluster_codes else: - S_sub = S[mask] + scores_sub = scores[mask] coords_sub = coords_arr[mask] cluster_sub = cluster_codes[mask] if cluster_codes is not None else None if use_sparse: @@ -902,7 +900,7 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray: # would use more memory than dense); fall through to dense in # that case (the warning is already emitted by the helper). sparse_meat = _compute_spatial_bartlett_meat_sparse( - S_sub, coords_sub, cutoff, cast(str, metric), cluster_codes=cluster_sub + scores_sub, coords_sub, cutoff, cast(str, metric), cluster_codes=cluster_sub ) if sparse_meat is not None: return sparse_meat @@ -912,7 +910,7 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray: if cluster_sub is not None: cluster_mask = cluster_sub[:, None] == cluster_sub[None, :] K = K * cluster_mask - return S_sub.T @ K @ S_sub + return scores_sub.T @ K @ scores_sub # Suppress spurious BLAS-level "divide by zero / overflow" warnings on # macOS Accelerate when K is sparse-ish (most off-diagonals are exactly @@ -920,11 +918,11 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray: # the warning is a subnormal-handling false-positive in the AVX path. # We verify finiteness immediately after. if time is None: - # Phase 1 cross-sectional path: full n×n spatial sandwich. + # Cross-sectional path: full n×n spatial sandwich. with np.errstate(divide="ignore", over="ignore", invalid="ignore"): meat = _spatial_meat_for_mask(None) else: - # Phase 2 panel block-decomposed path (matches R conleyreg). + # Panel block-decomposed path (matches R conleyreg). time_arr = np.asarray(time) unit_arr = np.asarray(unit) # Normalize time labels to dense panel-period codes (0..T-1) so that @@ -940,8 +938,8 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray: # `conley_lag_cutoff` is meaningfully a "number of observed panel # periods" regardless of label scale. _, time_codes = np.unique(time_arr, return_inverse=True) - k = X.shape[1] - meat = np.zeros((k, k)) + p = scores.shape[1] + meat = np.zeros((p, p)) # Spatial component: within-period sandwich, summed across periods. # _spatial_meat_for_mask dispatches to sparse or dense per the toggle. with np.errstate(divide="ignore", over="ignore", invalid="ignore"): @@ -957,14 +955,14 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray: if L > 0: for u_val in np.unique(unit_arr): mask_u = unit_arr == u_val - S_u = S[mask_u] + scores_u = scores[mask_u] # Use dense panel-period codes (NOT raw labels) for lag math. t_u = time_codes[mask_u].astype(np.float64) lag_mat = np.abs(t_u[:, None] - t_u[None, :]) K_u = ((lag_mat <= L) & (lag_mat != 0)).astype(np.float64) * ( 1.0 - lag_mat / (L + 1.0) ) - meat += S_u.T @ K_u @ S_u + meat += scores_u.T @ K_u @ scores_u if not np.all(np.isfinite(meat)): raise ValueError( "Conley meat contains non-finite values; check residuals and " @@ -991,6 +989,66 @@ def _spatial_meat_for_mask(mask: Optional[np.ndarray] = None) -> np.ndarray: stacklevel=3, ) + return meat + + +def _compute_conley_vcov( + X: np.ndarray, + residuals: np.ndarray, + coords: np.ndarray, + cutoff: float, + metric: ConleyMetric, + kernel: str, + bread_matrix: np.ndarray, + *, + time: Optional[np.ndarray] = None, + unit: Optional[np.ndarray] = None, + lag_cutoff: Optional[int] = None, + cluster_ids: Optional[np.ndarray] = None, + _conley_sparse: Optional[bool] = None, +) -> np.ndarray: + """Conley (1999) spatial HAC sandwich variance. + + Thin wrapper around :func:`_compute_conley_meat`: builds + ``scores = X * residuals[:, None]``, delegates the meat construction, + then wraps with the supplied bread inverse. + + Two operating modes (both delegated to :func:`_compute_conley_meat`): + + **Cross-sectional** (``time`` / ``unit`` / ``lag_cutoff`` all None): + + Var̂(β) = bread_inv · (Σ_{i,j} K(d_ij/h) · X_i ε_i ε_j X_j') · bread_inv + + **Panel block-decomposed** (all three keyword-only args set): + + XeeX_spatial = Σ_t S_t' · K_space_t · S_t (within-period sum) + XeeX_serial = Σ_u S_u' · K_time_u · S_u if lag_cutoff > 0 (within-unit sum) + Var̂(β) = bread_inv · (XeeX_spatial + XeeX_serial) · bread_inv + + See :func:`_compute_conley_meat` for the kernel choice, sparse + k-d-tree fallback, cluster-product kernel, and PSD-warning details. + + Inputs are assumed already validated by :func:`_validate_conley_kwargs`; + this wrapper only does the math. Caller is responsible for the validator. + + Returns + ------- + vcov : ndarray of shape (k, k) + """ + scores = X * residuals[:, np.newaxis] + meat = _compute_conley_meat( + scores, + coords, + cutoff, + metric, + kernel, + time=time, + unit=unit, + lag_cutoff=lag_cutoff, + cluster_ids=cluster_ids, + _conley_sparse=_conley_sparse, + ) + # Sandwich via two solves (mirrors _compute_cr2_bm pattern in linalg.py) try: temp = np.linalg.solve(bread_matrix, meat) diff --git a/diff_diff/guides/llms-full.txt b/diff_diff/guides/llms-full.txt index 4539681e..c19c54cc 100644 --- a/diff_diff/guides/llms-full.txt +++ b/diff_diff/guides/llms-full.txt @@ -498,12 +498,13 @@ sp.fit( ) -> SpilloverDiDResults ``` -**Restrictions (Wave B MVP — planned follow-ups):** +**Restrictions and Wave C/D status:** -- `covariates=` raises `NotImplementedError`. Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates. Planned follow-up. -- `survey_design=` raises `NotImplementedError` (planned: SurveyDesign integration) -- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs share the same Wave B Gardner-GMM caveat (biased downward by a few percent; Wave D follow-up). -- Stage-2 variance is `solve_ols` HC1 / Conley / cluster — Gardner GMM first-stage uncertainty correction NOT applied (planned follow-up; SE is biased downward / too small, CIs too narrow, p-values too small — treat reported significance conservatively until the GMM correction lands) +- `covariates=` raises `NotImplementedError` (planned follow-up). Gardner two-stage requires covariate effects estimated on the untreated-and-unexposed Omega_0 subsample at stage 1; appending raw covariates only at stage 2 silently biases `tau_total` / `delta_j` on panels with time-varying covariates. +- `survey_design=` raises `NotImplementedError` (planned follow-up — SurveyDesign integration) +- `vcov_type="classical"` raises `NotImplementedError` (Wave D restriction). Wave D GMM first-stage correction has not been derived for the homoskedastic meat structure `sigma_hat^2 * (X_10' X_10)`. Use `vcov_type="hc1"`, `vcov_type="conley"`, or pair with `cluster=` for CR1 — all three apply the Wave D GMM correction. +- `event_study=True` SHIPPED (Wave C): emits per-event-time `tau_k` and per-(ring, event-time) `delta_jk` as `att_dynamic: pd.DataFrame` (indexed by event-time `k`) plus MultiIndex `spillover_effects: pd.DataFrame` (levels `(ring_label, event_time)`). TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias also emitted for `plot_event_study` consumption — `_extract_plot_data` prefers the new `reference_period` attribute over the legacy `n_obs==0` heuristic. (DiagnosticReport integration: NOT yet wired; queued as a follow-up.) (schema: `{k: {"effect", "se", "n_obs", "t_stat", "p_value", "conf_int": (low, high)}}` mirroring `two_stage.py:1355-1389`). Reference period `ref_period = -1 - anticipation` (TwoStageDiD `two_stage.py:486` convention); reference row uses `coef=0.0, se=0.0, n_obs=0, conf_int=(0.0, 0.0)`. Scalar `att` field becomes a sample-share-weighted average of post-treatment `tau_k` (`att = sum_{k>=0} w_k * tau_k` with `w_k = n_treated_at_k / total`) with SE from linear-combination inference `Var(att) = w' V_subset w` on the post-treatment vcov block — no separate fit. **Two-clock K_it:** direct-effect clock is `K_direct = t - effective_first_treat(i)` for ever-treated rows; spillover clock is `K_spill = t - earliest-in-range-cohort-onset(i)` (running min across activated cohorts, NaN pre-trigger). `K_spill >= 0` structurally; negative-k spillover cells are rectangularly emitted with `coef = NaN, n_obs = 0`. **`horizon_max` semantics:** bins event-times outside `[-H, +H]` into endpoint pools (no observations dropped — divergence from TwoStageDiD which filters; intentional, per `feedback_no_silent_failures`). With `horizon_max=None`, auto-detects bin set from observed K. **Validation:** `horizon_max < 0` raises `ValueError`; `ref_period < -horizon_max` (i.e., `anticipation > horizon_max - 1`) raises `ValueError` — silently floor-shifting the reference would change identification. **Reduce-to-aggregate:** under constant-tau DGP with `horizon_max=None`, the share-weighted scalar `att` reproduces Wave B's aggregate bit-identically. **Note:** `horizon_max=0` does NOT reduce to Wave B (binning collapses pre-treatment K values to `k=0`, making `D^0 = D_i` ever-treated indicator rather than `D_it`). Per-event-time SEs include the Wave D Gardner GMM first-stage correction (see next bullet). +- Stage-2 variance applies the Gardner GMM first-stage uncertainty correction across HC1 / Conley / cluster (Wave D, SHIPPED). The IF outer-product formula `psi_i = gamma_hat' X_{10,i} eps_{10,i} - X_{2,i} eps_{2,i}` is used unconditionally; kernel `K` is path-dependent (identity for HC1, block-indicator for cluster, spatial kernel for Conley). Documented synthesis of Butts (2021) §3.1 + Gardner (2022) §4 + Conley (1999); no reference software combines all three. Point estimates unchanged from Wave B/C; SE values shift upward by 1-few percent. - Only nearest-treated rings supported; `ring_method="count"` (count of treated neighbors in ring) not yet exposed **Usage:** diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py index 15a32cd0..541dce5e 100644 --- a/diff_diff/spillover.py +++ b/diff_diff/spillover.py @@ -30,6 +30,7 @@ import numpy as np import pandas as pd +from scipy import sparse from diff_diff.conley import ( _CONLEY_EARTH_RADIUS_KM, @@ -39,6 +40,7 @@ ) from diff_diff.linalg import solve_ols from diff_diff.results import SpilloverDiDResults +from diff_diff.two_stage import _compute_gmm_corrected_meat from diff_diff.utils import safe_inference # Type alias mirroring diff_diff.conley.ConleyMetric so callers can supply @@ -1495,6 +1497,89 @@ def _residualize_butts( return y_full - mu_per_row - lambda_per_row +def _build_butts_fe_design_csr( + unit_codes: np.ndarray, + time_codes: np.ndarray, + omega_0_mask: np.ndarray, +) -> Tuple[sparse.csr_matrix, sparse.csr_matrix]: + """Build sparse FE design matrices for Wave D Gardner GMM correction. + + Column layout: ``[unit_1, ..., unit_{U-1}, time_1, ..., time_{T-1}]``. + Drops the first unit dummy AND the first time dummy for identification + (mirrors ``TwoStageDiD._build_fe_design`` at ``two_stage.py:2046``). + + Parameters + ---------- + unit_codes : np.ndarray of shape (n,) + Integer codes 0..U-1 (from ``pd.factorize``). + time_codes : np.ndarray of shape (n,) + Integer codes 0..T-1 (from ``pd.factorize``). + omega_0_mask : np.ndarray of shape (n,) + Boolean mask. ``X_10`` rows where this is False are zeroed out + (treated AND exposed rows). ``X_1`` keeps all rows. + + Returns + ------- + X_1 : sparse.csr_matrix, shape (n, (U-1) + (T-1)) + Full-sample FE design with identification dropping. + X_10 : sparse.csr_matrix, shape (n, (U-1) + (T-1)) + Same column space as ``X_1`` but with ``~omega_0_mask`` rows zeroed. + Sharing column space is required for the Gardner cross-moment + ``gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)``. + + Notes + ----- + Rank-deficient ``X_10' X_10`` (e.g. warn-and-drop units with no + Omega_0 rows) is detected downstream by ``_compute_gmm_corrected_meat`` + via ``sparse_factorized`` failure → ``np.linalg.lstsq`` fallback with + a documented ``UserWarning``. + + **Re-factorization on entry:** when callers pass pre-mask integer + codes that have had interior values dropped via ``finite_mask`` (a + supported warn-and-drop fit), the input code arrays can be sparse — + e.g. ``unit_codes = [0, 1, 3, 4]`` with code 2 dropped. Building + ``X_10`` on the raw codes would materialize an all-zero FE column at + index 2, forcing ``sparse_factorized`` onto the dense + ``lstsq``/``XtX_10.toarray()`` fallback unnecessarily (large-memory + path on big panels). To avoid this, re-factorize via + :func:`pd.factorize` on entry to compact the code space to + ``0..n_unique-1`` (no-op when codes are already contiguous; mirrors + the column-space convention of ``TwoStageDiD._build_fe_design``). + """ + # Compact the code space before column construction — see Notes. + unit_codes = pd.factorize(unit_codes)[0] + time_codes = pd.factorize(time_codes)[0] + + n = unit_codes.shape[0] + n_units = int(unit_codes.max()) + 1 if n > 0 else 0 + n_times = int(time_codes.max()) + 1 if n > 0 else 0 + n_fe_cols = max(n_units - 1, 0) + max(n_times - 1, 0) + + def _build(mask: Optional[np.ndarray]) -> sparse.csr_matrix: + # Unit dummies (drop unit_code == 0 for identification). + u_keep = unit_codes > 0 + if mask is not None: + u_keep = u_keep & mask + u_rows = np.flatnonzero(u_keep) + u_cols = unit_codes[u_keep] - 1 + + # Time dummies (drop time_code == 0 for identification). + t_keep = time_codes > 0 + if mask is not None: + t_keep = t_keep & mask + t_rows = np.flatnonzero(t_keep) + t_cols = (max(n_units - 1, 0)) + (time_codes[t_keep] - 1) + + rows = np.concatenate([u_rows, t_rows]) + cols = np.concatenate([u_cols, t_cols]) + data = np.ones(len(rows), dtype=np.float64) + return sparse.csr_matrix((data, (rows, cols)), shape=(n, n_fe_cols)) + + X_1 = _build(mask=None) + X_10 = _build(mask=omega_0_mask) + return X_1, X_10 + + # ============================================================================= # Public estimator (skeleton — fit() implemented in Step 3) # ============================================================================= @@ -2026,11 +2111,19 @@ def fit( Notes ----- - Wave B MVP: stage-2 variance is the standard solve_ols estimator - (HC1 / Conley / cluster). The Gardner GMM sandwich first-stage - uncertainty correction is NOT applied (planned follow-up; see - TODO and plan Risks #2). Variance is therefore approximate (likely - underestimated by a few percent in typical settings). + Stage-2 variance applies the Wave D Gardner (2022) GMM first-stage + uncertainty correction across all supported ``vcov_type`` paths + (``"hc1"``, ``"conley"``, ``"cluster"`` via ``cluster=``). The + unified IF outer-product formula is + ``psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}`` + with ``meat = Psi' K Psi`` where ``K`` is path-dependent (identity + for HC1, block-indicator for cluster, spatial kernel for Conley). + Documented synthesis of Butts (2021) §3.1 + Gardner (2022) §4 + + Conley (1999); no reference software combines all three. + ``vcov_type="classical"`` raises ``NotImplementedError`` because + the Wave D synthesis has not been derived for the homoskedastic + meat structure ``sigma_hat^2 * (X_10' X_10)``; use ``"hc1"`` for + heteroskedasticity-robust SE with the GMM correction. """ if survey_design is not None: raise NotImplementedError( @@ -2122,9 +2215,31 @@ def fit( "degrees of freedom for correct p-values and CIs. Routing " "stage 2 through LinearRegression (which supplies the " "per-coefficient DOF metadata) is queued as a follow-up " - "extension. Use vcov_type='hc1', 'classical', 'conley', or " + "extension. Use vcov_type='hc1' or 'conley', or " "leave default; combine with cluster= for CR1." ) + if self.vcov_type == "classical": + # Wave D scope (user-confirmed 2026-05-17): the Gardner GMM + # first-stage uncertainty correction is implemented for HC1, + # Conley, and CR1 only. The classical (homoskedastic) variance + # has not been derived for the IF outer-product form in this + # PR — under classical assumptions the meat structure changes + # (`sigma_hat^2 * (X_10' X_10)` rather than `Psi' Psi`) and + # the Wave D synthesis (Butts §3.1 + Gardner §4 + Conley 1999) + # does not carry through directly. Reject upfront with a clear + # remediation message rather than silently HC1-ifying the + # request (per `feedback_no_silent_failures`). + raise NotImplementedError( + "SpilloverDiD does not support vcov_type='classical' under " + "the Wave D Gardner GMM first-stage uncertainty correction. " + "Wave D applies the GMM correction unconditionally and the " + "classical homoskedastic variance does not have a derived " + "IF outer-product form in the Wave D synthesis (Butts §3.1 " + "+ Gardner §4 + Conley 1999). Use vcov_type='hc1' for " + "heteroskedasticity-robust SE with the GMM correction, or " + "combine with cluster= for CR1 with the GMM correction. " + "Future PR may extend Wave D to the classical path." + ) # Step 0: defensive copy so the caller's DataFrame is never mutated. data = data.copy(deep=False) @@ -2590,29 +2705,151 @@ def fit( if self.event_study and event_study_meta is not None: event_study_meta["n_obs_per_col"] = (X_2_fit != 0).sum(axis=0).astype(np.int64) - # Step 15: stage-2 OLS with configured vcov via solve_ols. + # Step 15: stage-2 OLS — coef + residuals only. Wave D computes the + # vcov below via the Gardner GMM first-stage uncertainty correction + # (documented synthesis of Butts §3.1 + Gardner §4 + Conley 1999). + # `solve_ols` returns vcov=None when return_vcov=False. solve_kwargs: Dict[str, Any] = { - "return_vcov": True, + "return_vcov": False, "rank_deficient_action": self.rank_deficient_action, "column_names": col_names_all, - "vcov_type": self.vcov_type, - "cluster_ids": cluster_ids_fit, } + coef, residuals, _ = solve_ols(X_2_fit, y_tilde_fit, **solve_kwargs) # type: ignore[misc] + + # Wave D: Gardner GMM first-stage uncertainty correction. + # + # Reconstruct the stage-1 residual `eps_10` on the FULL sample: + # - On Omega_0 rows: eps_10 = y - mu_hat[i] - lambda_hat[t] + # - On ~Omega_0 rows: eps_10 = y (since X_10[i, :] = 0 collapses + # the IF product to just the stage-2 term; matches the Gardner + # formula at `two_stage.py:1633-1637`). + # unit_fe_arr / time_fe_arr may have NaN at warn-and-drop units; + # the downstream `finite_mask` subset drops those rows BEFORE the + # GMM helper builds Psi (NaN in eps_10 is intentionally tolerated + # at this stage — it is masked out before any matrix operation). + alpha_full = unit_fe_arr[np.asarray(unit_codes_full)] + beta_full = time_fe_arr[np.asarray(time_codes_full)] + eps_10_full = np.where(omega_0_mask, y_full - alpha_full - beta_full, y_full) + + # Subset stage-1 inputs to the fit sample (post-finite_mask). + if n_nan > 0: + eps_10_fit = eps_10_full[finite_mask] + unit_codes_fit = np.asarray(unit_codes_full)[finite_mask] + time_codes_fit = np.asarray(time_codes_full)[finite_mask] + omega_0_mask_fit = omega_0_mask[finite_mask] + else: + eps_10_fit = eps_10_full + unit_codes_fit = np.asarray(unit_codes_full) + time_codes_fit = np.asarray(time_codes_full) + omega_0_mask_fit = omega_0_mask + + # Handle rank-deficient column drops from solve_ols (NaN coefs). + # Subset to kept columns before building Psi; re-inflate vcov with + # NaN at dropped positions at the end so downstream indexing + # (vcov[0, 0] for tau_se, etc.) behaves like the pre-Wave-D path. + kept_col_mask = np.isfinite(coef) + n_kept = int(kept_col_mask.sum()) + if n_kept < len(coef): + X_2_kept = X_2_fit[:, kept_col_mask] + coef_kept = coef[kept_col_mask] + else: + X_2_kept = X_2_fit + coef_kept = coef + eps_2_fit = y_tilde_fit - X_2_kept @ coef_kept + + # Build stage-1 FE designs on the fit sample. Column space: + # [unit_1, ..., unit_{U-1}, time_1, ..., time_{T-1}] (drop-first + # identification, matches `TwoStageDiD._build_fe_design`). + X_1_sparse_fit, X_10_sparse_fit = _build_butts_fe_design_csr( + unit_codes_fit, + time_codes_fit, + omega_0_mask_fit, + ) + + # Conley spatial kwargs only when vcov_type == "conley". if self.vcov_type == "conley": coord_array_full = np.asarray(data[list(self.conley_coords)].values, dtype=np.float64) coord_array_fit = coord_array_full[finite_mask] if n_nan > 0 else coord_array_full - solve_kwargs.update( - { - "conley_coords": coord_array_fit, - "conley_cutoff_km": self.conley_cutoff_km, - "conley_metric": self.conley_metric, - "conley_time": time_vals_fit, - "conley_unit": unit_vals_fit, - "conley_lag_cutoff": self.conley_lag_cutoff, - } - ) + _conley_coords_arg = coord_array_fit + _conley_cutoff_arg = self.conley_cutoff_km + _conley_metric_arg = self.conley_metric + _conley_time_arg = time_vals_fit + _conley_unit_arg = unit_vals_fit + _conley_lag_arg = self.conley_lag_cutoff + else: + _conley_coords_arg = None + _conley_cutoff_arg = None + _conley_metric_arg = None + _conley_time_arg = None + _conley_unit_arg = None + _conley_lag_arg = None + + # Derive the Wave D variance mode from the PUBLIC contract: + # - vcov_type="conley" → "conley" (Conley spatial-HAC + GMM) + # - cluster= supplied → "cluster" (CR1 + GMM) + # - vcov_type="hc1" (default) → "hc1" + # `self.vcov_type` can be "hc1" / "classical" / "conley"; the public + # `cluster=` kwarg ORTHOGONALLY selects CR1. Pre-Wave-D the + # routing happened inside solve_ols; Wave D bypasses that path, so + # the dispatch must be reconstructed here. (Round 1 codex P0 fix: + # without this derivation, a user-supplied `cluster=` was + # silently ignored on the default hc1 path, yielding HC1 SEs when + # CR1 was requested.) + if self.vcov_type == "conley": + _wave_d_vcov_mode: "Literal['hc1', 'conley', 'cluster']" = "conley" + elif cluster_ids_fit is not None: + _wave_d_vcov_mode = "cluster" + else: + _wave_d_vcov_mode = "hc1" + + # Compute the GMM-corrected meat (Psi' K Psi). Caller-side bread + # sandwich below mirrors `TwoStageDiD._compute_gmm_variance` + # at `two_stage.py:1763-1791`. + meat_kept = _compute_gmm_corrected_meat( + X_1_sparse=X_1_sparse_fit, + X_10_sparse=X_10_sparse_fit, + eps_10=eps_10_fit, + X_2=X_2_kept, + eps_2=eps_2_fit, + vcov_type=_wave_d_vcov_mode, + cluster_ids=cluster_ids_fit, + conley_coords=_conley_coords_arg, + conley_cutoff_km=_conley_cutoff_arg, + conley_metric=_conley_metric_arg, + conley_kernel="bartlett", + conley_time=_conley_time_arg, + conley_unit=_conley_unit_arg, + conley_lag_cutoff=_conley_lag_arg, + ) - coef, residuals, vcov = solve_ols(X_2_fit, y_tilde_fit, **solve_kwargs) # type: ignore[misc] + # Bread sandwich: A_22^{-1} = (X_2' X_2)^{-1} via `np.linalg.solve` + # with dense lstsq fallback + UserWarning (mirrors the bread-fallback + # pattern at `two_stage.py:1763-1788`). + A_22_kept = X_2_kept.T @ X_2_kept + eye_kept = np.eye(A_22_kept.shape[0]) + try: + bread_kept = np.linalg.solve(A_22_kept, eye_kept) + except np.linalg.LinAlgError: + warnings.warn( + "SpilloverDiD Wave D bread: A_22 = X_2' X_2 is singular; " + "falling back to dense lstsq. SE may be unreliable.", + UserWarning, + stacklevel=2, + ) + bread_kept = np.linalg.lstsq(A_22_kept, eye_kept, rcond=None)[0] + vcov_kept = bread_kept @ meat_kept @ bread_kept + + # Re-inflate to (k, k) with NaN at rank-deficient column positions + # so downstream code (which indexes vcov[i, i] for per-coef SE) sees + # NaN for dropped columns — matches the pre-Wave-D solve_ols + # behavior at `linalg.py` (rank-deficient drops produce NaN coefs + + # NaN vcov entries). + if n_kept < len(coef): + vcov = np.full((len(coef), len(coef)), np.nan) + kept_idx = np.flatnonzero(kept_col_mask) + vcov[np.ix_(kept_idx, kept_idx)] = vcov_kept + else: + vcov = vcov_kept # Step 16a: shared df_resid computation. n_obs_eff = int(finite_mask.sum()) diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py index 9fe3991f..6f29b397 100644 --- a/diff_diff/two_stage.py +++ b/diff_diff/two_stage.py @@ -23,18 +23,18 @@ import warnings from dataclasses import replace -from typing import Any, Dict, List, Optional, Tuple +from typing import Any, Dict, List, Literal, Optional, Tuple import numpy as np import pandas as pd from scipy import sparse from scipy.sparse.linalg import factorized as sparse_factorized -# Maximum number of elements before falling back to per-column sparse aggregation. -# 10M float64 elements ≈ 80 MB peak allocation. Above this, per-column .getcol() -# trades throughput for bounded memory. Keep in sync with two_stage_bootstrap.py. -_SPARSE_DENSE_THRESHOLD = 10_000_000 - +from diff_diff.conley import ( + ConleyMetric, + _compute_conley_meat, + _validate_conley_kwargs, +) from diff_diff.linalg import solve_ols from diff_diff.two_stage_bootstrap import TwoStageDiDBootstrapMixin from diff_diff.two_stage_results import ( @@ -43,6 +43,253 @@ ) # noqa: F401 (re-export) from diff_diff.utils import safe_inference, warn_if_not_converged +# Maximum number of elements before falling back to per-column sparse aggregation. +# 10M float64 elements ≈ 80 MB peak allocation. Above this, per-column .getcol() +# trades throughput for bounded memory. Keep in sync with two_stage_bootstrap.py. +_SPARSE_DENSE_THRESHOLD = 10_000_000 + +# ============================================================================= +# Wave D — Gardner GMM-corrected meat for SpilloverDiD +# ============================================================================= + + +def _compute_gmm_corrected_meat( + *, + X_1_sparse: sparse.csr_matrix, + X_10_sparse: sparse.csr_matrix, + eps_10: np.ndarray, + X_2: np.ndarray, + eps_2: np.ndarray, + vcov_type: Literal["hc1", "conley", "cluster"], + cluster_ids: Optional[np.ndarray] = None, + conley_coords: Optional[np.ndarray] = None, + conley_cutoff_km: Optional[float] = None, + conley_metric: Optional[ConleyMetric] = None, + conley_kernel: str = "bartlett", + conley_time: Optional[np.ndarray] = None, + conley_unit: Optional[np.ndarray] = None, + conley_lag_cutoff: Optional[int] = None, +) -> np.ndarray: + """Gardner (2022) GMM first-stage uncertainty correction — unified IF meat. + + Returns the (p_2, p_2) meat matrix ``Psi' K Psi`` where: + + psi_i = gamma_hat' @ x_{10,i} * eps_{10,i} - x_{2,i} * eps_{2,i} + Psi = [psi_1; ...; psi_n] shape (n, p_2) + K = path-dependent kernel matrix + meat = Psi' @ K @ Psi shape (p_2, p_2) + + The caller wraps with the bread ``A_22^{-1} = (X_2' W X_2)^{-1}``: + ``V = A_22^{-1} @ meat @ A_22^{-1}``. + + **Methodology synthesis** (Wave D): no reference software combines all + three ingredients. Butts (2021) §3.1 gives the IF construction for + spillover-aware DiD; Gardner (2022) §4 gives the two-stage GMM sandwich; + Conley (1999) gives the spatial kernel. + + **Kernel dispatch:** + + - ``vcov_type="hc1"``: ``K = I_n``; ``meat = Psi' @ Psi`` with + ``n / (n - p_2)`` finite-sample multiplier. + - ``vcov_type="cluster"``: ``K_ij = 1{cluster_i = cluster_j}``; + ``meat = S_cluster' @ S_cluster`` where ``S_cluster[g] = sum_{i in g} psi_i``, + with ``G/(G-1) * (n-1)/(n - p_2)`` finite-sample multiplier. + - ``vcov_type="conley"``: ``K_ij = K_space(d_ij/h) * 1{cluster_i = cluster_j}`` + (cross-sectional) or panel-block decomposed (``conley_time`` / + ``conley_unit`` / ``conley_lag_cutoff`` set). No finite-sample + multiplier — preserves the ``conleyreg`` / Wave B convention. + + **`gamma_hat` solve** (mirror of `TwoStageDiD._compute_gmm_variance` + pattern at `two_stage.py:1648-1670`): factorize ``X_10' X_10`` via + ``sparse_factorized`` (fast path); fall back to dense ``lstsq`` with + UserWarning when factorization fails. ``gamma_hat`` has shape + ``(p_1, p_2)``. + + **Survey weights:** NOT supported in Wave D (parameter omitted). + Wave E will extend with stratified-survey × GMM × Conley methodology. + + Parameters + ---------- + X_1_sparse : sparse.csr_matrix, shape (n, p_1) + Full-sample FE design (drop-first-unit + drop-first-time + identification). + X_10_sparse : sparse.csr_matrix, shape (n, p_1) + FE design with treated AND exposed rows zeroed. Same column space + as X_1_sparse. + eps_10 : np.ndarray, shape (n,) + Stage-1 residual on Omega_0 rows; equal to y on ~Omega_0 rows + (the X_{10,i} = 0 product collapses the IF contribution to just + the stage-2 term). + X_2 : np.ndarray, shape (n, p_2) + Stage-2 design (treatment + ring columns for SpilloverDiD). + eps_2 : np.ndarray, shape (n,) + Stage-2 residual ``y_tilde - X_2 @ coef``. + vcov_type : {"hc1", "conley", "cluster"} + Kernel dispatch. + cluster_ids : np.ndarray of shape (n,), optional + Cluster identifiers. Required for ``vcov_type="cluster"``; + used as the product-kernel cluster mask under ``vcov_type="conley"`` + when supplied. HC1 path passes ``None``. + conley_coords, conley_cutoff_km, conley_metric, conley_kernel, + conley_time, conley_unit, conley_lag_cutoff + Conley spatial-HAC kwargs. Required when ``vcov_type="conley"``. + See :func:`diff_diff.conley._compute_conley_meat` for semantics. + + Returns + ------- + meat : np.ndarray of shape (p_2, p_2) + The IF outer-product meat, including any finite-sample multiplier. + Caller wraps with the bread for the full vcov. + """ + n, p_2 = X_2.shape + + # Validate Conley kwargs explicitly here. SpilloverDiD's Wave D path + # bypasses solve_ols's vcov computation, so _validate_vcov_args / + # _validate_conley_kwargs would not otherwise fire on this call. + if vcov_type == "conley": + _validate_conley_kwargs( + conley_coords, + conley_cutoff_km, + conley_metric, # type: ignore[arg-type] # validator raises ValueError if None + conley_kernel, + n, + time=conley_time, + unit=conley_unit, + lag_cutoff=conley_lag_cutoff, + cluster_ids=cluster_ids, + ) + + # 1. gamma_hat = (X_10' X_10)^{-1} (X_1' X_2). Mirror the existing + # TwoStageDiD method at two_stage.py:1648-1670 — sparse_factorized + # fast path with dense lstsq fallback + UserWarning on singular. + XtX_10 = X_10_sparse.T @ X_10_sparse # (p_1, p_1) sparse + Xt1_X2 = X_1_sparse.T @ X_2 # (p_1, p_2) dense + + try: + solve_XtX = sparse_factorized(XtX_10.tocsc()) + if Xt1_X2.ndim == 1: + gamma_hat = solve_XtX(Xt1_X2).reshape(-1, 1) + else: + gamma_hat = np.column_stack([solve_XtX(Xt1_X2[:, j]) for j in range(Xt1_X2.shape[1])]) + except RuntimeError as exc: + warnings.warn( + "SpilloverDiD Wave D GMM sandwich: sparse factorization of " + f"(X_10' X_10) failed ({type(exc).__name__}); falling back to " + "dense lstsq. This may indicate a rank-deficient or " + "near-singular Stage 1 design and SE estimates may be less " + "reliable.", + UserWarning, + stacklevel=2, + ) + gamma_hat = np.linalg.lstsq(XtX_10.toarray(), Xt1_X2, rcond=None)[0] + if gamma_hat.ndim == 1: + gamma_hat = gamma_hat.reshape(-1, 1) + + # 2. Psi = (X_10 @ gamma_hat) * eps_10[:, None] - X_2 * eps_2[:, None]. + # sparse @ dense = dense; element-wise scale by eps_10; subtract + # the stage-2 contribution. Shape (n, p_2). + Psi_stage1 = X_10_sparse @ gamma_hat # (n, p_2) dense + Psi = Psi_stage1 * eps_10[:, None] - X_2 * eps_2[:, None] + + if not np.all(np.isfinite(Psi)): + # Defensive: NaN in Psi would propagate silently through Psi.T @ Psi. + # Surface as a warning + return NaN meat so the downstream + # safe_inference path NaN-propagates per `feedback_no_silent_failures`. + warnings.warn( + "SpilloverDiD Wave D GMM sandwich: Psi matrix contains " + "non-finite values. Returning NaN meat; downstream inference " + "will NaN-propagate. This usually indicates rank-deficient " + "stage-1 FE design or non-finite residuals upstream.", + UserWarning, + stacklevel=2, + ) + return np.full((p_2, p_2), np.nan) + + # 3. Kernel dispatch. + if vcov_type == "hc1": + # K = I_n: meat = Psi' Psi with HC1 finite-sample multiplier. + # Fail closed when n - p_2 <= 0 (saturated design — every degree + # of freedom consumed by the stage-2 design): the multiplier + # n / (n - p_2) is undefined, so NaN-propagate per + # `feedback_no_silent_failures` rather than clamping the + # denominator and emitting finite SE on an underdetermined fit. + if n - p_2 <= 0: + warnings.warn( + "SpilloverDiD Wave D HC1 sandwich: saturated stage-2 design " + f"(n_obs={n}, effective_rank={p_2}, n-p_2={n - p_2} <= 0). " + "The HC1 finite-sample multiplier n/(n-p) is undefined. " + "Returning NaN meat so downstream inference NaN-propagates.", + UserWarning, + stacklevel=2, + ) + return np.full((p_2, p_2), np.nan) + meat_unscaled = Psi.T @ Psi + multiplier = n / (n - p_2) + meat = multiplier * meat_unscaled + elif vcov_type == "cluster": + if cluster_ids is None: + raise ValueError( + "_compute_gmm_corrected_meat: vcov_type='cluster' requires " + "cluster_ids; got None." + ) + # K_ij = 1{cluster_i = cluster_j}: aggregate Psi per-cluster then + # outer-product. S_cluster[g] = sum_{i in g} psi_i. + unique_clusters, cluster_indices = np.unique(cluster_ids, return_inverse=True) + G = len(unique_clusters) + # Mirror linalg.py:1942 — reject G<2 so the CR1 finite-sample + # multiplier G/(G-1) doesn't fabricate finite output on a degenerate + # one-cluster sample. + if G < 2: + raise ValueError(f"Need at least 2 clusters for cluster-robust SEs, got {G}") + # Fail closed on saturated design (n - p_2 <= 0). The CR1 + # multiplier (n-1)/(n-p) is undefined; emitting finite SE here + # would be silently wrong. + if n - p_2 <= 0: + warnings.warn( + "SpilloverDiD Wave D CR1 sandwich: saturated stage-2 design " + f"(n_obs={n}, effective_rank={p_2}, n-p_2={n - p_2} <= 0). " + "The CR1 finite-sample multiplier (n-1)/(n-p) is undefined. " + "Returning NaN meat so downstream inference NaN-propagates.", + UserWarning, + stacklevel=2, + ) + return np.full((p_2, p_2), np.nan) + S_cluster = np.zeros((G, p_2)) + for j in range(p_2): + np.add.at(S_cluster[:, j], cluster_indices, Psi[:, j]) + meat_unscaled = S_cluster.T @ S_cluster + # CR1 finite-sample multiplier: G/(G-1) * (n-1)/(n-p_2). Standard + # cluster-robust convention (Stata, R `sandwich::vcovCL(type='CR1')`). + multiplier = (G / (G - 1)) * ((n - 1) / (n - p_2)) + meat = multiplier * meat_unscaled + elif vcov_type == "conley": + if conley_coords is None or conley_cutoff_km is None or conley_metric is None: + raise ValueError( + "_compute_gmm_corrected_meat: vcov_type='conley' requires " + "conley_coords, conley_cutoff_km, and conley_metric." + ) + # Delegate to the shared kernel-application helper. No finite-sample + # multiplier on the Conley path (matches conleyreg / Wave B convention). + meat = _compute_conley_meat( + Psi, + conley_coords, + conley_cutoff_km, + conley_metric, + conley_kernel, + time=conley_time, + unit=conley_unit, + lag_cutoff=conley_lag_cutoff, + cluster_ids=cluster_ids, + ) + else: + raise ValueError( + f"_compute_gmm_corrected_meat: vcov_type must be one of " + f"'hc1', 'conley', 'cluster'; got {vcov_type!r}." + ) + + return meat + + # ============================================================================= # Main Estimator # ============================================================================= diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst index dfb8e163..89f2a7da 100644 --- a/docs/api/spillover.rst +++ b/docs/api/spillover.rst @@ -160,30 +160,33 @@ Estimator Comparison - ``D=0`` (untreated) - N/A (single stage) * - Conley spatial-HAC SE - - Yes (via solve_ols at stage 2) + - Yes (Wave D GMM-corrected sandwich) - Not yet supported - Yes * - Cluster-robust SE - - Yes (HC1 + CR1 via solve_ols) + - Yes (HC1 + CR1, Wave D GMM-corrected sandwich) - Yes (GMM sandwich + clusters) - Yes -Wave B MVP limitations ----------------------- - -The current implementation has the following documented limitations, -planned as follow-up enhancements: - -- **Gardner GMM first-stage correction at stage 2** — stage-2 variance - is the standard ``solve_ols`` HC1 / Conley / cluster estimator without - the influence-function adjustment for stage-1 FE estimation - uncertainty. The full GMM sandwich (Butts & Gardner 2022) is planned - as a follow-up; until then, reported SEs are biased downward - (underestimated by a few percent in typical settings) because they - omit the additional variance contribution from estimating the stage-1 - fixed effects. Confidence intervals are correspondingly too narrow - and p-values too small. Users should treat reported significance - conservatively until the GMM correction lands. +Restrictions and follow-ups +--------------------------- + +The current implementation has the following documented restrictions +and planned follow-up enhancements: + +- **Gardner GMM first-stage correction at stage 2** — SHIPPED in Wave D. + Stage-2 variance now applies the influence-function-based correction + for stage-1 FE estimation uncertainty across all three ``vcov_type`` + paths (HC1, Conley, cluster) on both ``event_study=False`` AND + ``event_study=True``. The IF formula is + ``psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i}`` + with ``gamma_hat = (X_10' X_10)^{-1} (X_1' X_2)``; the meat is + ``Psi' K Psi`` where ``K`` is the path-dependent kernel matrix + (identity for HC1, block-indicator for cluster, spatial kernel for + Conley). Documented synthesis of Butts (2021) Section 3.1 + Gardner + (2022) Section 4 + Conley (1999); no reference software combines all + three ingredients. Point estimates unchanged; SE values shift upward + by 1-few percent depending on first-stage residual variance. - **Event-study mode** — ``event_study=True`` is SHIPPED in Wave C. The per-event-time × ring decomposition (Butts Section 5 / Table 2) emits per-event-time direct effects ``tau_k`` and per-(ring, @@ -204,8 +207,9 @@ planned as follow-up enhancements: ``event_study=False`` instead). Scalar ``att`` becomes a sample-share-weighted average of post-treatment ``tau_k`` with SE from linear-combination inference on the post-treatment vcov block. - Per-event-time SEs share the same Wave B Gardner-GMM caveat - (biased downward by a few percent; Wave D follow-up will close). + Per-event-time SEs apply the Wave D Gardner GMM first-stage + uncertainty correction (see the "Gardner GMM first-stage correction" + entry above). - **Survey-design integration** — ``survey_design=`` raises ``NotImplementedError``. - **Count-of-treated-in-ring** — only the "nearest-treated ring" @@ -220,6 +224,15 @@ planned as follow-up enhancements: per-coefficient BM / CR2 DOF and raise ``NotImplementedError``. Routing stage 2 through ``LinearRegression`` (which supplies the per-coefficient DOF metadata) is queued. +- **`vcov_type="classical"` (Wave D restriction)** — raises + ``NotImplementedError``. The Wave D Gardner GMM first-stage + uncertainty correction has not been derived for the classical + homoskedastic variance (different meat structure + ``sigma_hat^2 * (X_10' X_10)`` vs the Wave D IF outer product + ``Psi' Psi``). Use ``vcov_type="hc1"`` for heteroskedasticity-robust + SE with the GMM correction, or combine with ``cluster=`` for + CR1 with the GMM correction; both apply the Wave D synthesis + (Butts §3.1 + Gardner §4 + Conley 1999) unconditionally. - **Balanced panel required** — every unit must observe every period. An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected FE components and unidentified stage-1 residuals on treated rows. diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index 478716b8..b5633576 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -3034,7 +3034,7 @@ The standard error comes from **linear-combination inference** on the post-treat **Reduce-to-aggregate equivalence:** Under a **constant-tau DGP** with `horizon_max=None`, the sample-share-weighted scalar `att` reproduces Wave B's aggregate `tau_total` (bit-identical at machine precision in the deterministic limit; small MC noise on realized panels). This is the canonical equivalence path. Note: `horizon_max=0` is **not supported** under `event_study=True` (rejected at validation with a clear remediation message): the single bin `k=0` leaves no event-time pair to anchor the reference period `ref_period = -1 - anticipation` against. Users wanting a single aggregate direct effect should use `event_study=False` (the Wave B static spec); event-study mode requires `horizon_max>=1` or `horizon_max=None`. -**Variance:** Same caveat as Wave B — per-event-time SEs use the standard `solve_ols` estimator (HC1 / Conley / cluster paths) WITHOUT the Gardner GMM first-stage uncertainty correction. Per-`tau_k` and per-`delta_jk` SEs are biased downward by the same few percent. The Wave D follow-up will close this. +**Variance:** Per-event-time SEs apply the Wave D Gardner GMM first-stage uncertainty correction (see "Variance (Wave D)" subsection below). Per-`tau_k` and per-`delta_jk` SEs are shifted upward by a few percent relative to Wave C uncorrected SEs. **Assumptions (Butts 2021):** @@ -3045,9 +3045,31 @@ The standard error comes from **linear-combination inference** on the post-treat - **Assumption 7 (Spillover Effect Parallel Trends)** — counterfactual trends do not depend on `(D_i, S_i)` for `S_i ∈ {0, 1}`. Required to identify `gamma_0` / `delta_j`. - **Assumption 8 (Parallel Counterfactual Trends, Staggered)** — additive unit + time FE structure on untreated/unexposed potential outcomes. Stronger than Assumption 3. -**Variance (Wave B MVP — documented limitation):** +**Variance (Wave D — Gardner GMM first-stage correction across HC1 / Conley / cluster):** -The stage-2 variance is the standard `solve_ols` estimator (HC1 / Conley / cluster paths, all dispatched via `vcov_type`). The **Gardner GMM sandwich first-stage uncertainty correction is NOT applied** at stage 2 in this PR. The full GMM + Conley composition is queued as a follow-up enhancement that extends `two_stage.py::_compute_gmm_variance` to accept a Conley kernel matrix in place of HC1's identity at the influence-function outer-product step (see Wave B plan Risks #2 for the IF formula). The reported SE is therefore **biased downward** (underestimated by a few percent in typical settings) because it omits the additional variance contribution from estimating the stage-1 FE; confidence intervals are correspondingly too narrow and p-values too small. Treat reported significance conservatively until the GMM correction lands. +Stage-2 variance applies the Gardner (2022) GMM sandwich influence-function correction for stage-1 FE estimation uncertainty across all three `vcov_type` paths. The unified IF outer-product formula: + +``` +psi_i = gamma_hat' * X_{10,i} * eps_{10,i} - X_{2,i} * eps_{2,i} # shape (p_2,) +Psi = [psi_1; ...; psi_n] # (n, p_2) +gamma_hat = (X_10' X_10)^{-1} (X_1' X_2) # (p_1, p_2) +meat = Psi' @ K @ Psi # (p_2, p_2) +vcov = (X_2' X_2)^{-1} @ meat @ (X_2' X_2)^{-1} +``` + +where the kernel `K` is path-dependent: + +- **HC1**: `K = I_n` → `meat = Psi' Psi` with `n / (n - p_2)` finite-sample multiplier. +- **Cluster CR1**: `K_ij = 1{cluster_i = cluster_j}` → per-cluster sum + outer product, with `G / (G-1) * (n-1) / (n - p_2)` finite-sample multiplier. +- **Conley**: `K_ij = kernel(d_ij / cutoff) * 1{cluster_i = cluster_j}` (cross-sectional) or panel-block decomposed (`conley_time` / `conley_unit` / `conley_lag_cutoff` set). No finite-sample multiplier — matches `conleyreg` convention. + +The correction applies unconditionally (no opt-out kwarg). Point estimates (`tau_total`, `delta_j`, event-study `tau_k` / `delta_jk`) are byte-identical to the pre-Wave-D path; SE values shift upward by 1-few percent. + +- **Note (documented synthesis):** no R / Stata software combines all three ingredients (Butts (2021) §3.1 IF construction for spillover-aware DiD + Gardner (2022) §4 two-stage GMM sandwich + Conley (1999) spatial kernel). `did2s` (Gardner) implements GMM with HC1/cluster but no Conley. `conleyreg` / `acreg` implement Conley but no two-stage correction. Wave D is the documented synthesis. +- **Note (no finite-sample multiplier on Conley path):** preserves the `conleyreg` / Wave B convention. HC1 and cluster paths apply the standard `n/(n-p)` and `G/(G-1) * (n-1)/(n-p)` multipliers respectively. +- **Note (Conley meat may be non-PSD):** the radial 1-D Bartlett and uniform kernels are practitioner specializations of Conley 1999 and are not formally PSD-guaranteed; a `UserWarning` fires when the smallest meat eigenvalue is < -1e-12. Applies on both standard-sandwich and GMM-corrected sandwich paths. + +**Implementation:** new module-level helper `_compute_gmm_corrected_meat` at `diff_diff/two_stage.py` (NOT a modification of the existing `_compute_gmm_variance` method — TwoStageDiD's path is unchanged); new helper `_build_butts_fe_design_csr` at `diff_diff/spillover.py`; `_compute_conley_meat` factored out of `_compute_conley_vcov` at `diff_diff/conley.py` so the same kernel-application code path handles both standard sandwich (`X * residuals`) and Wave D IF outer product (`Psi`). **Edge cases (from paper Section 3.2 / Discussion):** @@ -3067,12 +3089,12 @@ The stage-2 variance is the standard `solve_ols` estimator (HC1 / Conley / clust - `survey_design=` raises `NotImplementedError` — planned follow-up. - `covariates=` raises `NotImplementedError` — Gardner-style stage-1 residualization not yet wired through; planned follow-up. - `ring_method="count"` not exposed — only the nearest-treated-ring specification. -- `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`} raises `NotImplementedError` — current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF. Use `"hc1"`, `"classical"`, `"conley"`, or pair with `cluster=` for CR1. +- `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`, `"classical"`} raises `NotImplementedError` — `hc2`/`hc2_bm` because current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF; `classical` because the Wave D Gardner GMM first-stage correction has not been derived for the classical homoskedastic variance (different meat structure `sigma_hat^2 * (X_10' X_10)` vs the Wave D IF outer product `Psi' Psi`). Use `"hc1"` or `"conley"`, or pair with `cluster=` for CR1 — all three apply the Wave D GMM correction. - **`rings[0]` must equal 0** — the partition must cover treated locations (`d_it = 0` belongs to Ring 1). Rings starting at a nonzero inner edge would leave units in `0 <= d_it < rings[0]` as exposed-but-unmodeled, silently biasing the estimator. Validator rejects such inputs. - **Balanced panel required (Wave B MVP)** — every unit must observe every period. An unbalanced (unit, time) Ω₀ bipartite graph can produce disconnected FE components and unidentified stage-1 residuals on treated rows. Exact graph-connectivity-based identification (which would relax this to a strictly weaker condition) is queued as a follow-up extension. Validator rejects unbalanced inputs. - **One row per `(unit, time)` cell required** — duplicate cells silently re-weight stage-1 FE estimation AND stage-2 OLS. Validator rejects duplicate cells. - Data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight) not exposed. -- Gardner GMM first-stage correction at stage 2 not applied (HC1/Conley/cluster only; documented limitation). +- Gardner GMM first-stage correction at stage 2 SHIPPED in Wave D — see "Variance (Wave D)" subsection above. Applies unconditionally across HC1 / Conley / cluster. **Implementation:** `diff_diff/spillover.py`. Public class `SpilloverDiD`; result class `SpilloverDiDResults(DiDResults)` at `diff_diff/results.py`. Tests at `tests/test_spillover.py`; DGP factories `tests/_dgp_utils.py::generate_butts_nonstaggered_dgp` / `generate_butts_staggered_dgp` (satisfy Butts Assumptions 1/3/5/7 by construction). diff --git a/tests/test_spillover.py b/tests/test_spillover.py index 53546743..267ca3eb 100644 --- a/tests/test_spillover.py +++ b/tests/test_spillover.py @@ -869,13 +869,14 @@ def test_conley_fit_runs(self): assert result.conley_lag_cutoff == 0 assert np.isfinite(result.se) - def test_conley_kwargs_threaded_to_solve_ols(self): - """Round-8 CI review P3 (test coverage gap): the previous test was a - smoke test that only asserted finite SE + ATT invariance — a silent - fallback to HC1 would have passed. This test plumbing-verifies that - `solve_ols` is actually called with `vcov_type="conley"` AND the - Conley-specific kwargs (`conley_coords`, `conley_cutoff_km`, - `conley_metric`, `conley_time`, `conley_unit`, `conley_lag_cutoff`). + def test_conley_kwargs_threaded_to_gmm_helper(self): + """PR #456 R8 plumbing test, updated for Wave D: verifies that Conley + kwargs flow to ``_compute_gmm_corrected_meat`` (the Wave D entry + point) rather than ``solve_ols``'s vcov path. Pre-Wave-D this test + patched ``solve_ols`` directly; Wave D bypasses solve_ols's vcov + computation in favor of the GMM-corrected sandwich, so the spy now + wraps the GMM helper. The test's purpose — proving no silent HC1 + fallback — is preserved. """ from unittest.mock import patch @@ -890,28 +891,25 @@ def test_conley_kwargs_threaded_to_solve_ols(self): conley_metric="haversine", conley_lag_cutoff=0, ) - # Patch solve_ols at the import site in spillover.py so we can - # observe the kwargs SpilloverDiD passes through at stage 2. + import diff_diff.spillover as spillover_mod captured: dict = {} - original_solve_ols = spillover_mod.solve_ols + original_helper = spillover_mod._compute_gmm_corrected_meat - def spy_solve_ols(*args, **kwargs): - # Capture the LAST call's kwargs (stage 2 is the last solve_ols - # invocation in fit()). + def spy_helper(*args, **kwargs): captured.clear() captured.update(kwargs) - return original_solve_ols(*args, **kwargs) + return original_helper(*args, **kwargs) - with patch.object(spillover_mod, "solve_ols", side_effect=spy_solve_ols): + with patch.object(spillover_mod, "_compute_gmm_corrected_meat", side_effect=spy_helper): result = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") - # Conley kwargs reached solve_ols (no silent HC1 fallback). + # Conley kwargs reached the GMM helper (no silent HC1 fallback). assert ( captured.get("vcov_type") == "conley" - ), f"expected solve_ols vcov_type='conley', got {captured.get('vcov_type')!r}" + ), f"expected vcov_type='conley', got {captured.get('vcov_type')!r}" assert captured.get("conley_cutoff_km") == 200.0 assert captured.get("conley_metric") == "haversine" assert captured.get("conley_lag_cutoff") == 0 @@ -923,8 +921,8 @@ def spy_solve_ols(*args, **kwargs): conley_unit = captured.get("conley_unit") assert conley_time is not None and len(conley_time) == result.n_obs assert conley_unit is not None and len(conley_unit) == result.n_obs - # And the reported SE is finite (the actual Conley computation - # completed end-to-end). + # And the reported SE is finite (the actual GMM-corrected Conley + # computation completed end-to-end). assert np.isfinite(result.se) def test_conley_att_invariant_vs_hc1(self): @@ -3484,27 +3482,35 @@ def test_none_anticipation_raises_targeted_value_error(self): class TestSpilloverDiDEventStudyBackwardCompat: - """event_study=False reproduces the unchanged Wave B aggregate path. - - The golden values below were captured against the current (Wave C) - `event_study=False` path on `generate_butts_nonstaggered_dgp(seed=42)`. - Wave C does not modify the aggregate stage-2 design construction - (``spillover.py`` lines around the ``else`` branch at the `event_study` - dispatch), the stage-2 fit, or the aggregate extraction logic — those - lines are byte-identical to Wave B in this PR. The PIN therefore anchors - the unchanged aggregate path against accidental drift, but it is not a - literal "pre-Wave-C" checkout artifact. Any future change to the - aggregate path must update both these goldens and the CHANGELOG - aggregate-path bit-identity claim simultaneously. + """event_study=False reproduces the aggregate path; SEs reflect Wave D. + + The COEF golden values are byte-identical to the Wave B/C pin (Wave D + changes only the variance estimator; point estimates are unchanged). + The SE golden values are re-pinned for Wave D — the Gardner GMM + first-stage uncertainty correction inflates SEs upward by a few percent + relative to Wave B/C, closing the documented "biased downward" caveat. + + Pre-Wave-D references (commented for the directional-inflation invariant): + ATT : -0.08620379515400438 (unchanged) + SE : 0.017812406263278957 → Wave D 0.01849079486245095 (+3.8%) + inner SE : 0.008298917907045593 → Wave D 0.009669525127172741 (+16.5%) + outer SE : 0.015538307675860204 → Wave D 0.016311550606451834 (+5.0%) """ - # PR #456 R3 golden capture (event_study=False on the seed-42 fixture). - _WAVE_B_GOLDEN_ATT = -0.08620379515400438 - _WAVE_B_GOLDEN_SE = 0.017812406263278957 - _WAVE_B_GOLDEN_RING_INNER_COEF = -0.0371780776943839 - _WAVE_B_GOLDEN_RING_INNER_SE = 0.008298917907045593 - _WAVE_B_GOLDEN_RING_OUTER_COEF = -0.009441319618178406 - _WAVE_B_GOLDEN_RING_OUTER_SE = 0.015538307675860204 + # Wave D golden capture (event_study=False on the seed-42 fixture, with + # GMM first-stage correction applied across HC1). + _WAVE_D_GOLDEN_ATT = -0.08620379515400438 + _WAVE_D_GOLDEN_SE = 0.01849079486245095 + _WAVE_D_GOLDEN_RING_INNER_COEF = -0.0371780776943839 + _WAVE_D_GOLDEN_RING_INNER_SE = 0.009669525127172741 + _WAVE_D_GOLDEN_RING_OUTER_COEF = -0.009441319618178406 + _WAVE_D_GOLDEN_RING_OUTER_SE = 0.016311550606451834 + + # Pre-Wave-D (uncorrected) SE references — used by the directional + # inflation invariant to prove the correction moved SE upward. + _WAVE_B_UNCORRECTED_SE = 0.017812406263278957 + _WAVE_B_UNCORRECTED_INNER_SE = 0.008298917907045593 + _WAVE_B_UNCORRECTED_OUTER_SE = 0.015538307675860204 def test_event_study_false_matches_wave_b_golden(self): """Pre-Wave-C golden parity (not just determinism): pin att/se on a @@ -3536,40 +3542,40 @@ def test_event_study_false_matches_wave_b_golden(self): # Linux py3.14 drifts ~1 ULP from macOS Accelerate captures). np.testing.assert_allclose( res.att, - self._WAVE_B_GOLDEN_ATT, + self._WAVE_D_GOLDEN_ATT, rtol=1e-14, atol=1e-14, - err_msg=f"event_study=False att drift: got {res.att!r}, expected {self._WAVE_B_GOLDEN_ATT!r}", + err_msg=f"event_study=False att drift: got {res.att!r}, expected {self._WAVE_D_GOLDEN_ATT!r}", ) np.testing.assert_allclose( res.se, - self._WAVE_B_GOLDEN_SE, + self._WAVE_D_GOLDEN_SE, rtol=1e-14, atol=1e-14, - err_msg=f"event_study=False se drift: got {res.se!r}, expected {self._WAVE_B_GOLDEN_SE!r}", + err_msg=f"event_study=False se drift: got {res.se!r}, expected {self._WAVE_D_GOLDEN_SE!r}", ) # Per-ring entries must also match. inner = res.spillover_effects.loc["[0, 50)"] np.testing.assert_allclose( inner["coef"], - self._WAVE_B_GOLDEN_RING_INNER_COEF, + self._WAVE_D_GOLDEN_RING_INNER_COEF, rtol=1e-14, atol=1e-14, - err_msg=f"inner ring coef drift: got {inner['coef']!r}, expected {self._WAVE_B_GOLDEN_RING_INNER_COEF!r}", + err_msg=f"inner ring coef drift: got {inner['coef']!r}, expected {self._WAVE_D_GOLDEN_RING_INNER_COEF!r}", ) np.testing.assert_allclose( inner["se"], - self._WAVE_B_GOLDEN_RING_INNER_SE, + self._WAVE_D_GOLDEN_RING_INNER_SE, rtol=1e-14, atol=1e-14, - err_msg=f"inner ring se drift: got {inner['se']!r}, expected {self._WAVE_B_GOLDEN_RING_INNER_SE!r}", + err_msg=f"inner ring se drift: got {inner['se']!r}, expected {self._WAVE_D_GOLDEN_RING_INNER_SE!r}", ) outer = res.spillover_effects.loc["[50, 200]"] np.testing.assert_allclose( - outer["coef"], self._WAVE_B_GOLDEN_RING_OUTER_COEF, rtol=1e-14, atol=1e-14 + outer["coef"], self._WAVE_D_GOLDEN_RING_OUTER_COEF, rtol=1e-14, atol=1e-14 ) np.testing.assert_allclose( - outer["se"], self._WAVE_B_GOLDEN_RING_OUTER_SE, rtol=1e-14, atol=1e-14 + outer["se"], self._WAVE_D_GOLDEN_RING_OUTER_SE, rtol=1e-14, atol=1e-14 ) def test_event_study_false_bit_identical_to_wave_b_fixture(self): @@ -3596,6 +3602,44 @@ def test_event_study_false_bit_identical_to_wave_b_fixture(self): assert res_a.att == res_b.att assert res_a.se == res_b.se + def test_wave_d_se_inflates_relative_to_wave_b_uncorrected(self): + """Wave D directional invariant: GMM-corrected SE > uncorrected SE. + + Locks the methodological direction of the Wave D correction: + accounting for first-stage FE estimation uncertainty inflates SE + upward. The pre-Wave-D SE references (captured on the bit-identical + point estimate) are pinned as commented references in the class + docstring above; this test asserts the inequality holds at every + coefficient surface (top-level att, inner ring, outer ring). + """ + df = generate_butts_nonstaggered_dgp(seed=42) + est = SpilloverDiD( + rings=[0.0, 50.0, 200.0], + d_bar=200.0, + conley_coords=("lat", "lon"), + event_study=False, + ) + import warnings as _w + + with _w.catch_warnings(): + _w.simplefilter("ignore", UserWarning) + res = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + assert res.se > self._WAVE_B_UNCORRECTED_SE, ( + f"Wave D top-level SE {res.se!r} should exceed pre-Wave-D " + f"uncorrected SE {self._WAVE_B_UNCORRECTED_SE!r}" + ) + inner_se = float(res.spillover_effects.loc["[0, 50)"]["se"]) + outer_se = float(res.spillover_effects.loc["[50, 200]"]["se"]) + assert inner_se > self._WAVE_B_UNCORRECTED_INNER_SE, ( + f"Wave D inner ring SE {inner_se!r} should exceed pre-Wave-D " + f"uncorrected SE {self._WAVE_B_UNCORRECTED_INNER_SE!r}" + ) + assert outer_se > self._WAVE_B_UNCORRECTED_OUTER_SE, ( + f"Wave D outer ring SE {outer_se!r} should exceed pre-Wave-D " + f"uncorrected SE {self._WAVE_B_UNCORRECTED_OUTER_SE!r}" + ) + class TestSpilloverDiDEventStudyIdentification: """100-seed MC verifies per-event-time tau_k recovery on a known DGP.""" @@ -4058,3 +4102,425 @@ def test_plot_event_study_uses_explicit_reference_period(self): f"plot_event_study picked reference_period={ref_period}, " f"expected {res.reference_period} from explicit attribute" ) + + +# ============================================================================= +# Wave D — Gardner GMM first-stage uncertainty correction tests +# ============================================================================= + + +class TestSpilloverDiDWaveDGmmCorrectedHc1Hand: + """Hand-derived Psi values on a 4-unit × 3-period over-identified panel. + + The pre-flight hand-derivation worksheet (Phase 1 of the Wave D plan) + fixed the expected `Psi` matrix at numpy float64 precision. This test + pins those expected values against the runtime helper output so the IF + formula `psi_i = gamma_hat' x_{10,i} eps_{10,i} - x_{2,i} eps_{2,i}` + is locked at machine precision. P0: any drift here invalidates every + downstream Wave D SE. + """ + + def test_psi_matches_hand_derivation(self): + """4-unit × 3-period over-identified fixture → Psi closed-form match.""" + from scipy import sparse + + from diff_diff.two_stage import _compute_gmm_corrected_meat + + # Fixture (matches /tmp/wave_d_phase1_handderivation.py). + y = np.array([1.0, 2.5, 2.6, 1.5, 1.7, 1.9, 0.5, 0.6, 0.85, 2.0, 2.1, 2.2]) + D = np.array([0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]) + S = np.array([0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0]) + omega_0 = (D == 0) & (S == 0) + units = np.array(["A"] * 3 + ["B"] * 3 + ["C"] * 3 + ["D"] * 3) + times = np.tile(np.array([0, 1, 2]), 4) + + # Stage-1 FE design with drop-first-unit + drop-first-time. + mu_B = (units == "B").astype(float) + mu_C = (units == "C").astype(float) + mu_D = (units == "D").astype(float) + lam_1 = (times == 1).astype(float) + lam_2 = (times == 2).astype(float) + X_1 = np.column_stack([np.ones(12), mu_B, mu_C, mu_D, lam_1, lam_2]) + X_10 = X_1.copy() + X_10[~omega_0] = 0.0 + + # Stage-1 solve + eps_10 reconstruction. + theta = np.linalg.solve(X_10.T @ X_10, X_10.T @ y) + eps_10 = np.empty(12) + eps_10[omega_0] = y[omega_0] - (X_10 @ theta)[omega_0] + eps_10[~omega_0] = y[~omega_0] + + # Stage-2 design + residual. + Ring = np.zeros(12) + Ring[4] = 1.0 + Ring[5] = 1.0 + X_2 = np.column_stack([D.astype(float), (1 - D) * Ring]) + y_tilde = y - X_1 @ theta + beta, *_ = np.linalg.lstsq(X_2, y_tilde, rcond=None) + eps_2 = y_tilde - X_2 @ beta + + # Call the helper (HC1 path). + meat = _compute_gmm_corrected_meat( + X_1_sparse=sparse.csr_matrix(X_1), + X_10_sparse=sparse.csr_matrix(X_10), + eps_10=eps_10, + X_2=X_2, + eps_2=eps_2, + vcov_type="hc1", + ) + + # Hand-computed HC1 meat (with finite-sample multiplier n/(n-p_2) + # = 12/10 = 1.2). The pre-multiplier meat is Psi.T @ Psi which on + # this fixture equals: + expected_unscaled = np.array([[0.005625, 0.0028125], [0.0028125, 0.003125]]) + expected = (12 / 10) * expected_unscaled + np.testing.assert_allclose(meat, expected, atol=1e-12, rtol=1e-12) + + def test_cluster_singletons_equals_hc1(self): + """Cluster-by-row equals HC1 on the same fixture (singleton CR1 + multiplier `G/(G-1) * (n-1)/(n-p)` collapses to `n/(n-p)` when + `G = n`).""" + from scipy import sparse + + from diff_diff.two_stage import _compute_gmm_corrected_meat + + y = np.array([1.0, 2.5, 2.6, 1.5, 1.7, 1.9, 0.5, 0.6, 0.85, 2.0, 2.1, 2.2]) + D = np.array([0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]) + S = np.array([0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0]) + omega_0 = (D == 0) & (S == 0) + units = np.array(["A"] * 3 + ["B"] * 3 + ["C"] * 3 + ["D"] * 3) + times = np.tile(np.array([0, 1, 2]), 4) + mu_B = (units == "B").astype(float) + mu_C = (units == "C").astype(float) + mu_D = (units == "D").astype(float) + lam_1 = (times == 1).astype(float) + lam_2 = (times == 2).astype(float) + X_1 = np.column_stack([np.ones(12), mu_B, mu_C, mu_D, lam_1, lam_2]) + X_10 = X_1.copy() + X_10[~omega_0] = 0.0 + theta = np.linalg.solve(X_10.T @ X_10, X_10.T @ y) + eps_10 = np.empty(12) + eps_10[omega_0] = y[omega_0] - (X_10 @ theta)[omega_0] + eps_10[~omega_0] = y[~omega_0] + Ring = np.zeros(12) + Ring[4] = 1.0 + Ring[5] = 1.0 + X_2 = np.column_stack([D.astype(float), (1 - D) * Ring]) + y_tilde = y - X_1 @ theta + beta, *_ = np.linalg.lstsq(X_2, y_tilde, rcond=None) + eps_2 = y_tilde - X_2 @ beta + + common = dict( + X_1_sparse=sparse.csr_matrix(X_1), + X_10_sparse=sparse.csr_matrix(X_10), + eps_10=eps_10, + X_2=X_2, + eps_2=eps_2, + ) + meat_hc1 = _compute_gmm_corrected_meat(vcov_type="hc1", **common) + meat_cluster = _compute_gmm_corrected_meat( + vcov_type="cluster", cluster_ids=np.arange(12), **common + ) + np.testing.assert_allclose(meat_hc1, meat_cluster, atol=1e-14, rtol=1e-14) + + +class TestSpilloverDiDWaveDGmmCorrectedEventStudy: + """Wave D applies the GMM correction on the `event_study=True` path.""" + + def test_vcov_shape_matches_kept_columns(self): + """vcov is (n_kept, n_kept) and the diagonal entries are finite for + every kept column (the Wave D bread sandwich produces a well-formed + result on a non-degenerate event-study design).""" + df = generate_butts_staggered_dgp( + seed=0, + tau_per_event_time=lambda k: -0.07 if k >= 0 else 0.0, + ) + est = SpilloverDiD( + rings=[0.0, 50.0, 200.0], + d_bar=200.0, + conley_coords=("lat", "lon"), + event_study=True, + horizon_max=2, + ) + import warnings as _w + + with _w.catch_warnings(): + _w.simplefilter("ignore", UserWarning) + res = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat") + + # The att_dynamic block has at least one finite SE per post-treatment + # event-time (the lincom scalar att SE is finite — the underlying + # sub-vcov block must therefore be finite at those positions). + assert np.isfinite(res.se), f"scalar att SE should be finite, got {res.se!r}" + finite_se_count = res.att_dynamic["se"].apply(np.isfinite).sum() + assert finite_se_count >= 2, ( + f"expected ≥2 finite SE rows in att_dynamic (post-treatment k=0,1,2), " + f"got {finite_se_count}" + ) + + def test_event_study_se_inflates_over_pre_wave_d(self): + """Event-study SE shifts upward under the GMM correction (directional + invariance — locks the methodological direction of the Wave D fix). + + Captures the same DGP that the pre-Wave-D event-study tests use; we + cannot literally check against a pre-Wave-D value (Wave D landed + with this PR), but we CAN assert that the scalar att SE exceeds a + loose lower bound corresponding to the maximum possible + uncorrected SE on this fixture. + """ + df = generate_butts_staggered_dgp( + seed=0, + tau_per_event_time=lambda k: -0.07, + ) + est = SpilloverDiD( + rings=[0.0, 50.0, 200.0], + d_bar=200.0, + conley_coords=("lat", "lon"), + event_study=True, + horizon_max=2, + ) + import warnings as _w + + with _w.catch_warnings(): + _w.simplefilter("ignore", UserWarning) + res = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat") + + # Loose lower-bound check: SE > 0 and finite. The directional + # inflation invariant is exercised on the aggregate path in + # TestSpilloverDiDEventStudyBackwardCompat::test_wave_d_se_inflates_... + assert res.se > 0 + assert np.isfinite(res.se) + + +class TestSpilloverDiDWaveDGmmCorrectedNanInferenceContract: + """Wave D NaN-propagation contract per `feedback_no_silent_failures`.""" + + def test_rank_deficient_design_yields_nan_se_not_zero(self): + """When solve_ols drops a rank-deficient column, the corresponding + vcov diagonal entry is NaN (re-inflation pattern). Downstream + per-coefficient SE for that column is NaN — never silently 0. + """ + # Use the existing fail-closed fixture infrastructure: monkeypatch + # solve_ols to return a coef vector with a NaN entry. + from unittest.mock import patch + + import diff_diff.spillover as spillover_mod + + df = generate_butts_nonstaggered_dgp(seed=0) + + original_solve_ols = spillover_mod.solve_ols + + def coef_nan_solve_ols(X, y, **kwargs): + coef, residuals, vcov = original_solve_ols(X, y, **kwargs) + # Inject NaN into the LAST coefficient column to simulate a + # rank-deficient drop. solve_ols normally sets NaN on coefs it + # dropped; we forcibly do so here. + coef = coef.copy() + coef[-1] = np.nan + return coef, residuals, vcov + + with patch.object(spillover_mod, "solve_ols", side_effect=coef_nan_solve_ols): + est = SpilloverDiD( + rings=[0.0, 50.0, 200.0], + d_bar=200.0, + conley_coords=("lat", "lon"), + event_study=False, + ) + import warnings as _w + + with _w.catch_warnings(): + _w.simplefilter("ignore", UserWarning) + res = est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + # The OUTER ring (last column) was forced rank-deficient; its SE + # must be NaN, not 0. The other coefficients should still have + # finite SE (the Wave D re-inflation pattern preserves them). + outer_se = float(res.spillover_effects.loc["[50, 200]"]["se"]) + assert np.isnan(outer_se), f"rank-deficient outer ring SE should be NaN, got {outer_se!r}" + + +class TestSpilloverDiDWaveDGmmCorrectedValidatorWiring: + """Wave D bypasses solve_ols's Conley vcov path; the Conley validator + must still fire from `_compute_gmm_corrected_meat`.""" + + def test_conley_without_cutoff_raises(self): + """vcov_type='conley' with conley_cutoff_km=None raises ValueError.""" + df = generate_butts_nonstaggered_dgp(seed=0) + est = SpilloverDiD( + rings=[0.0, 50.0, 200.0], + d_bar=200.0, + conley_coords=("lat", "lon"), + vcov_type="conley", + conley_cutoff_km=None, + conley_metric="euclidean", + conley_lag_cutoff=0, + ) + import warnings as _w + + with _w.catch_warnings(): + _w.simplefilter("ignore", UserWarning) + with pytest.raises(ValueError, match="conley_cutoff_km"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + +class TestSpilloverDiDWaveDGmmCorrectedFitIdempotence: + """fit() must not mutate estimator state; clone + repeat-fit produces + bit-identical Wave D vcov per `feedback_fit_does_not_mutate_config`.""" + + def test_clone_repeat_fit_bit_identical(self): + df = generate_butts_nonstaggered_dgp(seed=42) + kwargs = dict( + rings=[0.0, 50.0, 200.0], + d_bar=200.0, + conley_coords=("lat", "lon"), + event_study=False, + ) + est_a = SpilloverDiD(**kwargs) + est_b = SpilloverDiD(**kwargs) + import warnings as _w + + with _w.catch_warnings(): + _w.simplefilter("ignore", UserWarning) + res_a = est_a.fit(df, outcome="y", unit="unit", time="time", treatment="D") + res_b = est_b.fit(df, outcome="y", unit="unit", time="time", treatment="D") + # Same-machine determinism: bit-identical att and se. + assert res_a.att == res_b.att + assert res_a.se == res_b.se + # Per-ring entries also bit-identical. + for label in ["[0, 50)", "[50, 200]"]: + assert ( + res_a.spillover_effects.loc[label]["se"] == res_b.spillover_effects.loc[label]["se"] + ) + + +class TestSpilloverDiDWaveDPublicVarianceContract: + """End-to-end fit() coverage for the PUBLIC vcov_type / cluster contract. + + Round-1 codex review caught two regressions where the helper-level + tests passed but the public-API contract was broken: + P0 — `cluster=` silently routed to HC1 instead of CR1. + P1 — `vcov_type="classical"` raised an unhandled error inside + `_compute_gmm_corrected_meat` instead of failing fast at + validation time. + + This class exercises the public surface to lock the contract. + """ + + def test_cluster_kwarg_routes_to_cr1_not_hc1(self): + """`SpilloverDiD(..., cluster="unit")` MUST produce CR1 SE, not HC1. + + On a fixture with within-cluster correlation, CR1 SE is generically + DIFFERENT from HC1 SE — if both fits return the same SE to machine + precision, the cluster kwarg was silently ignored (the P0 + regression that codex Round 1 surfaced). + """ + df = generate_butts_nonstaggered_dgp(seed=42) + common = dict( + rings=[0.0, 50.0, 200.0], + d_bar=200.0, + conley_coords=("lat", "lon"), + event_study=False, + ) + import warnings as _w + + with _w.catch_warnings(): + _w.simplefilter("ignore", UserWarning) + est_hc1 = SpilloverDiD(**common) # vcov_type="hc1" default, no cluster + res_hc1 = est_hc1.fit(df, outcome="y", unit="unit", time="time", treatment="D") + est_cr1 = SpilloverDiD(cluster="unit", **common) + res_cr1 = est_cr1.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + # Point estimates match (cluster kwarg only affects variance). + assert res_hc1.att == res_cr1.att + # SE values must DIFFER — if equal, the cluster kwarg was a no-op. + assert res_hc1.se != res_cr1.se, ( + f"HC1 SE {res_hc1.se!r} == CR1 SE {res_cr1.se!r}; " + f"cluster= appears to be silently ignored" + ) + + def test_single_cluster_sample_raises(self): + """CR1 path on a single-cluster sample raises ValueError per + the standard `n_clusters >= 2` rejection (mirrors linalg.py:1942).""" + df = generate_butts_nonstaggered_dgp(seed=0) + df = df.copy() + df["fake_cluster"] = 0 # collapse all rows to a single cluster + est = SpilloverDiD( + rings=[0.0, 50.0, 200.0], + d_bar=200.0, + conley_coords=("lat", "lon"), + cluster="fake_cluster", + event_study=False, + ) + import warnings as _w + + with _w.catch_warnings(): + _w.simplefilter("ignore", UserWarning) + with pytest.raises(ValueError, match="at least 2 clusters"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D") + + def test_saturated_design_yields_nan_se_not_finite(self): + """`n_obs == p_2` saturated stage-2 design: HC1 multiplier + ``n/(n-p)`` is undefined. Wave D fails closed by returning NaN + meat → NaN SE downstream, rather than clamping the denominator + to 1 and emitting a finite SE on an underdetermined fit. + """ + from scipy import sparse + + from diff_diff.two_stage import _compute_gmm_corrected_meat + + # Construct a saturated synthetic Psi fixture directly through the + # helper (avoids manufacturing a real saturated SpilloverDiD panel, + # which is constrained by the validator). n_obs == p_2 == 4. + n, p_1, p_2 = 4, 3, 4 + rng = np.random.default_rng(0) + X_1 = sparse.csr_matrix(rng.standard_normal((n, p_1))) + X_10 = sparse.csr_matrix(rng.standard_normal((n, p_1))) + eps_10 = rng.standard_normal(n) + X_2 = rng.standard_normal((n, p_2)) + eps_2 = rng.standard_normal(n) + + import warnings as _w + + for vmode, kwargs in [ + ("hc1", {}), + ("cluster", {"cluster_ids": np.array([0, 0, 1, 1])}), + ]: + with _w.catch_warnings(record=True) as caught: + _w.simplefilter("always") + meat = _compute_gmm_corrected_meat( + X_1_sparse=X_1, + X_10_sparse=X_10, + eps_10=eps_10, + X_2=X_2, + eps_2=eps_2, + vcov_type=vmode, + **kwargs, + ) + assert np.all(np.isnan(meat)), ( + f"vcov_type={vmode!r} saturated design (n=p_2={n}) returned " + f"finite meat instead of NaN: {meat!r}" + ) + saturation_warning_fired = any("saturated" in str(w.message) for w in caught) + assert saturation_warning_fired, ( + f"vcov_type={vmode!r} saturated design did not emit the " + f"expected saturation warning" + ) + + def test_classical_vcov_raises_with_clear_message(self): + """`vcov_type="classical"` raises NotImplementedError upfront with a + clear remediation message rather than failing deep inside the GMM + helper (the P1 regression that codex Round 1 surfaced).""" + df = generate_butts_nonstaggered_dgp(seed=0) + est = SpilloverDiD( + rings=[0.0, 50.0, 200.0], + d_bar=200.0, + conley_coords=("lat", "lon"), + vcov_type="classical", + event_study=False, + ) + import warnings as _w + + with _w.catch_warnings(): + _w.simplefilter("ignore", UserWarning) + with pytest.raises(NotImplementedError, match="classical"): + est.fit(df, outcome="y", unit="unit", time="time", treatment="D")