diff --git a/CHANGELOG.md b/CHANGELOG.md
index 6bac2cce..efa987f5 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Added
- **ContinuousDiD methodology-review-tracker promotion.** Tracker row flipped **In Progress** → **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR #473). REGISTRY `## ContinuousDiD` gains a formal Deviations block consolidating the boundary-knots deviation from R `contdid` v0.1.0 (`range(dose)` vs `range(dvals)` — library avoids extrapolation), the `bspline_derivative` derivative-failure `UserWarning` (Phase 2 axis-C #12), the `+inf` → `0` never-treated recoding warning, and the zero-`first_treat`+nonzero-`dose` force-zeroing warning (both axis-E silent-coercion fixes) into a single AI-review-recognized labeled surface. R cross-language coverage for ContinuousDiD runs at relative tolerance across two surfaces: (a) **scalar parity with raw R `cont_did` / `pte_default`** at 1% on overall ATT for all 6 benchmarks and on overall ACRT for benchmarks 4-5 (benchmark 6 is event-study, scalar `overall_att` only); (b) **harmonized boundary-knot-normalized curve parity** with R-side ATT(d) / ACRT(d) reconstructed under `Boundary.knots = range(treated_doses)` (matching the library) on benchmarks 1-3 via the benchmark harness — `_run_r_contdid` does the R-side rebuild at `tests/test_methodology_continuous_did.py:333-367`, and `_compare_with_r` orchestrates the Python-vs-R comparison at `:395-459` — max ATT(d) at 1% and max ACRT(d) at 2%. NOT bit-exact (`atol=1e-8`) like HAD — the boundary-knots deviation precludes algorithmic bit-equality on aggregated dose-response curves. Surface (a) is direct raw-package parity; surface (b) is reconstructed-basis parity because raw `contdid` curves use `range(dvals)`. No source code changes, no new tests, no new docstrings — consolidation only against the existing 15 methodology tests (`tests/test_methodology_continuous_did.py`), 80 unit tests (`tests/test_continuous_did.py`), and `docs/methodology/continuous-did.md` theory note. `METHODOLOGY_REVIEW.md` ContinuousDiD row promoted **In Progress** → **Complete**.
- **`SpilloverDiD(vcov_type="conley", survey_design=...)` integration via stratified-Conley sandwich on PSU totals (Wave E.2).** Lifts the Wave E.1 `NotImplementedError` (`spillover.py:2201` upfront, `two_stage.py:217` helper-level) and adds spatial-HAC + design-based variance for the previously deferred composition. **Documented synthesis** of Conley (1999) spatial-HAC × Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) × Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all three ingredients on a two-stage influence function. **Mechanical composition (panel-aware):** preserves the library's existing `conley_lag_cutoff = 0` semantic at `diff_diff.conley._compute_conley_meat` ("within-period spatial only — exclude cross-period spatial pairs") by looping over periods. For each period `t`, SpilloverDiD's per-obs Hájek-weighted Wave D IF `psi_i` is aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i` (via `np.add.at`); per-PSU spatial centroids are panel-constant (mean of per-observation `conley_coords` within each PSU, vectorized `np.add.at` sums / `np.bincount` counts); for each stratum the within-stratum sandwich is `M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} K(d(centroid_j, centroid_k) / conley_cutoff_km) * (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'`, where K is the Bartlett kernel (SpilloverDiD currently exposes Bartlett only and hardcodes it; the survey helper accepts `"uniform"` too but exposing that on the SpilloverDiD constructor is a separate follow-up) and `d` is haversine / euclidean / callable per `ConleyMetric`. Cross-stratum kernel weights are exactly zero by sampling design (strata are independence partitions). Total meat is `sum_t sum_h M_h_t`. Cross-period spatial pairs are excluded by construction — the per-period loop matches the library's panel Conley contract exactly. **Reduction semantics (load-bearing for tests):** the orchestrator's panel-aware meat equals `sum_t` of per-period within-stratum stratified-Conley sandwiches on per-period PSU totals (pinned at `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`); single stratum (H = 1, FPC = inf) reduces to `sum_t` plain Conley sandwich on per-period PSU totals (NOT on time-collapsed totals). **Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing `_compute_stratified_meat_from_psu_scores` 3-tuple `(meat, variance_computed, legitimate_zero_count)` contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(scores=centered, coords=psu_coords_h, ...)` in cross-sectional mode); new dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to existing `_compute_binder_tsl_meat`, performs per-obs Psi → PSU aggregation + centroid derivation + dispatch to survey helper, intentionally drops `cluster_ids` at the dispatch boundary — see Restrictions). `_compute_gmm_corrected_meat` conley branch extended with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D. **Singleton-stratum `lonely_psu="adjust"` parity:** the survey helper mirrors the Binder helper's `continue` to skip the FPC scale on singleton strata (with `n_h = 1` the scale `n_h / (n_h - 1)` would divide by zero); the degenerate one-PSU kernel `K = [[K(0)]] = [[1.0]]` reduces to `centered.T @ centered`, matching Binder's singleton-adjust output. **Saturated `df_survey = 0` NaN-fail:** mirrors Wave E.1 (`_compute_stratified_conley_meat` returns NaN meat with `UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..." so callers can `pytest.warns(UserWarning, match="Wave E.2 stratified-Conley")`). **Public surface restrictions:** replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate; per-replicate full refit is separate follow-up scope); `cluster=
+ survey_design.psu + vcov_type="conley"` coerces `cluster=` to PSU per Wave E.1's warn-and-use-PSU pattern (the Conley cluster product kernel becomes a no-op after PSU aggregation, so `cluster_ids` is intentionally not threaded into the inner Conley kernel call — every PSU is its own cluster post-aggregation, which would zero all cross-PSU pairs); LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley "Phase 5" roadmap, not Wave E); DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", survey_design=)` requires `_APPLICABILITY` / `_PT_METHOD` registration (separate Wave F PR). **Tests:** new `TestSpilloverDiDWaveE2ConleySurveyDesign` and `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` classes in `tests/test_spillover.py` (bit-identical no-survey fallback; panel-aware per-period sum invariant on the orchestrator + helper composition; hand-computation methodology anchor; single-stratum ≡ plain Conley on PSU totals; cross-stratum independence as a unit test on the survey helper with interleaved cross-stratum centroids; Binder vs Conley singleton-adjust FPC skip parity; lonely-PSU sensitivity across three modes; FPC large ≡ no-FPC and FPC = n_h zeros stratum; saturated NaN-fail with `pytest.warns(match="Wave E.2 stratified-Conley")`; replicate-weight + non-pweight rejections; cluster warn-and-use-PSU; fit idempotency; `finite_mask` survey-array subsetting; no-PSU coverage — weights-only `SurveyDesign(weights=...)`, strata-only `SurveyDesign(weights=..., strata=...)`, and a per-period re-index unit invariant pinning that no cross-period spatial pairs leak into the meat on implicit-PSU layouts; event-study path on both `is_staggered=True`/`False` branches per `feedback_cohort_loop_trigger_cache_both_branches`; drift goldens at `rtol=1e-12 / atol=1e-14`). The pre-existing `tests/test_spillover.py::test_fit_conley_plus_survey_design_not_implemented` Wave E.1-era gate-assertion test is removed (replaced by the positive-path tests above). Wave E.1 entry's "Public surface restrictions" bullet updated to past-tense the conley+survey gate reference.
+- **`SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` panel-block composition via spatial + serial Bartlett HAC (Wave E.2 follow-up).** Lifts the Wave E.2 upfront `NotImplementedError` at `spillover.py:2210` and extends the panel-aware stratified-Conley sandwich (cross-sectional `lag=0` shipped in Wave E.2) with a within-PSU serial Bartlett HAC over time (Newey-West 1987 form). **Documented synthesis** of Wave E.2's panel-aware Conley × Binder TSL × Wave D Gardner GMM composition with Newey-West (1987) serial Bartlett HAC, matching the no-survey panel-block decomposition at `diff_diff.conley._compute_conley_meat` (Conley 1999 + Newey-West 1987 separable form, NOT Driscoll-Kraay 2D-HAC). The composition is `meat = meat_spatial + meat_serial` with disjoint index sets: spatial is the shipped Wave E.2 per-period within-stratum sandwich on PSU totals; serial is the new within-PSU sum `meat_serial_h = FPC_h_panel * sum_{g in stratum h} sum_{|t-s| <= L, t != s, both periods present for PSU g} (1 - |t-s|/(L+1)) * S_centered_t[g] @ S_centered_s[g]'` where `S_centered_t[g] = S_psu_t[g] - S_bar_h(g)_t` is per-period within-stratum centered (Binder TSL form — matches the spatial helper's centering exactly), and `|t-s|` uses panel-wide dense time codes (mirrors `conley.py:940` R deviation that matches R `conleyreg::time_dist`). Serial Bartlett kernel is hardcoded regardless of `conley_kernel` (the user-selected kernel governs the spatial term only). **FPC convention (panel-wide per-stratum):** standalone Newey-West composition on stratified clusters — the serial sum is a PANEL-level construct, so the cluster set is the panel-wide PSU set in stratum h; FPC denominator uses `n_h_panel = |unique PSUs in stratum h across active sample|`, NOT per-period `n_h_t`. The spatial term keeps its existing per-period FPC unchanged. For balanced panels the two FPC denominators converge; the difference surfaces under unbalanced panels. Citation chain: Binder (1983) for the FPC factor form, Gerber (2026) Prop 1 for the Binder TSL composition with two-stage IF, Newey-West (1987) for the serial Bartlett kernel weights, Conley (1999) for the spatial kernel and panel-block decomposition (deliberately NOT by analogy to the Binder helper's cross-sectional per-stratum FPC convention). **Centering asymmetry vs no-survey reference:** the no-survey panel-block path at `conley.py:949-965` uses RAW scores for the serial term because it assumes `E[scores] = 0` under correct specification; the survey-weighted Binder TSL form centers explicitly (textbook stratified-cluster sandwich). Using raw scores in the survey case would inflate variance by twice the squared per-period stratum mean and would NOT reduce to the cross-sectional Wave E.2 form at lag=0. **Reduction semantics (load-bearing for tests):** `conley_lag_cutoff = 0` or `None` produces bit-identical ATT and scalar SE to shipped Wave E.2 (orchestrator skips the serial helper invocation; the spatial loop + saturation guard + new PSD/finite guard still run on the spatial-only meat). `assert_array_equal` regression pin at `test_a` covers user-visible ATT + scalar SE; `test_a2` mock-spy independently asserts the serial helper is NOT invoked at `lag=0`. The meat matrix itself is not exposed on `SpilloverDiDResults`, so full meat-matrix equality is implied (not asserted); `conley_time is None` or `T = 1` short-circuits the serial helper to zero meat (degenerate panel-block, not a saturation diagnostic); single-stratum H=1 with FPC=inf reduces to Newey-West Bartlett HAC on per-period within-stratum-CENTERED per-PSU score sequences (NOT raw scores — Binder TSL centering is retained at H=1; the panel-wide `G/(G-1)` survey factor replaces FPC); bandwidth → 0 with L > 0 reduces the spatial term to per-period within-stratum HC sandwich while leaving the serial term unchanged (separable form). **Singleton-stratum `lonely_psu="adjust"` panel-wide mean asymmetry:** for the serial helper, `_global_psu_mean` is the panel-wide mean of per-period PSU totals (averaged over all `(g, t)` with `present[g, t]`), NOT the per-period within-stratum mean used by the spatial helper. The `continue`-skip-FPC pattern matches the spatial helper at `survey.py:2007-2017` to avoid divide-by-zero on `n_h_panel = 1`. **Restrictions inherited:** replicate-weight variance still raises `NotImplementedError`; DiagnosticReport routing for the panel-block case is queued for the same Wave F follow-up as the cross-sectional case. **Implementation:** new sibling helper `_compute_stratified_serial_bartlett_meat` in `diff_diff/two_stage.py` (parallel to the Wave E.2 spatial orchestrator; ~200 LoC). Orchestrator `_compute_stratified_conley_meat` signature extended with `conley_lag_cutoff: Optional[int] = None`; spatial loop unchanged; serial helper called after spatial loop when `conley_lag_cutoff > 0`; saturation NaN-fail accounting merges both terms' `(variance_computed, legitimate_zero)` flags into the same template (`UserWarning` "Wave E.2 stratified-Conley sandwich: df_survey = 0..." covers both spatial-only and panel-block cases). Dispatch in `_compute_gmm_corrected_meat` conley branch threads `conley_lag_cutoff` through; the `cluster_ids` non-threading rationale (post-PSU-aggregation every PSU is its own cluster) still applies to the new serial branch. Spillover-side gate at `spillover.py:2210-2230` (Wave E.2-era `NotImplementedError` for `lag > 0 + survey`) deleted; gate rationale comment replaced with shipped-behaviour note. Stage-1 weighted FE solver, `finite_mask` survey-array subsetting, `df_survey` threading, bread weighting, and `SpilloverDiDResults` survey metadata are all inherited UNCHANGED — Psi construction is bit-identical regardless of vcov_type or lag. **Tests:** new `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff` and `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoffEventStudy` classes in `tests/test_spillover.py`; existing `test_j0_panel_conley_lag_cutoff_rejected_under_survey` (Wave E.2-era gate-assertion) DELETED.
- **HeterogeneousAdoptionDiD methodology-review-tracker promotion.** New `tests/test_methodology_had.py` (6 classes, 36 tests) with paper-equation-numbered Verified Components walk-through against de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) arXiv:2405.04465v6 (Equations 3 / 7 / 11 / 18 / 29 and Theorems 1 / 3 / 4 / 7): Design 1' MC recovery on both the zero-boundary DGP AND a nonzero-boundary-intercept DGP (`ΔY = c + β·D + ε` with `c != 0`) so the `att = (mean(ΔY) − τ_bc) / mean(D)` subtraction term is verified explicitly, N(0,1) coverage at `n_replicates=200`, mass-point Wald-IV closed-form equivalence at `atol=1e-9`, QUG limit-law distributional match at KS-stat ≤ 0.05 (n_draws=5000), Yatchew-HR paper-literal `σ²_diff = 1/(2G)` normalization lock, joint Stute pre-trends + homogeneity H0 fail-to-reject on both surfaces and H1 reject for joint homogeneity under a nonlinear DGP, and library-deviation locks (equal-weighting via selective low-dose-region replication, sup-t bootstrap gating, staggered-timing fail-closed `ValueError`). Added "Non-testable assumptions (paper Section 3.1.2)" Notes block to `HeterogeneousAdoptionDiD` class docstring + "Scope (what this test does NOT cover)" clauses to `qug_test` / `stute_test` / `yatchew_hr_test` / `did_had_pretest_workflow` Notes sections explicitly stating that the pre-tests verify ADJACENT assumptions (Assumption 4 / 7 / 8) and CANNOT test Assumptions 5 or 6. Phase-4 validation-harness items (Pierce-Schott 2016 Figure 2 replication, Table 1 coverage-rate reproduction across 3 DGPs × G ∈ {100, 500, 2500}) waived with documented rationale: R parity at `atol=1e-8` in `tests/test_did_had_parity.py` (3 DGPs × 5 method combos, bit-exact via `rtol=0`) is a strictly stronger anchor than coverage-rate Monte Carlo, and the paper itself self-acknowledges (Section 5.2) that NP estimators are too noisy to be informative on the LBD-restricted PNTR panel. REGISTRY HAD section gains a consolidated Deviations block (5 entries with framing header) and closes 2 of 3 unchecked Implementation Checklist items — the staggered-timing fail-closed `ValueError` and the Assumption 5/6 non-testability documentation; the `covariates=` Theorem 6 follow-up and the extensive-margin / "consider running standard DiD" warning both remain explicitly tracked in `TODO.md` as Low-priority follow-ups rather than claimed-closed. `dechaisemartin-2026-review.md:182-194` requirements checklist boxes the Phase 1a/1b/1c implementation-status closures + the Assumption 5/6 documentation + the staggered-timing closures; the extensive-margin item is acknowledged as partial (zero-dose `UserWarning` exists in `qug_test`; main-`fit()` "consider standard DiD" recommendation is the TODO follow-up). `METHODOLOGY_REVIEW.md` HAD row promoted **In Progress** → **Complete**.
- **SunAbraham `vcov_type` parameter (Phase 1b PR 1/8).** `SunAbraham(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` (defaults to `"hc1"`, which preserves prior behavior bit-equally - SA historically hard-coded HC1). Auto-cluster-at-unit dropped when the user opts into explicit `vcov_type="hc2"` or `vcov_type="classical"` (one-way only); preserved for `"hc1"` and `"hc2_bm"`. When `vcov_type in {"classical","hc2","hc2_bm"}`, `_fit_saturated_regression` auto-routes to a full-dummy saturated design (mirrors TWFE Gate 1 from PR #469): FWL preserves cohort coefficients but not the hat matrix, so HC2 leverage and Bell-McCaffrey Satterthwaite DOF must be computed on the full FE projection. Empirically matches R `lm()` summary classical SE, `sandwich::vcovHC(type="HC2")`, and `clubSandwich::vcovCR(..., type="CR2")` + `coef_test()$df_Satt` at atol=1e-10 (cohort SE and BM DOF pinned in `tests/test_methodology_sun_abraham.py`). For `vcov_type="hc2_bm"`, the user-facing aggregated inference (`event_study_effects[e]['p_value']`/`['conf_int']`, `overall_p_value`/`overall_conf_int`) uses CR2 Bell-McCaffrey contrast DOF — matches `clubSandwich::Wald_test(test="HTZ")$df_denom` at atol=1e-10 (mirrors PR #465's `_compute_cr2_bm_contrast_dof` pattern for MultiPeriodDiD's post-period-average ATT). `vcov_type` is now propagated to `SunAbrahamResults.vcov_type` for downstream introspection. `SurveyDesign` (any kind — analytical weights, stratified, PSU, or replicate-weight) combined with `vcov_type in {"classical","hc2","hc2_bm"}` raises `NotImplementedError`: the survey-design TSL (or replicate-weight refit) variance overrides the analytical sandwich family, and the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Use `vcov_type="hc1"` (default) for survey designs. `conley` rejected at `__init__` with a deferral message (would require threading 6+ `conley_*` params through the saturated regression call). **Deviation from R:** SA's within-transform HC1 SE differs from `fixest::sunab()` by ~1-2% (~2e-3 absolute) on typical panel sizes due to a different `(n-k)` finite-sample correction (fixest counts absorbed FE in k_total; SA's `solve_ols` counts only within-transformed columns); the IW aggregation step is otherwise identical (pinned at atol=5e-3, tracked in TODO.md). First PR of the Phase 1b standalone-estimator threading initiative (7 PRs to follow: StackedDiD, WooldridgeDiD-OLS, CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD).
- **PreTrendsPower R `pretrends` parity goldens (PR-C closes PR-B's deferred R-parity row).** JSON goldens at `benchmarks/data/r_pretrends_golden.json` generated from the committed `benchmarks/R/generate_pretrends_golden.R` script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2). 4 fixtures cover regular K=3 grid (`uniform_3_pre_periods_no_anticipation`), irregular K=3 grid `[-5,-3,-1]` (`irregular_pre_periods` — locks the PR-B Step 4 γ-unit linear-weight fix), anticipation-shifted K=4 grid (`anticipation_shifted`), and K=1 closed form (`single_pre_period_closed_form` — Roth Proposition 2 univariate truncated-normal). `TestPretrendsParityR` in `tests/test_methodology_pretrends.py` now active (4 tests): NIS power vs R `pretrends::pretrends()` at `atol=1e-4` across all 4 fixtures × 4 γ values; γ_p MDV vs R `slope_for_power()` at `atol=1e-4` across all 4 fixtures × 2 target_power values; end-to-end `fit()` on irregular grid vs R γ_p at `atol=1e-4` (locks the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API); K=1 three-way cross-check (Python ≡ analytical truncated-normal closed form `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`; both within `atol=1e-4` of R). Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054` (`dz ≈ 3.6e-5`); R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` (R) vs `scipy.stats.multivariate_normal.cdf` (Python) Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. `METHODOLOGY_REVIEW.md` PreTrendsPower row promoted `**Complete** (R parity pending)` → `**Complete**`. Roth (2022) paper review's `R \`pretrends\` package version pin (provisional)` Gaps bullet struck. Closes the PR-C TODO row.
diff --git a/README.md b/README.md
index a9b7e238..3587076a 100644
--- a/README.md
+++ b/README.md
@@ -106,7 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`.
- [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
- [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance
-- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up)
+- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC — `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=`)
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
diff --git a/TODO.md b/TODO.md
index ec244fec..193711de 100644
--- a/TODO.md
+++ b/TODO.md
@@ -139,7 +139,8 @@ Deferred items from PR reviews that were not addressed before merge.
| `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low |
| `SpilloverDiD(survey_design=...)` replicate-weight variance (BRR / Fay / JK1 / JKn / SDR). Wave E.1 ships Taylor-linearization only. Per Gerber (2026) Appendix A, the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit of stage 1 and stage 2 (200+ LoC of test surface beyond E.1). | `spillover.py::SpilloverDiD.fit`, `survey.py::compute_replicate_refit_variance` | follow-up | Low |
| `SpilloverDiD(survey_design=...)` subpopulation preservation (Wave E.3). Wave E.1's `finite_mask` block physically removes zero-weight rows that lose stage-1 FE support, so `SurveyDesign.subpopulation()`-derived designs see `n_psu` / `df_survey` / Binder centering recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (R `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts and treats out-of-domain rows as zero-score padding. Fix requires separating fit-sample alignment (Psi array) from design-level bookkeeping: preserve a full-design `resolved_survey` for inference metadata + zero-pad dropped zero-weight rows' IF contribution. Add `SurveyDesign.subpopulation()` regression test to lock the contract. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_binder_tsl_meat` | follow-up (Wave E.3) | Medium |
-| `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` serial Bartlett HAC composition. Wave E.2 ships the panel-aware `conley_lag_cutoff = 0` case ("within-period spatial only" — `sum_t sum_h M_h_t` per `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`) and raises `NotImplementedError` upfront at `spillover.py:fit` on `conley_lag_cutoff > 0`. The serial Bartlett component (within-unit / within-PSU temporal HAC at lag ≤ L) needs to compose with the panel-aware stratified-Conley spatial sandwich — the natural addition is `meat_serial = sum_g sum_{|t-s|<=L, t!=s} (1 - |t-s|/(L+1)) * (S_psu_t[g] - S_bar_h_t)(S_psu_s[g] - S_bar_h_s)'` per PSU, summed across all PSUs in each stratum, with appropriate Binder FPC scaling — plus a methodology call on whether to include cross-period spatial pairs in the serial term. Regression goldens vs the cross-sectional limit (lag=0, which is now the shipped path). | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_conley_meat` | follow-up (Wave E.2 follow-up) | Medium |
+| Serial Bartlett kernel logic duplicated between `diff_diff/two_stage.py::_compute_stratified_serial_bartlett_meat` (survey path) and `diff_diff/conley.py::_compute_conley_meat` panel-block branch (no-survey path). Both compute `K[t,s] = (1 - |t-s|/(L+1)) * 1{|t-s| <= L, t != s}` over dense panel-period codes. Factor out a shared `_serial_bartlett_kernel_matrix(t_codes, L)` helper and a shared post-meat finite + PSD-warning guard so the survey and no-survey paths can't drift on diagnostics or kernel weights. Cosmetic; refactor doesn't change behavior. | `two_stage.py::_compute_stratified_serial_bartlett_meat`, `conley.py::_compute_conley_meat` | follow-up | Low |
+| `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` no-effective-PSU serial Bartlett HAC. Wave E.2 follow-up ships the panel-block composition when an effective PSU exists (explicit `survey_design.psu` OR injected via `cluster=` per `_inject_cluster_as_psu`). Weights-only / strata-only survey designs WITHOUT a cluster fallback raise `NotImplementedError` at `SpilloverDiD.fit` post-resolution because under the pseudo-PSU = obs-index fallback each pseudo-PSU appears in exactly one period — the per-PSU serial cross-period loop would silently contribute zero. Fix would either derive a unit-level serial fallback for no-PSU designs (mixes IF allocators with the pseudo-PSU spatial term — needs methodology work) or route the serial loop through `conley_unit` with explicit documentation of the IF-allocator asymmetry. Regression goldens vs the effective-PSU shipped path. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_serial_bartlett_meat` | follow-up (Wave E.2 follow-up tail) | Low |
| `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
| `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
| `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt
index 6ae688fe..a9b26a6c 100644
--- a/diff_diff/guides/llms.txt
+++ b/diff_diff/guides/llms.txt
@@ -58,7 +58,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")`
- [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies
- [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects
- [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance
-- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up), both composed with the Wave D Gardner GMM correction (replicate weights queued as follow-up)
+- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2 cross-sectional `conley_lag_cutoff=0`) extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC (Newey-West 1987 separable form; `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=`), both composed with the Wave D Gardner GMM correction (replicate weights queued as follow-up)
- [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units
- [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility
- [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves
diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py
index a1f0db11..daa287f4 100644
--- a/diff_diff/spillover.py
+++ b/diff_diff/spillover.py
@@ -2197,37 +2197,23 @@ def fit(
# check, cluster-vs-PSU warn) runs AFTER `_validate_spillover_inputs`
# below so it sees the panel columns the validator guarantees.
#
- # Wave E.2 scope-limit (upfront, before resolution / panel work):
- # the panel-block Conley HAC (`conley_lag_cutoff > 0`) is NOT
- # composed with the survey path in this PR. The stratified-Conley
- # helper applies a cross-sectional kernel on PSU-aggregated totals;
- # composing the within-unit serial Bartlett HAC with the within-
- # stratum cross-PSU spatial kernel requires carrying PSU-by-time
- # scores into the meat construction, which is a separate Wave E.x
- # follow-up tracked in TODO.md. Reject upfront with a clear pointer
- # so users running `survey_design=` + `conley_lag_cutoff > 0` get
- # the error before stage-1 / 2 work (per `feedback_no_silent_failures`).
- if (
- survey_design is not None
- and self.vcov_type == "conley"
- and self.conley_lag_cutoff is not None
- and self.conley_lag_cutoff > 0
- ):
- raise NotImplementedError(
- "SpilloverDiD(vcov_type='conley', conley_lag_cutoff > 0) "
- "combined with survey_design= is not supported in Wave E.2. "
- "The Wave E.2 stratified-Conley sandwich aggregates Psi to "
- "PSU totals before applying the cross-sectional Conley "
- "kernel; the panel-block decomposition (within-unit serial "
- "Bartlett HAC over time) would require carrying PSU-by-time "
- "scores and composing the serial kernel with the within-"
- "stratum cross-PSU spatial kernel. This composition is "
- "queued as a follow-up (see TODO.md). For Wave E.2, use "
- "conley_lag_cutoff=0 (cross-sectional Conley) with "
- "survey_design=, or use survey_design= with "
- "vcov_type='hc1' (+ cluster= for CR1) for the full "
- "Wave E.1 path."
- )
+ # Wave E.2 follow-up (shipped): `vcov_type='conley' + conley_lag_cutoff > 0
+ # + survey_design=` is supported via panel-block stratified-Conley
+ # sandwich (spatial Wave E.2 term + within-PSU serial Bartlett HAC)
+ # WHEN there is an effective PSU (explicit `survey_design.psu` OR
+ # injected via `cluster=` per Wave E.1's `_inject_cluster_as_psu`
+ # routing). The orchestrator at
+ # `two_stage.py::_compute_stratified_conley_meat` sums the two terms
+ # with disjoint index sets — matches the no-survey panel-block
+ # decomposition at `conley.py::_compute_conley_meat` (Conley 1999
+ # spatial + Newey-West 1987 serial Bartlett; separable form, NOT
+ # Driscoll-Kraay 2D-HAC). FPC convention: per-period FPC on spatial,
+ # panel-wide stratum-level FPC on serial. The no-effective-PSU
+ # fail-closed gate is downstream at the post-resolution check (see
+ # the `resolved_survey_fit.psu is None` block below the cluster
+ # injection); the gate cannot live up here because at this point
+ # the user-supplied `cluster=` has not yet been injected into
+ # the survey design as the effective PSU.
# Validate `anticipation` up front: must be a non-negative integer.
# Accepting fractional or negative values would silently shift
# treatment timing and ring exposure beyond what the estimator's
@@ -3100,6 +3086,50 @@ def fit(
_conley_unit_arg = None
_conley_lag_arg = None
+ # Wave E.2 follow-up gate (post-resolution, post-injection):
+ # fail-closed for `vcov_type="conley" + conley_lag_cutoff > 0` when
+ # the EFFECTIVE PSU is still absent after `_inject_cluster_as_psu`.
+ # Under no-effective-PSU survey designs (weights-only / strata-only
+ # WITHOUT a cluster fallback) the orchestrator falls back to
+ # pseudo-PSU = obs-index in `_compute_stratified_conley_meat`, but
+ # each pseudo-PSU appears in exactly one period, so the per-PSU
+ # serial cross-period loop never contributes anything (silent zero
+ # serial term). Routing the serial loop to `conley_unit` (the panel
+ # unit) instead of pseudo-PSU would mix IF allocators (PSU spatial
+ # vs unit serial), which violates the single-IF-allocator design
+ # pinned by the user-confirmed methodology in the Wave E.2 follow-up
+ # plan. Fail-closed per `feedback_no_silent_failures` until a
+ # no-effective-PSU-specific derivation is queued. Note: this fires
+ # AFTER `_inject_cluster_as_psu` (which runs upstream) so the
+ # documented `cluster= + survey_design(without psu)` surface
+ # — which becomes an effective-PSU layout via injection — passes
+ # through unscathed. R2 P1 fix: original front-door gate at
+ # `spillover.py:2210-2242` (now removed) fired before injection
+ # and broke the cluster-as-PSU survey-Conley surface.
+ if (
+ resolved_survey_fit is not None
+ and resolved_survey_fit.psu is None
+ and self.vcov_type == "conley"
+ and self.conley_lag_cutoff is not None
+ and self.conley_lag_cutoff > 0
+ ):
+ raise NotImplementedError(
+ "SpilloverDiD(vcov_type='conley', conley_lag_cutoff > 0) "
+ "combined with a no-effective-PSU survey_design "
+ "(weights-only / strata-only WITHOUT a cluster fallback) "
+ "is not supported in Wave E.2 follow-up. Under no-effective-"
+ "PSU survey designs the panel-block serial Bartlett HAC "
+ "would silently contribute zero (each pseudo-PSU = "
+ "obs-index appears in exactly one period, so the within-PSU "
+ "temporal sum has no cross-period pairs to accumulate). "
+ "Routing the serial loop to `conley_unit` would mix IF "
+ "allocators with the spatial term and is not derived in "
+ "this PR. Supply either an explicit `survey_design.psu`, "
+ "or `cluster=` (which is injected as the effective "
+ "PSU per Wave E.1's `_inject_cluster_as_psu` routing), "
+ "or use `conley_lag_cutoff=0` (cross-sectional Wave E.2)."
+ )
+
# Derive the Wave D variance mode from the PUBLIC contract:
# - vcov_type="conley" → "conley" (Conley spatial-HAC + GMM)
# - cluster= supplied → "cluster" (CR1 + GMM)
diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py
index 51c15bed..8427d44a 100644
--- a/diff_diff/two_stage.py
+++ b/diff_diff/two_stage.py
@@ -125,20 +125,29 @@ def _compute_gmm_corrected_meat(
TSL has its own ``(1-f_h) * n_h/(n_h-1)`` correction).
- ``vcov_type="cluster"``: ``cluster_ids`` IS the PSU (via upstream
``_inject_cluster_as_psu``); identical to the HC1+survey branch.
- - ``vcov_type="conley"`` (cross-sectional only — ``conley_lag_cutoff = 0``):
+ - ``vcov_type="conley"`` with ``conley_lag_cutoff = 0`` (cross-sectional):
Wave E.2 stratified-Conley sandwich on PSU totals via
:func:`_compute_stratified_conley_meat`. Aggregates Psi to PSU
totals + derives per-PSU centroids as the mean of per-obs
``conley_coords``; for each stratum applies the Conley kernel
between PSU centroids scaled by ``(1 - f_h) * n_h/(n_h-1)``.
Cross-stratum kernel weights are zero by sampling design.
- - ``vcov_type="conley"`` with ``conley_lag_cutoff > 0`` (panel-block
- Conley): raises ``NotImplementedError`` upstream at
- ``SpilloverDiD.fit``. The panel-block decomposition would need to
- compose the within-unit serial Bartlett HAC with the within-stratum
- cross-PSU spatial kernel on PSU-by-time scores rather than the
- collapsed PSU totals; out of Wave E.2 scope and tracked as a
- follow-up in ``TODO.md``.
+ - ``vcov_type="conley"`` with ``conley_lag_cutoff > 0`` (panel-block,
+ Wave E.2 follow-up): the orchestrator at
+ :func:`_compute_stratified_conley_meat` adds a within-PSU serial
+ Bartlett HAC term (Newey-West 1987 form) onto the Wave E.2 spatial
+ sandwich — ``meat = meat_spatial + meat_serial`` with disjoint
+ index sets (matches the no-survey panel-block decomposition at
+ :func:`diff_diff.conley._compute_conley_meat`). Serial term uses
+ per-period within-stratum centering (Binder TSL form) and
+ panel-wide per-stratum FPC. Requires an **effective PSU** —
+ either explicit ``survey_design.psu`` OR ``cluster=``
+ injected as the effective PSU per Wave E.1's
+ ``_inject_cluster_as_psu`` routing. No-effective-PSU survey
+ designs (weights-only / strata-only WITHOUT a cluster fallback)
+ raise ``NotImplementedError`` upstream at ``SpilloverDiD.fit``
+ post-resolution because the pseudo-PSU fallback would silently
+ zero the serial sum.
**`gamma_hat` solve** (mirror of `TwoStageDiD._compute_gmm_variance`
pattern at `two_stage.py:1886-1917`): factorize ``X_10' W X_10`` via
@@ -364,10 +373,13 @@ def _compute_gmm_corrected_meat(
"conley_coords, conley_cutoff_km, and conley_metric."
)
if resolved_survey is not None:
- # Wave E.2: stratified-Conley sandwich on PSU totals. cluster_ids
- # is intentionally NOT threaded through — after PSU aggregation
- # every PSU is its own cluster, so a cluster product kernel
- # would zero all cross-PSU pairs. Wave E.1's
+ # Wave E.2: stratified-Conley sandwich on PSU totals.
+ # Wave E.2 follow-up: with conley_lag_cutoff > 0 the orchestrator
+ # adds the within-PSU serial Bartlett HAC term onto the spatial
+ # sandwich (separable form; meat = meat_spatial + meat_serial).
+ # cluster_ids is intentionally NOT threaded through — after PSU
+ # aggregation every PSU is its own cluster, so a cluster product
+ # kernel would zero all cross-PSU pairs. Wave E.1's
# _resolve_effective_cluster path already coerced any
# user-supplied cluster= into PSU upstream.
meat = _compute_stratified_conley_meat(
@@ -378,6 +390,7 @@ def _compute_gmm_corrected_meat(
conley_kernel=conley_kernel,
resolved_survey=resolved_survey,
conley_time=conley_time, # panel-aware per-period sandwich
+ conley_lag_cutoff=conley_lag_cutoff, # Wave E.2 follow-up serial term
)
else:
# Wave D no-survey Conley path UNCHANGED — bit-identical fallback.
@@ -546,6 +559,7 @@ def _compute_stratified_conley_meat(
conley_kernel: str,
resolved_survey: "ResolvedSurveyDesign",
conley_time: Optional[np.ndarray] = None,
+ conley_lag_cutoff: Optional[int] = None,
) -> np.ndarray:
"""Wave E.2 panel-aware stratified-Conley meat on PSU-by-time scores.
@@ -553,8 +567,13 @@ def _compute_stratified_conley_meat(
Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D
Gardner GMM first-stage uncertainty correction (Butts 2021 ss3.1 +
Gardner 2022 ss4) applied to SpilloverDiD's ring-indicator stage-2
- design. No reference software combines all three ingredients on a
- two-stage influence function.
+ design. Wave E.2 follow-up extends to ``conley_lag_cutoff > 0`` by
+ summing the within-PSU serial Bartlett HAC term (Newey-West 1987)
+ onto the spatial sandwich: ``meat = meat_spatial + meat_serial`` with
+ disjoint index sets, exactly matching the no-survey panel-block
+ decomposition at :func:`diff_diff.conley._compute_conley_meat`. No
+ reference software combines panel-block Conley + Binder TSL + Gardner
+ GMM correction on a two-stage influence function.
**Panel-aware composition (preserves the library's panel Conley
contract):** for each period ``t``, aggregate per-obs Psi to PSU
@@ -599,6 +618,14 @@ def _compute_stratified_conley_meat(
one iteration on the full Psi, which is the cross-sectional
Wave E.2 design). When provided (the standard SpilloverDiD case),
the per-period loop preserves the within-period spatial semantic.
+ conley_lag_cutoff : int, optional
+ Bartlett serial-HAC bandwidth ``L`` in panel periods (Wave E.2
+ follow-up). When None or 0, the spatial term is the entire meat
+ (shipped Wave E.2 behaviour); when ``> 0``, the within-PSU serial
+ Bartlett HAC :func:`_compute_stratified_serial_bartlett_meat` is
+ added to the spatial term. ``L > 0`` requires ``conley_time`` set
+ (with ``conley_time is None`` the panel reduces to T=1 and the
+ serial helper short-circuits to zero meat).
Returns
-------
@@ -627,18 +654,19 @@ def _compute_stratified_conley_meat(
- ``T = 1`` (single period or ``conley_time is None``): single-pass
stratified-Conley sandwich on the full PSU totals (the original
- cross-sectional Wave E.2 design).
- - ``H = 1`` stratum, ``FPC = inf``: reduces to ``sum_t`` plain
- Conley sandwich on per-period PSU totals.
- - Bandwidth -> 0 (``K = I``): reduces to ``sum_t`` per-period
+ cross-sectional Wave E.2 design). With ``conley_lag_cutoff > 0``
+ the serial helper short-circuits to zero meat (no cross-period
+ pairs possible).
+ - ``H = 1`` stratum, ``FPC = inf``: spatial term reduces to ``sum_t``
+ plain Conley sandwich on per-period PSU totals; serial term (if
+ ``conley_lag_cutoff > 0``) reduces to plain Newey-West Bartlett HAC
+ on PSU totals.
+ - Bandwidth -> 0 (``K = I``): spatial reduces to ``sum_t`` per-period
within-stratum HC sandwich on PSU totals (NOT Wave E.1 Binder,
- which is over time-collapsed PSU totals).
-
- Out of scope (deferred follow-up, tracked in TODO.md):
-
- - ``conley_lag_cutoff > 0`` panel-block: the within-PSU serial
- Bartlett HAC over time would compose with the spatial sandwich
- here. Rejected upfront at ``SpilloverDiD.fit``.
+ which is over time-collapsed PSU totals); serial term unchanged
+ (separable form).
+ - ``conley_lag_cutoff = 0`` or ``None``: bit-identical to shipped
+ Wave E.2 (no serial helper call; spatial-only meat).
"""
from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores
@@ -779,6 +807,29 @@ def _compute_stratified_conley_meat(
_variance_computed = _variance_computed or var_t
_legit_zero += legit_zero_t
+ # Wave E.2 follow-up: serial Bartlett HAC term for conley_lag_cutoff > 0.
+ # Sums onto the spatial meat with disjoint index sets (separable form,
+ # NOT Driscoll-Kraay 2D-HAC). Lag=0 / None short-circuits — the helper
+ # itself returns zero-meat for L<=0 or T<=1 so this branch never erodes
+ # the lag=0 bit-identity guarantee with shipped Wave E.2 (test (a)).
+ # `cluster_ids` intentionally not threaded (same rationale as the spatial
+ # term: post-PSU-aggregation each PSU is its own cluster, the within-PSU
+ # serial loop already iterates exactly the right scope; threading a
+ # cluster product kernel would be a no-op).
+ if conley_lag_cutoff is not None and conley_lag_cutoff > 0:
+ meat_serial, var_serial, legit_zero_serial = _compute_stratified_serial_bartlett_meat(
+ Psi,
+ psu_arr=psu_arr,
+ time_arr=time_arr,
+ strata_arr_full=strata_arr_full,
+ fpc_arr_full=fpc_arr_full,
+ conley_lag_cutoff=int(conley_lag_cutoff),
+ lonely_psu=resolved_survey.lonely_psu,
+ )
+ meat = meat + meat_serial
+ _variance_computed = _variance_computed or var_serial
+ _legit_zero += legit_zero_serial
+
# Wave E.2 survey-saturated NaN-fail per `feedback_no_silent_failures`.
if not _variance_computed and _legit_zero == 0:
warnings.warn(
@@ -791,9 +842,311 @@ def _compute_stratified_conley_meat(
)
return np.full((p_2, p_2), np.nan)
+ # Finite + PSD guards on the COMBINED survey meat (spatial + serial).
+ # Mirrors :func:`diff_diff.conley._compute_conley_meat` L966-990 so the
+ # survey panel-block path has the same diagnostic surface as the
+ # no-survey path. The radial 1-D Bartlett spatial kernel and the
+ # Newey-West Bartlett serial kernel are both practitioner
+ # specializations that are NOT formally PSD-guaranteed; adding two
+ # non-PSD-guaranteed terms can produce a more indefinite combined
+ # meat, so the check matters most on the panel-block path. CI codex
+ # R1 P2 fix.
+ if not np.all(np.isfinite(meat)):
+ raise ValueError(
+ "SpilloverDiD Wave E.2 stratified-Conley meat contains non-finite "
+ "values; check Psi for NaN/Inf upstream of the sandwich."
+ )
+ eigvals = np.linalg.eigvalsh(meat)
+ if eigvals.size and eigvals.min() < -1e-12:
+ warnings.warn(
+ f"SpilloverDiD Wave E.2 stratified-Conley meat with conley_kernel="
+ f"{conley_kernel!r} has a materially negative eigenvalue "
+ f"({eigvals.min():.2e}); the variance estimator is not guaranteed "
+ "PSD on this design. Both supported kernels (radial bartlett and "
+ "uniform spatial) plus the hardcoded serial Bartlett term are "
+ "practitioner specializations of Conley 1999 / Newey-West 1987 "
+ "and are not formally PSD-guaranteed; consider varying "
+ "conley_cutoff_km / conley_lag_cutoff, or reviewing the design "
+ "for collinearity / degenerate residual structure.",
+ UserWarning,
+ stacklevel=2,
+ )
+
return meat
+def _compute_stratified_serial_bartlett_meat(
+ Psi: np.ndarray,
+ *,
+ psu_arr: np.ndarray,
+ time_arr: np.ndarray,
+ strata_arr_full: Optional[np.ndarray],
+ fpc_arr_full: Optional[np.ndarray],
+ conley_lag_cutoff: int,
+ lonely_psu: str,
+) -> Tuple[np.ndarray, bool, int]:
+ """Wave E.2 follow-up: within-PSU serial Bartlett HAC meat over time on
+ PSU-aggregated per-period scores.
+
+ Composes Newey-West (1987) serial Bartlett HAC with Conley (1999)'s panel
+ block-decomposition convention, Binder (1983) FPC, and Gerber (2026,
+ arXiv:2605.04124) Proposition 1 Binder TSL on Wave D Gardner GMM IFs
+ (Butts 2021 ss3.1 + Gardner 2022 ss4). Sibling helper to the Wave E.2
+ spatial orchestrator :func:`_compute_stratified_conley_meat`; the two
+ terms sum with disjoint index sets to form the panel-block meat
+ ``meat = meat_spatial + meat_serial`` exactly matching the no-survey
+ panel-block decomposition at :func:`diff_diff.conley._compute_conley_meat`.
+
+ **Composition (separable form, NOT Driscoll-Kraay 2D-HAC):**
+
+ meat_serial = sum_h FPC_h * sum_{g in stratum h}
+ sum_{|t-s| <= L, t != s, present[g, t] & present[g, s]}
+ (1 - |t-s|/(L+1)) * S_centered_t[g] @ S_centered_s[g].T
+
+ where ``S_psu_t[g] = sum_{i in PSU g, time t} Psi[i]`` is the per-period
+ PSU total, ``S_centered_t[g] = S_psu_t[g] - S_bar_h(g)_t`` is the
+ per-period within-stratum centered score (Binder TSL form — matches the
+ spatial helper's centering exactly), and ``|t-s|`` is computed on PANEL-
+ WIDE dense time codes ``np.unique(time_arr, return_inverse=True)``
+ (matches :func:`diff_diff.conley._compute_conley_meat` panel-block
+ convention at conley.py:934-939; mirrors R ``conleyreg::time_dist``).
+ Serial Bartlett kernel weights are hardcoded regardless of the spatial
+ ``conley_kernel`` argument (also matches the conley.py reference).
+
+ **FPC convention (panel-wide per-stratum)** — STANDALONE Newey-West
+ composition on stratified clusters, NOT by analogy to the Binder spatial
+ helper at :func:`_compute_binder_tsl_meat`. The serial sum aggregates
+ within-PSU temporal correlation across all observed periods — it is a
+ PANEL-level construct, not a period-level construct. The cluster set for
+ the panel-level sum is the panel-wide set of PSUs in stratum h, so the
+ FPC denominator uses ``n_h_panel = |unique PSUs in stratum h across the
+ active sample|``, not the per-period ``n_h_t``. The spatial term keeps
+ its per-period FPC unchanged (the period-t spatial sum IS a within-period
+ stratified-cluster sandwich at one time index). For balanced panels with
+ PSU present in every period, ``n_h_panel = n_h_t`` for all t so the two
+ converge; the difference surfaces under unbalanced panels.
+
+ **Centering asymmetry vs no-survey reference** — `conley.py:949-965`
+ uses RAW scores for the serial term (no centering) because the no-survey
+ path assumes ``E[scores] = 0`` under correct specification (X*eps
+ centered around zero), so centering is a no-op. The survey-weighted
+ Binder TSL form estimates the within-stratum mean and centers explicitly
+ (textbook stratified-cluster sandwich; the per-period stratum mean enters
+ via Binder's finite-population variance derivation). Using raw scores in
+ the survey case would inflate variance by twice the squared per-period
+ stratum mean and would NOT reduce to the cross-sectional Wave E.2 form
+ at lag=0.
+
+ **Singleton-adjust panel-wide mean asymmetry** — for ``lonely_psu="adjust"``
+ on a singleton stratum, the serial helper centers against the panel-wide
+ mean of per-period PSU totals (averaged over all (g, t) with
+ ``present[g, t]``), NOT the per-period within-stratum mean used by the
+ spatial helper. The scope difference reflects the serial term's panel-
+ level nature: a singleton stratum at the panel level has no within-
+ stratum cross-PSU variation to demean against, so the only meaningful
+ centering target is the panel-wide PSU mean. The ``continue``-skip-FPC
+ pattern matches the spatial helper at :func:`_compute_stratified_conley_meat_from_psu_scores`
+ L2007-2017 to avoid divide-by-zero on ``n_h_panel = 1``.
+
+ Parameters
+ ----------
+ Psi : np.ndarray of shape (n, p_2)
+ Per-obs Wave D Gardner GMM IF scores (Hajek-weighted upstream via
+ Wave E.1 eps multiplication). Bit-identical to the spatial path Psi.
+ psu_arr : np.ndarray of shape (n,)
+ Per-obs PSU identifier. ``resolved_survey.psu`` or pseudo-PSU =
+ obs-index per the orchestrator's no-PSU fallback.
+ time_arr : np.ndarray of shape (n,)
+ Per-obs period label; normalized to dense codes 0..T-1 internally.
+ strata_arr_full : np.ndarray of shape (n,) or None
+ Per-obs stratum. None synthesizes a single stratum.
+ fpc_arr_full : np.ndarray of shape (n,) or None
+ Per-obs FPC. None disables FPC scaling on the serial term
+ (`(1-f_h) = 1`); the n_h/(n_h-1) factor is still applied.
+ conley_lag_cutoff : int
+ Bartlett serial-HAC bandwidth `L` in panel periods. Must be >= 1
+ (T=1 or L=0 returns zeros via short-circuit).
+ lonely_psu : str
+ ``"remove"`` / ``"certainty"`` / ``"adjust"`` — singleton-stratum
+ handling, matches the Wave E.2 spatial helper exactly.
+
+ Returns
+ -------
+ meat : np.ndarray of shape (p_2, p_2)
+ Serial Bartlett HAC meat.
+ variance_computed : bool
+ Whether any actual variance was contributed.
+ legitimate_zero_count : int
+ Strata that legitimately contribute zero (lonely_psu="certainty").
+
+ Notes
+ -----
+ Does NOT thread ``cluster_ids``: after PSU aggregation every PSU is its
+ own cluster, so a cluster product kernel would zero all cross-PSU pairs
+ (Wave E.2 dispatch-boundary rationale). Inherits Wave E.1
+ `_resolve_effective_cluster` warn-and-coerce-to-PSU upstream.
+
+ Does NOT receive ``psu_value_to_centroid``: the serial kernel operates
+ on temporal lag only (no spatial component), so PSU centroids are
+ irrelevant for this term. Asymmetric vs the spatial helper which needs
+ centroids.
+
+ T = 1 (single observed period) or ``conley_lag_cutoff <= 0`` short-
+ circuits to zero meat with no variance reported — the degenerate panel-
+ block path, NOT a saturation diagnostic.
+ """
+ p_2 = Psi.shape[1]
+ L = int(conley_lag_cutoff)
+
+ # T=1 short-circuit: no cross-period pairs are possible.
+ unique_times, time_indices = np.unique(time_arr, return_inverse=True)
+ T = len(unique_times)
+ if T <= 1 or L <= 0:
+ return np.zeros((p_2, p_2)), False, 0
+
+ # PSU-stratum constancy is enforced upstream by `_validate_unit_constant_survey`
+ # at survey.py:1008-1048 (PSU is a unit-level survey column; a PSU's
+ # stratum assignment is constant across all observations by sampling-
+ # design invariant). No defensive re-check here; the validator would have
+ # raised before we reach the meat construction.
+
+ # Build per-PSU per-period score tensor S_psu_panel[g, t, :].
+ unique_psus, first_idx_panel, psu_indices_full = np.unique(
+ psu_arr, return_index=True, return_inverse=True
+ )
+ G_panel = len(unique_psus)
+ S_psu_panel = np.zeros((G_panel, T, p_2))
+ for j in range(p_2):
+ np.add.at(S_psu_panel[:, :, j], (psu_indices_full, time_indices), Psi[:, j])
+
+ # Presence mask: True iff PSU g has at least one obs at period t.
+ counts = np.zeros((G_panel, T), dtype=np.int64)
+ np.add.at(counts, (psu_indices_full, time_indices), 1)
+ present = counts > 0
+
+ # Per-PSU panel-wide attributes (stratum + FPC).
+ if strata_arr_full is not None:
+ psu_strata_panel = np.asarray(strata_arr_full)[first_idx_panel]
+ else:
+ psu_strata_panel = np.zeros(G_panel, dtype=int)
+ if fpc_arr_full is not None:
+ psu_fpc_panel: Optional[np.ndarray] = np.asarray(fpc_arr_full, dtype=np.float64)[
+ first_idx_panel
+ ]
+ else:
+ psu_fpc_panel = None
+
+ # Per-period within-stratum centering on the (G_panel, T, p_2) tensor.
+ # Match the spatial helper's per-period stratum-mean centering exactly.
+ # Initialize to ZEROS (NOT raw S_psu_panel) so any (g, t) cell whose
+ # stratum-period has < 2 active PSUs contributes zero to downstream
+ # serial cross-products. Leaving raw scores in singleton-active-period
+ # cells would feed uncentered values into the serial Bartlett sum and
+ # contaminate the covariance — codex R1 P1 fix. With < 2 active PSUs in
+ # stratum h at period t, within-stratum variance is undefined; the
+ # methodologically-correct behavior is zero contribution from that
+ # (h, t) leg, matching the spatial helper's lonely_psu="remove"
+ # convention applied at the per-period level.
+ S_centered = np.zeros_like(S_psu_panel)
+ unique_strata_panel = np.unique(psu_strata_panel)
+ for t in range(T):
+ for h in unique_strata_panel:
+ active_mask = (psu_strata_panel == h) & present[:, t]
+ if int(active_mask.sum()) < 2:
+ # Singleton/empty active PSUs in stratum h at period t:
+ # leave S_centered as zero (no contribution to serial sum
+ # from this leg). The per-stratum singleton branch below
+ # still handles panel-wide n_h_panel < 2.
+ continue
+ stratum_mean_t = S_psu_panel[active_mask, t, :].mean(axis=0)
+ S_centered[active_mask, t, :] = S_psu_panel[active_mask, t, :] - stratum_mean_t
+
+ # Panel-wide PSU mean for the singleton-adjust branch (compute lazily —
+ # only needed if lonely_psu == "adjust" AND any singleton stratum exists).
+ _global_psu_mean: Optional[np.ndarray] = None
+ if lonely_psu == "adjust":
+ present_count = int(present.sum())
+ if present_count > 0:
+ _global_psu_mean = (S_psu_panel * present[:, :, None]).sum(axis=(0, 1)) / present_count
+
+ # Per-stratum serial accumulation.
+ meat = np.zeros((p_2, p_2))
+ _variance_computed = False
+ legitimate_zero_count = 0
+ t_codes_full = np.arange(T, dtype=np.float64)
+
+ for h in unique_strata_panel:
+ stratum_psus = np.where(psu_strata_panel == h)[0]
+ n_h_panel = len(stratum_psus)
+
+ # Singleton-stratum branch (mirror spatial helper at survey.py:2001-2017
+ # — FPC `n_h/(n_h-1)` divides by zero when n_h_panel = 1 so MUST continue
+ # to skip the multi-PSU FPC block below).
+ if n_h_panel < 2:
+ if lonely_psu == "remove":
+ continue
+ elif lonely_psu == "certainty":
+ legitimate_zero_count += 1
+ continue
+ elif lonely_psu == "adjust":
+ # Center against panel-wide PSU mean (different scope from
+ # spatial helper's per-period stratum mean; see docstring
+ # "singleton-adjust panel-wide mean asymmetry"). The guard
+ # below covers the all-empty-presence edge (present_count = 0
+ # at the global mean computation above leaves _global_psu_mean
+ # None); in that pathological case every PSU's present mask
+ # is all-False so the inner loop continues without subtraction.
+ if _global_psu_mean is None:
+ continue
+ for g in stratum_psus:
+ present_g = present[g]
+ if int(present_g.sum()) < 2:
+ continue
+ t_g = t_codes_full[present_g]
+ lag_mat = np.abs(t_g[:, None] - t_g[None, :])
+ K_g = ((lag_mat <= L) & (lag_mat != 0)).astype(np.float64) * (
+ 1.0 - lag_mat / (L + 1.0)
+ )
+ S_g_centered = S_psu_panel[g, present_g] - _global_psu_mean
+ with np.errstate(invalid="ignore", over="ignore"):
+ meat += S_g_centered.T @ K_g @ S_g_centered
+ _variance_computed = True
+ continue
+
+ # Multi-PSU branch (n_h_panel >= 2): standard FPC + per-PSU serial.
+ f_h_panel = 0.0
+ if psu_fpc_panel is not None:
+ N_h = psu_fpc_panel[stratum_psus[0]]
+ if N_h < n_h_panel:
+ raise ValueError(
+ f"FPC ({N_h}) is less than the number of PSUs "
+ f"({n_h_panel}) in stratum (Wave E.2 follow-up serial helper). "
+ "FPC must be >= n_PSU_panel."
+ )
+ f_h_panel = n_h_panel / N_h
+
+ M_h_serial = np.zeros((p_2, p_2))
+ for g in stratum_psus:
+ present_g = present[g]
+ if int(present_g.sum()) < 2:
+ continue
+ t_g = t_codes_full[present_g]
+ # PANEL-WIDE dense time codes for the serial kernel (NOT per-PSU
+ # positional encoding). See test (g) in TestSpilloverDiDWaveE2Followup
+ # for the methodology lock; matches conley.py:940 R-deviation.
+ lag_mat = np.abs(t_g[:, None] - t_g[None, :])
+ K_g = ((lag_mat <= L) & (lag_mat != 0)).astype(np.float64) * (1.0 - lag_mat / (L + 1.0))
+ S_g_centered = S_centered[g, present_g]
+ with np.errstate(invalid="ignore", over="ignore"):
+ M_h_serial += S_g_centered.T @ K_g @ S_g_centered
+
+ fpc_scale = (1.0 - f_h_panel) * n_h_panel / (n_h_panel - 1)
+ meat += fpc_scale * M_h_serial
+ _variance_computed = True
+
+ return meat, _variance_computed, legitimate_zero_count
+
+
# =============================================================================
# Main Estimator
# =============================================================================
diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst
index 5f883619..55342eee 100644
--- a/docs/api/spillover.rst
+++ b/docs/api/spillover.rst
@@ -326,6 +326,93 @@ and planned follow-up enhancements:
survey_design=)`` is queued for a follow-up (the
``_APPLICABILITY`` / ``_PT_METHOD`` wiring must register the new
combination first).
+- **Survey-design integration (Wave E.2 follow-up —
+ ``conley_lag_cutoff > 0`` panel-block composition).** SHIPPED in
+ Wave E.2 follow-up. ``vcov_type="conley" + conley_lag_cutoff > 0 +
+ survey_design=`` is now supported by adding a within-PSU serial
+ Bartlett HAC term to the Wave E.2 spatial sandwich, **provided the
+ survey design has an effective PSU** (either explicit
+ ``survey_design.psu`` or a ``cluster=`` argument that gets
+ injected as the effective PSU per Wave E.1's
+ ``_inject_cluster_as_psu`` routing). No-effective-PSU survey designs
+ (weights-only / strata-only WITHOUT a cluster fallback) raise
+ ``NotImplementedError`` at ``SpilloverDiD.fit`` post-resolution —
+ see the Restrictions list below.
+
+ .. note::
+
+ Wave E.2 follow-up composes Wave E.2's panel-aware stratified-Conley
+ spatial sandwich with within-PSU serial Bartlett HAC over time
+ (Newey-West 1987 form, kernel weights ``1 - |t-s|/(L+1)`` for
+ ``|t-s| <= L, t != s``). The composition is
+ ``meat = meat_spatial + meat_serial`` with disjoint index sets,
+ exactly matching the no-survey panel-block decomposition at
+ ``diff_diff.conley._compute_conley_meat`` (Conley 1999 + Newey-West
+ 1987 separable form, NOT Driscoll-Kraay 2D-HAC). For each stratum
+ ``h``: ``meat_serial_h = FPC_h_panel * sum_{g in stratum h}
+ sum_{|t-s| <= L, t != s, both periods present for PSU g}
+ K_serial(|t-s|/(L+1)) * S_centered_t[g] @ S_centered_s[g]'`` where
+ ``S_centered_t[g] = S_psu_t[g] - S_bar_h(g)_t`` is per-period
+ within-stratum centered (Binder TSL form — matches the spatial
+ helper's centering exactly), and ``|t-s|`` uses panel-wide dense
+ time codes (matches ``conley.py:940`` documented R deviation that
+ mirrors R ``conleyreg``). Serial Bartlett kernel is hardcoded
+ regardless of ``conley_kernel`` (mirrors ``conley.py:951-965``;
+ ``conley_kernel`` governs spatial kernel only). FPC for serial uses
+ panel-wide ``n_h_panel = |unique PSUs in stratum h across active
+ sample|``, NOT per-period ``n_h_t`` (the serial sum is a PANEL-level
+ construct; standalone Newey-West composition on stratified clusters,
+ deliberately NOT by analogy to the cross-sectional Wave E.2 spatial
+ FPC convention). No reference software combines panel-block Conley +
+ Binder TSL + Gardner GMM correction on a two-stage influence
+ function.
+
+ Reduction semantics:
+
+ - ``conley_lag_cutoff = 0`` or ``None``: **bit-identical** to shipped
+ Wave E.2 ATT and scalar SE (orchestrator skips the serial helper
+ invocation when ``L = 0`` so meat_serial does not contribute; the
+ test_a2 mock-spy independently asserts the helper isn't invoked).
+ - ``conley_time is None`` or ``T = 1``: serial helper short-circuits to
+ zero meat (no cross-period pairs possible).
+ - Single stratum (H = 1, FPC = inf) with ``L > 0``: serial reduces to
+ Newey-West Bartlett HAC on per-period within-stratum-CENTERED per-PSU
+ score sequences (NOT raw scores — Binder TSL centering is retained at
+ H=1 because the survey path always subtracts the per-period stratum
+ mean before the kernel application; the panel-wide ``G/(G-1)``
+ survey factor replaces FPC).
+ - Bandwidth → 0 with ``L > 0``: spatial reduces to per-period
+ within-stratum HC sandwich; serial term unchanged (separable form).
+ - All PSUs singleton + ``lonely_psu="remove"`` with ``L > 0``: meat
+ NaN-fails on the saturation diagnostic (template "Wave E.2
+ stratified-Conley sandwich" covers both spatial-only and panel-block
+ cases).
+
+ Centering asymmetry vs no-survey reference:
+
+ - The no-survey panel-block path at ``conley.py:949-965`` uses RAW
+ scores for the serial term (no centering) because it assumes
+ ``E[scores] = 0`` under correct specification. The survey-weighted
+ Binder TSL form estimates the within-stratum mean and centers
+ explicitly (textbook stratified-cluster sandwich); raw scores in the
+ survey case would inflate variance by twice the squared per-period
+ stratum mean and would NOT reduce to the cross-sectional Wave E.2
+ form at ``lag = 0``.
+
+ Restrictions:
+
+ - **Requires an effective PSU** — either explicit ``survey_design.psu``
+ OR ``cluster=`` injected as the effective PSU per Wave E.1's
+ ``_inject_cluster_as_psu``. No-effective-PSU survey designs
+ (weights-only / strata-only WITHOUT a cluster fallback) raise
+ ``NotImplementedError`` post-resolution at ``SpilloverDiD.fit``
+ because the pseudo-PSU = obs-index fallback would silently zero
+ the serial sum (each pseudo-PSU appears in exactly one period).
+ Tracked as a follow-up in ``TODO.md``.
+ - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises
+ ``NotImplementedError`` (inherits Wave E.1 gate).
+ - DiagnosticReport routing for the panel-block case is queued for the
+ same Wave F follow-up as the Wave E.2 cross-sectional case.
- **Count-of-treated-in-ring** — only the "nearest-treated ring"
specification is implemented. The "count" form re-introduces
functional-form dependence (paper Section 3.2 end) and is queued.
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index cf62fa27..96027831 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -3244,7 +3244,7 @@ The `_compute_stratified_meat_from_psu_scores` helper at `diff_diff/survey.py` i
Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_survey` (the standard survey 4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`). Threaded through all four `safe_inference` call sites: aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, and the scalar `att` lincom in event-study mode.
-- **Note (documented synthesis):** Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD's Wave D Psi is aggregated to PSU level and passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. No reference software combines all ingredients; Wave E.2 extends with the Conley × survey product-kernel composition — see "Variance (Wave E.2)" subsection below.
+- **Note (documented synthesis):** Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD's Wave D Psi is aggregated to PSU level and passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. No reference software combines all ingredients; Wave E.2 extends with the Conley × survey product-kernel composition (cross-sectional `conley_lag_cutoff = 0`) — see "Variance (Wave E.2)" subsection below; Wave E.2 follow-up further extends with the within-PSU serial Bartlett HAC for `conley_lag_cutoff > 0` — see "Variance (Wave E.2 follow-up)" subsection below.
- **Note (warn-and-use-PSU for cluster + survey):** when both `cluster=` and `survey_design.psu` are supplied with **different groupings**, the cluster argument emits a `UserWarning` and is overridden by PSU (mirrors `TwoStageDiD._resolve_effective_cluster`). PSU is the design-relevant cluster on survey panels; `cluster=` on SpilloverDiD is more often a spatial / unit-level label, so the design constraint wins. When both knobs are supplied with the **same** groupings, no warning fires and PSU still takes precedence (the inference is unchanged either way).
- **Note (limitation — `SurveyDesign.subpopulation()` with FE-undefined zero-weight rows):** when `survey_design` is built via `SurveyDesign.subpopulation()` (or otherwise carries zero-weight padding rows) AND those zero-weight rows lose stage-1 FE support (warn-and-drop unit path), Wave E.1's `finite_mask` block physically removes them from the survey design rather than retaining them as zero-score padding. Consequently `n_psu`, `df_survey`, and the Binder TSL centering are recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (e.g. R's `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts. Practitioners using subpopulation-derived designs should expect SEs that may differ slightly from textbook domain expectations on warn-and-drop fits. Tracked as Wave E.3 follow-up — see TODO.md.
- **Note (saturated `df_survey = 0` NaN-fail):** when `lonely_psu="remove"` removes all strata (single PSU per stratum), `_compute_stratified_meat_from_psu_scores` returns `(_, var_computed=False, legit_zero=0)`. SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can pin via `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`.
@@ -3276,6 +3276,34 @@ Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_su
**Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing Binder helper; 3-tuple `(meat, variance_computed, legitimate_zero_count)` return contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(centered, coords_h, cutoff, metric, kernel)` cross-sectional mode). New dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to `_compute_binder_tsl_meat`; per-obs Psi → PSU aggregation via `np.add.at` + PSU centroid derivation via vectorized `np.add.at` sums / `np.bincount` counts + dispatch to survey helper; intentionally no `cluster_ids` parameter). `_compute_gmm_corrected_meat` conley branch extended at `diff_diff/two_stage.py` with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D no-survey Conley. Saturation NaN-fail mirrors Wave E.1 (`UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..."). Wave E.1 stage-1 weighted FE solver, `finite_mask` survey-array subsetting, `df_survey` threading to `safe_inference` call sites, bread weighting, and `SpilloverDiDResults` survey metadata are all inherited UNCHANGED — Psi construction is bit-identical regardless of `vcov_type`. Tests: `TestSpilloverDiDWaveE2ConleySurveyDesign` + `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` at `tests/test_spillover.py`.
+### Variance (Wave E.2 follow-up — `conley_lag_cutoff > 0` panel-block composition via spatial + serial Bartlett HAC)
+
+`vcov_type="conley" + conley_lag_cutoff > 0 + survey_design=` is now supported via panel-block stratified-Conley sandwich. SHIPPED in Wave E.2 follow-up.
+
+- **Note (documented synthesis):** Wave E.2 follow-up composes Wave E.2's panel-aware stratified-Conley spatial sandwich (Conley 1999 spatial-HAC × Binder/Gerber 2026 stratified TSL × Wave D Gardner GMM correction) with within-PSU serial Bartlett HAC over time (Newey-West 1987 form, kernel weights `1 - |t-s|/(L+1)` for `|t-s| ≤ L, t ≠ s`). The composition is `meat = meat_spatial + meat_serial` with disjoint index sets, exactly matching the no-survey panel-block decomposition at `diff_diff.conley._compute_conley_meat` (Conley 1999 + Newey-West 1987 separable form, NOT Driscoll-Kraay 2D-HAC). The serial term aggregates per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i` along the time axis: for each stratum `h`, `meat_serial_h = FPC_h_panel * sum_{g in stratum h} sum_{|t-s| ≤ L, t ≠ s, both periods present for PSU g} K_serial(|t-s|/(L+1)) * S_centered_t[g] @ S_centered_s[g]'` where `S_centered_t[g] = S_psu_t[g] - S_bar_h(g)_t` is per-period within-stratum centered (Binder TSL form — matches the spatial helper's centering exactly), and `|t-s|` is computed on **panel-wide dense time codes** (matches `conley.py:940` documented R deviation that mirrors R `conleyreg`). Serial Bartlett kernel is hardcoded regardless of `conley_kernel` (mirrors conley.py:951-965; the user-selected `conley_kernel` governs the spatial kernel only). Total meat is `sum_t sum_h M_h_t + sum_h meat_serial_h`. No reference software combines panel-block Conley + Binder TSL + Gardner GMM correction on a two-stage influence function.
+
+- **FPC convention (panel-wide per-stratum) — standalone Newey-West composition on stratified clusters:** the serial sum aggregates within-PSU temporal correlation across all observed periods — it is a PANEL-level construct, not a period-level construct. The cluster set for the panel-level sum is the panel-wide set of PSUs in stratum `h`, so the FPC denominator uses `n_h_panel = |unique PSUs in stratum h across the active sample|` and `N_h = fpc_per_psu[first PSU in h]` (panel-constant by sampling design); serial term in stratum h scales by `(1 - n_h_panel/N_h) * n_h_panel/(n_h_panel - 1)`. The spatial term keeps its existing per-period FPC unchanged (the period-`t` spatial sum IS a within-period stratified-cluster sandwich at one time index). For balanced panels with PSU present in every period, `n_h_panel = n_h_t` for all `t` so the two FPC denominators converge; the difference surfaces under unbalanced panels. **Standalone citation chain** (Binder 1983 for FPC factor form, Gerber 2026 Prop 1 for Binder TSL composition with two-stage IF, Newey-West 1987 for serial Bartlett kernel weights, Conley 1999 for spatial kernel and panel-block decomposition) — deliberately NOT by analogy to the Binder helper's per-stratum first-occurrence FPC convention at `_compute_binder_tsl_meat`, which is a cross-sectional application of the same principle.
+
+- **Centering asymmetry vs no-survey reference:** the no-survey panel-block path at `conley.py:949-965` uses RAW scores for the serial term (no centering) because it assumes `E[scores] = 0` under correct specification — centering is a no-op under that assumption. The survey-weighted Binder TSL form estimates the within-stratum mean and centers explicitly (textbook stratified-cluster sandwich; the per-period stratum mean enters via Binder's finite-population variance derivation). Using raw scores in the survey case would inflate variance by twice the squared per-period stratum mean and would NOT reduce to the cross-sectional Wave E.2 form at `lag = 0`. The centering applied to each leg (`t` and `s`) of the cross-period pair matches the spatial helper's per-period centering exactly. Hand-check at `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff::test_c0_serial_centering_hand_check_raw_vs_centered` pins this contract.
+
+- **Reduction semantics (load-bearing for tests):**
+ - `conley_lag_cutoff = 0` or `None`: bit-identical to shipped Wave E.2 ATT and scalar SE (orchestrator skips the serial helper invocation; the spatial loop + saturation guard + new PSD/finite guard still run on the spatial-only meat). `assert_array_equal` regression pin at test_a covers ATT + scalar SE; test_a2 mock-spy independently pins that the serial helper is NOT invoked at lag=0.
+ - `conley_time is None` or `T = 1`: serial helper short-circuits to zero meat (no cross-period pairs possible) — the degenerate panel-block path, NOT a saturation diagnostic.
+ - Single stratum (`H = 1`, `FPC = inf`): spatial reduces to `sum_t` Conley sandwich on per-period **within-stratum-CENTERED** PSU totals (NOT raw — at H=1 the centering still subtracts the per-period mean over all G PSUs); serial reduces to Newey-West Bartlett HAC on per-period within-stratum-CENTERED PSU totals (NOT raw scores; the survey-weighted form retains Binder TSL centering even at H=1). Both reductions carry the panel-wide `G/(G-1)` survey factor in lieu of FPC.
+ - Bandwidth → 0, `L > 0`: spatial reduces to `sum_t` per-period within-stratum HC sandwich on PSU totals; serial term unchanged (separable form).
+ - All PSUs singleton in stratum + `lonely_psu="remove"`: `df_survey = 0` and the meat NaN-fails (saturation warning template "Wave E.2 stratified-Conley sandwich" covers both cross-sectional and panel-block cases).
+ - Cross-stratum kernel weight is exactly zero on the serial term too (the within-PSU loop is necessarily within-stratum).
+
+- **Singleton-stratum `lonely_psu="adjust"` panel-wide mean asymmetry:** for the serial helper, `_global_psu_mean` is the panel-wide mean of per-period PSU totals (averaged over all `(g, t)` with `present[g, t]`), NOT the per-period within-stratum mean used by the spatial helper. The scope difference reflects the serial term's panel-level nature: a singleton stratum at the panel level has no within-stratum cross-PSU variation to demean against, so the only meaningful centering target is the panel-wide PSU mean. The `continue`-skip-FPC pattern matches the spatial helper at `survey.py:2007-2017` exactly to avoid divide-by-zero on `n_h_panel = 1`.
+
+- **Restrictions (Wave E.2 follow-up):**
+ - Requires an **effective PSU**: either an explicit `survey_design.psu` OR a `cluster=` argument that gets injected as the effective PSU per Wave E.1's `_inject_cluster_as_psu` routing. No-effective-PSU survey designs (weights-only / strata-only WITHOUT a cluster fallback) raise `NotImplementedError` at `SpilloverDiD.fit` post-resolution because under the pseudo-PSU = obs-index fallback each pseudo-PSU appears in exactly one period — the per-PSU serial cross-period loop would silently contribute zero. Routing the serial loop to `conley_unit` would mix IF allocators with the spatial term's pseudo-PSU aggregation (the user-confirmed methodology pins a single IF allocator); fail-closed per `feedback_no_silent_failures` until a no-effective-PSU derivation is queued (tracked in TODO.md).
+ - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate).
+ - LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley roadmap, not Wave E).
+ - DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", conley_lag_cutoff > 0, survey_design=)` is queued for Wave F follow-up (per `feedback_audit_diagnostic_report_wiring_before_claim` — `_APPLICABILITY` / `_PT_METHOD` registration required before claiming consumability).
+
+**Implementation:** new sibling helper `_compute_stratified_serial_bartlett_meat` in `diff_diff/two_stage.py` (parallel to the Wave E.2 spatial orchestrator; ~200 LoC; three-mode singleton-stratum branching with FPC scaling inside the multi-PSU branch to avoid divide-by-zero; panel-wide dense time codes for the lag math). Orchestrator `_compute_stratified_conley_meat` signature extended with `conley_lag_cutoff: Optional[int] = None`; spatial loop unchanged; serial helper called after spatial loop when `conley_lag_cutoff > 0`; saturation NaN-fail accounting merges both terms' `(variance_computed, legitimate_zero)` flags. Dispatch in `_compute_gmm_corrected_meat` conley branch threads `conley_lag_cutoff` through to the orchestrator. Spillover-side gate at `spillover.py:2210` deleted (Wave E.2 era `NotImplementedError` for lag>0 + survey). Stage-1 weighted FE solver, `finite_mask` survey-array subsetting, `df_survey` threading, bread weighting, and `SpilloverDiDResults` survey metadata are all inherited UNCHANGED — Psi construction is bit-identical regardless of vcov_type or lag. Tests: `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff` + `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoffEventStudy` at `tests/test_spillover.py`.
+
**Edge cases (from paper Section 3.2 / Discussion):**
| # | Edge case | Handling |
@@ -3291,7 +3319,7 @@ Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_su
**Restrictions / deferred features:**
- `event_study=True` SHIPPED in Wave C — see Event-study mode subsection above. Emits `att_dynamic`, MultiIndex `spillover_effects`, and a TwoStageDiD-compatible `event_study_effects` dict alias.
-- `survey_design=` for `vcov_type ∈ {"hc1"}` (plus `cluster=` for CR1) SHIPPED in Wave E.1 — see "Variance (Wave E.1)" subsection below. Threads Hájek-normalized survey weights through stage-1 FE estimation, gamma_hat solve, eps construction, and bread inversion; aggregates the Wave D Psi to PSU totals and routes through the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. `vcov_type="conley"` combined with `survey_design=` SHIPPED in Wave E.2 — see "Variance (Wave E.2)" subsection below (stratified-Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — Gerber (2026) Appendix A notes the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up.
+- `survey_design=` for `vcov_type ∈ {"hc1"}` (plus `cluster=` for CR1) SHIPPED in Wave E.1 — see "Variance (Wave E.1)" subsection below. Threads Hájek-normalized survey weights through stage-1 FE estimation, gamma_hat solve, eps construction, and bread inversion; aggregates the Wave D Psi to PSU totals and routes through the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. `vcov_type="conley"` combined with `survey_design=` SHIPPED in Wave E.2 for cross-sectional Conley (`conley_lag_cutoff = 0`) — see "Variance (Wave E.2)" subsection below (stratified-Conley sandwich on PSU totals). Wave E.2 follow-up adds the panel-block composition (`conley_lag_cutoff > 0`) via spatial + serial Bartlett HAC — see "Variance (Wave E.2 follow-up)" subsection below. Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — Gerber (2026) Appendix A notes the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up.
- `covariates=` raises `NotImplementedError` — Gardner-style stage-1 residualization not yet wired through; planned follow-up.
- `ring_method="count"` not exposed — only the nearest-treated-ring specification.
- `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`, `"classical"`} raises `NotImplementedError` — `hc2`/`hc2_bm` because current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF; `classical` because the Wave D Gardner GMM first-stage correction has not been derived for the classical homoskedastic variance (different meat structure `sigma_hat^2 * (X_10' X_10)` vs the Wave D IF outer product `Psi' Psi`). Use `"hc1"` or `"conley"`, or pair with `cluster=` for CR1 — all three apply the Wave D GMM correction.
diff --git a/docs/references.rst b/docs/references.rst
index d8acabdb..30c0f18e 100644
--- a/docs/references.rst
+++ b/docs/references.rst
@@ -202,6 +202,10 @@ Multi-Period and Staggered Adoption
- **Goodman-Bacon, A. (2021).** "Difference-in-Differences with Variation in Treatment Timing." *Journal of Econometrics*, 225(2), 254-277. https://doi.org/10.1016/j.jeconom.2021.03.014
+- **Newey, W. K., & West, K. D. (1987).** "A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." *Econometrica*, 55(3), 703-708. https://doi.org/10.2307/1913610
+
+ Primary source for the Bartlett serial-HAC kernel weights `1 - |t-s|/(L+1)` used in time-series and panel HAC variance estimators. Our ``diff_diff/conley.py`` panel-block path at L949-965 hardcodes this kernel for the within-unit serial component (mirrors R ``conleyreg::time_dist``); SpilloverDiD's Wave E.2 follow-up composes the same kernel weights with Binder/Gerber TSL + Wave D Gardner GMM correction for the panel-block survey case (``_compute_stratified_serial_bartlett_meat`` at ``diff_diff/two_stage.py``; see REGISTRY section "Variance (Wave E.2 follow-up)").
+
- **Wing, C., Freedman, S. M., & Hollingsworth, A. (2024).** "Stacked Difference-in-Differences." *NBER Working Paper* 32054. https://www.nber.org/papers/w32054
- **Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025).** "Efficient Difference-in-Differences and Event Study Estimators." *Working Paper*.
diff --git a/tests/test_spillover.py b/tests/test_spillover.py
index 81e4f844..c5b2ef7c 100644
--- a/tests/test_spillover.py
+++ b/tests/test_spillover.py
@@ -11,6 +11,7 @@
import pandas as pd
import pytest
+from diff_diff import SurveyDesign
from diff_diff.spillover import (
SpilloverDiD,
_apply_callable_metric_pairwise,
@@ -6100,36 +6101,6 @@ def test_i_saturated_design_nan_fails(self):
assert np.isnan(res.t_stat)
assert np.isnan(res.p_value)
- def test_j0_panel_conley_lag_cutoff_rejected_under_survey(self):
- """vcov_type='conley' + conley_lag_cutoff > 0 + survey_design raises
- NotImplementedError upfront. Wave E.2 ships cross-sectional only;
- the panel-block decomposition (within-unit serial Bartlett HAC over
- time) would need PSU-by-time scores rather than the collapsed PSU
- totals. Tracked as a Wave E.2 follow-up in TODO.md.
- """
- from diff_diff import SurveyDesign
-
- df = generate_butts_nonstaggered_dgp(seed=180)
- df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
- design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
- est = SpilloverDiD(
- rings=[0.0, 100.0],
- conley_coords=("lat", "lon"),
- conley_metric="haversine",
- conley_cutoff_km=self._CUTOFF_KM,
- conley_lag_cutoff=2, # panel-block path
- vcov_type="conley",
- )
- with pytest.raises(NotImplementedError, match="conley_lag_cutoff > 0"):
- est.fit(
- df_s,
- outcome="y",
- unit="unit",
- time="time",
- first_treat="first_treat",
- survey_design=design,
- )
-
def test_j_replicate_weights_rejection_inherits_wave_e1(self):
"""Replicate-weight variance still raises NotImplementedError under
conley+survey (inherits Wave E.1 gate). SurveyDesign requires
@@ -6470,6 +6441,1081 @@ def test_n_finite_mask_survey_array_subsetting(self):
assert res.n_obs <= len(df_s) # at least the always-treated unit's rows dropped
+class TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff:
+ """Wave E.2 follow-up: conley + survey + conley_lag_cutoff > 0 via
+ panel-block composition (spatial + serial Bartlett HAC).
+
+ Methodology anchor: Wave E.2's panel-aware stratified-Conley spatial
+ sandwich (Conley 1999 × Binder/Gerber 2026 × Wave D Gardner GMM)
+ composed with within-PSU serial Bartlett HAC (Newey-West 1987 separable
+ form). Verifies:
+ - lag=0 STRICT bit-identity to shipped Wave E.2 ATT and scalar SE
+ (test_a) plus mock-spy that the serial helper isn't invoked (test_a2)
+ - serial centering = Binder TSL form (per-period within-stratum), NOT raw
+ - AR(1) DGP serial-term behavioral inflation
+ - panel-wide per-stratum FPC for the serial term
+ - panel-wide dense time codes for the lag math
+ - singleton-adjust panel-wide mean asymmetry (vs spatial's per-period mean)
+ - saturation NaN-fail still fires
+ """
+
+ _CUTOFF_KM = 1000.0
+
+ def _fit(self, df, lag_cutoff=1, design=None, **kwargs):
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=lag_cutoff,
+ vcov_type="conley",
+ event_study=False,
+ **kwargs,
+ )
+ return est.fit(
+ df,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+
+ def test_a_lag0_strict_bit_identical_to_wave_e2_meat(self):
+ """`conley_lag_cutoff = 0` MUST produce bit-identical ATT AND scalar SE
+ as a fresh Wave E.2 baseline fit (`assert_array_equal`). The
+ orchestrator does NOT truly early-return at lag=0 — the spatial
+ loop, saturation guard, and new PSD/finite guard all still run; the
+ guarantee is that the serial helper is NOT invoked (so meat_serial
+ contributes nothing). test_a2 mock-spy verifies the helper isn't
+ called.
+
+ Methodology lock: skipping the serial helper at the orchestrator
+ level is the backwards-compatibility guarantee that the shipped
+ Wave E.2 surface is unaffected by the follow-up. Without that skip,
+ a numerical zero from the serial helper would still inject
+ floating-point noise into the spatial-only meat (which would
+ surface as SE drift).
+
+ Note: full meat-matrix equality is NOT asserted — only ATT + scalar
+ SE are pinned (the meat matrix is not directly exposed on
+ `SpilloverDiDResults`).
+ """
+ df = generate_butts_nonstaggered_dgp(seed=0)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res_lag0 = self._fit(df_s, lag_cutoff=0, design=design)
+ # `lag=0` fit must match a fresh Wave E.2 baseline estimator fit
+ # (which has the same config, same lag, same data) — bit-identical.
+ # This catches any case where the orchestrator fails to short-circuit
+ # and the serial helper injects floating-point noise into the
+ # spatial-only meat. The Wave E.2 baseline is constructed inline
+ # (rather than relying on captured goldens) so this test is robust
+ # to BLAS-runner drift on absolute values; the assertion is on
+ # cross-fit bit-identity at one runner.
+ est_e2_baseline = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=0,
+ vcov_type="conley",
+ event_study=False,
+ )
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res_e2_baseline = est_e2_baseline.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ np.testing.assert_array_equal(res_lag0.att, res_e2_baseline.att)
+ np.testing.assert_array_equal(res_lag0.se, res_e2_baseline.se)
+
+ def test_a2_lag0_does_not_call_serial_helper(self):
+ """Structural anchor: orchestrator skips the serial helper at `lag_cutoff = 0`
+ BEFORE invoking `_compute_stratified_serial_bartlett_meat`. Mirrors
+ the Wave E.2 test_a2 mock-spy pattern. Without this, a future
+ refactor that always-invokes the serial helper would silently degrade
+ the lag=0 backwards-compat guarantee.
+ """
+ from unittest.mock import patch
+
+ df = generate_butts_nonstaggered_dgp(seed=2)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ with patch("diff_diff.two_stage._compute_stratified_serial_bartlett_meat") as mock_serial:
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ self._fit(df_s, lag_cutoff=0, design=design)
+ assert not mock_serial.called, (
+ "lag=0 conley + survey fit must NOT call the serial helper "
+ "— orchestrator short-circuit broken."
+ )
+
+ def test_b_lag1_invokes_serial_helper(self):
+ """lag=1 MUST invoke the new serial helper exactly once."""
+ from unittest.mock import patch
+
+ df = generate_butts_nonstaggered_dgp(seed=3)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat as _orig
+
+ with patch(
+ "diff_diff.two_stage._compute_stratified_serial_bartlett_meat", wraps=_orig
+ ) as spy:
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ self._fit(df_s, lag_cutoff=1, design=design)
+ assert spy.called, "lag=1 fit must invoke the serial helper"
+ assert spy.call_count == 1
+
+ def test_c0_serial_centering_hand_check_raw_vs_centered(self):
+ """Pure unit test: the serial helper applies per-period within-stratum
+ centering (Binder TSL form), NOT raw scores like the no-survey
+ reference at conley.py:949-965. Construct a 2-PSU × 2-period synthetic
+ with materially nonzero per-period stratum mean; assert the centered
+ helper output equals the hand-computed centered form (not the raw form).
+
+ Pins MEDIUM #2 from plan-review: codex will push hard on the asymmetry
+ vs the no-survey panel-block reference; the load-bearing claim is
+ that the centered form is the methodologically-correct choice under
+ Binder TSL and that raw scores would inflate the variance.
+ """
+ from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat
+
+ # 2 PSUs (one stratum, both PSUs in stratum 0), 2 periods, score dim 2.
+ # Per-period stratum mean is NOT zero (the two PSUs' scores have
+ # nonzero average): PSU 0 has [+2, +1] at t=0, [+3, +2] at t=1;
+ # PSU 1 has [-1, 0] at t=0, [-2, +1] at t=1.
+ # Stratum mean at t=0: [(2-1)/2, (1+0)/2] = [0.5, 0.5]
+ # Stratum mean at t=1: [(3-2)/2, (2+1)/2] = [0.5, 1.5]
+ # Centered: PSU 0 at t=0: [1.5, 0.5]; PSU 1 at t=0: [-1.5, -0.5]
+ # PSU 0 at t=1: [2.5, 0.5]; PSU 1 at t=1: [-2.5, -0.5]
+ # Serial sum per PSU at L=1: K_serial(|0-1|=1) = (1 - 1/2) = 0.5
+ # PSU 0 contribution: 0.5 * (centered_t0[0] @ centered_t1[0].T)
+ # = 0.5 * outer([1.5, 0.5], [2.5, 0.5])
+ # PSU 1 contribution: 0.5 * outer([-1.5, -0.5], [-2.5, -0.5])
+ # NOTE: K_serial is symmetric so we get the (t=0, t=1) and (t=1, t=0)
+ # contributions both as the same scalar K=0.5 times each PSU's
+ # cross-period outer product (with appropriate transposition).
+ # Total per-PSU meat = sum over (t, s) with t != s of K * S_t @ S_s.T
+ # = K * (S_0 @ S_1.T + S_1 @ S_0.T)
+ # FPC: n_h_panel = 2, fpc_per_psu = inf -> f_h = 0, scale = 2/(2-1) = 2
+ psu_arr = np.array([0, 0, 1, 1])
+ time_arr = np.array([0, 1, 0, 1])
+ strata_arr = np.array([0, 0, 0, 0])
+ fpc_arr = np.full(4, np.inf)
+ Psi = np.array(
+ [
+ [2.0, 1.0], # PSU 0, t=0
+ [3.0, 2.0], # PSU 0, t=1
+ [-1.0, 0.0], # PSU 1, t=0
+ [-2.0, 1.0], # PSU 1, t=1
+ ]
+ )
+
+ meat_centered, var_computed, _ = _compute_stratified_serial_bartlett_meat(
+ Psi,
+ psu_arr=psu_arr,
+ time_arr=time_arr,
+ strata_arr_full=strata_arr,
+ fpc_arr_full=fpc_arr,
+ conley_lag_cutoff=1,
+ lonely_psu="remove",
+ )
+ assert var_computed
+
+ # Hand-compute expected (centered form)
+ S_psu_0 = np.array([[2.0, 1.0], [3.0, 2.0]]) # (T, k)
+ S_psu_1 = np.array([[-1.0, 0.0], [-2.0, 1.0]])
+ # Per-period stratum means
+ mean_t0 = (S_psu_0[0] + S_psu_1[0]) / 2 # [0.5, 0.5]
+ mean_t1 = (S_psu_0[1] + S_psu_1[1]) / 2 # [0.5, 1.5]
+ S0_centered = np.array([S_psu_0[0] - mean_t0, S_psu_0[1] - mean_t1])
+ S1_centered = np.array([S_psu_1[0] - mean_t0, S_psu_1[1] - mean_t1])
+ # K_serial at L=1: K[t,s] = (1 - |t-s|/2) for |t-s| in {1}; K[t,t] = 0
+ K_serial = np.array([[0.0, 0.5], [0.5, 0.0]])
+ meat_0 = S0_centered.T @ K_serial @ S0_centered
+ meat_1 = S1_centered.T @ K_serial @ S1_centered
+ # f_h = 0 (FPC=inf), n_h=2, scale = (1-0) * 2/(2-1) = 2
+ meat_expected_centered = 2.0 * (meat_0 + meat_1)
+ np.testing.assert_allclose(meat_centered, meat_expected_centered, rtol=1e-12, atol=1e-14)
+
+ # Verify the RAW form would be DIFFERENT (sanity check that centering matters)
+ meat_0_raw = S_psu_0.T @ K_serial @ S_psu_0
+ meat_1_raw = S_psu_1.T @ K_serial @ S_psu_1
+ meat_expected_raw = 2.0 * (meat_0_raw + meat_1_raw)
+ # Raw and centered should differ MATERIALLY on this fixture
+ assert not np.allclose(
+ meat_expected_centered, meat_expected_raw, rtol=1e-2
+ ), "Test fixture must have nonzero centering effect to anchor the asymmetry"
+
+ def test_c1_hand_computation_methodology_anchor_lag1(self):
+ """Hand-compute the serial Bartlett HAC at L=1 on a 4-PSU x 3-period
+ synthetic and assert implementation parity. Methodology anchor that
+ gets codified from the _scratch/wave_e2_followup_smoke.py.
+ """
+ from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat
+
+ rng = np.random.default_rng(0)
+ G, T, k = 4, 3, 2
+ psu_arr = np.repeat(np.arange(G), T)
+ time_arr = np.tile(np.arange(T), G)
+ strata_arr = np.repeat([0, 0, 1, 1], T)
+ fpc_arr = np.full(G * T, 20.0)
+ Psi = rng.standard_normal((G * T, k))
+
+ meat, var_computed, _ = _compute_stratified_serial_bartlett_meat(
+ Psi,
+ psu_arr=psu_arr,
+ time_arr=time_arr,
+ strata_arr_full=strata_arr,
+ fpc_arr_full=fpc_arr,
+ conley_lag_cutoff=1,
+ lonely_psu="remove",
+ )
+
+ # Hand-compute expected
+ S_psu = np.zeros((G, T, k))
+ for i in range(G * T):
+ S_psu[psu_arr[i], time_arr[i]] += Psi[i]
+ meat_expected = np.zeros((k, k))
+ for h in [0, 1]:
+ stratum_psus = np.where(strata_arr[::T] == h)[0]
+ n_h_panel = len(stratum_psus)
+ # Per-period within-stratum mean
+ S_h = S_psu[stratum_psus] # (n_h, T, k)
+ S_bar_h = S_h.mean(axis=0) # (T, k)
+ S_centered = S_h - S_bar_h[None, :, :]
+ # K_serial at L=1
+ K = np.array([[0.0, 0.5, 0.0], [0.5, 0.0, 0.5], [0.0, 0.5, 0.0]])
+ meat_h = np.zeros((k, k))
+ for g_local in range(n_h_panel):
+ S_g = S_centered[g_local]
+ meat_h += S_g.T @ K @ S_g
+ N_h = 20.0
+ fpc_scale = (1.0 - n_h_panel / N_h) * n_h_panel / (n_h_panel - 1)
+ meat_expected += fpc_scale * meat_h
+
+ np.testing.assert_allclose(meat, meat_expected, rtol=1e-12, atol=1e-14)
+ assert var_computed
+
+ def test_c2_hand_computation_methodology_anchor_lag2(self):
+ """L=2 exercises multiple kernel weights: K(1) = 2/3, K(2) = 1/3."""
+ from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat
+
+ rng = np.random.default_rng(1)
+ G, T, k = 4, 4, 2
+ psu_arr = np.repeat(np.arange(G), T)
+ time_arr = np.tile(np.arange(T), G)
+ strata_arr = np.repeat([0, 0, 1, 1], T)
+ fpc_arr = np.full(G * T, 30.0)
+ Psi = rng.standard_normal((G * T, k))
+
+ meat, _, _ = _compute_stratified_serial_bartlett_meat(
+ Psi,
+ psu_arr=psu_arr,
+ time_arr=time_arr,
+ strata_arr_full=strata_arr,
+ fpc_arr_full=fpc_arr,
+ conley_lag_cutoff=2,
+ lonely_psu="remove",
+ )
+
+ # Hand-compute
+ S_psu = np.zeros((G, T, k))
+ for i in range(G * T):
+ S_psu[psu_arr[i], time_arr[i]] += Psi[i]
+ meat_expected = np.zeros((k, k))
+ # K[t,s] at L=2: (1 - |t-s|/3) for |t-s| in {1, 2}
+ K = np.array(
+ [
+ [0.0, 2 / 3, 1 / 3, 0.0],
+ [2 / 3, 0.0, 2 / 3, 1 / 3],
+ [1 / 3, 2 / 3, 0.0, 2 / 3],
+ [0.0, 1 / 3, 2 / 3, 0.0],
+ ]
+ )
+ for h in [0, 1]:
+ stratum_psus = np.where(strata_arr[::T] == h)[0]
+ n_h_panel = len(stratum_psus)
+ S_h = S_psu[stratum_psus]
+ S_bar_h = S_h.mean(axis=0)
+ S_centered = S_h - S_bar_h[None, :, :]
+ meat_h = np.zeros((k, k))
+ for g_local in range(n_h_panel):
+ S_g = S_centered[g_local]
+ meat_h += S_g.T @ K @ S_g
+ N_h = 30.0
+ fpc_scale = (1.0 - n_h_panel / N_h) * n_h_panel / (n_h_panel - 1)
+ meat_expected += fpc_scale * meat_h
+
+ np.testing.assert_allclose(meat, meat_expected, rtol=1e-12, atol=1e-14)
+
+ def test_c3_serial_term_inflates_se_on_ar1_dgp(self):
+ """Behavioral (not invariant): on a panel with within-unit AR(1)
+ residuals (positive autocorrelation), SE at lag=1 should be at least
+ 5% larger than SE at lag=0. Without this test, a bug where the
+ serial term computes to near-zero (e.g. centered scores cancel)
+ would pass all reduction/parity tests.
+
+ Pins MEDIUM #7 from plan-review: invariant-only coverage is
+ susceptible to silent zero bugs.
+ """
+ # Generate AR(1) panel via deterministic seed
+ rng = np.random.default_rng(101)
+ n_units, T = 16, 8
+ rho = 0.7
+ unit_ids = np.repeat(np.arange(n_units), T)
+ times = np.tile(np.arange(T), n_units)
+ # AR(1) within-unit residuals
+ residuals = np.zeros(n_units * T)
+ for u in range(n_units):
+ e = rng.standard_normal(T)
+ r = np.zeros(T)
+ r[0] = e[0]
+ for t in range(1, T):
+ r[t] = rho * r[t - 1] + np.sqrt(1 - rho**2) * e[t]
+ residuals[u * T : (u + 1) * T] = r
+
+ # PSU/stratum assignment
+ psu_per_unit = unit_ids // 4 # 4 PSUs, 4 units each
+ strata_per_unit = psu_per_unit // 2 # 2 strata, 2 PSUs each
+ # Spatial coords: UNIT-CONSTANT (SpilloverDiD requires constant coords
+ # within each unit across periods). PSU 0/1 (treated) clustered near
+ # lat 40; PSU 2/3 (never-treated) FAR (~1000+ km away near lat 50)
+ # so they stay in Omega_0 (untreated AND unexposed) at every period.
+ # Without the geographic separation, near-control units within
+ # ring_max would be flagged as S_it=1 and the time-FE identification
+ # would fail at treated periods.
+ psu_lat_centers = np.array([40.0, 40.1, 50.0, 50.1])
+ psu_lon_centers = np.array([-120.0, -120.1, -120.0, -120.1])
+ unit_lat_offset = rng.normal(0, 0.005, size=n_units)
+ unit_lon_offset = rng.normal(0, 0.005, size=n_units)
+ # unit_to_coords map: unit u gets PSU centroid + unit-level offset
+ lat = psu_lat_centers[psu_per_unit] + unit_lat_offset[unit_ids]
+ lon = psu_lon_centers[psu_per_unit] + unit_lon_offset[unit_ids]
+ # PSU 0/1 treated at t=3; PSU 2/3 never treated
+ first_treat = np.where(psu_per_unit < 2, 3, 2**31 - 1)
+ # Outcome: pure noise (treatment effect = 0)
+ y = residuals + np.where((psu_per_unit < 2) & (times >= 3), 0.0, 0.0)
+
+ df = pd.DataFrame(
+ {
+ "unit": unit_ids,
+ "time": times,
+ "first_treat": first_treat,
+ "y": y,
+ "psu": psu_per_unit,
+ "stratum": strata_per_unit,
+ "lat": lat,
+ "lon": lon,
+ "w": 1.0,
+ "N_h": 100.0,
+ }
+ )
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res_lag0 = self._fit(df, lag_cutoff=0, design=design)
+ res_lag1 = self._fit(df, lag_cutoff=1, design=design)
+ # AR(1) with rho=0.7 should produce material serial inflation. The 5%
+ # threshold is loose enough to tolerate bootstrap/numerical jitter
+ # but tight enough to catch a near-zero serial term.
+ inflation = (res_lag1.se / res_lag0.se) - 1.0
+ assert inflation > 0.05, (
+ f"AR(1) DGP serial inflation {inflation:.4f} should exceed 5%; "
+ f"lag0_se={res_lag0.se:.6f}, lag1_se={res_lag1.se:.6f}"
+ )
+
+ def test_d_single_stratum_lag1_finite_and_finite_se(self):
+ """Single stratum (H=1) + lag=1 fit produces finite output. (Strict
+ bit-equivalence to no-survey panel-block conley requires careful
+ Hajek-weight + bread alignment; the simpler invariant tested here is
+ that the new path produces well-defined output on a single-stratum
+ survey design.)
+ """
+ df = generate_butts_nonstaggered_dgp(seed=5)
+ df_s = _augment_with_survey(df, n_strata=1, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res = self._fit(df_s, lag_cutoff=1, design=design)
+ assert np.isfinite(res.att)
+ assert np.isfinite(res.se) and res.se > 0
+
+ def test_e_cross_stratum_independence_with_serial(self):
+ """Pure unit test on the serial helper: full meat must equal
+ partition-by-stratum-then-sum-of-each-stratum's-meat. Pins that
+ cross-stratum contributions are exactly zero in the serial term
+ (each stratum's serial sum is independent because it only iterates
+ within-stratum PSUs).
+ """
+ from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat
+
+ rng = np.random.default_rng(7)
+ G, T, k = 6, 3, 2
+ psu_arr = np.repeat(np.arange(G), T)
+ time_arr = np.tile(np.arange(T), G)
+ strata_arr = np.repeat([0, 0, 0, 1, 1, 1], T)
+ fpc_arr = np.full(G * T, 20.0)
+ Psi = rng.standard_normal((G * T, k))
+
+ meat_full, _, _ = _compute_stratified_serial_bartlett_meat(
+ Psi,
+ psu_arr=psu_arr,
+ time_arr=time_arr,
+ strata_arr_full=strata_arr,
+ fpc_arr_full=fpc_arr,
+ conley_lag_cutoff=1,
+ lonely_psu="remove",
+ )
+ # Partition + sum
+ meat_partitioned = np.zeros((k, k))
+ for h in [0, 1]:
+ mask = strata_arr == h
+ psu_arr_h = psu_arr[mask]
+ sub_strata = np.zeros(mask.sum(), dtype=int)
+ meat_h, _, _ = _compute_stratified_serial_bartlett_meat(
+ Psi[mask],
+ psu_arr=psu_arr_h,
+ time_arr=time_arr[mask],
+ strata_arr_full=sub_strata,
+ fpc_arr_full=fpc_arr[mask],
+ conley_lag_cutoff=1,
+ lonely_psu="remove",
+ )
+ meat_partitioned += meat_h
+ np.testing.assert_allclose(meat_full, meat_partitioned, rtol=1e-12, atol=1e-14)
+
+ def test_f_singleton_adjust_lag1_no_divide_by_zero(self):
+ """Pure unit test: singleton stratum + lonely_psu="adjust" + lag=1
+ produces finite output (the panel-wide n_h_panel = 1 FPC must be
+ SKIPPED via the `continue` mirror; otherwise the helper would
+ divide-by-zero). Pins the singleton-adjust panel-wide mean asymmetry.
+ """
+ from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat
+
+ rng = np.random.default_rng(11)
+ # 1 PSU in stratum 0, 2 PSUs in stratum 1; lonely_psu="adjust" exercises
+ # the singleton branch for stratum 0.
+ G, T, k = 3, 3, 2
+ psu_arr = np.repeat([0, 1, 2], T)
+ time_arr = np.tile(np.arange(T), G)
+ strata_arr = np.repeat([0, 1, 1], T)
+ fpc_arr = np.full(G * T, 10.0)
+ Psi = rng.standard_normal((G * T, k))
+
+ meat, var_computed, _ = _compute_stratified_serial_bartlett_meat(
+ Psi,
+ psu_arr=psu_arr,
+ time_arr=time_arr,
+ strata_arr_full=strata_arr,
+ fpc_arr_full=fpc_arr,
+ conley_lag_cutoff=1,
+ lonely_psu="adjust",
+ )
+ assert np.all(np.isfinite(meat)), "singleton-adjust path divided by zero"
+ assert var_computed
+
+ def test_f2_all_singleton_remove_lag1_returns_zero(self):
+ """Pure unit test: all strata singleton + lonely_psu="remove" + lag=1
+ returns zero meat and variance_computed=False. (At the orchestrator
+ level the spatial term would also fail; the orchestrator combines
+ both terms' var_computed flags before NaN-failing.)
+ """
+ from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat
+
+ rng = np.random.default_rng(13)
+ G, T, k = 3, 3, 2
+ psu_arr = np.repeat([0, 1, 2], T)
+ time_arr = np.tile(np.arange(T), G)
+ strata_arr = np.repeat([0, 1, 2], T) # 1 PSU each — all singleton
+ fpc_arr = np.full(G * T, 10.0)
+ Psi = rng.standard_normal((G * T, k))
+
+ meat, var_computed, _ = _compute_stratified_serial_bartlett_meat(
+ Psi,
+ psu_arr=psu_arr,
+ time_arr=time_arr,
+ strata_arr_full=strata_arr,
+ fpc_arr_full=fpc_arr,
+ conley_lag_cutoff=1,
+ lonely_psu="remove",
+ )
+ np.testing.assert_array_equal(meat, np.zeros((k, k)))
+ assert not var_computed
+
+ def test_g_unbalanced_panel_panel_wide_dense_codes(self):
+ """Unit test: panel-wide dense time codes for lag math (matches
+ `conley.py:940` R deviation). Drop PSU 0 from period t=2 in a
+ 4-PSU × 4-period synthetic. PSU 0's observed periods become {0, 1, 3};
+ with L=1, only the (t=0, t=1) pair contributes (pair (t=1, t=3) at
+ panel-wide lag |1-3|=2 > 1; pair (t=0, t=3) at lag 3 > 1). Hand-compute
+ the expected serial contribution from PSU 0.
+
+ Pins MEDIUM #1 from re-review: the lag convention is PANEL-WIDE dense
+ codes (NOT per-PSU positional encoding).
+ """
+ from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat
+
+ rng = np.random.default_rng(17)
+ G, T, k = 4, 4, 2
+ # Build obs list with PSU 0 missing period 2
+ psu_obs_list, time_obs_list, strata_obs_list, fpc_obs_list = [], [], [], []
+ for g in range(G):
+ for t in range(T):
+ if g == 0 and t == 2:
+ continue # PSU 0 missing period 2
+ psu_obs_list.append(g)
+ time_obs_list.append(t)
+ strata_obs_list.append(g // 2)
+ fpc_obs_list.append(20.0)
+ psu_arr = np.array(psu_obs_list)
+ time_arr = np.array(time_obs_list)
+ strata_arr = np.array(strata_obs_list)
+ fpc_arr = np.array(fpc_obs_list, dtype=np.float64)
+ Psi = rng.standard_normal((len(psu_arr), k))
+
+ meat, _, _ = _compute_stratified_serial_bartlett_meat(
+ Psi,
+ psu_arr=psu_arr,
+ time_arr=time_arr,
+ strata_arr_full=strata_arr,
+ fpc_arr_full=fpc_arr,
+ conley_lag_cutoff=1,
+ lonely_psu="remove",
+ )
+
+ # Verify PSU 0's serial pairs use panel-wide lag codes {0,1,3} -> lag={1,2,3}
+ # not per-PSU positional {0,1,2} -> lag={1,1,2}.
+ # If the implementation incorrectly used per-PSU positional, PSU 0
+ # would contribute the (t=1, t=3) pair at lag=1 (positional), not lag=2.
+ # We verify by computing expected meat under both conventions and
+ # asserting which the implementation matches.
+
+ # Panel-wide convention (correct): PSU 0 only contributes (t=0, t=1) at lag=1.
+ # Per-PSU positional convention (incorrect): would also contribute
+ # (t=1, t=3) at positional lag=1 (positions 1 and 2 in {0, 1, 3}).
+ # The contributions differ; verify implementation matches panel-wide.
+
+ # Build S_psu per (g, t) with NaN at missing cells, then sum into meat
+ # using panel-wide t codes.
+ S_psu = np.full((G, T, k), np.nan)
+ for i, (g, t) in enumerate(zip(psu_arr, time_arr)):
+ if np.isnan(S_psu[g, t, 0]):
+ S_psu[g, t] = Psi[i]
+ else:
+ S_psu[g, t] += Psi[i]
+ # Per-period within-stratum centering. Initialize to ZEROS (NOT raw
+ # S_psu) so any (g, t) cell with < 2 active PSUs in stratum at period
+ # contributes zero to the serial sum — matches the helper's
+ # codex-R1-P1 fix preventing raw-score leakage into the serial
+ # Bartlett covariance.
+ S_centered = np.zeros_like(S_psu)
+ # Replace NaN with 0 in the base (won't matter — zero-init prevents
+ # NaN propagation from the helper's initial copy of S_psu).
+ for t in range(T):
+ for h in [0, 1]:
+ stratum_psus = [g for g in range(G) if g // 2 == h]
+ active_psus = [g for g in stratum_psus if not np.isnan(S_psu[g, t, 0])]
+ if len(active_psus) < 2:
+ continue # singleton-active-period: leave S_centered as zero
+ stratum_mean = np.mean([S_psu[g, t] for g in active_psus], axis=0)
+ for g in active_psus:
+ S_centered[g, t] = S_psu[g, t] - stratum_mean
+ # Per-PSU serial accumulation
+ meat_expected = np.zeros((k, k))
+ for h in [0, 1]:
+ stratum_psus = [g for g in range(G) if g // 2 == h]
+ n_h_panel = len(stratum_psus)
+ if n_h_panel < 2:
+ continue
+ meat_h = np.zeros((k, k))
+ for g in stratum_psus:
+ present_g = ~np.isnan(S_centered[g, :, 0])
+ t_g = np.arange(T)[present_g].astype(np.float64)
+ if len(t_g) < 2:
+ continue
+ lag_mat = np.abs(t_g[:, None] - t_g[None, :])
+ K_g = ((lag_mat <= 1) & (lag_mat != 0)).astype(np.float64) * (1.0 - lag_mat / 2.0)
+ S_g = S_centered[g, present_g]
+ meat_h += S_g.T @ K_g @ S_g
+ N_h = 20.0
+ fpc_scale = (1.0 - n_h_panel / N_h) * n_h_panel / (n_h_panel - 1)
+ meat_expected += fpc_scale * meat_h
+
+ np.testing.assert_allclose(meat, meat_expected, rtol=1e-12, atol=1e-14)
+
+ def test_g2_lag_greater_than_T_minus_1_finite(self):
+ """L > T-1 is well-defined (no kernel overflow); fit succeeds with
+ finite output. Pinned per plan: lag=T and lag=T+5 should not crash.
+ """
+ df = generate_butts_nonstaggered_dgp(seed=19) # T=2 panel
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ T_max = df["time"].nunique()
+ for L in [T_max, T_max + 5]:
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res = self._fit(df_s, lag_cutoff=L, design=design)
+ assert np.isfinite(res.att), f"ATT NaN at L={L}"
+ assert np.isfinite(res.se), f"SE NaN at L={L}"
+
+ def test_h_singleton_active_period_centering_zeros(self):
+ """Pure unit test: when a stratum-period has < 2 active PSUs, the
+ per-period centering must zero out S_centered (not leave raw scores).
+ Pins codex R1 P1 fix — leaving raw scores in singleton-active-period
+ cells would feed uncentered values into the serial Bartlett sum and
+ contaminate the covariance.
+
+ Construct a 4-PSU x 3-period panel where one stratum has both PSUs
+ present at t=0, t=2 but only ONE PSU present at t=1 (simulating
+ finite_mask-style sparsity). With the bug, the singleton-active-period
+ cell at t=1 leaks RAW scores into the serial sum across (t=0, t=1)
+ and (t=1, t=2) pairs; with the fix, those legs contribute zero.
+ """
+ from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat
+
+ rng = np.random.default_rng(37)
+ # 4 PSUs: stratum 0 has PSUs (0, 1), stratum 1 has PSUs (2, 3).
+ # PSU 1 missing at t=1 -> stratum 0 has only PSU 0 active at t=1.
+ # Stratum 1 has both PSUs at all periods.
+ obs = []
+ for g in range(4):
+ for t in range(3):
+ if g == 1 and t == 1:
+ continue # PSU 1 absent at t=1 (singleton-active-period for stratum 0)
+ obs.append((g, t, g // 2))
+ psu_arr = np.array([o[0] for o in obs])
+ time_arr = np.array([o[1] for o in obs])
+ strata_arr = np.array([o[2] for o in obs])
+ fpc_arr = np.full(len(obs), 20.0)
+ # Choose Psi values so PSU 0 has materially nonzero score at t=1
+ # (the singleton-active-period cell). Without the fix, this raw
+ # value would leak into the serial cross-product with (PSU 0, t=0)
+ # and (PSU 0, t=2); with the fix, the t=1 leg of PSU 0's serial sum
+ # contributes zero.
+ Psi = rng.standard_normal((len(obs), 2))
+ # Override PSU 0's t=1 score to a known large value
+ for i, (g, t, _) in enumerate(obs):
+ if g == 0 and t == 1:
+ Psi[i] = np.array([10.0, 10.0])
+
+ meat, _, _ = _compute_stratified_serial_bartlett_meat(
+ Psi,
+ psu_arr=psu_arr,
+ time_arr=time_arr,
+ strata_arr_full=strata_arr,
+ fpc_arr_full=fpc_arr,
+ conley_lag_cutoff=1,
+ lonely_psu="remove",
+ )
+ # Hand-compute expected: PSU 0's t=1 leg has S_centered = 0
+ # (singleton-active in stratum 0); so PSU 0's serial sum only
+ # contributes from valid cross-period pairs where both legs have
+ # >= 2 active PSUs in stratum 0.
+ # At t=0, t=2: stratum 0 has both PSUs (0, 1) active. So
+ # S_centered_0_t0 = Psi_0_t0 - (Psi_0_t0 + Psi_1_t0)/2
+ # S_centered_0_t2 = Psi_0_t2 - (Psi_0_t2 + Psi_1_t2)/2
+ # Cross-period pair (t=0, t=2) at lag=2 contributes 0 to L=1 sum.
+ # Cross-period pairs (t=0, t=1) and (t=1, t=2) at lag=1 should
+ # contribute zero from PSU 0's leg (since S_centered at t=1 is zero).
+ # So PSU 0 contributes nothing to the serial sum at L=1.
+ # PSU 1: only present at t=0, t=2; lag=2 > L=1 so no contribution.
+ # Total contribution from stratum 0 = 0.
+ # PSU 2, PSU 3 (stratum 1) contribute normally at L=1.
+ # Compute stratum 1's expected contribution
+ psi_2_t0 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 2 and t == 0][0]]
+ psi_2_t1 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 2 and t == 1][0]]
+ psi_2_t2 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 2 and t == 2][0]]
+ psi_3_t0 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 3 and t == 0][0]]
+ psi_3_t1 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 3 and t == 1][0]]
+ psi_3_t2 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 3 and t == 2][0]]
+ # Stratum 1 mean at each t
+ mean_t0 = (psi_2_t0 + psi_3_t0) / 2
+ mean_t1 = (psi_2_t1 + psi_3_t1) / 2
+ mean_t2 = (psi_2_t2 + psi_3_t2) / 2
+ # Centered PSU 2
+ s2 = np.array([psi_2_t0 - mean_t0, psi_2_t1 - mean_t1, psi_2_t2 - mean_t2])
+ s3 = np.array([psi_3_t0 - mean_t0, psi_3_t1 - mean_t1, psi_3_t2 - mean_t2])
+ K_serial = np.array([[0.0, 0.5, 0.0], [0.5, 0.0, 0.5], [0.0, 0.5, 0.0]])
+ meat_s1 = s2.T @ K_serial @ s2 + s3.T @ K_serial @ s3
+ # FPC scale: panel-wide n_h_panel = 2, N_h = 20, f = 0.1, scale = 0.9 * 2/1 = 1.8
+ meat_expected = 1.8 * meat_s1
+
+ np.testing.assert_allclose(meat, meat_expected, rtol=1e-12, atol=1e-14)
+
+ def test_n_no_psu_survey_lag_positive_raises(self):
+ """No-PSU survey design + conley_lag_cutoff > 0 raises
+ NotImplementedError upfront (codex R1 P0 fix). Pseudo-PSU = obs-index
+ fallback would silently zero the serial sum; the gate prevents the
+ silent zero by failing closed at SpilloverDiD.fit.
+ """
+ df = generate_butts_nonstaggered_dgp(seed=41)
+ # No PSU column; just weights + strata.
+ df_s = df.copy()
+ df_s["w"] = 1.0
+ df_s["stratum"] = 0 # single stratum
+ design = SurveyDesign(weights="w", strata="stratum") # no psu, no fpc
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=1, # the problematic combination
+ vcov_type="conley",
+ )
+ with pytest.raises(NotImplementedError, match="no-effective-PSU survey_design"):
+ est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+
+ def test_n3_cluster_injects_psu_under_no_psu_survey_lag_positive(self):
+ """Positive test (R2 P1 fix): `cluster=` injected as PSU under
+ a no-PSU survey design (weights + strata + fpc) must allow lag>0
+ survey-Conley fitting via `_inject_cluster_as_psu`. ATT AND SE must
+ equal the equivalent fit with explicit `survey_design.psu="psu"`
+ — the FPC scaling is the SAME on both paths (Wave E.1 `_inject_cluster_as_psu`
+ carries the FPC through), so SE parity is the load-bearing pin.
+
+ Pins the documented Wave E.1 surface that `cluster=` becomes
+ the effective PSU when survey PSU is absent; the codex-R2 P1 fix
+ moved the no-effective-PSU gate to AFTER injection so this
+ documented surface continues to work for lag>0. R3 P3 fix: original
+ test omitted fpc on the no-PSU design and only asserted ATT
+ equality + finite SE; this version includes fpc on both paths and
+ asserts ATT AND scalar SE tight numerical parity (`assert_allclose`
+ at rtol=1e-12, atol=1e-14) so a future variance regression on the
+ cluster-injected surface fails. The full meat matrix is not
+ asserted (SE is a projection of meat that could in principle
+ coincide while off-diagonals differ); scalar-SE parity is the
+ load-bearing user-visible pin.
+ """
+ df = generate_butts_nonstaggered_dgp(seed=53)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ # Reference fit: explicit survey_design.psu="psu"
+ design_explicit = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res_explicit = self._fit(df_s, lag_cutoff=1, design=design_explicit)
+ # Cluster-injected fit: SurveyDesign(weights, strata, fpc) + cluster="psu"
+ # (fpc included so SE comparison is apples-to-apples)
+ design_no_psu = SurveyDesign(weights="w", strata="stratum", fpc="N_h")
+ est_cluster = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=1,
+ vcov_type="conley",
+ cluster="psu", # injected as effective PSU
+ )
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res_cluster = est_cluster.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design_no_psu,
+ )
+ # ATT bit-identical (variance-method-invariant) AND SE bit-identical
+ # (cluster injection carries fpc through; both paths see identical
+ # effective PSU labels + FPC scaling for the panel-block sandwich).
+ np.testing.assert_allclose(res_cluster.att, res_explicit.att, rtol=1e-12, atol=1e-14)
+ np.testing.assert_allclose(res_cluster.se, res_explicit.se, rtol=1e-12, atol=1e-14)
+ assert np.isfinite(res_cluster.se) and res_cluster.se > 0
+
+ def test_n2_weights_only_survey_lag_positive_raises(self):
+ """Weights-only SurveyDesign + conley_lag_cutoff > 0 also raises
+ (covers the weights-only no-PSU sub-case)."""
+ df = generate_butts_nonstaggered_dgp(seed=43)
+ df_s = df.copy()
+ df_s["w"] = 1.0
+ design = SurveyDesign(weights="w") # weights only, no psu/strata/fpc
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=2,
+ vcov_type="conley",
+ )
+ with pytest.raises(NotImplementedError, match="no-effective-PSU survey_design"):
+ est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+
+ def test_j_no_survey_panel_block_unchanged_with_lag(self):
+ """The no-survey Wave D panel-block conley path at `conley.py:920-965`
+ must still work after the orchestrator gate-relaxation. The dispatch
+ in `_compute_gmm_corrected_meat` only ROUTES the survey branch
+ through the new orchestrator; the `resolved_survey is None` branch is
+ bit-identical to Wave D pre-PR.
+ """
+ df = generate_butts_nonstaggered_dgp(seed=23)
+ # No survey design; just conley + lag=1 (already supported pre-PR via Wave A/D)
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=1000.0,
+ conley_lag_cutoff=1,
+ vcov_type="conley",
+ )
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat")
+ assert np.isfinite(res.att)
+ assert np.isfinite(res.se) and res.se > 0
+
+ def test_k_replicate_weights_rejection_inherits_wave_e1(self):
+ """Replicate-weight + conley + lag>0 + survey raises NotImplementedError.
+ Replicate-weight SurveyDesigns are by construction no-PSU (replicate
+ weights can't combine with strata/psu/fpc), so the codex-R1-P0
+ no-PSU lag-gate at SpilloverDiD.fit fires before the Wave E.1
+ replicate-weight gate — either error is informative. We assert the
+ broader rejection contract: ANY of the gates fires.
+ """
+ df = generate_butts_nonstaggered_dgp(seed=29)
+ df_s = df.copy()
+ df_s["w"] = 1.0
+ # Build replicate-weight columns (unit-constant)
+ units_sorted = sorted(df_s["unit"].unique())
+ rep_cols = []
+ rng = np.random.default_rng(0)
+ for r in range(5):
+ col = f"rep_{r}"
+ w_by_unit = {u: rng.uniform(0.5, 1.5) for u in units_sorted}
+ df_s[col] = df_s["unit"].map(w_by_unit)
+ rep_cols.append(col)
+ design = SurveyDesign(weights="w", replicate_weights=rep_cols, replicate_method="JK1")
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=1000.0,
+ conley_lag_cutoff=1,
+ vcov_type="conley",
+ )
+ with pytest.raises(NotImplementedError, match="(?:replicate|no-effective-PSU)"):
+ est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+
+ def test_m_fit_idempotency_lag1(self):
+ """Repeated fit with lag=1 produces bit-identical results (per
+ `feedback_fit_does_not_mutate_config`).
+ """
+ df = generate_butts_nonstaggered_dgp(seed=31)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=1000.0,
+ conley_lag_cutoff=1,
+ vcov_type="conley",
+ )
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res1 = est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ # Re-fit on a fresh estimator instance with same params
+ est2 = SpilloverDiD(**est.get_params())
+ res2 = est2.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ np.testing.assert_array_equal(res1.att, res2.att)
+ np.testing.assert_array_equal(res1.se, res2.se)
+
+ def test_r_drift_goldens_lag1(self):
+ """ATT is variance-method-invariant: lag=1 ATT MUST match lag=0 ATT
+ bit-identically on the same data + estimator config (only vcov_type
+ / lag changes the meat, never the point estimate). SE MUST differ
+ (serial Bartlett term contributes nonzero variance under panel data
+ with within-PSU temporal correlation). Catches any case where the
+ serial helper accidentally feeds back into the point estimate path.
+ """
+ df = generate_butts_nonstaggered_dgp(seed=0)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res_lag0 = self._fit(df_s, lag_cutoff=0, design=design)
+ res_lag1 = self._fit(df_s, lag_cutoff=1, design=design)
+ # Both finite + positive
+ assert np.isfinite(res_lag0.att) and np.isfinite(res_lag0.se) and res_lag0.se > 0
+ assert np.isfinite(res_lag1.att) and np.isfinite(res_lag1.se) and res_lag1.se > 0
+ # ATT bit-identical between lag=0 and lag=1 (variance-method-invariant)
+ np.testing.assert_array_equal(
+ res_lag1.att,
+ res_lag0.att,
+ err_msg="ATT must be variance-method-invariant; lag=1 ATT must match lag=0 ATT exactly",
+ )
+ # SE differs (serial Bartlett term contributes nonzero variance)
+ assert not np.allclose(res_lag1.se, res_lag0.se, rtol=1e-10), (
+ f"lag=1 SE ({res_lag1.se}) must differ from lag=0 SE ({res_lag0.se}); "
+ "serial Bartlett term should add nonzero variance contribution. "
+ "Note: on a 2-period Butts DGP only L=1 pairs exist (only adjacent "
+ "periods), so the inflation may be small but should be detectable."
+ )
+
+
+class TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoffEventStudy:
+ """Wave E.2 follow-up event-study mirror: conley + survey + lag>0 on both
+ `is_staggered` branches (per `feedback_cohort_loop_trigger_cache_both_branches`).
+ """
+
+ _CUTOFF_KM = 1000.0
+
+ def test_o_event_study_conley_lag1_survey_is_staggered_true(self):
+ """Full plumbing end-to-end on the staggered event-study path with
+ `conley_lag_cutoff=1` survey."""
+ df = generate_butts_staggered_dgp(seed=24)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=1, # NEW: panel-block composition
+ vcov_type="conley",
+ event_study=True,
+ horizon_max=2,
+ )
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res = est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ assert np.isfinite(res.att) and np.isfinite(res.se) and res.se > 0
+ assert res.spillover_effects is not None
+ assert res.survey_metadata is not None
+ # df_survey on this fixture matches Wave E.2 event-study test_o
+ assert res.survey_metadata.df_survey == 6
+
+ def test_p_event_study_conley_lag1_survey_is_staggered_false(self):
+ """The non-staggered branch of the event-study path also works with
+ lag>0 (per `feedback_cohort_loop_trigger_cache_both_branches`)."""
+ df = generate_butts_nonstaggered_dgp(seed=25)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ est = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=1, # NEW: panel-block composition
+ vcov_type="conley",
+ event_study=True,
+ horizon_max=1,
+ )
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res = est.fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ assert np.isfinite(res.att) and np.isfinite(res.se) and res.se > 0
+ assert res.survey_metadata is not None
+
+ def test_r_event_study_drift_goldens_lag1_vs_lag0(self):
+ """Event-study + survey + lag=1: ATT bit-identical to lag=0 (variance-
+ method-invariant); SE differs (serial Bartlett contributes).
+ """
+ df = generate_butts_staggered_dgp(seed=24)
+ df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0)
+ design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h")
+ with warnings.catch_warnings():
+ warnings.simplefilter("ignore", UserWarning)
+ res_lag0 = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=0,
+ vcov_type="conley",
+ event_study=True,
+ horizon_max=2,
+ ).fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ res_lag1 = SpilloverDiD(
+ rings=[0.0, 100.0],
+ conley_coords=("lat", "lon"),
+ conley_metric="haversine",
+ conley_cutoff_km=self._CUTOFF_KM,
+ conley_lag_cutoff=1,
+ vcov_type="conley",
+ event_study=True,
+ horizon_max=2,
+ ).fit(
+ df_s,
+ outcome="y",
+ unit="unit",
+ time="time",
+ first_treat="first_treat",
+ survey_design=design,
+ )
+ # ATT bit-identical (point estimate variance-method-invariant)
+ np.testing.assert_array_equal(res_lag1.att, res_lag0.att)
+ # SE differs (serial term contributes nonzero variance under panel
+ # data with multiple periods of within-PSU score correlation)
+ assert not np.allclose(res_lag1.se, res_lag0.se, rtol=1e-10)
+
+
class TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy:
"""Event-study branch + conley + survey, both is_staggered branches."""