diff --git a/CHANGELOG.md b/CHANGELOG.md index 6bac2cce..efa987f5 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **ContinuousDiD methodology-review-tracker promotion.** Tracker row flipped **In Progress** → **Complete** with full Verified Components / Test Coverage / Corrections Made / Deviations / Outstanding Concerns structure mirroring the HAD precedent (PR #473). REGISTRY `## ContinuousDiD` gains a formal Deviations block consolidating the boundary-knots deviation from R `contdid` v0.1.0 (`range(dose)` vs `range(dvals)` — library avoids extrapolation), the `bspline_derivative` derivative-failure `UserWarning` (Phase 2 axis-C #12), the `+inf` → `0` never-treated recoding warning, and the zero-`first_treat`+nonzero-`dose` force-zeroing warning (both axis-E silent-coercion fixes) into a single AI-review-recognized labeled surface. R cross-language coverage for ContinuousDiD runs at relative tolerance across two surfaces: (a) **scalar parity with raw R `cont_did` / `pte_default`** at 1% on overall ATT for all 6 benchmarks and on overall ACRT for benchmarks 4-5 (benchmark 6 is event-study, scalar `overall_att` only); (b) **harmonized boundary-knot-normalized curve parity** with R-side ATT(d) / ACRT(d) reconstructed under `Boundary.knots = range(treated_doses)` (matching the library) on benchmarks 1-3 via the benchmark harness — `_run_r_contdid` does the R-side rebuild at `tests/test_methodology_continuous_did.py:333-367`, and `_compare_with_r` orchestrates the Python-vs-R comparison at `:395-459` — max ATT(d) at 1% and max ACRT(d) at 2%. NOT bit-exact (`atol=1e-8`) like HAD — the boundary-knots deviation precludes algorithmic bit-equality on aggregated dose-response curves. Surface (a) is direct raw-package parity; surface (b) is reconstructed-basis parity because raw `contdid` curves use `range(dvals)`. No source code changes, no new tests, no new docstrings — consolidation only against the existing 15 methodology tests (`tests/test_methodology_continuous_did.py`), 80 unit tests (`tests/test_continuous_did.py`), and `docs/methodology/continuous-did.md` theory note. `METHODOLOGY_REVIEW.md` ContinuousDiD row promoted **In Progress** → **Complete**. - **`SpilloverDiD(vcov_type="conley", survey_design=...)` integration via stratified-Conley sandwich on PSU totals (Wave E.2).** Lifts the Wave E.1 `NotImplementedError` (`spillover.py:2201` upfront, `two_stage.py:217` helper-level) and adds spatial-HAC + design-based variance for the previously deferred composition. **Documented synthesis** of Conley (1999) spatial-HAC × Gerber (2026, arXiv:2605.04124) Proposition 1 Binder TSL (the Wave E.1 foundation) × Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. No reference software combines all three ingredients on a two-stage influence function. **Mechanical composition (panel-aware):** preserves the library's existing `conley_lag_cutoff = 0` semantic at `diff_diff.conley._compute_conley_meat` ("within-period spatial only — exclude cross-period spatial pairs") by looping over periods. For each period `t`, SpilloverDiD's per-obs Hájek-weighted Wave D IF `psi_i` is aggregated to per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i` (via `np.add.at`); per-PSU spatial centroids are panel-constant (mean of per-observation `conley_coords` within each PSU, vectorized `np.add.at` sums / `np.bincount` counts); for each stratum the within-stratum sandwich is `M_h_t = (1 - f_h) * n_h/(n_h-1) * sum_{j,k in PSUs_h} K(d(centroid_j, centroid_k) / conley_cutoff_km) * (S_psu_t[j] - S_bar_h_t)(S_psu_t[k] - S_bar_h_t)'`, where K is the Bartlett kernel (SpilloverDiD currently exposes Bartlett only and hardcodes it; the survey helper accepts `"uniform"` too but exposing that on the SpilloverDiD constructor is a separate follow-up) and `d` is haversine / euclidean / callable per `ConleyMetric`. Cross-stratum kernel weights are exactly zero by sampling design (strata are independence partitions). Total meat is `sum_t sum_h M_h_t`. Cross-period spatial pairs are excluded by construction — the per-period loop matches the library's panel Conley contract exactly. **Reduction semantics (load-bearing for tests):** the orchestrator's panel-aware meat equals `sum_t` of per-period within-stratum stratified-Conley sandwiches on per-period PSU totals (pinned at `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`); single stratum (H = 1, FPC = inf) reduces to `sum_t` plain Conley sandwich on per-period PSU totals (NOT on time-collapsed totals). **Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing `_compute_stratified_meat_from_psu_scores` 3-tuple `(meat, variance_computed, legitimate_zero_count)` contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(scores=centered, coords=psu_coords_h, ...)` in cross-sectional mode); new dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to existing `_compute_binder_tsl_meat`, performs per-obs Psi → PSU aggregation + centroid derivation + dispatch to survey helper, intentionally drops `cluster_ids` at the dispatch boundary — see Restrictions). `_compute_gmm_corrected_meat` conley branch extended with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D. **Singleton-stratum `lonely_psu="adjust"` parity:** the survey helper mirrors the Binder helper's `continue` to skip the FPC scale on singleton strata (with `n_h = 1` the scale `n_h / (n_h - 1)` would divide by zero); the degenerate one-PSU kernel `K = [[K(0)]] = [[1.0]]` reduces to `centered.T @ centered`, matching Binder's singleton-adjust output. **Saturated `df_survey = 0` NaN-fail:** mirrors Wave E.1 (`_compute_stratified_conley_meat` returns NaN meat with `UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..." so callers can `pytest.warns(UserWarning, match="Wave E.2 stratified-Conley")`). **Public surface restrictions:** replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate; per-replicate full refit is separate follow-up scope); `cluster= + survey_design.psu + vcov_type="conley"` coerces `cluster=` to PSU per Wave E.1's warn-and-use-PSU pattern (the Conley cluster product kernel becomes a no-op after PSU aggregation, so `cluster_ids` is intentionally not threaded into the inner Conley kernel call — every PSU is its own cluster post-aggregation, which would zero all cross-PSU pairs); LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley "Phase 5" roadmap, not Wave E); DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", survey_design=)` requires `_APPLICABILITY` / `_PT_METHOD` registration (separate Wave F PR). **Tests:** new `TestSpilloverDiDWaveE2ConleySurveyDesign` and `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` classes in `tests/test_spillover.py` (bit-identical no-survey fallback; panel-aware per-period sum invariant on the orchestrator + helper composition; hand-computation methodology anchor; single-stratum ≡ plain Conley on PSU totals; cross-stratum independence as a unit test on the survey helper with interleaved cross-stratum centroids; Binder vs Conley singleton-adjust FPC skip parity; lonely-PSU sensitivity across three modes; FPC large ≡ no-FPC and FPC = n_h zeros stratum; saturated NaN-fail with `pytest.warns(match="Wave E.2 stratified-Conley")`; replicate-weight + non-pweight rejections; cluster warn-and-use-PSU; fit idempotency; `finite_mask` survey-array subsetting; no-PSU coverage — weights-only `SurveyDesign(weights=...)`, strata-only `SurveyDesign(weights=..., strata=...)`, and a per-period re-index unit invariant pinning that no cross-period spatial pairs leak into the meat on implicit-PSU layouts; event-study path on both `is_staggered=True`/`False` branches per `feedback_cohort_loop_trigger_cache_both_branches`; drift goldens at `rtol=1e-12 / atol=1e-14`). The pre-existing `tests/test_spillover.py::test_fit_conley_plus_survey_design_not_implemented` Wave E.1-era gate-assertion test is removed (replaced by the positive-path tests above). Wave E.1 entry's "Public surface restrictions" bullet updated to past-tense the conley+survey gate reference. +- **`SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` panel-block composition via spatial + serial Bartlett HAC (Wave E.2 follow-up).** Lifts the Wave E.2 upfront `NotImplementedError` at `spillover.py:2210` and extends the panel-aware stratified-Conley sandwich (cross-sectional `lag=0` shipped in Wave E.2) with a within-PSU serial Bartlett HAC over time (Newey-West 1987 form). **Documented synthesis** of Wave E.2's panel-aware Conley × Binder TSL × Wave D Gardner GMM composition with Newey-West (1987) serial Bartlett HAC, matching the no-survey panel-block decomposition at `diff_diff.conley._compute_conley_meat` (Conley 1999 + Newey-West 1987 separable form, NOT Driscoll-Kraay 2D-HAC). The composition is `meat = meat_spatial + meat_serial` with disjoint index sets: spatial is the shipped Wave E.2 per-period within-stratum sandwich on PSU totals; serial is the new within-PSU sum `meat_serial_h = FPC_h_panel * sum_{g in stratum h} sum_{|t-s| <= L, t != s, both periods present for PSU g} (1 - |t-s|/(L+1)) * S_centered_t[g] @ S_centered_s[g]'` where `S_centered_t[g] = S_psu_t[g] - S_bar_h(g)_t` is per-period within-stratum centered (Binder TSL form — matches the spatial helper's centering exactly), and `|t-s|` uses panel-wide dense time codes (mirrors `conley.py:940` R deviation that matches R `conleyreg::time_dist`). Serial Bartlett kernel is hardcoded regardless of `conley_kernel` (the user-selected kernel governs the spatial term only). **FPC convention (panel-wide per-stratum):** standalone Newey-West composition on stratified clusters — the serial sum is a PANEL-level construct, so the cluster set is the panel-wide PSU set in stratum h; FPC denominator uses `n_h_panel = |unique PSUs in stratum h across active sample|`, NOT per-period `n_h_t`. The spatial term keeps its existing per-period FPC unchanged. For balanced panels the two FPC denominators converge; the difference surfaces under unbalanced panels. Citation chain: Binder (1983) for the FPC factor form, Gerber (2026) Prop 1 for the Binder TSL composition with two-stage IF, Newey-West (1987) for the serial Bartlett kernel weights, Conley (1999) for the spatial kernel and panel-block decomposition (deliberately NOT by analogy to the Binder helper's cross-sectional per-stratum FPC convention). **Centering asymmetry vs no-survey reference:** the no-survey panel-block path at `conley.py:949-965` uses RAW scores for the serial term because it assumes `E[scores] = 0` under correct specification; the survey-weighted Binder TSL form centers explicitly (textbook stratified-cluster sandwich). Using raw scores in the survey case would inflate variance by twice the squared per-period stratum mean and would NOT reduce to the cross-sectional Wave E.2 form at lag=0. **Reduction semantics (load-bearing for tests):** `conley_lag_cutoff = 0` or `None` produces bit-identical ATT and scalar SE to shipped Wave E.2 (orchestrator skips the serial helper invocation; the spatial loop + saturation guard + new PSD/finite guard still run on the spatial-only meat). `assert_array_equal` regression pin at `test_a` covers user-visible ATT + scalar SE; `test_a2` mock-spy independently asserts the serial helper is NOT invoked at `lag=0`. The meat matrix itself is not exposed on `SpilloverDiDResults`, so full meat-matrix equality is implied (not asserted); `conley_time is None` or `T = 1` short-circuits the serial helper to zero meat (degenerate panel-block, not a saturation diagnostic); single-stratum H=1 with FPC=inf reduces to Newey-West Bartlett HAC on per-period within-stratum-CENTERED per-PSU score sequences (NOT raw scores — Binder TSL centering is retained at H=1; the panel-wide `G/(G-1)` survey factor replaces FPC); bandwidth → 0 with L > 0 reduces the spatial term to per-period within-stratum HC sandwich while leaving the serial term unchanged (separable form). **Singleton-stratum `lonely_psu="adjust"` panel-wide mean asymmetry:** for the serial helper, `_global_psu_mean` is the panel-wide mean of per-period PSU totals (averaged over all `(g, t)` with `present[g, t]`), NOT the per-period within-stratum mean used by the spatial helper. The `continue`-skip-FPC pattern matches the spatial helper at `survey.py:2007-2017` to avoid divide-by-zero on `n_h_panel = 1`. **Restrictions inherited:** replicate-weight variance still raises `NotImplementedError`; DiagnosticReport routing for the panel-block case is queued for the same Wave F follow-up as the cross-sectional case. **Implementation:** new sibling helper `_compute_stratified_serial_bartlett_meat` in `diff_diff/two_stage.py` (parallel to the Wave E.2 spatial orchestrator; ~200 LoC). Orchestrator `_compute_stratified_conley_meat` signature extended with `conley_lag_cutoff: Optional[int] = None`; spatial loop unchanged; serial helper called after spatial loop when `conley_lag_cutoff > 0`; saturation NaN-fail accounting merges both terms' `(variance_computed, legitimate_zero)` flags into the same template (`UserWarning` "Wave E.2 stratified-Conley sandwich: df_survey = 0..." covers both spatial-only and panel-block cases). Dispatch in `_compute_gmm_corrected_meat` conley branch threads `conley_lag_cutoff` through; the `cluster_ids` non-threading rationale (post-PSU-aggregation every PSU is its own cluster) still applies to the new serial branch. Spillover-side gate at `spillover.py:2210-2230` (Wave E.2-era `NotImplementedError` for `lag > 0 + survey`) deleted; gate rationale comment replaced with shipped-behaviour note. Stage-1 weighted FE solver, `finite_mask` survey-array subsetting, `df_survey` threading, bread weighting, and `SpilloverDiDResults` survey metadata are all inherited UNCHANGED — Psi construction is bit-identical regardless of vcov_type or lag. **Tests:** new `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff` and `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoffEventStudy` classes in `tests/test_spillover.py`; existing `test_j0_panel_conley_lag_cutoff_rejected_under_survey` (Wave E.2-era gate-assertion) DELETED. - **HeterogeneousAdoptionDiD methodology-review-tracker promotion.** New `tests/test_methodology_had.py` (6 classes, 36 tests) with paper-equation-numbered Verified Components walk-through against de Chaisemartin, Ciccia, D'Haultfœuille & Knau (2026) arXiv:2405.04465v6 (Equations 3 / 7 / 11 / 18 / 29 and Theorems 1 / 3 / 4 / 7): Design 1' MC recovery on both the zero-boundary DGP AND a nonzero-boundary-intercept DGP (`ΔY = c + β·D + ε` with `c != 0`) so the `att = (mean(ΔY) − τ_bc) / mean(D)` subtraction term is verified explicitly, N(0,1) coverage at `n_replicates=200`, mass-point Wald-IV closed-form equivalence at `atol=1e-9`, QUG limit-law distributional match at KS-stat ≤ 0.05 (n_draws=5000), Yatchew-HR paper-literal `σ²_diff = 1/(2G)` normalization lock, joint Stute pre-trends + homogeneity H0 fail-to-reject on both surfaces and H1 reject for joint homogeneity under a nonlinear DGP, and library-deviation locks (equal-weighting via selective low-dose-region replication, sup-t bootstrap gating, staggered-timing fail-closed `ValueError`). Added "Non-testable assumptions (paper Section 3.1.2)" Notes block to `HeterogeneousAdoptionDiD` class docstring + "Scope (what this test does NOT cover)" clauses to `qug_test` / `stute_test` / `yatchew_hr_test` / `did_had_pretest_workflow` Notes sections explicitly stating that the pre-tests verify ADJACENT assumptions (Assumption 4 / 7 / 8) and CANNOT test Assumptions 5 or 6. Phase-4 validation-harness items (Pierce-Schott 2016 Figure 2 replication, Table 1 coverage-rate reproduction across 3 DGPs × G ∈ {100, 500, 2500}) waived with documented rationale: R parity at `atol=1e-8` in `tests/test_did_had_parity.py` (3 DGPs × 5 method combos, bit-exact via `rtol=0`) is a strictly stronger anchor than coverage-rate Monte Carlo, and the paper itself self-acknowledges (Section 5.2) that NP estimators are too noisy to be informative on the LBD-restricted PNTR panel. REGISTRY HAD section gains a consolidated Deviations block (5 entries with framing header) and closes 2 of 3 unchecked Implementation Checklist items — the staggered-timing fail-closed `ValueError` and the Assumption 5/6 non-testability documentation; the `covariates=` Theorem 6 follow-up and the extensive-margin / "consider running standard DiD" warning both remain explicitly tracked in `TODO.md` as Low-priority follow-ups rather than claimed-closed. `dechaisemartin-2026-review.md:182-194` requirements checklist boxes the Phase 1a/1b/1c implementation-status closures + the Assumption 5/6 documentation + the staggered-timing closures; the extensive-margin item is acknowledged as partial (zero-dose `UserWarning` exists in `qug_test`; main-`fit()` "consider standard DiD" recommendation is the TODO follow-up). `METHODOLOGY_REVIEW.md` HAD row promoted **In Progress** → **Complete**. - **SunAbraham `vcov_type` parameter (Phase 1b PR 1/8).** `SunAbraham(vcov_type=...)` now accepts `{"classical","hc1","hc2","hc2_bm"}` (defaults to `"hc1"`, which preserves prior behavior bit-equally - SA historically hard-coded HC1). Auto-cluster-at-unit dropped when the user opts into explicit `vcov_type="hc2"` or `vcov_type="classical"` (one-way only); preserved for `"hc1"` and `"hc2_bm"`. When `vcov_type in {"classical","hc2","hc2_bm"}`, `_fit_saturated_regression` auto-routes to a full-dummy saturated design (mirrors TWFE Gate 1 from PR #469): FWL preserves cohort coefficients but not the hat matrix, so HC2 leverage and Bell-McCaffrey Satterthwaite DOF must be computed on the full FE projection. Empirically matches R `lm()` summary classical SE, `sandwich::vcovHC(type="HC2")`, and `clubSandwich::vcovCR(..., type="CR2")` + `coef_test()$df_Satt` at atol=1e-10 (cohort SE and BM DOF pinned in `tests/test_methodology_sun_abraham.py`). For `vcov_type="hc2_bm"`, the user-facing aggregated inference (`event_study_effects[e]['p_value']`/`['conf_int']`, `overall_p_value`/`overall_conf_int`) uses CR2 Bell-McCaffrey contrast DOF — matches `clubSandwich::Wald_test(test="HTZ")$df_denom` at atol=1e-10 (mirrors PR #465's `_compute_cr2_bm_contrast_dof` pattern for MultiPeriodDiD's post-period-average ATT). `vcov_type` is now propagated to `SunAbrahamResults.vcov_type` for downstream introspection. `SurveyDesign` (any kind — analytical weights, stratified, PSU, or replicate-weight) combined with `vcov_type in {"classical","hc2","hc2_bm"}` raises `NotImplementedError`: the survey-design TSL (or replicate-weight refit) variance overrides the analytical sandwich family, and the auto-cluster guard for one-way families would silently downgrade unit-level PSUs to per-observation PSUs. Use `vcov_type="hc1"` (default) for survey designs. `conley` rejected at `__init__` with a deferral message (would require threading 6+ `conley_*` params through the saturated regression call). **Deviation from R:** SA's within-transform HC1 SE differs from `fixest::sunab()` by ~1-2% (~2e-3 absolute) on typical panel sizes due to a different `(n-k)` finite-sample correction (fixest counts absorbed FE in k_total; SA's `solve_ols` counts only within-transformed columns); the IW aggregation step is otherwise identical (pinned at atol=5e-3, tracked in TODO.md). First PR of the Phase 1b standalone-estimator threading initiative (7 PRs to follow: StackedDiD, WooldridgeDiD-OLS, CallawaySantAnna, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD). - **PreTrendsPower R `pretrends` parity goldens (PR-C closes PR-B's deferred R-parity row).** JSON goldens at `benchmarks/data/r_pretrends_golden.json` generated from the committed `benchmarks/R/generate_pretrends_golden.R` script against `jonathandroth/pretrends` commit `122731d082` (package version 0.1.0, R 4.5.2). 4 fixtures cover regular K=3 grid (`uniform_3_pre_periods_no_anticipation`), irregular K=3 grid `[-5,-3,-1]` (`irregular_pre_periods` — locks the PR-B Step 4 γ-unit linear-weight fix), anticipation-shifted K=4 grid (`anticipation_shifted`), and K=1 closed form (`single_pre_period_closed_form` — Roth Proposition 2 univariate truncated-normal). `TestPretrendsParityR` in `tests/test_methodology_pretrends.py` now active (4 tests): NIS power vs R `pretrends::pretrends()` at `atol=1e-4` across all 4 fixtures × 4 γ values; γ_p MDV vs R `slope_for_power()` at `atol=1e-4` across all 4 fixtures × 2 target_power values; end-to-end `fit()` on irregular grid vs R γ_p at `atol=1e-4` (locks the full `fit() → _extract_pre_period_params → _get_violation_weights → _compute_mdv_nis` chain through the public API); K=1 three-way cross-check (Python ≡ analytical truncated-normal closed form `1 - Φ(z - γ/σ) + Φ(-z - γ/σ)` at `atol=1e-7`; both within `atol=1e-4` of R). Tolerance rationale: R hardcodes `thresholdTstat.Pretest=1.96` while Python uses `scipy.stats.norm.ppf(0.975) = 1.959963984540054` (`dz ≈ 3.6e-5`); R `slope_for_power` uses `uniroot(tol = .Machine$double.eps^0.25 ≈ 1.22e-4)` versus Python `brentq(xtol=2e-12)`; the inverse-solver tolerance gap dominates γ_p, and `mvtnorm::pmvnorm` (R) vs `scipy.stats.multivariate_normal.cdf` (Python) Genz-Bretz randomized-lattice differences bound the K=4 NIS power gap at ~5e-5. `METHODOLOGY_REVIEW.md` PreTrendsPower row promoted `**Complete** (R parity pending)` → `**Complete**`. Roth (2022) paper review's `R \`pretrends\` package version pin (provisional)` Gaps bullet struck. Closes the PR-C TODO row. diff --git a/README.md b/README.md index a9b7e238..3587076a 100644 --- a/README.md +++ b/README.md @@ -106,7 +106,7 @@ Full guide: `diff_diff.get_llm_guide("practitioner")`. - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html) - Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html) - Borusyak, Jaravel & Spiess (2024) imputation estimator, most efficient under homogeneous effects - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html) - Gardner (2022) two-stage estimator with GMM sandwich variance -- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up) +- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html) - Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover on near-control units; handles non-staggered and staggered timing; supports survey-design variance under `survey_design=` for HC1 / CR1 (Wave E.1 Binder TSL) and Conley (Wave E.2 panel-aware stratified-Conley sandwich on per-period PSU totals; extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC — `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=`) - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html) - Synthetic DiD combining standard DiD and synthetic control for few treated units - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html) - triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html) - Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves diff --git a/TODO.md b/TODO.md index ec244fec..193711de 100644 --- a/TODO.md +++ b/TODO.md @@ -139,7 +139,8 @@ Deferred items from PR reviews that were not addressed before merge. | `SyntheticDiD(vcov_type="conley")` support. Currently raises `TypeError` at `__init__` because SyntheticDiD uses `variance_method ∈ {bootstrap, jackknife, placebo}` rather than the analytical sandwich that Conley plugs into. Wiring would require either reimplementing an analytical sandwich path for SyntheticDiD or designing a spatial-block bootstrap (new methodology, Politis-Romano 1994 territory). | `synthetic_did.py::SyntheticDiD` | follow-up (spillover-conley) | Low | | `SpilloverDiD(survey_design=...)` replicate-weight variance (BRR / Fay / JK1 / JKn / SDR). Wave E.1 ships Taylor-linearization only. Per Gerber (2026) Appendix A, the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit of stage 1 and stage 2 (200+ LoC of test surface beyond E.1). | `spillover.py::SpilloverDiD.fit`, `survey.py::compute_replicate_refit_variance` | follow-up | Low | | `SpilloverDiD(survey_design=...)` subpopulation preservation (Wave E.3). Wave E.1's `finite_mask` block physically removes zero-weight rows that lose stage-1 FE support, so `SurveyDesign.subpopulation()`-derived designs see `n_psu` / `df_survey` / Binder centering recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (R `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts and treats out-of-domain rows as zero-score padding. Fix requires separating fit-sample alignment (Psi array) from design-level bookkeeping: preserve a full-design `resolved_survey` for inference metadata + zero-pad dropped zero-weight rows' IF contribution. Add `SurveyDesign.subpopulation()` regression test to lock the contract. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_binder_tsl_meat` | follow-up (Wave E.3) | Medium | -| `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` serial Bartlett HAC composition. Wave E.2 ships the panel-aware `conley_lag_cutoff = 0` case ("within-period spatial only" — `sum_t sum_h M_h_t` per `tests/test_spillover.py::TestSpilloverDiDWaveE2ConleySurveyDesign::test_b_panel_aware_per_period_sum_invariant`) and raises `NotImplementedError` upfront at `spillover.py:fit` on `conley_lag_cutoff > 0`. The serial Bartlett component (within-unit / within-PSU temporal HAC at lag ≤ L) needs to compose with the panel-aware stratified-Conley spatial sandwich — the natural addition is `meat_serial = sum_g sum_{|t-s|<=L, t!=s} (1 - |t-s|/(L+1)) * (S_psu_t[g] - S_bar_h_t)(S_psu_s[g] - S_bar_h_s)'` per PSU, summed across all PSUs in each stratum, with appropriate Binder FPC scaling — plus a methodology call on whether to include cross-period spatial pairs in the serial term. Regression goldens vs the cross-sectional limit (lag=0, which is now the shipped path). | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_conley_meat` | follow-up (Wave E.2 follow-up) | Medium | +| Serial Bartlett kernel logic duplicated between `diff_diff/two_stage.py::_compute_stratified_serial_bartlett_meat` (survey path) and `diff_diff/conley.py::_compute_conley_meat` panel-block branch (no-survey path). Both compute `K[t,s] = (1 - |t-s|/(L+1)) * 1{|t-s| <= L, t != s}` over dense panel-period codes. Factor out a shared `_serial_bartlett_kernel_matrix(t_codes, L)` helper and a shared post-meat finite + PSD-warning guard so the survey and no-survey paths can't drift on diagnostics or kernel weights. Cosmetic; refactor doesn't change behavior. | `two_stage.py::_compute_stratified_serial_bartlett_meat`, `conley.py::_compute_conley_meat` | follow-up | Low | +| `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` no-effective-PSU serial Bartlett HAC. Wave E.2 follow-up ships the panel-block composition when an effective PSU exists (explicit `survey_design.psu` OR injected via `cluster=` per `_inject_cluster_as_psu`). Weights-only / strata-only survey designs WITHOUT a cluster fallback raise `NotImplementedError` at `SpilloverDiD.fit` post-resolution because under the pseudo-PSU = obs-index fallback each pseudo-PSU appears in exactly one period — the per-PSU serial cross-period loop would silently contribute zero. Fix would either derive a unit-level serial fallback for no-PSU designs (mixes IF allocators with the pseudo-PSU spatial term — needs methodology work) or route the serial loop through `conley_unit` with explicit documentation of the IF-allocator asymmetry. Regression goldens vs the effective-PSU shipped path. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_serial_bartlett_meat` | follow-up (Wave E.2 follow-up tail) | Low | | `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low | | `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low | | `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium | diff --git a/diff_diff/guides/llms.txt b/diff_diff/guides/llms.txt index 6ae688fe..a9b26a6c 100644 --- a/diff_diff/guides/llms.txt +++ b/diff_diff/guides/llms.txt @@ -58,7 +58,7 @@ Full practitioner guide: call `diff_diff.get_llm_guide("practitioner")` - [SunAbraham](https://diff-diff.readthedocs.io/en/stable/api/staggered.html): Sun & Abraham (2021) interaction-weighted estimator for heterogeneity-robust event studies - [ImputationDiD](https://diff-diff.readthedocs.io/en/stable/api/imputation.html): Borusyak, Jaravel & Spiess (2024) imputation estimator — most efficient under homogeneous effects - [TwoStageDiD](https://diff-diff.readthedocs.io/en/stable/api/two_stage.html): Gardner (2022) two-stage estimator with GMM sandwich variance -- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2; `conley_lag_cutoff=0` only — serial Bartlett HAC composition queued as follow-up), both composed with the Wave D Gardner GMM correction (replicate weights queued as follow-up) +- [SpilloverDiD](https://diff-diff.readthedocs.io/en/stable/api/spillover.html): Butts (2021) ring-indicator spillover-aware DiD identifying direct effect on treated + per-ring spillover-on-control; reuses `conley_coords` for ring construction; handles non-staggered and staggered timing; supports `SurveyDesign(weights, strata, psu, fpc)` under `vcov_type="hc1"` with optional `cluster=` for CR1 via Gerber (2026) Binder TSL (Wave E.1) and under `vcov_type="conley"` via a panel-aware stratified-Conley sandwich on per-period PSU totals (Wave E.2 cross-sectional `conley_lag_cutoff=0`) extended in Wave E.2 follow-up to `conley_lag_cutoff > 0` via panel-block composition with within-PSU serial Bartlett HAC (Newey-West 1987 separable form; `lag>0` requires an effective PSU via explicit `survey_design.psu` or injected `cluster=`), both composed with the Wave D Gardner GMM correction (replicate weights queued as follow-up) - [SyntheticDiD](https://diff-diff.readthedocs.io/en/stable/api/estimators.html): Synthetic DiD combining standard DiD and synthetic control methods for few treated units - [TripleDifference](https://diff-diff.readthedocs.io/en/stable/api/triple_diff.html): Triple difference (DDD) estimator for designs requiring two criteria for treatment eligibility - [ContinuousDiD](https://diff-diff.readthedocs.io/en/stable/api/continuous_did.html): Callaway, Goodman-Bacon & Sant'Anna (2024) continuous treatment DiD with dose-response curves diff --git a/diff_diff/spillover.py b/diff_diff/spillover.py index a1f0db11..daa287f4 100644 --- a/diff_diff/spillover.py +++ b/diff_diff/spillover.py @@ -2197,37 +2197,23 @@ def fit( # check, cluster-vs-PSU warn) runs AFTER `_validate_spillover_inputs` # below so it sees the panel columns the validator guarantees. # - # Wave E.2 scope-limit (upfront, before resolution / panel work): - # the panel-block Conley HAC (`conley_lag_cutoff > 0`) is NOT - # composed with the survey path in this PR. The stratified-Conley - # helper applies a cross-sectional kernel on PSU-aggregated totals; - # composing the within-unit serial Bartlett HAC with the within- - # stratum cross-PSU spatial kernel requires carrying PSU-by-time - # scores into the meat construction, which is a separate Wave E.x - # follow-up tracked in TODO.md. Reject upfront with a clear pointer - # so users running `survey_design=` + `conley_lag_cutoff > 0` get - # the error before stage-1 / 2 work (per `feedback_no_silent_failures`). - if ( - survey_design is not None - and self.vcov_type == "conley" - and self.conley_lag_cutoff is not None - and self.conley_lag_cutoff > 0 - ): - raise NotImplementedError( - "SpilloverDiD(vcov_type='conley', conley_lag_cutoff > 0) " - "combined with survey_design= is not supported in Wave E.2. " - "The Wave E.2 stratified-Conley sandwich aggregates Psi to " - "PSU totals before applying the cross-sectional Conley " - "kernel; the panel-block decomposition (within-unit serial " - "Bartlett HAC over time) would require carrying PSU-by-time " - "scores and composing the serial kernel with the within-" - "stratum cross-PSU spatial kernel. This composition is " - "queued as a follow-up (see TODO.md). For Wave E.2, use " - "conley_lag_cutoff=0 (cross-sectional Conley) with " - "survey_design=, or use survey_design= with " - "vcov_type='hc1' (+ cluster= for CR1) for the full " - "Wave E.1 path." - ) + # Wave E.2 follow-up (shipped): `vcov_type='conley' + conley_lag_cutoff > 0 + # + survey_design=` is supported via panel-block stratified-Conley + # sandwich (spatial Wave E.2 term + within-PSU serial Bartlett HAC) + # WHEN there is an effective PSU (explicit `survey_design.psu` OR + # injected via `cluster=` per Wave E.1's `_inject_cluster_as_psu` + # routing). The orchestrator at + # `two_stage.py::_compute_stratified_conley_meat` sums the two terms + # with disjoint index sets — matches the no-survey panel-block + # decomposition at `conley.py::_compute_conley_meat` (Conley 1999 + # spatial + Newey-West 1987 serial Bartlett; separable form, NOT + # Driscoll-Kraay 2D-HAC). FPC convention: per-period FPC on spatial, + # panel-wide stratum-level FPC on serial. The no-effective-PSU + # fail-closed gate is downstream at the post-resolution check (see + # the `resolved_survey_fit.psu is None` block below the cluster + # injection); the gate cannot live up here because at this point + # the user-supplied `cluster=` has not yet been injected into + # the survey design as the effective PSU. # Validate `anticipation` up front: must be a non-negative integer. # Accepting fractional or negative values would silently shift # treatment timing and ring exposure beyond what the estimator's @@ -3100,6 +3086,50 @@ def fit( _conley_unit_arg = None _conley_lag_arg = None + # Wave E.2 follow-up gate (post-resolution, post-injection): + # fail-closed for `vcov_type="conley" + conley_lag_cutoff > 0` when + # the EFFECTIVE PSU is still absent after `_inject_cluster_as_psu`. + # Under no-effective-PSU survey designs (weights-only / strata-only + # WITHOUT a cluster fallback) the orchestrator falls back to + # pseudo-PSU = obs-index in `_compute_stratified_conley_meat`, but + # each pseudo-PSU appears in exactly one period, so the per-PSU + # serial cross-period loop never contributes anything (silent zero + # serial term). Routing the serial loop to `conley_unit` (the panel + # unit) instead of pseudo-PSU would mix IF allocators (PSU spatial + # vs unit serial), which violates the single-IF-allocator design + # pinned by the user-confirmed methodology in the Wave E.2 follow-up + # plan. Fail-closed per `feedback_no_silent_failures` until a + # no-effective-PSU-specific derivation is queued. Note: this fires + # AFTER `_inject_cluster_as_psu` (which runs upstream) so the + # documented `cluster= + survey_design(without psu)` surface + # — which becomes an effective-PSU layout via injection — passes + # through unscathed. R2 P1 fix: original front-door gate at + # `spillover.py:2210-2242` (now removed) fired before injection + # and broke the cluster-as-PSU survey-Conley surface. + if ( + resolved_survey_fit is not None + and resolved_survey_fit.psu is None + and self.vcov_type == "conley" + and self.conley_lag_cutoff is not None + and self.conley_lag_cutoff > 0 + ): + raise NotImplementedError( + "SpilloverDiD(vcov_type='conley', conley_lag_cutoff > 0) " + "combined with a no-effective-PSU survey_design " + "(weights-only / strata-only WITHOUT a cluster fallback) " + "is not supported in Wave E.2 follow-up. Under no-effective-" + "PSU survey designs the panel-block serial Bartlett HAC " + "would silently contribute zero (each pseudo-PSU = " + "obs-index appears in exactly one period, so the within-PSU " + "temporal sum has no cross-period pairs to accumulate). " + "Routing the serial loop to `conley_unit` would mix IF " + "allocators with the spatial term and is not derived in " + "this PR. Supply either an explicit `survey_design.psu`, " + "or `cluster=` (which is injected as the effective " + "PSU per Wave E.1's `_inject_cluster_as_psu` routing), " + "or use `conley_lag_cutoff=0` (cross-sectional Wave E.2)." + ) + # Derive the Wave D variance mode from the PUBLIC contract: # - vcov_type="conley" → "conley" (Conley spatial-HAC + GMM) # - cluster= supplied → "cluster" (CR1 + GMM) diff --git a/diff_diff/two_stage.py b/diff_diff/two_stage.py index 51c15bed..8427d44a 100644 --- a/diff_diff/two_stage.py +++ b/diff_diff/two_stage.py @@ -125,20 +125,29 @@ def _compute_gmm_corrected_meat( TSL has its own ``(1-f_h) * n_h/(n_h-1)`` correction). - ``vcov_type="cluster"``: ``cluster_ids`` IS the PSU (via upstream ``_inject_cluster_as_psu``); identical to the HC1+survey branch. - - ``vcov_type="conley"`` (cross-sectional only — ``conley_lag_cutoff = 0``): + - ``vcov_type="conley"`` with ``conley_lag_cutoff = 0`` (cross-sectional): Wave E.2 stratified-Conley sandwich on PSU totals via :func:`_compute_stratified_conley_meat`. Aggregates Psi to PSU totals + derives per-PSU centroids as the mean of per-obs ``conley_coords``; for each stratum applies the Conley kernel between PSU centroids scaled by ``(1 - f_h) * n_h/(n_h-1)``. Cross-stratum kernel weights are zero by sampling design. - - ``vcov_type="conley"`` with ``conley_lag_cutoff > 0`` (panel-block - Conley): raises ``NotImplementedError`` upstream at - ``SpilloverDiD.fit``. The panel-block decomposition would need to - compose the within-unit serial Bartlett HAC with the within-stratum - cross-PSU spatial kernel on PSU-by-time scores rather than the - collapsed PSU totals; out of Wave E.2 scope and tracked as a - follow-up in ``TODO.md``. + - ``vcov_type="conley"`` with ``conley_lag_cutoff > 0`` (panel-block, + Wave E.2 follow-up): the orchestrator at + :func:`_compute_stratified_conley_meat` adds a within-PSU serial + Bartlett HAC term (Newey-West 1987 form) onto the Wave E.2 spatial + sandwich — ``meat = meat_spatial + meat_serial`` with disjoint + index sets (matches the no-survey panel-block decomposition at + :func:`diff_diff.conley._compute_conley_meat`). Serial term uses + per-period within-stratum centering (Binder TSL form) and + panel-wide per-stratum FPC. Requires an **effective PSU** — + either explicit ``survey_design.psu`` OR ``cluster=`` + injected as the effective PSU per Wave E.1's + ``_inject_cluster_as_psu`` routing. No-effective-PSU survey + designs (weights-only / strata-only WITHOUT a cluster fallback) + raise ``NotImplementedError`` upstream at ``SpilloverDiD.fit`` + post-resolution because the pseudo-PSU fallback would silently + zero the serial sum. **`gamma_hat` solve** (mirror of `TwoStageDiD._compute_gmm_variance` pattern at `two_stage.py:1886-1917`): factorize ``X_10' W X_10`` via @@ -364,10 +373,13 @@ def _compute_gmm_corrected_meat( "conley_coords, conley_cutoff_km, and conley_metric." ) if resolved_survey is not None: - # Wave E.2: stratified-Conley sandwich on PSU totals. cluster_ids - # is intentionally NOT threaded through — after PSU aggregation - # every PSU is its own cluster, so a cluster product kernel - # would zero all cross-PSU pairs. Wave E.1's + # Wave E.2: stratified-Conley sandwich on PSU totals. + # Wave E.2 follow-up: with conley_lag_cutoff > 0 the orchestrator + # adds the within-PSU serial Bartlett HAC term onto the spatial + # sandwich (separable form; meat = meat_spatial + meat_serial). + # cluster_ids is intentionally NOT threaded through — after PSU + # aggregation every PSU is its own cluster, so a cluster product + # kernel would zero all cross-PSU pairs. Wave E.1's # _resolve_effective_cluster path already coerced any # user-supplied cluster= into PSU upstream. meat = _compute_stratified_conley_meat( @@ -378,6 +390,7 @@ def _compute_gmm_corrected_meat( conley_kernel=conley_kernel, resolved_survey=resolved_survey, conley_time=conley_time, # panel-aware per-period sandwich + conley_lag_cutoff=conley_lag_cutoff, # Wave E.2 follow-up serial term ) else: # Wave D no-survey Conley path UNCHANGED — bit-identical fallback. @@ -546,6 +559,7 @@ def _compute_stratified_conley_meat( conley_kernel: str, resolved_survey: "ResolvedSurveyDesign", conley_time: Optional[np.ndarray] = None, + conley_lag_cutoff: Optional[int] = None, ) -> np.ndarray: """Wave E.2 panel-aware stratified-Conley meat on PSU-by-time scores. @@ -553,8 +567,13 @@ def _compute_stratified_conley_meat( Proposition 1 Binder TSL (the Wave E.1 foundation) and the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 ss3.1 + Gardner 2022 ss4) applied to SpilloverDiD's ring-indicator stage-2 - design. No reference software combines all three ingredients on a - two-stage influence function. + design. Wave E.2 follow-up extends to ``conley_lag_cutoff > 0`` by + summing the within-PSU serial Bartlett HAC term (Newey-West 1987) + onto the spatial sandwich: ``meat = meat_spatial + meat_serial`` with + disjoint index sets, exactly matching the no-survey panel-block + decomposition at :func:`diff_diff.conley._compute_conley_meat`. No + reference software combines panel-block Conley + Binder TSL + Gardner + GMM correction on a two-stage influence function. **Panel-aware composition (preserves the library's panel Conley contract):** for each period ``t``, aggregate per-obs Psi to PSU @@ -599,6 +618,14 @@ def _compute_stratified_conley_meat( one iteration on the full Psi, which is the cross-sectional Wave E.2 design). When provided (the standard SpilloverDiD case), the per-period loop preserves the within-period spatial semantic. + conley_lag_cutoff : int, optional + Bartlett serial-HAC bandwidth ``L`` in panel periods (Wave E.2 + follow-up). When None or 0, the spatial term is the entire meat + (shipped Wave E.2 behaviour); when ``> 0``, the within-PSU serial + Bartlett HAC :func:`_compute_stratified_serial_bartlett_meat` is + added to the spatial term. ``L > 0`` requires ``conley_time`` set + (with ``conley_time is None`` the panel reduces to T=1 and the + serial helper short-circuits to zero meat). Returns ------- @@ -627,18 +654,19 @@ def _compute_stratified_conley_meat( - ``T = 1`` (single period or ``conley_time is None``): single-pass stratified-Conley sandwich on the full PSU totals (the original - cross-sectional Wave E.2 design). - - ``H = 1`` stratum, ``FPC = inf``: reduces to ``sum_t`` plain - Conley sandwich on per-period PSU totals. - - Bandwidth -> 0 (``K = I``): reduces to ``sum_t`` per-period + cross-sectional Wave E.2 design). With ``conley_lag_cutoff > 0`` + the serial helper short-circuits to zero meat (no cross-period + pairs possible). + - ``H = 1`` stratum, ``FPC = inf``: spatial term reduces to ``sum_t`` + plain Conley sandwich on per-period PSU totals; serial term (if + ``conley_lag_cutoff > 0``) reduces to plain Newey-West Bartlett HAC + on PSU totals. + - Bandwidth -> 0 (``K = I``): spatial reduces to ``sum_t`` per-period within-stratum HC sandwich on PSU totals (NOT Wave E.1 Binder, - which is over time-collapsed PSU totals). - - Out of scope (deferred follow-up, tracked in TODO.md): - - - ``conley_lag_cutoff > 0`` panel-block: the within-PSU serial - Bartlett HAC over time would compose with the spatial sandwich - here. Rejected upfront at ``SpilloverDiD.fit``. + which is over time-collapsed PSU totals); serial term unchanged + (separable form). + - ``conley_lag_cutoff = 0`` or ``None``: bit-identical to shipped + Wave E.2 (no serial helper call; spatial-only meat). """ from diff_diff.survey import _compute_stratified_conley_meat_from_psu_scores @@ -779,6 +807,29 @@ def _compute_stratified_conley_meat( _variance_computed = _variance_computed or var_t _legit_zero += legit_zero_t + # Wave E.2 follow-up: serial Bartlett HAC term for conley_lag_cutoff > 0. + # Sums onto the spatial meat with disjoint index sets (separable form, + # NOT Driscoll-Kraay 2D-HAC). Lag=0 / None short-circuits — the helper + # itself returns zero-meat for L<=0 or T<=1 so this branch never erodes + # the lag=0 bit-identity guarantee with shipped Wave E.2 (test (a)). + # `cluster_ids` intentionally not threaded (same rationale as the spatial + # term: post-PSU-aggregation each PSU is its own cluster, the within-PSU + # serial loop already iterates exactly the right scope; threading a + # cluster product kernel would be a no-op). + if conley_lag_cutoff is not None and conley_lag_cutoff > 0: + meat_serial, var_serial, legit_zero_serial = _compute_stratified_serial_bartlett_meat( + Psi, + psu_arr=psu_arr, + time_arr=time_arr, + strata_arr_full=strata_arr_full, + fpc_arr_full=fpc_arr_full, + conley_lag_cutoff=int(conley_lag_cutoff), + lonely_psu=resolved_survey.lonely_psu, + ) + meat = meat + meat_serial + _variance_computed = _variance_computed or var_serial + _legit_zero += legit_zero_serial + # Wave E.2 survey-saturated NaN-fail per `feedback_no_silent_failures`. if not _variance_computed and _legit_zero == 0: warnings.warn( @@ -791,9 +842,311 @@ def _compute_stratified_conley_meat( ) return np.full((p_2, p_2), np.nan) + # Finite + PSD guards on the COMBINED survey meat (spatial + serial). + # Mirrors :func:`diff_diff.conley._compute_conley_meat` L966-990 so the + # survey panel-block path has the same diagnostic surface as the + # no-survey path. The radial 1-D Bartlett spatial kernel and the + # Newey-West Bartlett serial kernel are both practitioner + # specializations that are NOT formally PSD-guaranteed; adding two + # non-PSD-guaranteed terms can produce a more indefinite combined + # meat, so the check matters most on the panel-block path. CI codex + # R1 P2 fix. + if not np.all(np.isfinite(meat)): + raise ValueError( + "SpilloverDiD Wave E.2 stratified-Conley meat contains non-finite " + "values; check Psi for NaN/Inf upstream of the sandwich." + ) + eigvals = np.linalg.eigvalsh(meat) + if eigvals.size and eigvals.min() < -1e-12: + warnings.warn( + f"SpilloverDiD Wave E.2 stratified-Conley meat with conley_kernel=" + f"{conley_kernel!r} has a materially negative eigenvalue " + f"({eigvals.min():.2e}); the variance estimator is not guaranteed " + "PSD on this design. Both supported kernels (radial bartlett and " + "uniform spatial) plus the hardcoded serial Bartlett term are " + "practitioner specializations of Conley 1999 / Newey-West 1987 " + "and are not formally PSD-guaranteed; consider varying " + "conley_cutoff_km / conley_lag_cutoff, or reviewing the design " + "for collinearity / degenerate residual structure.", + UserWarning, + stacklevel=2, + ) + return meat +def _compute_stratified_serial_bartlett_meat( + Psi: np.ndarray, + *, + psu_arr: np.ndarray, + time_arr: np.ndarray, + strata_arr_full: Optional[np.ndarray], + fpc_arr_full: Optional[np.ndarray], + conley_lag_cutoff: int, + lonely_psu: str, +) -> Tuple[np.ndarray, bool, int]: + """Wave E.2 follow-up: within-PSU serial Bartlett HAC meat over time on + PSU-aggregated per-period scores. + + Composes Newey-West (1987) serial Bartlett HAC with Conley (1999)'s panel + block-decomposition convention, Binder (1983) FPC, and Gerber (2026, + arXiv:2605.04124) Proposition 1 Binder TSL on Wave D Gardner GMM IFs + (Butts 2021 ss3.1 + Gardner 2022 ss4). Sibling helper to the Wave E.2 + spatial orchestrator :func:`_compute_stratified_conley_meat`; the two + terms sum with disjoint index sets to form the panel-block meat + ``meat = meat_spatial + meat_serial`` exactly matching the no-survey + panel-block decomposition at :func:`diff_diff.conley._compute_conley_meat`. + + **Composition (separable form, NOT Driscoll-Kraay 2D-HAC):** + + meat_serial = sum_h FPC_h * sum_{g in stratum h} + sum_{|t-s| <= L, t != s, present[g, t] & present[g, s]} + (1 - |t-s|/(L+1)) * S_centered_t[g] @ S_centered_s[g].T + + where ``S_psu_t[g] = sum_{i in PSU g, time t} Psi[i]`` is the per-period + PSU total, ``S_centered_t[g] = S_psu_t[g] - S_bar_h(g)_t`` is the + per-period within-stratum centered score (Binder TSL form — matches the + spatial helper's centering exactly), and ``|t-s|`` is computed on PANEL- + WIDE dense time codes ``np.unique(time_arr, return_inverse=True)`` + (matches :func:`diff_diff.conley._compute_conley_meat` panel-block + convention at conley.py:934-939; mirrors R ``conleyreg::time_dist``). + Serial Bartlett kernel weights are hardcoded regardless of the spatial + ``conley_kernel`` argument (also matches the conley.py reference). + + **FPC convention (panel-wide per-stratum)** — STANDALONE Newey-West + composition on stratified clusters, NOT by analogy to the Binder spatial + helper at :func:`_compute_binder_tsl_meat`. The serial sum aggregates + within-PSU temporal correlation across all observed periods — it is a + PANEL-level construct, not a period-level construct. The cluster set for + the panel-level sum is the panel-wide set of PSUs in stratum h, so the + FPC denominator uses ``n_h_panel = |unique PSUs in stratum h across the + active sample|``, not the per-period ``n_h_t``. The spatial term keeps + its per-period FPC unchanged (the period-t spatial sum IS a within-period + stratified-cluster sandwich at one time index). For balanced panels with + PSU present in every period, ``n_h_panel = n_h_t`` for all t so the two + converge; the difference surfaces under unbalanced panels. + + **Centering asymmetry vs no-survey reference** — `conley.py:949-965` + uses RAW scores for the serial term (no centering) because the no-survey + path assumes ``E[scores] = 0`` under correct specification (X*eps + centered around zero), so centering is a no-op. The survey-weighted + Binder TSL form estimates the within-stratum mean and centers explicitly + (textbook stratified-cluster sandwich; the per-period stratum mean enters + via Binder's finite-population variance derivation). Using raw scores in + the survey case would inflate variance by twice the squared per-period + stratum mean and would NOT reduce to the cross-sectional Wave E.2 form + at lag=0. + + **Singleton-adjust panel-wide mean asymmetry** — for ``lonely_psu="adjust"`` + on a singleton stratum, the serial helper centers against the panel-wide + mean of per-period PSU totals (averaged over all (g, t) with + ``present[g, t]``), NOT the per-period within-stratum mean used by the + spatial helper. The scope difference reflects the serial term's panel- + level nature: a singleton stratum at the panel level has no within- + stratum cross-PSU variation to demean against, so the only meaningful + centering target is the panel-wide PSU mean. The ``continue``-skip-FPC + pattern matches the spatial helper at :func:`_compute_stratified_conley_meat_from_psu_scores` + L2007-2017 to avoid divide-by-zero on ``n_h_panel = 1``. + + Parameters + ---------- + Psi : np.ndarray of shape (n, p_2) + Per-obs Wave D Gardner GMM IF scores (Hajek-weighted upstream via + Wave E.1 eps multiplication). Bit-identical to the spatial path Psi. + psu_arr : np.ndarray of shape (n,) + Per-obs PSU identifier. ``resolved_survey.psu`` or pseudo-PSU = + obs-index per the orchestrator's no-PSU fallback. + time_arr : np.ndarray of shape (n,) + Per-obs period label; normalized to dense codes 0..T-1 internally. + strata_arr_full : np.ndarray of shape (n,) or None + Per-obs stratum. None synthesizes a single stratum. + fpc_arr_full : np.ndarray of shape (n,) or None + Per-obs FPC. None disables FPC scaling on the serial term + (`(1-f_h) = 1`); the n_h/(n_h-1) factor is still applied. + conley_lag_cutoff : int + Bartlett serial-HAC bandwidth `L` in panel periods. Must be >= 1 + (T=1 or L=0 returns zeros via short-circuit). + lonely_psu : str + ``"remove"`` / ``"certainty"`` / ``"adjust"`` — singleton-stratum + handling, matches the Wave E.2 spatial helper exactly. + + Returns + ------- + meat : np.ndarray of shape (p_2, p_2) + Serial Bartlett HAC meat. + variance_computed : bool + Whether any actual variance was contributed. + legitimate_zero_count : int + Strata that legitimately contribute zero (lonely_psu="certainty"). + + Notes + ----- + Does NOT thread ``cluster_ids``: after PSU aggregation every PSU is its + own cluster, so a cluster product kernel would zero all cross-PSU pairs + (Wave E.2 dispatch-boundary rationale). Inherits Wave E.1 + `_resolve_effective_cluster` warn-and-coerce-to-PSU upstream. + + Does NOT receive ``psu_value_to_centroid``: the serial kernel operates + on temporal lag only (no spatial component), so PSU centroids are + irrelevant for this term. Asymmetric vs the spatial helper which needs + centroids. + + T = 1 (single observed period) or ``conley_lag_cutoff <= 0`` short- + circuits to zero meat with no variance reported — the degenerate panel- + block path, NOT a saturation diagnostic. + """ + p_2 = Psi.shape[1] + L = int(conley_lag_cutoff) + + # T=1 short-circuit: no cross-period pairs are possible. + unique_times, time_indices = np.unique(time_arr, return_inverse=True) + T = len(unique_times) + if T <= 1 or L <= 0: + return np.zeros((p_2, p_2)), False, 0 + + # PSU-stratum constancy is enforced upstream by `_validate_unit_constant_survey` + # at survey.py:1008-1048 (PSU is a unit-level survey column; a PSU's + # stratum assignment is constant across all observations by sampling- + # design invariant). No defensive re-check here; the validator would have + # raised before we reach the meat construction. + + # Build per-PSU per-period score tensor S_psu_panel[g, t, :]. + unique_psus, first_idx_panel, psu_indices_full = np.unique( + psu_arr, return_index=True, return_inverse=True + ) + G_panel = len(unique_psus) + S_psu_panel = np.zeros((G_panel, T, p_2)) + for j in range(p_2): + np.add.at(S_psu_panel[:, :, j], (psu_indices_full, time_indices), Psi[:, j]) + + # Presence mask: True iff PSU g has at least one obs at period t. + counts = np.zeros((G_panel, T), dtype=np.int64) + np.add.at(counts, (psu_indices_full, time_indices), 1) + present = counts > 0 + + # Per-PSU panel-wide attributes (stratum + FPC). + if strata_arr_full is not None: + psu_strata_panel = np.asarray(strata_arr_full)[first_idx_panel] + else: + psu_strata_panel = np.zeros(G_panel, dtype=int) + if fpc_arr_full is not None: + psu_fpc_panel: Optional[np.ndarray] = np.asarray(fpc_arr_full, dtype=np.float64)[ + first_idx_panel + ] + else: + psu_fpc_panel = None + + # Per-period within-stratum centering on the (G_panel, T, p_2) tensor. + # Match the spatial helper's per-period stratum-mean centering exactly. + # Initialize to ZEROS (NOT raw S_psu_panel) so any (g, t) cell whose + # stratum-period has < 2 active PSUs contributes zero to downstream + # serial cross-products. Leaving raw scores in singleton-active-period + # cells would feed uncentered values into the serial Bartlett sum and + # contaminate the covariance — codex R1 P1 fix. With < 2 active PSUs in + # stratum h at period t, within-stratum variance is undefined; the + # methodologically-correct behavior is zero contribution from that + # (h, t) leg, matching the spatial helper's lonely_psu="remove" + # convention applied at the per-period level. + S_centered = np.zeros_like(S_psu_panel) + unique_strata_panel = np.unique(psu_strata_panel) + for t in range(T): + for h in unique_strata_panel: + active_mask = (psu_strata_panel == h) & present[:, t] + if int(active_mask.sum()) < 2: + # Singleton/empty active PSUs in stratum h at period t: + # leave S_centered as zero (no contribution to serial sum + # from this leg). The per-stratum singleton branch below + # still handles panel-wide n_h_panel < 2. + continue + stratum_mean_t = S_psu_panel[active_mask, t, :].mean(axis=0) + S_centered[active_mask, t, :] = S_psu_panel[active_mask, t, :] - stratum_mean_t + + # Panel-wide PSU mean for the singleton-adjust branch (compute lazily — + # only needed if lonely_psu == "adjust" AND any singleton stratum exists). + _global_psu_mean: Optional[np.ndarray] = None + if lonely_psu == "adjust": + present_count = int(present.sum()) + if present_count > 0: + _global_psu_mean = (S_psu_panel * present[:, :, None]).sum(axis=(0, 1)) / present_count + + # Per-stratum serial accumulation. + meat = np.zeros((p_2, p_2)) + _variance_computed = False + legitimate_zero_count = 0 + t_codes_full = np.arange(T, dtype=np.float64) + + for h in unique_strata_panel: + stratum_psus = np.where(psu_strata_panel == h)[0] + n_h_panel = len(stratum_psus) + + # Singleton-stratum branch (mirror spatial helper at survey.py:2001-2017 + # — FPC `n_h/(n_h-1)` divides by zero when n_h_panel = 1 so MUST continue + # to skip the multi-PSU FPC block below). + if n_h_panel < 2: + if lonely_psu == "remove": + continue + elif lonely_psu == "certainty": + legitimate_zero_count += 1 + continue + elif lonely_psu == "adjust": + # Center against panel-wide PSU mean (different scope from + # spatial helper's per-period stratum mean; see docstring + # "singleton-adjust panel-wide mean asymmetry"). The guard + # below covers the all-empty-presence edge (present_count = 0 + # at the global mean computation above leaves _global_psu_mean + # None); in that pathological case every PSU's present mask + # is all-False so the inner loop continues without subtraction. + if _global_psu_mean is None: + continue + for g in stratum_psus: + present_g = present[g] + if int(present_g.sum()) < 2: + continue + t_g = t_codes_full[present_g] + lag_mat = np.abs(t_g[:, None] - t_g[None, :]) + K_g = ((lag_mat <= L) & (lag_mat != 0)).astype(np.float64) * ( + 1.0 - lag_mat / (L + 1.0) + ) + S_g_centered = S_psu_panel[g, present_g] - _global_psu_mean + with np.errstate(invalid="ignore", over="ignore"): + meat += S_g_centered.T @ K_g @ S_g_centered + _variance_computed = True + continue + + # Multi-PSU branch (n_h_panel >= 2): standard FPC + per-PSU serial. + f_h_panel = 0.0 + if psu_fpc_panel is not None: + N_h = psu_fpc_panel[stratum_psus[0]] + if N_h < n_h_panel: + raise ValueError( + f"FPC ({N_h}) is less than the number of PSUs " + f"({n_h_panel}) in stratum (Wave E.2 follow-up serial helper). " + "FPC must be >= n_PSU_panel." + ) + f_h_panel = n_h_panel / N_h + + M_h_serial = np.zeros((p_2, p_2)) + for g in stratum_psus: + present_g = present[g] + if int(present_g.sum()) < 2: + continue + t_g = t_codes_full[present_g] + # PANEL-WIDE dense time codes for the serial kernel (NOT per-PSU + # positional encoding). See test (g) in TestSpilloverDiDWaveE2Followup + # for the methodology lock; matches conley.py:940 R-deviation. + lag_mat = np.abs(t_g[:, None] - t_g[None, :]) + K_g = ((lag_mat <= L) & (lag_mat != 0)).astype(np.float64) * (1.0 - lag_mat / (L + 1.0)) + S_g_centered = S_centered[g, present_g] + with np.errstate(invalid="ignore", over="ignore"): + M_h_serial += S_g_centered.T @ K_g @ S_g_centered + + fpc_scale = (1.0 - f_h_panel) * n_h_panel / (n_h_panel - 1) + meat += fpc_scale * M_h_serial + _variance_computed = True + + return meat, _variance_computed, legitimate_zero_count + + # ============================================================================= # Main Estimator # ============================================================================= diff --git a/docs/api/spillover.rst b/docs/api/spillover.rst index 5f883619..55342eee 100644 --- a/docs/api/spillover.rst +++ b/docs/api/spillover.rst @@ -326,6 +326,93 @@ and planned follow-up enhancements: survey_design=)`` is queued for a follow-up (the ``_APPLICABILITY`` / ``_PT_METHOD`` wiring must register the new combination first). +- **Survey-design integration (Wave E.2 follow-up — + ``conley_lag_cutoff > 0`` panel-block composition).** SHIPPED in + Wave E.2 follow-up. ``vcov_type="conley" + conley_lag_cutoff > 0 + + survey_design=`` is now supported by adding a within-PSU serial + Bartlett HAC term to the Wave E.2 spatial sandwich, **provided the + survey design has an effective PSU** (either explicit + ``survey_design.psu`` or a ``cluster=`` argument that gets + injected as the effective PSU per Wave E.1's + ``_inject_cluster_as_psu`` routing). No-effective-PSU survey designs + (weights-only / strata-only WITHOUT a cluster fallback) raise + ``NotImplementedError`` at ``SpilloverDiD.fit`` post-resolution — + see the Restrictions list below. + + .. note:: + + Wave E.2 follow-up composes Wave E.2's panel-aware stratified-Conley + spatial sandwich with within-PSU serial Bartlett HAC over time + (Newey-West 1987 form, kernel weights ``1 - |t-s|/(L+1)`` for + ``|t-s| <= L, t != s``). The composition is + ``meat = meat_spatial + meat_serial`` with disjoint index sets, + exactly matching the no-survey panel-block decomposition at + ``diff_diff.conley._compute_conley_meat`` (Conley 1999 + Newey-West + 1987 separable form, NOT Driscoll-Kraay 2D-HAC). For each stratum + ``h``: ``meat_serial_h = FPC_h_panel * sum_{g in stratum h} + sum_{|t-s| <= L, t != s, both periods present for PSU g} + K_serial(|t-s|/(L+1)) * S_centered_t[g] @ S_centered_s[g]'`` where + ``S_centered_t[g] = S_psu_t[g] - S_bar_h(g)_t`` is per-period + within-stratum centered (Binder TSL form — matches the spatial + helper's centering exactly), and ``|t-s|`` uses panel-wide dense + time codes (matches ``conley.py:940`` documented R deviation that + mirrors R ``conleyreg``). Serial Bartlett kernel is hardcoded + regardless of ``conley_kernel`` (mirrors ``conley.py:951-965``; + ``conley_kernel`` governs spatial kernel only). FPC for serial uses + panel-wide ``n_h_panel = |unique PSUs in stratum h across active + sample|``, NOT per-period ``n_h_t`` (the serial sum is a PANEL-level + construct; standalone Newey-West composition on stratified clusters, + deliberately NOT by analogy to the cross-sectional Wave E.2 spatial + FPC convention). No reference software combines panel-block Conley + + Binder TSL + Gardner GMM correction on a two-stage influence + function. + + Reduction semantics: + + - ``conley_lag_cutoff = 0`` or ``None``: **bit-identical** to shipped + Wave E.2 ATT and scalar SE (orchestrator skips the serial helper + invocation when ``L = 0`` so meat_serial does not contribute; the + test_a2 mock-spy independently asserts the helper isn't invoked). + - ``conley_time is None`` or ``T = 1``: serial helper short-circuits to + zero meat (no cross-period pairs possible). + - Single stratum (H = 1, FPC = inf) with ``L > 0``: serial reduces to + Newey-West Bartlett HAC on per-period within-stratum-CENTERED per-PSU + score sequences (NOT raw scores — Binder TSL centering is retained at + H=1 because the survey path always subtracts the per-period stratum + mean before the kernel application; the panel-wide ``G/(G-1)`` + survey factor replaces FPC). + - Bandwidth → 0 with ``L > 0``: spatial reduces to per-period + within-stratum HC sandwich; serial term unchanged (separable form). + - All PSUs singleton + ``lonely_psu="remove"`` with ``L > 0``: meat + NaN-fails on the saturation diagnostic (template "Wave E.2 + stratified-Conley sandwich" covers both spatial-only and panel-block + cases). + + Centering asymmetry vs no-survey reference: + + - The no-survey panel-block path at ``conley.py:949-965`` uses RAW + scores for the serial term (no centering) because it assumes + ``E[scores] = 0`` under correct specification. The survey-weighted + Binder TSL form estimates the within-stratum mean and centers + explicitly (textbook stratified-cluster sandwich); raw scores in the + survey case would inflate variance by twice the squared per-period + stratum mean and would NOT reduce to the cross-sectional Wave E.2 + form at ``lag = 0``. + + Restrictions: + + - **Requires an effective PSU** — either explicit ``survey_design.psu`` + OR ``cluster=`` injected as the effective PSU per Wave E.1's + ``_inject_cluster_as_psu``. No-effective-PSU survey designs + (weights-only / strata-only WITHOUT a cluster fallback) raise + ``NotImplementedError`` post-resolution at ``SpilloverDiD.fit`` + because the pseudo-PSU = obs-index fallback would silently zero + the serial sum (each pseudo-PSU appears in exactly one period). + Tracked as a follow-up in ``TODO.md``. + - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises + ``NotImplementedError`` (inherits Wave E.1 gate). + - DiagnosticReport routing for the panel-block case is queued for the + same Wave F follow-up as the Wave E.2 cross-sectional case. - **Count-of-treated-in-ring** — only the "nearest-treated ring" specification is implemented. The "count" form re-introduces functional-form dependence (paper Section 3.2 end) and is queued. diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index cf62fa27..96027831 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -3244,7 +3244,7 @@ The `_compute_stratified_meat_from_psu_scores` helper at `diff_diff/survey.py` i Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_survey` (the standard survey 4-way branch: PSU+strata → `n_PSU - n_strata`; PSU only → `n_PSU - 1`; strata only → `n_obs - n_strata`; neither → `n_obs - 1`). Threaded through all four `safe_inference` call sites: aggregate `tau_total`, per-ring `delta_j`, event-study per-event-time `tau_k` / `delta_jk`, and the scalar `att` lincom in event-study mode. -- **Note (documented synthesis):** Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD's Wave D Psi is aggregated to PSU level and passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. No reference software combines all ingredients; Wave E.2 extends with the Conley × survey product-kernel composition — see "Variance (Wave E.2)" subsection below. +- **Note (documented synthesis):** Wave E.1 composes Gerber (2026, arXiv:2605.04124) Proposition 1 — Binder Taylor Series Linearization for IF representations of smooth functionals; explicitly derived for TwoStageDiD in the paper's Appendix — with the Wave D Gardner GMM first-stage uncertainty correction (Butts 2021 §3.1 + Gardner 2022 §4) applied to SpilloverDiD's ring-indicator stage-2 design. The composition is mechanical: SpilloverDiD's Wave D Psi is aggregated to PSU level and passed to the audited Binder TSL meat helper. Survey weights enter via Hájek normalization at the gamma_hat solve, eps construction, and bread inversion. No reference software combines all ingredients; Wave E.2 extends with the Conley × survey product-kernel composition (cross-sectional `conley_lag_cutoff = 0`) — see "Variance (Wave E.2)" subsection below; Wave E.2 follow-up further extends with the within-PSU serial Bartlett HAC for `conley_lag_cutoff > 0` — see "Variance (Wave E.2 follow-up)" subsection below. - **Note (warn-and-use-PSU for cluster + survey):** when both `cluster=` and `survey_design.psu` are supplied with **different groupings**, the cluster argument emits a `UserWarning` and is overridden by PSU (mirrors `TwoStageDiD._resolve_effective_cluster`). PSU is the design-relevant cluster on survey panels; `cluster=` on SpilloverDiD is more often a spatial / unit-level label, so the design constraint wins. When both knobs are supplied with the **same** groupings, no warning fires and PSU still takes precedence (the inference is unchanged either way). - **Note (limitation — `SurveyDesign.subpopulation()` with FE-undefined zero-weight rows):** when `survey_design` is built via `SurveyDesign.subpopulation()` (or otherwise carries zero-weight padding rows) AND those zero-weight rows lose stage-1 FE support (warn-and-drop unit path), Wave E.1's `finite_mask` block physically removes them from the survey design rather than retaining them as zero-score padding. Consequently `n_psu`, `df_survey`, and the Binder TSL centering are recomputed on the reduced fit sample rather than the full domain design. Standard domain-estimation practice (e.g. R's `survey::svyrecvar` on a `subset()` design) preserves the original PSU/strata counts. Practitioners using subpopulation-derived designs should expect SEs that may differ slightly from textbook domain expectations on warn-and-drop fits. Tracked as Wave E.3 follow-up — see TODO.md. - **Note (saturated `df_survey = 0` NaN-fail):** when `lonely_psu="remove"` removes all strata (single PSU per stratum), `_compute_stratified_meat_from_psu_scores` returns `(_, var_computed=False, legit_zero=0)`. SpilloverDiD's Wave E.1 path returns NaN meat with a `UserWarning` matching `"df_survey"` so callers can pin via `pytest.warns(UserWarning, match="df_survey")`. This is a **departure from TwoStageDiD** (`two_stage.py:2003-2005`) which currently NaN-fails SILENTLY; Wave E.1 surfaces the diagnostic per `feedback_no_silent_failures`. @@ -3276,6 +3276,34 @@ Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_su **Implementation:** new `_compute_stratified_conley_meat_from_psu_scores` helper in `diff_diff/survey.py` (parallel to existing Binder helper; 3-tuple `(meat, variance_computed, legitimate_zero_count)` return contract; per-stratum loop replaces the inner `centered.T @ centered` with `_compute_conley_meat(centered, coords_h, cutoff, metric, kernel)` cross-sectional mode). New dispatch wrapper `_compute_stratified_conley_meat` in `diff_diff/two_stage.py` (parallel to `_compute_binder_tsl_meat`; per-obs Psi → PSU aggregation via `np.add.at` + PSU centroid derivation via vectorized `np.add.at` sums / `np.bincount` counts + dispatch to survey helper; intentionally no `cluster_ids` parameter). `_compute_gmm_corrected_meat` conley branch extended at `diff_diff/two_stage.py` with `if resolved_survey is not None` routing to the new wrapper; the `resolved_survey is None` branch is bit-identical to Wave D no-survey Conley. Saturation NaN-fail mirrors Wave E.1 (`UserWarning` template "Wave E.2 stratified-Conley sandwich: df_survey = 0..."). Wave E.1 stage-1 weighted FE solver, `finite_mask` survey-array subsetting, `df_survey` threading to `safe_inference` call sites, bread weighting, and `SpilloverDiDResults` survey metadata are all inherited UNCHANGED — Psi construction is bit-identical regardless of `vcov_type`. Tests: `TestSpilloverDiDWaveE2ConleySurveyDesign` + `TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy` at `tests/test_spillover.py`. +### Variance (Wave E.2 follow-up — `conley_lag_cutoff > 0` panel-block composition via spatial + serial Bartlett HAC) + +`vcov_type="conley" + conley_lag_cutoff > 0 + survey_design=` is now supported via panel-block stratified-Conley sandwich. SHIPPED in Wave E.2 follow-up. + +- **Note (documented synthesis):** Wave E.2 follow-up composes Wave E.2's panel-aware stratified-Conley spatial sandwich (Conley 1999 spatial-HAC × Binder/Gerber 2026 stratified TSL × Wave D Gardner GMM correction) with within-PSU serial Bartlett HAC over time (Newey-West 1987 form, kernel weights `1 - |t-s|/(L+1)` for `|t-s| ≤ L, t ≠ s`). The composition is `meat = meat_spatial + meat_serial` with disjoint index sets, exactly matching the no-survey panel-block decomposition at `diff_diff.conley._compute_conley_meat` (Conley 1999 + Newey-West 1987 separable form, NOT Driscoll-Kraay 2D-HAC). The serial term aggregates per-period PSU totals `S_psu_t[g] = sum_{i in PSU g, time t} psi_i` along the time axis: for each stratum `h`, `meat_serial_h = FPC_h_panel * sum_{g in stratum h} sum_{|t-s| ≤ L, t ≠ s, both periods present for PSU g} K_serial(|t-s|/(L+1)) * S_centered_t[g] @ S_centered_s[g]'` where `S_centered_t[g] = S_psu_t[g] - S_bar_h(g)_t` is per-period within-stratum centered (Binder TSL form — matches the spatial helper's centering exactly), and `|t-s|` is computed on **panel-wide dense time codes** (matches `conley.py:940` documented R deviation that mirrors R `conleyreg`). Serial Bartlett kernel is hardcoded regardless of `conley_kernel` (mirrors conley.py:951-965; the user-selected `conley_kernel` governs the spatial kernel only). Total meat is `sum_t sum_h M_h_t + sum_h meat_serial_h`. No reference software combines panel-block Conley + Binder TSL + Gardner GMM correction on a two-stage influence function. + +- **FPC convention (panel-wide per-stratum) — standalone Newey-West composition on stratified clusters:** the serial sum aggregates within-PSU temporal correlation across all observed periods — it is a PANEL-level construct, not a period-level construct. The cluster set for the panel-level sum is the panel-wide set of PSUs in stratum `h`, so the FPC denominator uses `n_h_panel = |unique PSUs in stratum h across the active sample|` and `N_h = fpc_per_psu[first PSU in h]` (panel-constant by sampling design); serial term in stratum h scales by `(1 - n_h_panel/N_h) * n_h_panel/(n_h_panel - 1)`. The spatial term keeps its existing per-period FPC unchanged (the period-`t` spatial sum IS a within-period stratified-cluster sandwich at one time index). For balanced panels with PSU present in every period, `n_h_panel = n_h_t` for all `t` so the two FPC denominators converge; the difference surfaces under unbalanced panels. **Standalone citation chain** (Binder 1983 for FPC factor form, Gerber 2026 Prop 1 for Binder TSL composition with two-stage IF, Newey-West 1987 for serial Bartlett kernel weights, Conley 1999 for spatial kernel and panel-block decomposition) — deliberately NOT by analogy to the Binder helper's per-stratum first-occurrence FPC convention at `_compute_binder_tsl_meat`, which is a cross-sectional application of the same principle. + +- **Centering asymmetry vs no-survey reference:** the no-survey panel-block path at `conley.py:949-965` uses RAW scores for the serial term (no centering) because it assumes `E[scores] = 0` under correct specification — centering is a no-op under that assumption. The survey-weighted Binder TSL form estimates the within-stratum mean and centers explicitly (textbook stratified-cluster sandwich; the per-period stratum mean enters via Binder's finite-population variance derivation). Using raw scores in the survey case would inflate variance by twice the squared per-period stratum mean and would NOT reduce to the cross-sectional Wave E.2 form at `lag = 0`. The centering applied to each leg (`t` and `s`) of the cross-period pair matches the spatial helper's per-period centering exactly. Hand-check at `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff::test_c0_serial_centering_hand_check_raw_vs_centered` pins this contract. + +- **Reduction semantics (load-bearing for tests):** + - `conley_lag_cutoff = 0` or `None`: bit-identical to shipped Wave E.2 ATT and scalar SE (orchestrator skips the serial helper invocation; the spatial loop + saturation guard + new PSD/finite guard still run on the spatial-only meat). `assert_array_equal` regression pin at test_a covers ATT + scalar SE; test_a2 mock-spy independently pins that the serial helper is NOT invoked at lag=0. + - `conley_time is None` or `T = 1`: serial helper short-circuits to zero meat (no cross-period pairs possible) — the degenerate panel-block path, NOT a saturation diagnostic. + - Single stratum (`H = 1`, `FPC = inf`): spatial reduces to `sum_t` Conley sandwich on per-period **within-stratum-CENTERED** PSU totals (NOT raw — at H=1 the centering still subtracts the per-period mean over all G PSUs); serial reduces to Newey-West Bartlett HAC on per-period within-stratum-CENTERED PSU totals (NOT raw scores; the survey-weighted form retains Binder TSL centering even at H=1). Both reductions carry the panel-wide `G/(G-1)` survey factor in lieu of FPC. + - Bandwidth → 0, `L > 0`: spatial reduces to `sum_t` per-period within-stratum HC sandwich on PSU totals; serial term unchanged (separable form). + - All PSUs singleton in stratum + `lonely_psu="remove"`: `df_survey = 0` and the meat NaN-fails (saturation warning template "Wave E.2 stratified-Conley sandwich" covers both cross-sectional and panel-block cases). + - Cross-stratum kernel weight is exactly zero on the serial term too (the within-PSU loop is necessarily within-stratum). + +- **Singleton-stratum `lonely_psu="adjust"` panel-wide mean asymmetry:** for the serial helper, `_global_psu_mean` is the panel-wide mean of per-period PSU totals (averaged over all `(g, t)` with `present[g, t]`), NOT the per-period within-stratum mean used by the spatial helper. The scope difference reflects the serial term's panel-level nature: a singleton stratum at the panel level has no within-stratum cross-PSU variation to demean against, so the only meaningful centering target is the panel-wide PSU mean. The `continue`-skip-FPC pattern matches the spatial helper at `survey.py:2007-2017` exactly to avoid divide-by-zero on `n_h_panel = 1`. + +- **Restrictions (Wave E.2 follow-up):** + - Requires an **effective PSU**: either an explicit `survey_design.psu` OR a `cluster=` argument that gets injected as the effective PSU per Wave E.1's `_inject_cluster_as_psu` routing. No-effective-PSU survey designs (weights-only / strata-only WITHOUT a cluster fallback) raise `NotImplementedError` at `SpilloverDiD.fit` post-resolution because under the pseudo-PSU = obs-index fallback each pseudo-PSU appears in exactly one period — the per-PSU serial cross-period loop would silently contribute zero. Routing the serial loop to `conley_unit` would mix IF allocators with the spatial term's pseudo-PSU aggregation (the user-confirmed methodology pins a single IF allocator); fail-closed per `feedback_no_silent_failures` until a no-effective-PSU derivation is queued (tracked in TODO.md). + - Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` (inherits Wave E.1 gate). + - LinearRegression-side `vcov_type="conley" + survey_design=` gate at `diff_diff/linalg.py:2853` remains (separate Bertanha-Imbens 2014 weighted-Conley roadmap, not Wave E). + - DiagnosticReport routing for `SpilloverDiDResults(vcov_type="conley", conley_lag_cutoff > 0, survey_design=)` is queued for Wave F follow-up (per `feedback_audit_diagnostic_report_wiring_before_claim` — `_APPLICABILITY` / `_PT_METHOD` registration required before claiming consumability). + +**Implementation:** new sibling helper `_compute_stratified_serial_bartlett_meat` in `diff_diff/two_stage.py` (parallel to the Wave E.2 spatial orchestrator; ~200 LoC; three-mode singleton-stratum branching with FPC scaling inside the multi-PSU branch to avoid divide-by-zero; panel-wide dense time codes for the lag math). Orchestrator `_compute_stratified_conley_meat` signature extended with `conley_lag_cutoff: Optional[int] = None`; spatial loop unchanged; serial helper called after spatial loop when `conley_lag_cutoff > 0`; saturation NaN-fail accounting merges both terms' `(variance_computed, legitimate_zero)` flags. Dispatch in `_compute_gmm_corrected_meat` conley branch threads `conley_lag_cutoff` through to the orchestrator. Spillover-side gate at `spillover.py:2210` deleted (Wave E.2 era `NotImplementedError` for lag>0 + survey). Stage-1 weighted FE solver, `finite_mask` survey-array subsetting, `df_survey` threading, bread weighting, and `SpilloverDiDResults` survey metadata are all inherited UNCHANGED — Psi construction is bit-identical regardless of vcov_type or lag. Tests: `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff` + `TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoffEventStudy` at `tests/test_spillover.py`. + **Edge cases (from paper Section 3.2 / Discussion):** | # | Edge case | Handling | @@ -3291,7 +3319,7 @@ Degrees of freedom for the t-distribution lookup use `ResolvedSurveyDesign.df_su **Restrictions / deferred features:** - `event_study=True` SHIPPED in Wave C — see Event-study mode subsection above. Emits `att_dynamic`, MultiIndex `spillover_effects`, and a TwoStageDiD-compatible `event_study_effects` dict alias. -- `survey_design=` for `vcov_type ∈ {"hc1"}` (plus `cluster=` for CR1) SHIPPED in Wave E.1 — see "Variance (Wave E.1)" subsection below. Threads Hájek-normalized survey weights through stage-1 FE estimation, gamma_hat solve, eps construction, and bread inversion; aggregates the Wave D Psi to PSU totals and routes through the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. `vcov_type="conley"` combined with `survey_design=` SHIPPED in Wave E.2 — see "Variance (Wave E.2)" subsection below (stratified-Conley sandwich on PSU totals). Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — Gerber (2026) Appendix A notes the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up. +- `survey_design=` for `vcov_type ∈ {"hc1"}` (plus `cluster=` for CR1) SHIPPED in Wave E.1 — see "Variance (Wave E.1)" subsection below. Threads Hájek-normalized survey weights through stage-1 FE estimation, gamma_hat solve, eps construction, and bread inversion; aggregates the Wave D Psi to PSU totals and routes through the audited `_compute_stratified_meat_from_psu_scores` Binder TSL meat helper. `vcov_type="conley"` combined with `survey_design=` SHIPPED in Wave E.2 for cross-sectional Conley (`conley_lag_cutoff = 0`) — see "Variance (Wave E.2)" subsection below (stratified-Conley sandwich on PSU totals). Wave E.2 follow-up adds the panel-block composition (`conley_lag_cutoff > 0`) via spatial + serial Bartlett HAC — see "Variance (Wave E.2 follow-up)" subsection below. Replicate-weight variance (BRR / Fay / JK1 / JKn / SDR) raises `NotImplementedError` — Gerber (2026) Appendix A notes the IF-reweighting shortcut does NOT apply to TwoStageDiD-class estimators because `gamma_hat` is weight-sensitive; correct support requires per-replicate full re-fit and is queued as a follow-up. - `covariates=` raises `NotImplementedError` — Gardner-style stage-1 residualization not yet wired through; planned follow-up. - `ring_method="count"` not exposed — only the nearest-treated-ring specification. - `vcov_type` ∈ {`"hc2"`, `"hc2_bm"`, `"classical"`} raises `NotImplementedError` — `hc2`/`hc2_bm` because current stage-2 inference uses generic residual df rather than per-coefficient Bell-McCaffrey / CR2 DOF; `classical` because the Wave D Gardner GMM first-stage correction has not been derived for the classical homoskedastic variance (different meat structure `sigma_hat^2 * (X_10' X_10)` vs the Wave D IF outer product `Psi' Psi`). Use `"hc1"` or `"conley"`, or pair with `cluster=` for CR1 — all three apply the Wave D GMM correction. diff --git a/docs/references.rst b/docs/references.rst index d8acabdb..30c0f18e 100644 --- a/docs/references.rst +++ b/docs/references.rst @@ -202,6 +202,10 @@ Multi-Period and Staggered Adoption - **Goodman-Bacon, A. (2021).** "Difference-in-Differences with Variation in Treatment Timing." *Journal of Econometrics*, 225(2), 254-277. https://doi.org/10.1016/j.jeconom.2021.03.014 +- **Newey, W. K., & West, K. D. (1987).** "A Simple, Positive Semi-Definite, Heteroskedasticity and Autocorrelation Consistent Covariance Matrix." *Econometrica*, 55(3), 703-708. https://doi.org/10.2307/1913610 + + Primary source for the Bartlett serial-HAC kernel weights `1 - |t-s|/(L+1)` used in time-series and panel HAC variance estimators. Our ``diff_diff/conley.py`` panel-block path at L949-965 hardcodes this kernel for the within-unit serial component (mirrors R ``conleyreg::time_dist``); SpilloverDiD's Wave E.2 follow-up composes the same kernel weights with Binder/Gerber TSL + Wave D Gardner GMM correction for the panel-block survey case (``_compute_stratified_serial_bartlett_meat`` at ``diff_diff/two_stage.py``; see REGISTRY section "Variance (Wave E.2 follow-up)"). + - **Wing, C., Freedman, S. M., & Hollingsworth, A. (2024).** "Stacked Difference-in-Differences." *NBER Working Paper* 32054. https://www.nber.org/papers/w32054 - **Chen, X., Sant'Anna, P. H. C., & Xie, H. (2025).** "Efficient Difference-in-Differences and Event Study Estimators." *Working Paper*. diff --git a/tests/test_spillover.py b/tests/test_spillover.py index 81e4f844..c5b2ef7c 100644 --- a/tests/test_spillover.py +++ b/tests/test_spillover.py @@ -11,6 +11,7 @@ import pandas as pd import pytest +from diff_diff import SurveyDesign from diff_diff.spillover import ( SpilloverDiD, _apply_callable_metric_pairwise, @@ -6100,36 +6101,6 @@ def test_i_saturated_design_nan_fails(self): assert np.isnan(res.t_stat) assert np.isnan(res.p_value) - def test_j0_panel_conley_lag_cutoff_rejected_under_survey(self): - """vcov_type='conley' + conley_lag_cutoff > 0 + survey_design raises - NotImplementedError upfront. Wave E.2 ships cross-sectional only; - the panel-block decomposition (within-unit serial Bartlett HAC over - time) would need PSU-by-time scores rather than the collapsed PSU - totals. Tracked as a Wave E.2 follow-up in TODO.md. - """ - from diff_diff import SurveyDesign - - df = generate_butts_nonstaggered_dgp(seed=180) - df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) - design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") - est = SpilloverDiD( - rings=[0.0, 100.0], - conley_coords=("lat", "lon"), - conley_metric="haversine", - conley_cutoff_km=self._CUTOFF_KM, - conley_lag_cutoff=2, # panel-block path - vcov_type="conley", - ) - with pytest.raises(NotImplementedError, match="conley_lag_cutoff > 0"): - est.fit( - df_s, - outcome="y", - unit="unit", - time="time", - first_treat="first_treat", - survey_design=design, - ) - def test_j_replicate_weights_rejection_inherits_wave_e1(self): """Replicate-weight variance still raises NotImplementedError under conley+survey (inherits Wave E.1 gate). SurveyDesign requires @@ -6470,6 +6441,1081 @@ def test_n_finite_mask_survey_array_subsetting(self): assert res.n_obs <= len(df_s) # at least the always-treated unit's rows dropped +class TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoff: + """Wave E.2 follow-up: conley + survey + conley_lag_cutoff > 0 via + panel-block composition (spatial + serial Bartlett HAC). + + Methodology anchor: Wave E.2's panel-aware stratified-Conley spatial + sandwich (Conley 1999 × Binder/Gerber 2026 × Wave D Gardner GMM) + composed with within-PSU serial Bartlett HAC (Newey-West 1987 separable + form). Verifies: + - lag=0 STRICT bit-identity to shipped Wave E.2 ATT and scalar SE + (test_a) plus mock-spy that the serial helper isn't invoked (test_a2) + - serial centering = Binder TSL form (per-period within-stratum), NOT raw + - AR(1) DGP serial-term behavioral inflation + - panel-wide per-stratum FPC for the serial term + - panel-wide dense time codes for the lag math + - singleton-adjust panel-wide mean asymmetry (vs spatial's per-period mean) + - saturation NaN-fail still fires + """ + + _CUTOFF_KM = 1000.0 + + def _fit(self, df, lag_cutoff=1, design=None, **kwargs): + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=lag_cutoff, + vcov_type="conley", + event_study=False, + **kwargs, + ) + return est.fit( + df, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + + def test_a_lag0_strict_bit_identical_to_wave_e2_meat(self): + """`conley_lag_cutoff = 0` MUST produce bit-identical ATT AND scalar SE + as a fresh Wave E.2 baseline fit (`assert_array_equal`). The + orchestrator does NOT truly early-return at lag=0 — the spatial + loop, saturation guard, and new PSD/finite guard all still run; the + guarantee is that the serial helper is NOT invoked (so meat_serial + contributes nothing). test_a2 mock-spy verifies the helper isn't + called. + + Methodology lock: skipping the serial helper at the orchestrator + level is the backwards-compatibility guarantee that the shipped + Wave E.2 surface is unaffected by the follow-up. Without that skip, + a numerical zero from the serial helper would still inject + floating-point noise into the spatial-only meat (which would + surface as SE drift). + + Note: full meat-matrix equality is NOT asserted — only ATT + scalar + SE are pinned (the meat matrix is not directly exposed on + `SpilloverDiDResults`). + """ + df = generate_butts_nonstaggered_dgp(seed=0) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res_lag0 = self._fit(df_s, lag_cutoff=0, design=design) + # `lag=0` fit must match a fresh Wave E.2 baseline estimator fit + # (which has the same config, same lag, same data) — bit-identical. + # This catches any case where the orchestrator fails to short-circuit + # and the serial helper injects floating-point noise into the + # spatial-only meat. The Wave E.2 baseline is constructed inline + # (rather than relying on captured goldens) so this test is robust + # to BLAS-runner drift on absolute values; the assertion is on + # cross-fit bit-identity at one runner. + est_e2_baseline = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=0, + vcov_type="conley", + event_study=False, + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res_e2_baseline = est_e2_baseline.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + np.testing.assert_array_equal(res_lag0.att, res_e2_baseline.att) + np.testing.assert_array_equal(res_lag0.se, res_e2_baseline.se) + + def test_a2_lag0_does_not_call_serial_helper(self): + """Structural anchor: orchestrator skips the serial helper at `lag_cutoff = 0` + BEFORE invoking `_compute_stratified_serial_bartlett_meat`. Mirrors + the Wave E.2 test_a2 mock-spy pattern. Without this, a future + refactor that always-invokes the serial helper would silently degrade + the lag=0 backwards-compat guarantee. + """ + from unittest.mock import patch + + df = generate_butts_nonstaggered_dgp(seed=2) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + with patch("diff_diff.two_stage._compute_stratified_serial_bartlett_meat") as mock_serial: + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + self._fit(df_s, lag_cutoff=0, design=design) + assert not mock_serial.called, ( + "lag=0 conley + survey fit must NOT call the serial helper " + "— orchestrator short-circuit broken." + ) + + def test_b_lag1_invokes_serial_helper(self): + """lag=1 MUST invoke the new serial helper exactly once.""" + from unittest.mock import patch + + df = generate_butts_nonstaggered_dgp(seed=3) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat as _orig + + with patch( + "diff_diff.two_stage._compute_stratified_serial_bartlett_meat", wraps=_orig + ) as spy: + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + self._fit(df_s, lag_cutoff=1, design=design) + assert spy.called, "lag=1 fit must invoke the serial helper" + assert spy.call_count == 1 + + def test_c0_serial_centering_hand_check_raw_vs_centered(self): + """Pure unit test: the serial helper applies per-period within-stratum + centering (Binder TSL form), NOT raw scores like the no-survey + reference at conley.py:949-965. Construct a 2-PSU × 2-period synthetic + with materially nonzero per-period stratum mean; assert the centered + helper output equals the hand-computed centered form (not the raw form). + + Pins MEDIUM #2 from plan-review: codex will push hard on the asymmetry + vs the no-survey panel-block reference; the load-bearing claim is + that the centered form is the methodologically-correct choice under + Binder TSL and that raw scores would inflate the variance. + """ + from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat + + # 2 PSUs (one stratum, both PSUs in stratum 0), 2 periods, score dim 2. + # Per-period stratum mean is NOT zero (the two PSUs' scores have + # nonzero average): PSU 0 has [+2, +1] at t=0, [+3, +2] at t=1; + # PSU 1 has [-1, 0] at t=0, [-2, +1] at t=1. + # Stratum mean at t=0: [(2-1)/2, (1+0)/2] = [0.5, 0.5] + # Stratum mean at t=1: [(3-2)/2, (2+1)/2] = [0.5, 1.5] + # Centered: PSU 0 at t=0: [1.5, 0.5]; PSU 1 at t=0: [-1.5, -0.5] + # PSU 0 at t=1: [2.5, 0.5]; PSU 1 at t=1: [-2.5, -0.5] + # Serial sum per PSU at L=1: K_serial(|0-1|=1) = (1 - 1/2) = 0.5 + # PSU 0 contribution: 0.5 * (centered_t0[0] @ centered_t1[0].T) + # = 0.5 * outer([1.5, 0.5], [2.5, 0.5]) + # PSU 1 contribution: 0.5 * outer([-1.5, -0.5], [-2.5, -0.5]) + # NOTE: K_serial is symmetric so we get the (t=0, t=1) and (t=1, t=0) + # contributions both as the same scalar K=0.5 times each PSU's + # cross-period outer product (with appropriate transposition). + # Total per-PSU meat = sum over (t, s) with t != s of K * S_t @ S_s.T + # = K * (S_0 @ S_1.T + S_1 @ S_0.T) + # FPC: n_h_panel = 2, fpc_per_psu = inf -> f_h = 0, scale = 2/(2-1) = 2 + psu_arr = np.array([0, 0, 1, 1]) + time_arr = np.array([0, 1, 0, 1]) + strata_arr = np.array([0, 0, 0, 0]) + fpc_arr = np.full(4, np.inf) + Psi = np.array( + [ + [2.0, 1.0], # PSU 0, t=0 + [3.0, 2.0], # PSU 0, t=1 + [-1.0, 0.0], # PSU 1, t=0 + [-2.0, 1.0], # PSU 1, t=1 + ] + ) + + meat_centered, var_computed, _ = _compute_stratified_serial_bartlett_meat( + Psi, + psu_arr=psu_arr, + time_arr=time_arr, + strata_arr_full=strata_arr, + fpc_arr_full=fpc_arr, + conley_lag_cutoff=1, + lonely_psu="remove", + ) + assert var_computed + + # Hand-compute expected (centered form) + S_psu_0 = np.array([[2.0, 1.0], [3.0, 2.0]]) # (T, k) + S_psu_1 = np.array([[-1.0, 0.0], [-2.0, 1.0]]) + # Per-period stratum means + mean_t0 = (S_psu_0[0] + S_psu_1[0]) / 2 # [0.5, 0.5] + mean_t1 = (S_psu_0[1] + S_psu_1[1]) / 2 # [0.5, 1.5] + S0_centered = np.array([S_psu_0[0] - mean_t0, S_psu_0[1] - mean_t1]) + S1_centered = np.array([S_psu_1[0] - mean_t0, S_psu_1[1] - mean_t1]) + # K_serial at L=1: K[t,s] = (1 - |t-s|/2) for |t-s| in {1}; K[t,t] = 0 + K_serial = np.array([[0.0, 0.5], [0.5, 0.0]]) + meat_0 = S0_centered.T @ K_serial @ S0_centered + meat_1 = S1_centered.T @ K_serial @ S1_centered + # f_h = 0 (FPC=inf), n_h=2, scale = (1-0) * 2/(2-1) = 2 + meat_expected_centered = 2.0 * (meat_0 + meat_1) + np.testing.assert_allclose(meat_centered, meat_expected_centered, rtol=1e-12, atol=1e-14) + + # Verify the RAW form would be DIFFERENT (sanity check that centering matters) + meat_0_raw = S_psu_0.T @ K_serial @ S_psu_0 + meat_1_raw = S_psu_1.T @ K_serial @ S_psu_1 + meat_expected_raw = 2.0 * (meat_0_raw + meat_1_raw) + # Raw and centered should differ MATERIALLY on this fixture + assert not np.allclose( + meat_expected_centered, meat_expected_raw, rtol=1e-2 + ), "Test fixture must have nonzero centering effect to anchor the asymmetry" + + def test_c1_hand_computation_methodology_anchor_lag1(self): + """Hand-compute the serial Bartlett HAC at L=1 on a 4-PSU x 3-period + synthetic and assert implementation parity. Methodology anchor that + gets codified from the _scratch/wave_e2_followup_smoke.py. + """ + from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat + + rng = np.random.default_rng(0) + G, T, k = 4, 3, 2 + psu_arr = np.repeat(np.arange(G), T) + time_arr = np.tile(np.arange(T), G) + strata_arr = np.repeat([0, 0, 1, 1], T) + fpc_arr = np.full(G * T, 20.0) + Psi = rng.standard_normal((G * T, k)) + + meat, var_computed, _ = _compute_stratified_serial_bartlett_meat( + Psi, + psu_arr=psu_arr, + time_arr=time_arr, + strata_arr_full=strata_arr, + fpc_arr_full=fpc_arr, + conley_lag_cutoff=1, + lonely_psu="remove", + ) + + # Hand-compute expected + S_psu = np.zeros((G, T, k)) + for i in range(G * T): + S_psu[psu_arr[i], time_arr[i]] += Psi[i] + meat_expected = np.zeros((k, k)) + for h in [0, 1]: + stratum_psus = np.where(strata_arr[::T] == h)[0] + n_h_panel = len(stratum_psus) + # Per-period within-stratum mean + S_h = S_psu[stratum_psus] # (n_h, T, k) + S_bar_h = S_h.mean(axis=0) # (T, k) + S_centered = S_h - S_bar_h[None, :, :] + # K_serial at L=1 + K = np.array([[0.0, 0.5, 0.0], [0.5, 0.0, 0.5], [0.0, 0.5, 0.0]]) + meat_h = np.zeros((k, k)) + for g_local in range(n_h_panel): + S_g = S_centered[g_local] + meat_h += S_g.T @ K @ S_g + N_h = 20.0 + fpc_scale = (1.0 - n_h_panel / N_h) * n_h_panel / (n_h_panel - 1) + meat_expected += fpc_scale * meat_h + + np.testing.assert_allclose(meat, meat_expected, rtol=1e-12, atol=1e-14) + assert var_computed + + def test_c2_hand_computation_methodology_anchor_lag2(self): + """L=2 exercises multiple kernel weights: K(1) = 2/3, K(2) = 1/3.""" + from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat + + rng = np.random.default_rng(1) + G, T, k = 4, 4, 2 + psu_arr = np.repeat(np.arange(G), T) + time_arr = np.tile(np.arange(T), G) + strata_arr = np.repeat([0, 0, 1, 1], T) + fpc_arr = np.full(G * T, 30.0) + Psi = rng.standard_normal((G * T, k)) + + meat, _, _ = _compute_stratified_serial_bartlett_meat( + Psi, + psu_arr=psu_arr, + time_arr=time_arr, + strata_arr_full=strata_arr, + fpc_arr_full=fpc_arr, + conley_lag_cutoff=2, + lonely_psu="remove", + ) + + # Hand-compute + S_psu = np.zeros((G, T, k)) + for i in range(G * T): + S_psu[psu_arr[i], time_arr[i]] += Psi[i] + meat_expected = np.zeros((k, k)) + # K[t,s] at L=2: (1 - |t-s|/3) for |t-s| in {1, 2} + K = np.array( + [ + [0.0, 2 / 3, 1 / 3, 0.0], + [2 / 3, 0.0, 2 / 3, 1 / 3], + [1 / 3, 2 / 3, 0.0, 2 / 3], + [0.0, 1 / 3, 2 / 3, 0.0], + ] + ) + for h in [0, 1]: + stratum_psus = np.where(strata_arr[::T] == h)[0] + n_h_panel = len(stratum_psus) + S_h = S_psu[stratum_psus] + S_bar_h = S_h.mean(axis=0) + S_centered = S_h - S_bar_h[None, :, :] + meat_h = np.zeros((k, k)) + for g_local in range(n_h_panel): + S_g = S_centered[g_local] + meat_h += S_g.T @ K @ S_g + N_h = 30.0 + fpc_scale = (1.0 - n_h_panel / N_h) * n_h_panel / (n_h_panel - 1) + meat_expected += fpc_scale * meat_h + + np.testing.assert_allclose(meat, meat_expected, rtol=1e-12, atol=1e-14) + + def test_c3_serial_term_inflates_se_on_ar1_dgp(self): + """Behavioral (not invariant): on a panel with within-unit AR(1) + residuals (positive autocorrelation), SE at lag=1 should be at least + 5% larger than SE at lag=0. Without this test, a bug where the + serial term computes to near-zero (e.g. centered scores cancel) + would pass all reduction/parity tests. + + Pins MEDIUM #7 from plan-review: invariant-only coverage is + susceptible to silent zero bugs. + """ + # Generate AR(1) panel via deterministic seed + rng = np.random.default_rng(101) + n_units, T = 16, 8 + rho = 0.7 + unit_ids = np.repeat(np.arange(n_units), T) + times = np.tile(np.arange(T), n_units) + # AR(1) within-unit residuals + residuals = np.zeros(n_units * T) + for u in range(n_units): + e = rng.standard_normal(T) + r = np.zeros(T) + r[0] = e[0] + for t in range(1, T): + r[t] = rho * r[t - 1] + np.sqrt(1 - rho**2) * e[t] + residuals[u * T : (u + 1) * T] = r + + # PSU/stratum assignment + psu_per_unit = unit_ids // 4 # 4 PSUs, 4 units each + strata_per_unit = psu_per_unit // 2 # 2 strata, 2 PSUs each + # Spatial coords: UNIT-CONSTANT (SpilloverDiD requires constant coords + # within each unit across periods). PSU 0/1 (treated) clustered near + # lat 40; PSU 2/3 (never-treated) FAR (~1000+ km away near lat 50) + # so they stay in Omega_0 (untreated AND unexposed) at every period. + # Without the geographic separation, near-control units within + # ring_max would be flagged as S_it=1 and the time-FE identification + # would fail at treated periods. + psu_lat_centers = np.array([40.0, 40.1, 50.0, 50.1]) + psu_lon_centers = np.array([-120.0, -120.1, -120.0, -120.1]) + unit_lat_offset = rng.normal(0, 0.005, size=n_units) + unit_lon_offset = rng.normal(0, 0.005, size=n_units) + # unit_to_coords map: unit u gets PSU centroid + unit-level offset + lat = psu_lat_centers[psu_per_unit] + unit_lat_offset[unit_ids] + lon = psu_lon_centers[psu_per_unit] + unit_lon_offset[unit_ids] + # PSU 0/1 treated at t=3; PSU 2/3 never treated + first_treat = np.where(psu_per_unit < 2, 3, 2**31 - 1) + # Outcome: pure noise (treatment effect = 0) + y = residuals + np.where((psu_per_unit < 2) & (times >= 3), 0.0, 0.0) + + df = pd.DataFrame( + { + "unit": unit_ids, + "time": times, + "first_treat": first_treat, + "y": y, + "psu": psu_per_unit, + "stratum": strata_per_unit, + "lat": lat, + "lon": lon, + "w": 1.0, + "N_h": 100.0, + } + ) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res_lag0 = self._fit(df, lag_cutoff=0, design=design) + res_lag1 = self._fit(df, lag_cutoff=1, design=design) + # AR(1) with rho=0.7 should produce material serial inflation. The 5% + # threshold is loose enough to tolerate bootstrap/numerical jitter + # but tight enough to catch a near-zero serial term. + inflation = (res_lag1.se / res_lag0.se) - 1.0 + assert inflation > 0.05, ( + f"AR(1) DGP serial inflation {inflation:.4f} should exceed 5%; " + f"lag0_se={res_lag0.se:.6f}, lag1_se={res_lag1.se:.6f}" + ) + + def test_d_single_stratum_lag1_finite_and_finite_se(self): + """Single stratum (H=1) + lag=1 fit produces finite output. (Strict + bit-equivalence to no-survey panel-block conley requires careful + Hajek-weight + bread alignment; the simpler invariant tested here is + that the new path produces well-defined output on a single-stratum + survey design.) + """ + df = generate_butts_nonstaggered_dgp(seed=5) + df_s = _augment_with_survey(df, n_strata=1, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res = self._fit(df_s, lag_cutoff=1, design=design) + assert np.isfinite(res.att) + assert np.isfinite(res.se) and res.se > 0 + + def test_e_cross_stratum_independence_with_serial(self): + """Pure unit test on the serial helper: full meat must equal + partition-by-stratum-then-sum-of-each-stratum's-meat. Pins that + cross-stratum contributions are exactly zero in the serial term + (each stratum's serial sum is independent because it only iterates + within-stratum PSUs). + """ + from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat + + rng = np.random.default_rng(7) + G, T, k = 6, 3, 2 + psu_arr = np.repeat(np.arange(G), T) + time_arr = np.tile(np.arange(T), G) + strata_arr = np.repeat([0, 0, 0, 1, 1, 1], T) + fpc_arr = np.full(G * T, 20.0) + Psi = rng.standard_normal((G * T, k)) + + meat_full, _, _ = _compute_stratified_serial_bartlett_meat( + Psi, + psu_arr=psu_arr, + time_arr=time_arr, + strata_arr_full=strata_arr, + fpc_arr_full=fpc_arr, + conley_lag_cutoff=1, + lonely_psu="remove", + ) + # Partition + sum + meat_partitioned = np.zeros((k, k)) + for h in [0, 1]: + mask = strata_arr == h + psu_arr_h = psu_arr[mask] + sub_strata = np.zeros(mask.sum(), dtype=int) + meat_h, _, _ = _compute_stratified_serial_bartlett_meat( + Psi[mask], + psu_arr=psu_arr_h, + time_arr=time_arr[mask], + strata_arr_full=sub_strata, + fpc_arr_full=fpc_arr[mask], + conley_lag_cutoff=1, + lonely_psu="remove", + ) + meat_partitioned += meat_h + np.testing.assert_allclose(meat_full, meat_partitioned, rtol=1e-12, atol=1e-14) + + def test_f_singleton_adjust_lag1_no_divide_by_zero(self): + """Pure unit test: singleton stratum + lonely_psu="adjust" + lag=1 + produces finite output (the panel-wide n_h_panel = 1 FPC must be + SKIPPED via the `continue` mirror; otherwise the helper would + divide-by-zero). Pins the singleton-adjust panel-wide mean asymmetry. + """ + from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat + + rng = np.random.default_rng(11) + # 1 PSU in stratum 0, 2 PSUs in stratum 1; lonely_psu="adjust" exercises + # the singleton branch for stratum 0. + G, T, k = 3, 3, 2 + psu_arr = np.repeat([0, 1, 2], T) + time_arr = np.tile(np.arange(T), G) + strata_arr = np.repeat([0, 1, 1], T) + fpc_arr = np.full(G * T, 10.0) + Psi = rng.standard_normal((G * T, k)) + + meat, var_computed, _ = _compute_stratified_serial_bartlett_meat( + Psi, + psu_arr=psu_arr, + time_arr=time_arr, + strata_arr_full=strata_arr, + fpc_arr_full=fpc_arr, + conley_lag_cutoff=1, + lonely_psu="adjust", + ) + assert np.all(np.isfinite(meat)), "singleton-adjust path divided by zero" + assert var_computed + + def test_f2_all_singleton_remove_lag1_returns_zero(self): + """Pure unit test: all strata singleton + lonely_psu="remove" + lag=1 + returns zero meat and variance_computed=False. (At the orchestrator + level the spatial term would also fail; the orchestrator combines + both terms' var_computed flags before NaN-failing.) + """ + from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat + + rng = np.random.default_rng(13) + G, T, k = 3, 3, 2 + psu_arr = np.repeat([0, 1, 2], T) + time_arr = np.tile(np.arange(T), G) + strata_arr = np.repeat([0, 1, 2], T) # 1 PSU each — all singleton + fpc_arr = np.full(G * T, 10.0) + Psi = rng.standard_normal((G * T, k)) + + meat, var_computed, _ = _compute_stratified_serial_bartlett_meat( + Psi, + psu_arr=psu_arr, + time_arr=time_arr, + strata_arr_full=strata_arr, + fpc_arr_full=fpc_arr, + conley_lag_cutoff=1, + lonely_psu="remove", + ) + np.testing.assert_array_equal(meat, np.zeros((k, k))) + assert not var_computed + + def test_g_unbalanced_panel_panel_wide_dense_codes(self): + """Unit test: panel-wide dense time codes for lag math (matches + `conley.py:940` R deviation). Drop PSU 0 from period t=2 in a + 4-PSU × 4-period synthetic. PSU 0's observed periods become {0, 1, 3}; + with L=1, only the (t=0, t=1) pair contributes (pair (t=1, t=3) at + panel-wide lag |1-3|=2 > 1; pair (t=0, t=3) at lag 3 > 1). Hand-compute + the expected serial contribution from PSU 0. + + Pins MEDIUM #1 from re-review: the lag convention is PANEL-WIDE dense + codes (NOT per-PSU positional encoding). + """ + from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat + + rng = np.random.default_rng(17) + G, T, k = 4, 4, 2 + # Build obs list with PSU 0 missing period 2 + psu_obs_list, time_obs_list, strata_obs_list, fpc_obs_list = [], [], [], [] + for g in range(G): + for t in range(T): + if g == 0 and t == 2: + continue # PSU 0 missing period 2 + psu_obs_list.append(g) + time_obs_list.append(t) + strata_obs_list.append(g // 2) + fpc_obs_list.append(20.0) + psu_arr = np.array(psu_obs_list) + time_arr = np.array(time_obs_list) + strata_arr = np.array(strata_obs_list) + fpc_arr = np.array(fpc_obs_list, dtype=np.float64) + Psi = rng.standard_normal((len(psu_arr), k)) + + meat, _, _ = _compute_stratified_serial_bartlett_meat( + Psi, + psu_arr=psu_arr, + time_arr=time_arr, + strata_arr_full=strata_arr, + fpc_arr_full=fpc_arr, + conley_lag_cutoff=1, + lonely_psu="remove", + ) + + # Verify PSU 0's serial pairs use panel-wide lag codes {0,1,3} -> lag={1,2,3} + # not per-PSU positional {0,1,2} -> lag={1,1,2}. + # If the implementation incorrectly used per-PSU positional, PSU 0 + # would contribute the (t=1, t=3) pair at lag=1 (positional), not lag=2. + # We verify by computing expected meat under both conventions and + # asserting which the implementation matches. + + # Panel-wide convention (correct): PSU 0 only contributes (t=0, t=1) at lag=1. + # Per-PSU positional convention (incorrect): would also contribute + # (t=1, t=3) at positional lag=1 (positions 1 and 2 in {0, 1, 3}). + # The contributions differ; verify implementation matches panel-wide. + + # Build S_psu per (g, t) with NaN at missing cells, then sum into meat + # using panel-wide t codes. + S_psu = np.full((G, T, k), np.nan) + for i, (g, t) in enumerate(zip(psu_arr, time_arr)): + if np.isnan(S_psu[g, t, 0]): + S_psu[g, t] = Psi[i] + else: + S_psu[g, t] += Psi[i] + # Per-period within-stratum centering. Initialize to ZEROS (NOT raw + # S_psu) so any (g, t) cell with < 2 active PSUs in stratum at period + # contributes zero to the serial sum — matches the helper's + # codex-R1-P1 fix preventing raw-score leakage into the serial + # Bartlett covariance. + S_centered = np.zeros_like(S_psu) + # Replace NaN with 0 in the base (won't matter — zero-init prevents + # NaN propagation from the helper's initial copy of S_psu). + for t in range(T): + for h in [0, 1]: + stratum_psus = [g for g in range(G) if g // 2 == h] + active_psus = [g for g in stratum_psus if not np.isnan(S_psu[g, t, 0])] + if len(active_psus) < 2: + continue # singleton-active-period: leave S_centered as zero + stratum_mean = np.mean([S_psu[g, t] for g in active_psus], axis=0) + for g in active_psus: + S_centered[g, t] = S_psu[g, t] - stratum_mean + # Per-PSU serial accumulation + meat_expected = np.zeros((k, k)) + for h in [0, 1]: + stratum_psus = [g for g in range(G) if g // 2 == h] + n_h_panel = len(stratum_psus) + if n_h_panel < 2: + continue + meat_h = np.zeros((k, k)) + for g in stratum_psus: + present_g = ~np.isnan(S_centered[g, :, 0]) + t_g = np.arange(T)[present_g].astype(np.float64) + if len(t_g) < 2: + continue + lag_mat = np.abs(t_g[:, None] - t_g[None, :]) + K_g = ((lag_mat <= 1) & (lag_mat != 0)).astype(np.float64) * (1.0 - lag_mat / 2.0) + S_g = S_centered[g, present_g] + meat_h += S_g.T @ K_g @ S_g + N_h = 20.0 + fpc_scale = (1.0 - n_h_panel / N_h) * n_h_panel / (n_h_panel - 1) + meat_expected += fpc_scale * meat_h + + np.testing.assert_allclose(meat, meat_expected, rtol=1e-12, atol=1e-14) + + def test_g2_lag_greater_than_T_minus_1_finite(self): + """L > T-1 is well-defined (no kernel overflow); fit succeeds with + finite output. Pinned per plan: lag=T and lag=T+5 should not crash. + """ + df = generate_butts_nonstaggered_dgp(seed=19) # T=2 panel + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + T_max = df["time"].nunique() + for L in [T_max, T_max + 5]: + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res = self._fit(df_s, lag_cutoff=L, design=design) + assert np.isfinite(res.att), f"ATT NaN at L={L}" + assert np.isfinite(res.se), f"SE NaN at L={L}" + + def test_h_singleton_active_period_centering_zeros(self): + """Pure unit test: when a stratum-period has < 2 active PSUs, the + per-period centering must zero out S_centered (not leave raw scores). + Pins codex R1 P1 fix — leaving raw scores in singleton-active-period + cells would feed uncentered values into the serial Bartlett sum and + contaminate the covariance. + + Construct a 4-PSU x 3-period panel where one stratum has both PSUs + present at t=0, t=2 but only ONE PSU present at t=1 (simulating + finite_mask-style sparsity). With the bug, the singleton-active-period + cell at t=1 leaks RAW scores into the serial sum across (t=0, t=1) + and (t=1, t=2) pairs; with the fix, those legs contribute zero. + """ + from diff_diff.two_stage import _compute_stratified_serial_bartlett_meat + + rng = np.random.default_rng(37) + # 4 PSUs: stratum 0 has PSUs (0, 1), stratum 1 has PSUs (2, 3). + # PSU 1 missing at t=1 -> stratum 0 has only PSU 0 active at t=1. + # Stratum 1 has both PSUs at all periods. + obs = [] + for g in range(4): + for t in range(3): + if g == 1 and t == 1: + continue # PSU 1 absent at t=1 (singleton-active-period for stratum 0) + obs.append((g, t, g // 2)) + psu_arr = np.array([o[0] for o in obs]) + time_arr = np.array([o[1] for o in obs]) + strata_arr = np.array([o[2] for o in obs]) + fpc_arr = np.full(len(obs), 20.0) + # Choose Psi values so PSU 0 has materially nonzero score at t=1 + # (the singleton-active-period cell). Without the fix, this raw + # value would leak into the serial cross-product with (PSU 0, t=0) + # and (PSU 0, t=2); with the fix, the t=1 leg of PSU 0's serial sum + # contributes zero. + Psi = rng.standard_normal((len(obs), 2)) + # Override PSU 0's t=1 score to a known large value + for i, (g, t, _) in enumerate(obs): + if g == 0 and t == 1: + Psi[i] = np.array([10.0, 10.0]) + + meat, _, _ = _compute_stratified_serial_bartlett_meat( + Psi, + psu_arr=psu_arr, + time_arr=time_arr, + strata_arr_full=strata_arr, + fpc_arr_full=fpc_arr, + conley_lag_cutoff=1, + lonely_psu="remove", + ) + # Hand-compute expected: PSU 0's t=1 leg has S_centered = 0 + # (singleton-active in stratum 0); so PSU 0's serial sum only + # contributes from valid cross-period pairs where both legs have + # >= 2 active PSUs in stratum 0. + # At t=0, t=2: stratum 0 has both PSUs (0, 1) active. So + # S_centered_0_t0 = Psi_0_t0 - (Psi_0_t0 + Psi_1_t0)/2 + # S_centered_0_t2 = Psi_0_t2 - (Psi_0_t2 + Psi_1_t2)/2 + # Cross-period pair (t=0, t=2) at lag=2 contributes 0 to L=1 sum. + # Cross-period pairs (t=0, t=1) and (t=1, t=2) at lag=1 should + # contribute zero from PSU 0's leg (since S_centered at t=1 is zero). + # So PSU 0 contributes nothing to the serial sum at L=1. + # PSU 1: only present at t=0, t=2; lag=2 > L=1 so no contribution. + # Total contribution from stratum 0 = 0. + # PSU 2, PSU 3 (stratum 1) contribute normally at L=1. + # Compute stratum 1's expected contribution + psi_2_t0 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 2 and t == 0][0]] + psi_2_t1 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 2 and t == 1][0]] + psi_2_t2 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 2 and t == 2][0]] + psi_3_t0 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 3 and t == 0][0]] + psi_3_t1 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 3 and t == 1][0]] + psi_3_t2 = Psi[[i for i, (g, t, _) in enumerate(obs) if g == 3 and t == 2][0]] + # Stratum 1 mean at each t + mean_t0 = (psi_2_t0 + psi_3_t0) / 2 + mean_t1 = (psi_2_t1 + psi_3_t1) / 2 + mean_t2 = (psi_2_t2 + psi_3_t2) / 2 + # Centered PSU 2 + s2 = np.array([psi_2_t0 - mean_t0, psi_2_t1 - mean_t1, psi_2_t2 - mean_t2]) + s3 = np.array([psi_3_t0 - mean_t0, psi_3_t1 - mean_t1, psi_3_t2 - mean_t2]) + K_serial = np.array([[0.0, 0.5, 0.0], [0.5, 0.0, 0.5], [0.0, 0.5, 0.0]]) + meat_s1 = s2.T @ K_serial @ s2 + s3.T @ K_serial @ s3 + # FPC scale: panel-wide n_h_panel = 2, N_h = 20, f = 0.1, scale = 0.9 * 2/1 = 1.8 + meat_expected = 1.8 * meat_s1 + + np.testing.assert_allclose(meat, meat_expected, rtol=1e-12, atol=1e-14) + + def test_n_no_psu_survey_lag_positive_raises(self): + """No-PSU survey design + conley_lag_cutoff > 0 raises + NotImplementedError upfront (codex R1 P0 fix). Pseudo-PSU = obs-index + fallback would silently zero the serial sum; the gate prevents the + silent zero by failing closed at SpilloverDiD.fit. + """ + df = generate_butts_nonstaggered_dgp(seed=41) + # No PSU column; just weights + strata. + df_s = df.copy() + df_s["w"] = 1.0 + df_s["stratum"] = 0 # single stratum + design = SurveyDesign(weights="w", strata="stratum") # no psu, no fpc + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=1, # the problematic combination + vcov_type="conley", + ) + with pytest.raises(NotImplementedError, match="no-effective-PSU survey_design"): + est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + + def test_n3_cluster_injects_psu_under_no_psu_survey_lag_positive(self): + """Positive test (R2 P1 fix): `cluster=` injected as PSU under + a no-PSU survey design (weights + strata + fpc) must allow lag>0 + survey-Conley fitting via `_inject_cluster_as_psu`. ATT AND SE must + equal the equivalent fit with explicit `survey_design.psu="psu"` + — the FPC scaling is the SAME on both paths (Wave E.1 `_inject_cluster_as_psu` + carries the FPC through), so SE parity is the load-bearing pin. + + Pins the documented Wave E.1 surface that `cluster=` becomes + the effective PSU when survey PSU is absent; the codex-R2 P1 fix + moved the no-effective-PSU gate to AFTER injection so this + documented surface continues to work for lag>0. R3 P3 fix: original + test omitted fpc on the no-PSU design and only asserted ATT + equality + finite SE; this version includes fpc on both paths and + asserts ATT AND scalar SE tight numerical parity (`assert_allclose` + at rtol=1e-12, atol=1e-14) so a future variance regression on the + cluster-injected surface fails. The full meat matrix is not + asserted (SE is a projection of meat that could in principle + coincide while off-diagonals differ); scalar-SE parity is the + load-bearing user-visible pin. + """ + df = generate_butts_nonstaggered_dgp(seed=53) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + # Reference fit: explicit survey_design.psu="psu" + design_explicit = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res_explicit = self._fit(df_s, lag_cutoff=1, design=design_explicit) + # Cluster-injected fit: SurveyDesign(weights, strata, fpc) + cluster="psu" + # (fpc included so SE comparison is apples-to-apples) + design_no_psu = SurveyDesign(weights="w", strata="stratum", fpc="N_h") + est_cluster = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=1, + vcov_type="conley", + cluster="psu", # injected as effective PSU + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res_cluster = est_cluster.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design_no_psu, + ) + # ATT bit-identical (variance-method-invariant) AND SE bit-identical + # (cluster injection carries fpc through; both paths see identical + # effective PSU labels + FPC scaling for the panel-block sandwich). + np.testing.assert_allclose(res_cluster.att, res_explicit.att, rtol=1e-12, atol=1e-14) + np.testing.assert_allclose(res_cluster.se, res_explicit.se, rtol=1e-12, atol=1e-14) + assert np.isfinite(res_cluster.se) and res_cluster.se > 0 + + def test_n2_weights_only_survey_lag_positive_raises(self): + """Weights-only SurveyDesign + conley_lag_cutoff > 0 also raises + (covers the weights-only no-PSU sub-case).""" + df = generate_butts_nonstaggered_dgp(seed=43) + df_s = df.copy() + df_s["w"] = 1.0 + design = SurveyDesign(weights="w") # weights only, no psu/strata/fpc + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=2, + vcov_type="conley", + ) + with pytest.raises(NotImplementedError, match="no-effective-PSU survey_design"): + est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + + def test_j_no_survey_panel_block_unchanged_with_lag(self): + """The no-survey Wave D panel-block conley path at `conley.py:920-965` + must still work after the orchestrator gate-relaxation. The dispatch + in `_compute_gmm_corrected_meat` only ROUTES the survey branch + through the new orchestrator; the `resolved_survey is None` branch is + bit-identical to Wave D pre-PR. + """ + df = generate_butts_nonstaggered_dgp(seed=23) + # No survey design; just conley + lag=1 (already supported pre-PR via Wave A/D) + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=1000.0, + conley_lag_cutoff=1, + vcov_type="conley", + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res = est.fit(df, outcome="y", unit="unit", time="time", first_treat="first_treat") + assert np.isfinite(res.att) + assert np.isfinite(res.se) and res.se > 0 + + def test_k_replicate_weights_rejection_inherits_wave_e1(self): + """Replicate-weight + conley + lag>0 + survey raises NotImplementedError. + Replicate-weight SurveyDesigns are by construction no-PSU (replicate + weights can't combine with strata/psu/fpc), so the codex-R1-P0 + no-PSU lag-gate at SpilloverDiD.fit fires before the Wave E.1 + replicate-weight gate — either error is informative. We assert the + broader rejection contract: ANY of the gates fires. + """ + df = generate_butts_nonstaggered_dgp(seed=29) + df_s = df.copy() + df_s["w"] = 1.0 + # Build replicate-weight columns (unit-constant) + units_sorted = sorted(df_s["unit"].unique()) + rep_cols = [] + rng = np.random.default_rng(0) + for r in range(5): + col = f"rep_{r}" + w_by_unit = {u: rng.uniform(0.5, 1.5) for u in units_sorted} + df_s[col] = df_s["unit"].map(w_by_unit) + rep_cols.append(col) + design = SurveyDesign(weights="w", replicate_weights=rep_cols, replicate_method="JK1") + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=1000.0, + conley_lag_cutoff=1, + vcov_type="conley", + ) + with pytest.raises(NotImplementedError, match="(?:replicate|no-effective-PSU)"): + est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + + def test_m_fit_idempotency_lag1(self): + """Repeated fit with lag=1 produces bit-identical results (per + `feedback_fit_does_not_mutate_config`). + """ + df = generate_butts_nonstaggered_dgp(seed=31) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=1000.0, + conley_lag_cutoff=1, + vcov_type="conley", + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res1 = est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + # Re-fit on a fresh estimator instance with same params + est2 = SpilloverDiD(**est.get_params()) + res2 = est2.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + np.testing.assert_array_equal(res1.att, res2.att) + np.testing.assert_array_equal(res1.se, res2.se) + + def test_r_drift_goldens_lag1(self): + """ATT is variance-method-invariant: lag=1 ATT MUST match lag=0 ATT + bit-identically on the same data + estimator config (only vcov_type + / lag changes the meat, never the point estimate). SE MUST differ + (serial Bartlett term contributes nonzero variance under panel data + with within-PSU temporal correlation). Catches any case where the + serial helper accidentally feeds back into the point estimate path. + """ + df = generate_butts_nonstaggered_dgp(seed=0) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res_lag0 = self._fit(df_s, lag_cutoff=0, design=design) + res_lag1 = self._fit(df_s, lag_cutoff=1, design=design) + # Both finite + positive + assert np.isfinite(res_lag0.att) and np.isfinite(res_lag0.se) and res_lag0.se > 0 + assert np.isfinite(res_lag1.att) and np.isfinite(res_lag1.se) and res_lag1.se > 0 + # ATT bit-identical between lag=0 and lag=1 (variance-method-invariant) + np.testing.assert_array_equal( + res_lag1.att, + res_lag0.att, + err_msg="ATT must be variance-method-invariant; lag=1 ATT must match lag=0 ATT exactly", + ) + # SE differs (serial Bartlett term contributes nonzero variance) + assert not np.allclose(res_lag1.se, res_lag0.se, rtol=1e-10), ( + f"lag=1 SE ({res_lag1.se}) must differ from lag=0 SE ({res_lag0.se}); " + "serial Bartlett term should add nonzero variance contribution. " + "Note: on a 2-period Butts DGP only L=1 pairs exist (only adjacent " + "periods), so the inflation may be small but should be detectable." + ) + + +class TestSpilloverDiDWaveE2FollowupConleySurveyLagCutoffEventStudy: + """Wave E.2 follow-up event-study mirror: conley + survey + lag>0 on both + `is_staggered` branches (per `feedback_cohort_loop_trigger_cache_both_branches`). + """ + + _CUTOFF_KM = 1000.0 + + def test_o_event_study_conley_lag1_survey_is_staggered_true(self): + """Full plumbing end-to-end on the staggered event-study path with + `conley_lag_cutoff=1` survey.""" + df = generate_butts_staggered_dgp(seed=24) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=1, # NEW: panel-block composition + vcov_type="conley", + event_study=True, + horizon_max=2, + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res = est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + assert np.isfinite(res.att) and np.isfinite(res.se) and res.se > 0 + assert res.spillover_effects is not None + assert res.survey_metadata is not None + # df_survey on this fixture matches Wave E.2 event-study test_o + assert res.survey_metadata.df_survey == 6 + + def test_p_event_study_conley_lag1_survey_is_staggered_false(self): + """The non-staggered branch of the event-study path also works with + lag>0 (per `feedback_cohort_loop_trigger_cache_both_branches`).""" + df = generate_butts_nonstaggered_dgp(seed=25) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + est = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=1, # NEW: panel-block composition + vcov_type="conley", + event_study=True, + horizon_max=1, + ) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res = est.fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + assert np.isfinite(res.att) and np.isfinite(res.se) and res.se > 0 + assert res.survey_metadata is not None + + def test_r_event_study_drift_goldens_lag1_vs_lag0(self): + """Event-study + survey + lag=1: ATT bit-identical to lag=0 (variance- + method-invariant); SE differs (serial Bartlett contributes). + """ + df = generate_butts_staggered_dgp(seed=24) + df_s = _augment_with_survey(df, n_strata=2, psus_per_stratum=4, fpc=200.0) + design = SurveyDesign(weights="w", strata="stratum", psu="psu", fpc="N_h") + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + res_lag0 = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=0, + vcov_type="conley", + event_study=True, + horizon_max=2, + ).fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + res_lag1 = SpilloverDiD( + rings=[0.0, 100.0], + conley_coords=("lat", "lon"), + conley_metric="haversine", + conley_cutoff_km=self._CUTOFF_KM, + conley_lag_cutoff=1, + vcov_type="conley", + event_study=True, + horizon_max=2, + ).fit( + df_s, + outcome="y", + unit="unit", + time="time", + first_treat="first_treat", + survey_design=design, + ) + # ATT bit-identical (point estimate variance-method-invariant) + np.testing.assert_array_equal(res_lag1.att, res_lag0.att) + # SE differs (serial term contributes nonzero variance under panel + # data with multiple periods of within-PSU score correlation) + assert not np.allclose(res_lag1.se, res_lag0.se, rtol=1e-10) + + class TestSpilloverDiDWaveE2ConleySurveyDesignEventStudy: """Event-study branch + conley + survey, both is_staggered branches."""