diff --git a/CHANGELOG.md b/CHANGELOG.md
index 7763c178..6c909c83 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -11,7 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`CallawaySantAnna.cluster=` silent no-op (Phase 1b interstitial).** `CallawaySantAnna(cluster="state").fit(...)` previously accepted the argument, stored it, returned it from `get_params()`, but never consumed it anywhere in the fit / aggregator / bootstrap pipeline (`staggered.py:154-156` docstring claimed "Defaults to unit-level clustering" — but for bare `cluster=X`, the aggregator at `staggered_aggregation.py:193-213` computed per-unit IF variance regardless, and the bootstrap at `staggered_bootstrap.py:323-347` drew per-unit multiplier weights regardless). Users who explicitly set `cluster="state"` got per-unit inference with no warning — typically SE too small under intra-cluster correlation. **Survey-PSU clustering via `survey_design=SurveyDesign(psu="state")` was NOT affected** and continued to cluster correctly via `_compute_stratified_psu_meat`. The fix synthesizes a minimal `SurveyDesign(psu=self.cluster, weight_type="pweight")` when bare `cluster=` is set without an explicit survey design, threading the synthesized PSU through the existing survey-PSU machinery (aggregator + bootstrap). A new dedicated `df_inference` field on `CallawaySantAnnaResults` carries the cluster-level df for the bare-cluster-synthesize path ONLY (where `survey_metadata` is intentionally `None` to preserve the `DiagnosticReport.survey_metadata is not None` skip at `diagnostic_report.py:848-856` + `:1150-1158` for "Original fit used a survey design" reasoning, and the `summary()` survey block render at `staggered_results.py:235-238`). `HonestDiD` at `honest_did.py` prefers `survey_metadata.df_survey` first (the actual CS-internal df, which may be tightened post-resolve for replicate designs) and falls back to `df_inference` for bare-cluster fits — so downstream consumers always see the cluster df without overriding the post-recompute survey df. When `survey_design=SurveyDesign(weights=Y)` without PSU is provided AND `cluster=X` is also set, `_inject_cluster_as_psu` injects the bare cluster as the effective PSU AND an `effective_survey_design = replace(survey_design, psu=self.cluster)` is constructed so the downstream `_validate_unit_constant_survey` catches movers (units crossing clusters across periods) on panel data via the now-PSU-bearing design; `survey_metadata` is recomputed to reflect the injected PSU. When both `cluster=X` AND `survey_design.psu=Y` are set, the explicit PSU wins via `_resolve_effective_cluster` (emits `UserWarning` if partitions differ). **`cluster= + SurveyDesign(replicate_weights=[...])` raises `NotImplementedError`**: replicate-weight variance is computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores PSU/cluster entirely (`survey.py:104-109` enforces replicate_weights are mutually exclusive with strata/psu/fpc); honoring bare `cluster=` would silently have no effect while populating `cluster_name`/`n_clusters` on Results dishonestly. Assertive regression tests pin the fix on both panel and repeated-cross-section paths plus the survey/non-survey contract boundaries: `test_cluster_robust_ses_differ_from_unit_level`, `test_bare_cluster_works_with_panel_false_rcs`, `test_bare_cluster_synthesizes_survey_design`, `test_inject_branch_panel_mover_raises`, `test_replicate_weight_plus_cluster_rejected`, `test_bare_cluster_populates_df_inference` (asserts the dedicated cluster-df carrier is set), `test_bare_cluster_does_not_set_survey_metadata` (asserts the survey/non-survey contract is preserved — DiagnosticReport / summary() must not treat a bare-cluster fit as survey-backed), `test_explicit_survey_design_does_populate_survey_metadata` (asserts the inject-branch path still populates survey_metadata for legitimate user-provided SurveyDesign), and `test_bare_cluster_honest_did_uses_df_inference` (end-to-end: HonestDiD threads df_inference into HonestDiDResults.df_survey, preventing silent normal-theory regression on a future refactor). When `cluster=None` (default), behavior is bit-equal to pre-PR (wiring guarded by `if self.cluster is not None:`). Audit verified the no-op was CS-specific — the other 7 Phase 1b estimators (SunAbraham, StackedDiD, WooldridgeDiD, ImputationDiD, TripleDifference, TwoStageDiD, EfficientDiD) handle bare `cluster=` correctly.
 
 ### Added
-- **TROP methodology-review-tracker promotion: In Progress → Complete.** Closes the Athey, Imbens, Qu & Viviano (2025) *Triply Robust Panel Estimators* (arXiv:2508.21536) primary-source review on the methodology-review tracker. PR-A (the paper review on file at `docs/methodology/papers/athey-2025-review.md`) was previously merged as #443; this PR is the F.L.I.P. consolidation — new `tests/test_methodology_trop.py` with paper-equation-numbered Verified Components walk-through (10 classes, 36 tests covering Eq. 2 soft-threshold SVD prox / plain prox-gradient monotonicity on a toy setup / weighted-prox solver (the shipped accelerated FISTA outer loop is NOT directly tested for per-step monotonicity because Nesterov momentum does not guarantee it), Eq. 3 unit + time weights, Eqs. 4-5 + Algorithm 1 LOOCV with two-stage cycling, Corollary 1 three-condition unbiasedness, Theorem 5.1 MC-ranking realisation of the triply-robust bias bound, Section 2.2 DID + MC reductions, Eq. 13 + Algorithm 2 per-(i, t) estimation, Algorithm 3 stratified pairs bootstrap, Section 3 / Eq. 6 factor-DGP recovery, plus a `TestTROPDeviations` class locking 11 documented library deviations). Migrated from `tests/test_trop.py`: `TestMethodologyVerification` (5 tests → `TestTROPEquation6FactorDGPRecovery`), four paper-conformance tests + one weighted-solver convergence test from `TestPaperConformanceFixes` (→ `TestTROPEquation3Weights` / `TestTROPAlgorithm1LOOCV` / `TestTROPNuclearNormProx` / `TestTROPAlgorithm3Bootstrap`), three prox / plain prox-gradient monotonicity / weighted-objective tests from `TestTROPNuclearNormSolver` (→ `TestTROPNuclearNormProx`), plus a cycling-convergence test from `TestCyclingSearch` and the factor-DGP smoke from `TestTROPvsSDID`; the `TestPaperConformanceFixes` and `TestTROPvsSDID` shells are deleted. `TestTROPNuclearNormSolver` retains its single defensive `test_zero_weights_no_division_error`. `METHODOLOGY_REVIEW.md` TROP row promoted with merge date 2026-05-24, full Verified Components / Test Coverage / Deviations / Outstanding Concerns / R Parity structure mirroring HAD (PR #473) / ContinuousDiD (PR #476) / DCDH (PR #481) / WooldridgeDiD (PR #486) precedents. **Documented deviations:** Gap #5 (unnormalised weights match Eq. 2, not Section 5 sum-to-one), Gap #9 (unbalanced panels supported beyond paper's balanced-panel assumption), rank selection is implicit via nuclear-norm soft-thresholding with no discrete `rank_selection` constructor parameter (matches paper Section 5.3 + Appendix; corrects an earlier REGISTRY overclaim that listed cv / ic / elbow methods), `λ_nn = ∞` → 1e10 internal sentinel with original-value storage on results. **Outstanding Concerns (deferred):** Equation 14 covariate extension (`TROP.fit()` does not accept a `covariates` kwarg; non-support locked by `TestTROPDeviations::test_covariates_not_supported` via `inspect.signature` to guard against future `**kwargs`) and Theorem 8.1 (covariate triple robustness) deferred until use cases motivate; SC / SDID reductions paper-claimed under "specific (omega, theta) weight choices" not provided in the paper text — cross-language anchor deferred. **R parity:** deferred until paper-author reference implementation is released ("forthcoming" per the paper). REGISTRY.md `## TROP` section gains a "Verified Components" expansion: 10 ticked requirements + four `**Note:**` / `**Note (library-side choice):**` / `**Note (deferral):**` annotations consolidating the deviation surface (Eq. 10 balancing-decomposition pointer, Gap #5 weight-normalisation library-side choice, Eq. 14 covariate deferral). No source-code changes to `diff_diff/trop*.py`. Methodology sign-off scope: paper-aligned identification ingredients (Eq. 2 prox, Eq. 3 weights, Eqs. 4-5 LOOCV, Algorithms 1-3, Corollary 1 unbiasedness, Eq. 6 simulation recovery, DID reduction, documented deviations) are directly locked by the new tests. Theorem 5.1 is verified as a simulation sanity check (TROP RMSE < DID RMSE under LOOCV-tuned weights), NOT as a direct fixed-weight conditional-bias-bound lock; the Matrix Completion reduction is verified as code-path activation (effective_rank > 0 + beats DID baseline), NOT as equivalence against an independent MC reference. The Eq. 14 covariate extension is documented as deferred (TROP.fit() has no `covariates` kwarg).
+- **New tutorial: SpilloverDiD on synthetic TVA-style spillover panel (Butts 2021 §4 analogue) with spillover-bandwidth sensitivity and Conley spatial-HAC inference.** `docs/tutorials/23_spillover_tva.ipynb` walks a practitioner through the SpilloverDiD workflow on a 4-period 200-unit panel laid out as treated cluster + near-control band + far-control band. The DGP is tuned at locked seed 23 (`n_treated=25, n_near=120, n_far=55, tau_total=-7.4, delta_1=-4.5, d_bar=100 km`) so naive multi-period TWFE on the full sample understates the direct effect by ~42% — matching the Butts (2021) §4 Table 1 Panel A bias-correction direction documented at `docs/methodology/papers/butts-2021-review.md:257`. SpilloverDiD with `rings=[0.0, 100.0]` cleanly recovers both `tau_total = -7.34` and `delta_1 = -4.53`. The tutorial covers (1) problem framing with TVA / Kline-Moretti (2014) citation, (2) panel construction with the DGP equation inline, (3) naive headline and the bias mechanism, (4) `rings` sensitivity grid at outer edges 50/100/150/200 km (estimates stabilize once `d_bar` covers the true spillover horizon), (5) the headline `SpilloverDiD` fit, (6) Conley spatial-HAC variance via `vcov_type="conley", conley_cutoff_km=100, conley_lag_cutoff∈{0,1}` — the cutoff = `d_bar` choice follows Butts §3.1, while the `conley_lag_cutoff` serial term is the library's documented Wave E.2 follow-up synthesis with Newey-West-style serial Bartlett HAC (per REGISTRY "Variance (Wave E.2 follow-up)") — showcasing the Spillover-Conley work shipped through PRs #468 / #474 / #477 / #482 / #485 / #489. Drift detection in `tests/test_t23_spillover_tva_drift.py` (21 function-level tests, T19 pattern) pins panel composition, geographic bands, true parameters, naive coefficient endpoint (round-to-2 — well-conditioned MultiPeriodDiD fit, BLAS-stable to better than 0.005), recovery endpoints (round-to-2 on `tau_total = -7.34` and `delta_1 = -4.53`), seed-specific geometry numbers (max distance from origin, band diameters, cross-band max pair, within-band median + within-100km pair fractions), sensitivity grid `rings=[0, 50]` endpoint at round-to-1 (per reviewer guidance for BLAS safety on the borderline-rank-deficient point), notebook §2 constant-sync + AST-body sync against the test fixture, exact `tau_total` AND `delta_1` identity across the d_bar ∈ {100, 150, 200} grid (on THIS synthetic DGP only, because no units lie in the 80-200 km band — once `d_bar` covers the true 100 km spillover horizon, the additional ring bins are empty and contribute zero observations; this is NOT a generic Butts §4 result and the registry frames `d_bar` as a real bias/variance tradeoff in the general case), Conley SE divergence from HC1 (direction-pinned: Conley < HC1 on this DGP), and the platform-agnostic post-filter warning surface (T19 pattern: mirrors the notebook's narrow `.*encountered in matmul` `RuntimeWarning` filter inside the capture block and asserts no warnings remain; on Apple Silicon M4 + numpy<2.3 the three known Accelerate BLAS matmul warnings — documented at `TODO.md` "RuntimeWarnings in Linear Algebra Operations" — fire and are filtered, on M3 / Intel / Linux or numpy>=2.3 the filter is a no-op, EITHER WAY any UserWarning / FutureWarning / non-matmul RuntimeWarning surfaces immediately). Per `feedback_t19_drift_guards_test_file_only`, ZERO in-notebook asserts — all numerical guards live in the test file. Per `feedback_notebook_workflow`, the DGP was developed and locked in a temporary `_scratch/` script (gitignored) before being pasted into the notebook §2 cell and duplicated into the drift-test `panel` fixture. `docs/index.rst` toctree updated; `docs/references.rst` gains the Kline-Moretti (2014) entry; `docs/methodology/papers/butts-2021-review.md:257` cross-reference updated from "T22 tutorial" to "T23 tutorial" (slot 22 was occupied by `22_had_survey_design.ipynb`). The `SpilloverDiD T22 TVA tutorial` row in `TODO.md` (renumbered to T23 at delivery) is dropped.
 - **TripleDifference `vcov_type` input contract (Phase 1b interstitial #2, permanently narrow).** `TripleDifference(vcov_type=...)` now accepts `{"hc1"}` only (default). The analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages mirroring the CS interstitial. The rejection is **library-architectural, not paper-prescribed**: TripleDifference uses influence-function-based variance per Ortiz-Villavicencio & Sant'Anna (2025) arXiv:2505.09942 — the 3-pairwise-DiD decomposition `inf = w3·IF_3 + w2·IF_2 - w1·IF_1` has no single design matrix to compute hat-matrix leverage `1/(1-h_ii)` or Bell-McCaffrey Satterthwaite DOF on. The narrow contract is permanent and applies to the remaining IF-based estimators (`ImputationDiD`, `EfficientDiD`) when their `vcov_type` threading PRs land. `hc1` with `cluster=None` ≡ per-unit IF variance (`std(inf)/sqrt(n)`); `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the combined IF (`(G/(G-1)) · Σ_c (Σ_{i∈c} ψ_i)² / n²`, plain CR1 — no Stata-style `(n-1)/(n-p)` finite-sample factor because the IF has no design-matrix `p` in the OLS sense); `hc1` with `survey_design=` ≡ TSL on the combined IF (analytical or replicate). All three paths are unchanged at machine precision (default behavior bit-equal across all 3 estimation methods `{dr, reg, ipw}`). `vcov_type` and `cluster_name` fields added to `TripleDifferenceResults`, threaded through `to_dict()`. `summary()` routes the variance-family label through the shared `_format_vcov_label` (`results.py:49-89`): bare fits render `"HC1 heteroskedasticity-robust"`, clustered fits render `"CR1 cluster-robust at <cluster_name>, G=<n>"` (since the actual algebra is Liang-Zeger CR1 on the combined IF), and survey-backed fits suppress the variance-estimator line entirely (the Survey Design block already names design + n_psu + df, and the analytical SE is TSL on the combined IF — a raw "hc1" label would misstate the inference path). **`cluster= + SurveyDesign(replicate_weights=[...])` raises `NotImplementedError`** at `fit()`: replicate-weight variance is computed by replicate reweighting (BRR / Fay / JK1 / JKn / SDR) and ignores PSU/cluster entirely; honoring bare `cluster=` would silently have no effect on the variance estimate while populating `cluster_name`/`n_clusters` on Results dishonestly. Mirrors the `CallawaySantAnna` guard from PR #487. Under `survey_design.psu` (non-replicate path) `cluster_name`/`n_clusters` on Results are suppressed (set to None) so they can't misreport the raw cluster argument when the resolver picks the survey PSU instead. `set_params(vcov_type=...)` mirrors CS pattern (mutate-then-validate-at-use, no atomic validation); `fit()` re-validates `vcov_type` at use time so a `set_params(vcov_type="hc4")` mutation surfaces a clear error at fit-time rather than silently propagating to Results metadata. **Interstitial PR #2** (after CS PR #487) rather than full Phase 1b PR 4/8 vcov_type threading — the narrow surface is methodologically dictated by TripleDifference's IF-based variance, not a deferral. New `TestTripleDifferenceVcovType` class in `tests/test_triple_diff.py` covers the 5-surface contract (default/cluster/survey bit-equal, `__init__` rejection per family, `fit()`-time revalidation) plus 8 introspection / convenience-function tests. REGISTRY.md "IF-based variance estimators vs analytical-sandwich estimators" cross-reference section updated to list `TripleDifference` alongside `CallawaySantAnna` in the "Enforced today" tier. Phase 1b PR 4/8 (full `{classical, hc1, hc2, hc2_bm}` threading) resumes on a different estimator (TwoStageDiD) post-merge; the two remaining IF-based estimators (`ImputationDiD`, `EfficientDiD`) follow the same narrow-contract template.
 - **CallawaySantAnna `vcov_type` input contract (Phase 1b interstitial, permanently narrow).** `CallawaySantAnna(vcov_type=...)` now accepts `{"hc1"}` only (default). The analytical-sandwich families `{classical, hc2, hc2_bm}` and `conley` spatial-HAC are REJECTED at `__init__` with methodology-rooted messages. The rejection is **library-architectural, not paper-prescribed**: CS uses influence-function-based variance per Callaway & Sant'Anna (2021) — per-(g,t) doubly-robust / IPW / outcome-regression structure — and has no single design matrix to compute hat-matrix leverage `1/(1-h_ii)` or Bell-McCaffrey Satterthwaite DOF on. The narrow contract is permanent and applies to other IF-based estimators (ImputationDiD, EfficientDiD) when their `vcov_type` threading PRs land. `hc1` with `cluster=None` ≡ per-unit IF variance (Williams 2000 form); `hc1` with `cluster=X` ≡ CR1 Liang-Zeger on the IF activated via the cluster= wiring fix above. Documentation in `docs/methodology/REGISTRY.md` "IF-based variance estimators vs analytical-sandwich estimators" subsection. `vcov_type`, `cluster_name`, `n_clusters`, `df_inference` added to `CallawaySantAnnaResults` (the canonical PSU column wins for `cluster_name` reporting — `survey_design.psu` when explicit PSU is provided, `self.cluster` when bare cluster synthesizes/injects). `set_params(vcov_type=...)` mirrors SA pattern (mutate-then-refresh `_vcov_type_explicit`, no atomic validation); `fit()` re-validates `vcov_type` at use time so a `set_params(vcov_type="hc4")` mutation surfaces a clear error at fit-time rather than silently propagating to Results metadata. **Interstitial PR** rather than full Phase 1b PR 4/8 vcov_type threading — the narrow surface is methodologically dictated by CS's IF-based variance, not a deferral. Phase 1b PR 4/8 (full {classical, hc1, hc2, hc2_bm} threading) resumes on a different estimator post-merge.
 - **TripleDifference cluster-changes-SE defensive regression test.** Added `tests/test_triple_diff.py::TestTripleDifferenceClusterDefensive::test_cluster_changes_ses` asserting that `TripleDifference(cluster="state")` produces SE differing from `cluster=None` SE by `>1e-6` on a fixed-seed panel with state-level random effects. Defensive coverage closes a test gap identified during the Phase 1b cluster-wiring audit; TripleDifference's bare-cluster code path (`triple_diff.py:1245-1259`) was already correct but lacked a positive regression test. Mirrors `tests/test_two_stage.py::test_cluster_changes_ses`.
diff --git a/TODO.md b/TODO.md
index 169968a5..54cd3d69 100644
--- a/TODO.md
+++ b/TODO.md
@@ -159,7 +159,6 @@ Deferred items from PR reviews that were not addressed before merge.
 | `SpilloverDiD(vcov_type="conley", conley_lag_cutoff > 0, survey_design=...)` no-effective-PSU serial Bartlett HAC. Wave E.2 follow-up ships the panel-block composition when an effective PSU exists (explicit `survey_design.psu` OR injected via `cluster=<col>` per `_inject_cluster_as_psu`). Weights-only / strata-only survey designs WITHOUT a cluster fallback raise `NotImplementedError` at `SpilloverDiD.fit` post-resolution because under the pseudo-PSU = obs-index fallback each pseudo-PSU appears in exactly one period — the per-PSU serial cross-period loop would silently contribute zero. Fix would either derive a unit-level serial fallback for no-PSU designs (mixes IF allocators with the pseudo-PSU spatial term — needs methodology work) or route the serial loop through `conley_unit` with explicit documentation of the IF-allocator asymmetry. Regression goldens vs the effective-PSU shipped path. | `spillover.py::SpilloverDiD.fit`, `two_stage.py::_compute_stratified_serial_bartlett_meat` | follow-up (Wave E.2 follow-up tail) | Low |
 | `SpilloverDiD(ring_method="count")` extension. Currently only the nearest-treated-ring specification is exposed. Count-of-treated-in-ring (paper Section 3.2 end) is methodologically supported by Butts but re-introduces functional-form dependence; expose with an explicit kwarg gate and documentation warning. | `spillover.py::SpilloverDiD.fit` | follow-up | Low |
 | `SpilloverDiD` data-driven `d_bar` selection (Butts 2021b / Butts 2023 JUE Insight cross-validation). | `spillover.py::SpilloverDiD` | follow-up | Low |
-| `SpilloverDiD` T22 TVA tutorial (`docs/tutorials/22_spillover_did.ipynb`): synthetic TVA-style DGP reproducing Butts (2021) Section 4 Table 1 Panel A bias-correction direction (~40% understatement). Split from the methodology PR per user-confirmed scope split (2026-05-15). | `docs/tutorials/`, `tests/test_t22_*_drift.py` | follow-up (Wave B) | Medium |
 | Extend `TwoStageDiD` with Conley vcov as a first-class feature (mirrors Wave A's TWFE/MPD/DiD extension). Currently `TwoStageDiD.__init__` lacks `vcov_type` / `conley_*` kwargs; `SpilloverDiD` works around this by threading Conley directly via `solve_ols` at stage 2. Promoting Conley to TwoStageDiD's API removes the workaround and lets non-spillover users access Conley + Gardner two-stage. | `diff_diff/two_stage.py` | follow-up | Medium |
 | `SpilloverDiD` sparse cKDTree path for the staggered nearest-treated-distance helper (mirrors the static helper's sparse branch). Currently `_compute_nearest_treated_distance_staggered` always builds dense `(n_units, n_treated_by_onset)` pairwise distance matrices per cohort; on large staggered panels with many cohorts this is avoidable memory/runtime. Add a sparse k-d-tree branch analogous to `_compute_nearest_treated_distance_sparse`, gated on `n > _CONLEY_SPARSE_N_THRESHOLD`. | `spillover.py::_compute_nearest_treated_distance_staggered` | follow-up (Wave B) | Low |
 | `SpilloverDiDResults` in `DiagnosticReport` dispatch tables. Wave C event-study emits a TwoStageDiD-compatible `event_study_effects: Dict[int, Dict]` alias that `plot_event_study` consumes via the new `reference_period` attribute fallback in `_extract_plot_data`, but `SpilloverDiDResults` is NOT registered in `DiagnosticReport`'s `_APPLICABILITY` / `_PT_METHOD` tables — so `DiagnosticReport(spillover_result)` doesn't currently route to event-study diagnostics. Registering requires (a) deciding which diagnostics apply (parallel trends, pre-trends power, heterogeneity, design-effect) AND (b) adding an end-to-end test. | `diff_diff/diagnostic_report.py::_APPLICABILITY`, `_PT_METHOD` | follow-up (Wave C) | Low |
diff --git a/docs/index.rst b/docs/index.rst
index 5c500530..f4994db4 100644
--- a/docs/index.rst
+++ b/docs/index.rst
@@ -83,6 +83,7 @@ Quick Links
    tutorials/20_had_brand_campaign
    tutorials/21_had_pretest_workflow
    tutorials/22_had_survey_design
+   tutorials/23_spillover_tva
 
 .. toctree::
    :maxdepth: 1
diff --git a/docs/methodology/papers/butts-2021-review.md b/docs/methodology/papers/butts-2021-review.md
index 2850fff9..290531fd 100644
--- a/docs/methodology/papers/butts-2021-review.md
+++ b/docs/methodology/papers/butts-2021-review.md
@@ -254,7 +254,7 @@ where `D^k_{it} := D_i * 1{K_it = k}` and `K_it` is years since treatment turned
   - Agriculture employment: standard DiD = -5.1% per decade. With spillovers controlled, total effect = -7.4% per decade. **Spillover bias = +2.3 pp; canonical DiD UNDERESTIMATED the agricultural decline by ~40%.** Spillover-on-control coefficients are negative (-3.7%, -1.6%, -3.0%, -1.6% per decade across the 4 distance bins) — consistent with farm-worker out-migration to higher-paying TVA manufacturing jobs.
   - Manufacturing employment: standard DiD = +5.6% per decade. With spillovers controlled, total effect = +3.5% per decade. **Spillover bias = +2.1 pp; canonical DiD OVERESTIMATED the manufacturing gain by ~40%.** Spillover coefficients are negative (-2.0%, -2.5%, -3.3%, -3.0%), consistent with "urban shadow" effects whereby firms relocate INTO the TVA from neighboring areas.
 - Interpretation (page 21): "the long-run spillovers cause the original estimates to be about 40 percent too small for agriculture employment and 40 percent too large for manufacturing employment."
-- Useful design choice for diff-diff T22 tutorial DGP: a TVA-style bias-correction percentage of ~40% is large enough to be visible without being implausible.
+- Useful design choice for diff-diff T23 tutorial DGP: a TVA-style bias-correction percentage of ~40% is large enough to be visible without being implausible. (Tutorial slot 22 went to `22_had_survey_design.ipynb`; SpilloverDiD landed in slot 23 — `docs/tutorials/23_spillover_tva.ipynb`.)
 
 ### Empirical illustration — Opportunity Zones (Appendix B)
 - Application: revisits Chen, Glaeser, and Wessel (2021) on the 2017 federal Opportunity Zone program's effect on home prices.
diff --git a/docs/references.rst b/docs/references.rst
index 423587b8..fa54071d 100644
--- a/docs/references.rst
+++ b/docs/references.rst
@@ -188,7 +188,11 @@ Multi-Period and Staggered Adoption
 
 - **Butts, K. (2023).** "Difference-in-Differences with Spatial Spillovers." *arXiv:2105.03737v3* (originally posted 2021). https://arxiv.org/abs/2105.03737
 
-  Identifies the ring-indicator estimator implemented in our ``SpilloverDiD`` class. Section 2-3 covers non-staggered timing (Equations 5/6/8); Section 5 covers staggered timing via two-stage Gardner (Table 2). Section 3.1 (page 13) recommends Conley spatial-HAC for inference with cutoff = ``d_bar``.
+  Identifies the ring-indicator estimator implemented in our ``SpilloverDiD`` class. Section 2-3 covers non-staggered timing (Equations 5/6/8); Section 5 covers staggered timing via two-stage Gardner (Table 2). Section 3.1 (page 13) recommends Conley spatial-HAC for inference with cutoff = ``d_bar``. Section 4 Table 1 Panel A (TVA agriculture) is the empirical anchor for tutorial ``docs/tutorials/23_spillover_tva.ipynb`` (~40% understatement direction).
+
+- **Kline, P., & Moretti, E. (2014).** "Local Economic Development, Agglomeration Economies, and the Big Push: 100 Years of Evidence from the Tennessee Valley Authority." *The Quarterly Journal of Economics*, 129(1), 275-331. https://doi.org/10.1093/qje/qjt034
+
+  Original Tennessee Valley Authority application that Butts (2021) §4 revisits to illustrate spillover bias in the canonical place-based-intervention DiD. Cited as the motivating real-world example in ``docs/tutorials/23_spillover_tva.ipynb``.
 
 - **Conley, T. G. (1999).** "GMM Estimation with Cross Sectional Dependence." *Journal of Econometrics*, 92(1), 1-45. https://doi.org/10.1016/S0304-4076(98)00084-0
 
diff --git a/docs/tutorials/23_spillover_tva.ipynb b/docs/tutorials/23_spillover_tva.ipynb
new file mode 100644
index 00000000..bda2cbb5
--- /dev/null
+++ b/docs/tutorials/23_spillover_tva.ipynb
@@ -0,0 +1,583 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "a5f3d36b",
+   "metadata": {},
+   "source": [
+    "# Spillover-aware DiD with `SpilloverDiD` \u2014 a TVA-style worked example\n",
+    "\n",
+    "**Use this notebook when:** your treatment plausibly affects nearby\n",
+    "*untreated* units through geographic externalities, and your standard\n",
+    "DiD specification treats those neighbors as part of the control group.\n",
+    "\n",
+    "Place-based interventions (industrial policy, public-works investment,\n",
+    "vaccination campaigns, retail rollouts, technology subsidies) routinely\n",
+    "fit this pattern. The canonical reference is Kline & Moretti (2014)'s\n",
+    "study of the Tennessee Valley Authority (TVA): treated counties received\n",
+    "federal investment, but adjacent untreated counties absorbed positive\n",
+    "spillovers via labor flows, supply chains, and population movement.\n",
+    "Naive DiD that lumps adjacent counties into the control group then\n",
+    "*underestimates* the direct effect on treated counties, because the\n",
+    "controls are contaminated by spillover.\n",
+    "\n",
+    "Butts (2021) formalizes this with a single-stage regression that\n",
+    "explicitly models per-distance-band spillover effects on near-controls\n",
+    "while keeping far-controls as the clean comparison group:\n",
+    "\n",
+    "$$ y_{it} = \\mu_i + \\lambda_t + \\tau\\,D_{it} + \\sum_j \\delta_j\\,(1 - D_{it})\\,\\text{Ring}_{it,j} + \\varepsilon_{it} $$\n",
+    "\n",
+    "The diff-diff library's `SpilloverDiD` estimator (Butts 2021 + Gardner\n",
+    "2022 two-stage) implements this on panels with one or more spatial\n",
+    "rings, recovers the direct effect $\\tau$ and the spillover coefficients\n",
+    "$\\delta_j$, and supports Conley (1999) spatial-HAC standard errors out\n",
+    "of the box.\n",
+    "\n",
+    "This tutorial walks through the practitioner workflow:\n",
+    "\n",
+    "1. Build a synthetic TVA-style panel with known $\\tau$ and $\\delta_1$.\n",
+    "2. Fit a naive multi-period TWFE and observe the bias.\n",
+    "3. Choose a spillover bandwidth via theory + a sensitivity grid.\n",
+    "4. Fit `SpilloverDiD`; verify recovery of $\\tau$ and $\\delta_1$.\n",
+    "5. Add Conley spatial-HAC variance for robust inference.\n",
+    "\n",
+    "**References:** Butts (2021) \u2014 *arXiv:2105.03737*; Kline & Moretti\n",
+    "(2014) \u2014 *QJE* 129(1); Conley (1999) \u2014 *Journal of Econometrics* 92(1)."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a16e76a4",
+   "metadata": {},
+   "source": [
+    "## 2. The synthetic panel\n",
+    "\n",
+    "We build a 4-period, 200-unit panel laid out as three geographic bands:\n",
+    "\n",
+    "| Band         | Count | Location                                            | Treatment |\n",
+    "|--------------|-------|-----------------------------------------------------|-----------|\n",
+    "| Treated      | 25    | clustered around (0,0); max ~12 km from origin, cluster diameter ~22 km at seed 23 | Yes, from period 3 |\n",
+    "| Near-control | 120   | ~12-82 km north (0.1\u00b0-0.7\u00b0 latitude)                | No (but absorbs spillover) |\n",
+    "| Far-control  | 55    | ~224-331 km north (2.0\u00b0-3.0\u00b0 latitude)              | No (clean control) |\n",
+    "\n",
+    "True parameters: $\\tau_{\\text{total}} = -7.4$ (the direct effect on\n",
+    "treated, matching the Butts \u00a74 agriculture-employment magnitude),\n",
+    "$\\delta_1 = -4.5$ (the per-period spillover on near-controls), and a\n",
+    "locked random seed of `23`. The data-generating equation is\n",
+    "\n",
+    "$$ y_{it} = \\alpha_i + \\lambda_t + \\tau_{\\text{total}}\\,D_{it} + \\delta_1\\,(1 - D_{it})\\,\\text{Ring}_{it,1} + \\mathcal{N}(0, 0.5) $$\n",
+    "\n",
+    "We tune the band sizes and $\\delta_1$ so the naive TWFE coefficient\n",
+    "lands at ~58% of $\\tau_{\\text{total}}$ \u2014 i.e., the ~40% understatement\n",
+    "direction documented in Butts (2021) \u00a74 Table 1 Panel A."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1712f2bb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import warnings\n",
+    "\n",
+    "import numpy as np\n",
+    "import pandas as pd\n",
+    "\n",
+    "from diff_diff import MultiPeriodDiD, SpilloverDiD\n",
+    "\n",
+    "MAIN_SEED = 23\n",
+    "N_TREATED = 25\n",
+    "N_NEAR = 120\n",
+    "N_FAR = 55\n",
+    "T_PERIODS = 4\n",
+    "FIRST_TREAT = 3\n",
+    "TAU_TOTAL = -7.4\n",
+    "DELTA_1 = -4.5\n",
+    "D_BAR_KM = 100.0\n",
+    "NOISE_SD = 0.5\n",
+    "\n",
+    "\n",
+    "def build_t23_panel(seed: int = MAIN_SEED) -> pd.DataFrame:\n",
+    "    \"\"\"4-period TVA-style synthetic panel with known direct + spillover.\n",
+    "\n",
+    "    Layout: treated cluster + near-control band (within D_BAR_KM) + far-control\n",
+    "    band (clean). Outcomes: y_it = alpha_i + lambda_t + TAU_TOTAL * D_it +\n",
+    "    DELTA_1 * Ring1_it * (1 - D_it) + N(0, NOISE_SD).\n",
+    "    \"\"\"\n",
+    "    rng = np.random.default_rng(seed)\n",
+    "    n_units = N_TREATED + N_NEAR + N_FAR\n",
+    "    units = [f\"u{i:04d}\" for i in range(n_units)]\n",
+    "    alpha = rng.normal(0.0, 1.0, size=n_units)\n",
+    "    lambda_t = np.array([0.0, 0.5, 1.0, 1.5])[:T_PERIODS]\n",
+    "\n",
+    "    coords = np.empty((n_units, 2))\n",
+    "    is_treated_unit = np.zeros(n_units, dtype=bool)\n",
+    "    is_near_unit = np.zeros(n_units, dtype=bool)\n",
+    "    for i in range(N_TREATED):\n",
+    "        coords[i] = (rng.normal(0, 0.05), rng.normal(0, 0.05))\n",
+    "        is_treated_unit[i] = True\n",
+    "    for i in range(N_TREATED, N_TREATED + N_NEAR):\n",
+    "        coords[i] = (rng.uniform(0.1, 0.7), rng.uniform(-0.3, 0.3))\n",
+    "        is_near_unit[i] = True\n",
+    "    for i in range(N_TREATED + N_NEAR, n_units):\n",
+    "        coords[i] = (rng.uniform(2.0, 3.0), rng.uniform(-0.5, 0.5))\n",
+    "\n",
+    "    rows = []\n",
+    "    for i, u in enumerate(units):\n",
+    "        for t in range(1, T_PERIODS + 1):\n",
+    "            D_it = int(is_treated_unit[i] and t >= FIRST_TREAT)\n",
+    "            Ring1_it = int(is_near_unit[i] and t >= FIRST_TREAT)\n",
+    "            y = (\n",
+    "                alpha[i]\n",
+    "                + lambda_t[t - 1]\n",
+    "                + TAU_TOTAL * D_it\n",
+    "                + DELTA_1 * Ring1_it * (1 - D_it)\n",
+    "                + rng.normal(0, NOISE_SD)\n",
+    "            )\n",
+    "            rows.append(\n",
+    "                {\n",
+    "                    \"unit\": u,\n",
+    "                    \"time\": t,\n",
+    "                    \"lat\": coords[i, 0],\n",
+    "                    \"lon\": coords[i, 1],\n",
+    "                    \"ever_treated\": int(is_treated_unit[i]),\n",
+    "                    \"D\": D_it,\n",
+    "                    \"y\": y,\n",
+    "                }\n",
+    "            )\n",
+    "    return pd.DataFrame(rows)\n",
+    "\n",
+    "\n",
+    "df = build_t23_panel()\n",
+    "print(f\"Panel: {len(df)} rows, {df['unit'].nunique()} units, T={df['time'].nunique()}\")\n",
+    "df.head()\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "21aa12c0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Quick geographic check: treated cluster, near-control band, far-control band\n",
+    "band_counts = (\n",
+    "    df.drop_duplicates(\"unit\")\n",
+    "    .assign(\n",
+    "        band=lambda d: np.select(\n",
+    "            [d[\"ever_treated\"] == 1, d[\"lat\"] <= 1.0],\n",
+    "            [\"treated\", \"near\"],\n",
+    "            default=\"far\",\n",
+    "        )\n",
+    "    )\n",
+    "    .groupby(\"band\")\n",
+    "    .size()\n",
+    "    .reindex([\"treated\", \"near\", \"far\"])\n",
+    ")\n",
+    "print(band_counts)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8be4e7c9",
+   "metadata": {},
+   "source": [
+    "## 3. The naive headline \u2014 multi-period TWFE on the full sample\n",
+    "\n",
+    "A practitioner reaching for a default DiD specification fits a\n",
+    "multi-period TWFE on the full panel, treating ALL non-treated-cluster\n",
+    "units as controls \u2014 both the near-controls (which absorb spillover)\n",
+    "and the far-controls (which don't).\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0c563f94",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "naive = MultiPeriodDiD()\n",
+    "with warnings.catch_warnings():\n",
+    "    # absorb=['unit'] makes the unit-invariant 'ever_treated' indicator\n",
+    "    # perfectly collinear with unit FE; MultiPeriodDiD drops it (with a\n",
+    "    # UserWarning) and identifies the ATT through the ever_treated \u00d7 post\n",
+    "    # interaction columns. This is the expected TWFE specification.\n",
+    "    warnings.filterwarnings(\"ignore\", category=UserWarning, message=\"Rank-deficient design matrix\")\n",
+    "    naive_res = naive.fit(\n",
+    "        df,\n",
+    "        outcome=\"y\",\n",
+    "        treatment=\"ever_treated\",\n",
+    "        time=\"time\",\n",
+    "        post_periods=[3, 4],\n",
+    "        unit=\"unit\",\n",
+    "        absorb=[\"unit\"],\n",
+    "        reference_period=2,  # explicit pre-period; matches the current MPD default\n",
+    "    )\n",
+    "\n",
+    "print(f\"Naive TWFE ATT: {naive_res.att:.4f}  (true tau_total = {TAU_TOTAL})\")\n",
+    "print(f\"  SE:           {naive_res.se:.4f}\")\n",
+    "print(f\"  Ratio to true: {naive_res.att / TAU_TOTAL:.3f}\")\n",
+    "print(f\"  Understatement: {(1 - naive_res.att / TAU_TOTAL) * 100:.1f}%\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "7253fe4c",
+   "metadata": {},
+   "source": [
+    "The naive estimate is roughly **-4.29**, about 58% of the true\n",
+    "$\\tau_{\\text{total}} = -7.4$ \u2014 a **~42% understatement** that matches\n",
+    "the Butts (2021) \u00a74 agriculture direction. The mechanism is\n",
+    "straightforward: with $D_{i,t} = 1$ only for the treated cluster, the\n",
+    "near-control band contributes negative outcomes in the post-period (via\n",
+    "$\\delta_1 = -4.5$) into the *control* arm. The TWFE within-comparison\n",
+    "then sees a smaller pre/post gap on the treatment side relative to the\n",
+    "contaminated control side, and reports a deflated direct effect.\n",
+    "\n",
+    "This is exactly the bias `SpilloverDiD` is designed to remove."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "cb77d87a",
+   "metadata": {},
+   "source": [
+    "## 4. Choosing the spillover bandwidth\n",
+    "\n",
+    "`SpilloverDiD` is parameterized by `rings=[r_0, r_1, ..., r_K]` where\n",
+    "the $j$-th ring covers distance band $[r_{j-1}, r_j]$ km. The\n",
+    "outermost ring edge `max(rings)` defines the spillover horizon\n",
+    "(`d_bar`); units outside that horizon are treated as the clean control\n",
+    "group. For a single near-control band, the simplest spec is\n",
+    "`rings=[0.0, d_bar]` \u2014 one ring, one $\\delta_1$ coefficient.\n",
+    "\n",
+    "Choosing `d_bar` is a design decision driven by (i) theory about the\n",
+    "plausible spillover range and (ii) a sensitivity check that you have\n",
+    "not under- or over-stated it. Below we vary the outer edge across 50,\n",
+    "100, 150, 200 km and report the recovered $\\tau$ and $\\delta_1$ at\n",
+    "each.\n",
+    "\n",
+    "> **API note:** `SpilloverDiD` accepts an optional explicit `d_bar`\n",
+    "> kwarg, but if set it must equal `max(rings)`. We omit it and let the\n",
+    "> default (`d_bar = max(rings)`) apply."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4804334d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "sensitivity = []\n",
+    "for outer in (50.0, 100.0, 150.0, 200.0):\n",
+    "    est = SpilloverDiD(rings=[0.0, outer], conley_coords=(\"lat\", \"lon\"))\n",
+    "    with warnings.catch_warnings():\n",
+    "        # Narrow filter for the Apple Silicon M4 + numpy<2.3 Accelerate\n",
+    "        # BLAS RuntimeWarnings (\"divide by zero\" / \"overflow\" / \"invalid\"\n",
+    "        # in matmul) documented at TODO.md \"RuntimeWarnings in Linear\n",
+    "        # Algebra Operations\". These do NOT fire on M3 / Intel / Linux\n",
+    "        # or numpy>=2.3, so this filter is a no-op there. Values are\n",
+    "        # recovered correctly on every platform.\n",
+    "        warnings.filterwarnings(\n",
+    "            \"ignore\", category=RuntimeWarning, message=\".*encountered in matmul\"\n",
+    "        )\n",
+    "        res = est.fit(df, outcome=\"y\", unit=\"unit\", time=\"time\", treatment=\"D\")\n",
+    "    delta1 = float(res.spillover_effects.iloc[0][\"coef\"])\n",
+    "    sensitivity.append({\"d_bar_km\": outer, \"tau_total\": res.att, \"delta_1\": delta1})\n",
+    "\n",
+    "sens_df = pd.DataFrame(sensitivity)\n",
+    "print(sens_df.to_string(index=False, float_format=lambda x: f\"{x:>8.4f}\"))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "787428d1",
+   "metadata": {},
+   "source": [
+    "At `d_bar = 50` km the ring is too narrow: near-controls in the\n",
+    "50-78 km band ARE exposed in the DGP (they're within the true 100 km\n",
+    "spillover horizon and carry $\\delta_1 = -4.5$), but the ring spec\n",
+    "misclassifies them as far-away clean controls. Both estimates suffer\n",
+    "from the misspecification \u2014 $\\tau$ deflates to ~-5.4 because the\n",
+    "\"clean control\" arm now contains genuinely-affected units, and the\n",
+    "spillover coefficient $\\delta_1$ attenuates to ~-2.6 because the\n",
+    "$S = 1$ ring averages 50-78 km exposure into the cleaner\n",
+    "$S = 0$ comparison. This is the registry-documented failure mode for\n",
+    "undershooting `d_bar`. (The exact values are quoted to one decimal\n",
+    "here rather than two because the `d_bar = 50` design is borderline-\n",
+    "rank-deficient \u2014 the two-decimal value can shift slightly across BLAS\n",
+    "paths even at the same locked seed.)\n",
+    "\n",
+    "At `d_bar = 100` km the ring covers the entire DGP near-control band\n",
+    "(0.1\u00b0-0.7\u00b0 latitude \u2248 11-78 km). $\\tau$ recovers to -7.34 and\n",
+    "$\\delta_1$ to -4.53 \u2014 both within 1% of the truth.\n",
+    "\n",
+    "At `d_bar = 150` km and `d_bar = 200` km, the estimates are *identical*\n",
+    "to `d_bar = 100`. This is by DGP design: there are no units in the\n",
+    "80-200 km band, so widening the ring adds zero observations and\n",
+    "zero new information. Once `d_bar` covers the true spillover horizon,\n",
+    "further widening is benign on this panel.\n",
+    "\n",
+    "**Verdict:** `d_bar = 100` km is the right choice here, balancing\n",
+    "\"capture all spillover\" against \"don't sweep clean controls into the\n",
+    "spillover bin\". On real data, a similar sensitivity grid plus paper /\n",
+    "theory guidance is the practitioner workflow Butts (2021) \u00a73.1\n",
+    "recommends."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ce1cb17e",
+   "metadata": {},
+   "source": [
+    "## 5. Fit `SpilloverDiD` and interpret\n",
+    "\n",
+    "With `d_bar = 100`, the headline fit recovers both the direct effect\n",
+    "and the spillover. We compare side-by-side with the naive TWFE from \u00a73.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "4558bcfe",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "est = SpilloverDiD(rings=[0.0, D_BAR_KM], conley_coords=(\"lat\", \"lon\"))\n",
+    "with warnings.catch_warnings():\n",
+    "    # See \u00a74 for the rationale on this narrow matmul-RuntimeWarning filter.\n",
+    "    warnings.filterwarnings(\n",
+    "        \"ignore\", category=RuntimeWarning, message=\".*encountered in matmul\"\n",
+    "    )\n",
+    "    spill = est.fit(df, outcome=\"y\", unit=\"unit\", time=\"time\", treatment=\"D\")\n",
+    "\n",
+    "delta_1_row = spill.spillover_effects.iloc[0]\n",
+    "\n",
+    "comparison = pd.DataFrame(\n",
+    "    {\n",
+    "        \"estimate\": [naive_res.att, spill.att, delta_1_row[\"coef\"]],\n",
+    "        \"se\": [naive_res.se, spill.se, delta_1_row[\"se\"]],\n",
+    "        \"true value\": [TAU_TOTAL, TAU_TOTAL, DELTA_1],\n",
+    "    },\n",
+    "    index=[\"naive TWFE (direct, biased)\", \"SpilloverDiD tau_total\", \"SpilloverDiD delta_1\"],\n",
+    ")\n",
+    "print(comparison.to_string(float_format=lambda x: f\"{x:>8.4f}\"))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e28652ea",
+   "metadata": {},
+   "source": [
+    "With the spillover term in the regression, `SpilloverDiD` cleanly\n",
+    "separates the direct effect (-7.34, very close to true -7.4) from the\n",
+    "spillover on near-controls (-4.53, very close to true -4.5). The\n",
+    "spillover-aware estimator removes the bias the naive TWFE suffered.\n",
+    "\n",
+    "The interpretation is straightforward: every period in the post window,\n",
+    "treated units lose 7.4 units of $y$ on average from the policy, AND\n",
+    "every near-control unit (within 100 km of any treated unit) loses 4.5\n",
+    "units of $y$ from the spillover. A standard DiD that ignored the\n",
+    "spillover would report only the contaminated -4.3 and miss the\n",
+    "spillover-on-controls entirely."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2e005043",
+   "metadata": {},
+   "source": [
+    "## 6. Robust inference with Conley spatial-HAC\n",
+    "\n",
+    "The HC1 standard error in \u00a75 treats each observation's score as\n",
+    "independent of every other. On spatially-correlated panels \u2014 and this\n",
+    "is the canonical case for place-based interventions \u2014 that\n",
+    "independence assumption is violated. Conley (1999) provides a\n",
+    "spatial-HAC variance estimator that downweights pairs by their\n",
+    "geographic distance through a Bartlett kernel out to a `conley_cutoff_km`\n",
+    "bandwidth. Newey-West (1987) provides the analogous serial term via\n",
+    "`conley_lag_cutoff`.\n",
+    "\n",
+    "Per Butts (2021) \u00a73.1, the Conley bandwidth is typically chosen to\n",
+    "match the spillover horizon `d_bar`. Below we compare the \u00a75 HC1 SE\n",
+    "against Conley with `conley_cutoff_km = d_bar = 100`, both with\n",
+    "`conley_lag_cutoff = 0` (spatial only) and `conley_lag_cutoff = 1` (add\n",
+    "within-unit serial correlation across consecutive periods).\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "25d20f8c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def fit_with(vcov_type, lag=None):\n",
+    "    kwargs = dict(rings=[0.0, D_BAR_KM], conley_coords=(\"lat\", \"lon\"), vcov_type=vcov_type)\n",
+    "    if vcov_type == \"conley\":\n",
+    "        kwargs[\"conley_cutoff_km\"] = D_BAR_KM\n",
+    "        kwargs[\"conley_lag_cutoff\"] = lag\n",
+    "    est = SpilloverDiD(**kwargs)\n",
+    "    with warnings.catch_warnings():\n",
+    "        # See \u00a74 for the rationale on this narrow matmul-RuntimeWarning filter.\n",
+    "        warnings.filterwarnings(\n",
+    "            \"ignore\", category=RuntimeWarning, message=\".*encountered in matmul\"\n",
+    "        )\n",
+    "        return est.fit(df, outcome=\"y\", unit=\"unit\", time=\"time\", treatment=\"D\")\n",
+    "\n",
+    "\n",
+    "res_hc1 = fit_with(\"hc1\")\n",
+    "res_c0 = fit_with(\"conley\", lag=0)\n",
+    "res_c1 = fit_with(\"conley\", lag=1)\n",
+    "\n",
+    "se_table = pd.DataFrame(\n",
+    "    {\n",
+    "        \"tau_total\": [res_hc1.att, res_c0.att, res_c1.att],\n",
+    "        \"SE\": [res_hc1.se, res_c0.se, res_c1.se],\n",
+    "    },\n",
+    "    index=[\"HC1\", \"Conley (lag=0, spatial only)\", \"Conley (lag=1, spatial + serial)\"],\n",
+    ")\n",
+    "print(se_table.to_string(float_format=lambda x: f\"{x:>8.4f}\"))\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e6d5ee1b",
+   "metadata": {},
+   "source": [
+    "Point estimates are identical across all three rows \u2014 the variance\n",
+    "choice doesn't move the coefficients. But the standard errors differ:\n",
+    "on this DGP, the Conley spatial-HAC SE comes in *lower* than HC1.\n",
+    "\n",
+    "The mechanism is the pairwise covariance structure of the\n",
+    "influence-function score matrix. With `conley_cutoff_km = 100`, the\n",
+    "spatial term in the no-survey Conley meat sums Bartlett-weighted\n",
+    "score cross-products over every within-period observation pair whose\n",
+    "units lie within 100 km **of each other** (support is pairwise, not\n",
+    "relative to the treated cluster \u2014 no survey-design PSU aggregation\n",
+    "here, the no-survey path operates on the observation-level Wave D\n",
+    "Gardner GMM-corrected influence rows $\\psi_i$). On this DGP that\n",
+    "means non-zero contribution from:\n",
+    "\n",
+    "- treated \u00d7 treated pairs (cluster diameter ~22 km at seed 23),\n",
+    "- treated \u00d7 near-control pairs (max pairwise separation ~90 km),\n",
+    "- near \u00d7 near pairs (100% of within-band pairs are within 100 km), and\n",
+    "- far \u00d7 far pairs **within** the far-control band (max within-band\n",
+    "  pairwise distance is ~131 km, but ~95% of within-band pair distances\n",
+    "  are within 100 km of each other \u2014 median far/far pairwise distance\n",
+    "  is ~56 km).\n",
+    "\n",
+    "Cross-band pairs at long distance \u2014 treated \u00d7 far and near \u00d7 far \u2014\n",
+    "sit outside the 100 km support and contribute zero kernel weight.\n",
+    "The Conley meat is the sum of the observation-level diagonal\n",
+    "self-product terms $\\psi_i \\psi_i'$ PLUS the Bartlett-weighted\n",
+    "off-diagonal score cross-products $K(d_{ij}/h)\\,\\psi_i \\psi_j'$\n",
+    "from all the within-support pair classes enumerated above; the ATT\n",
+    "variance is then the corresponding sandwich entry. (Note: the\n",
+    "no-survey Conley path does NOT apply an HC1 `n/(n-p)` finite-sample\n",
+    "multiplier \u2014 that lives on the HC1 path only. Whether the resulting\n",
+    "sandwich ends up above or below HC1 depends on both the missing\n",
+    "multiplier AND the net sign and magnitude of the weighted\n",
+    "off-diagonal cross-products under the specific DGP and design.) On\n",
+    "this seed they net out to a smaller ATT variance, so Conley < HC1,\n",
+    "but the direction is a feature of the data and the sign mechanism\n",
+    "is not easily attributable to any one pair class. Adding the serial\n",
+    "term (`lag = 1`) further reshapes the SE through a separate\n",
+    "within-unit cross-period score covariance sum.\n",
+    "\n",
+    "The takeaway: Conley vcov is a *correction*. It can move the SE in\n",
+    "either direction relative to HC1 depending on the residual covariance\n",
+    "structure within the kernel's support, but it's the methodologically\n",
+    "correct choice when scores are spatially or serially correlated. On\n",
+    "real applications the correction often inflates the SE (and the\n",
+    "t-stat shrinks), but the direction is a feature of the data, not a\n",
+    "guarantee. See the `ConleySpatialHAC` and `SpilloverDiD` sections of\n",
+    "`docs/methodology/REGISTRY.md` for the full variance formula."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5cc89608",
+   "metadata": {},
+   "source": [
+    "## 7. Practitioner takeaways and where to go next\n",
+    "\n",
+    "**When to reach for `SpilloverDiD`.** Whenever your treatment has a\n",
+    "plausible geographic externality \u2014 labor-market spillovers,\n",
+    "agglomeration effects, disease transmission, supply-chain ripples,\n",
+    "peer effects in retail or technology adoption \u2014 a naive DiD that\n",
+    "treats nearby units as controls will give you a biased estimate of the\n",
+    "direct effect. The magnitude of the bias is roughly\n",
+    "$(\\text{near share}) \\times \\delta_1$ in absolute value (here:\n",
+    "$(120/175) \\times 4.5 \\approx 3.1$, matching the observed\n",
+    "-4.3 vs -7.4 gap).\n",
+    "\n",
+    "**How to choose `d_bar`.** Start from theory: what's the plausible\n",
+    "spatial range of the spillover mechanism? Triangulate with a\n",
+    "sensitivity grid (\u00a74 above). The estimate stabilizes once `d_bar`\n",
+    "covers the true horizon; further widening is benign unless it sweeps\n",
+    "clean controls into the spillover bin.\n",
+    "\n",
+    "**When Conley matters.** Whenever your residuals have spatial or\n",
+    "serial correlation. On clustered-treatment designs (treatment radius\n",
+    "$\\ll$ panel extent) the spatial-HAC adjustment can move the SE in\n",
+    "either direction relative to HC1; the direction is a feature of the\n",
+    "DGP. Per Butts (2021) \u00a73.1, set `conley_cutoff_km = d_bar` as the\n",
+    "default and run a sensitivity grid on the cutoff if the SE matters\n",
+    "for your inference.\n",
+    "\n",
+    "**Extensions not covered here.**\n",
+    "- *Multiple rings* \u2014 pass `rings=[0, 50, 100]` for separate near-vs-mid\n",
+    "  spillover coefficients $\\delta_1, \\delta_2$ when theory or\n",
+    "  diagnostics suggest distance-graded externalities.\n",
+    "- *Staggered treatment* \u2014 `SpilloverDiD` automatically handles\n",
+    "  unit-specific treatment onsets via the `first_treat` column.\n",
+    "- *Event study* \u2014 pass `event_study=True` and `horizon_max` for\n",
+    "  per-event-time direct and spillover effects (Wave C; see\n",
+    "  `tutorials/12_two_stage_did.ipynb` for the TwoStageDiD event-study\n",
+    "  pattern this mirrors).\n",
+    "- *Survey weights* \u2014 `survey_design=SurveyDesign(...)` supports\n",
+    "  H\u00e1jek-weighted point estimates with Binder TSL variance. Combined\n",
+    "  with `vcov_type=\"conley\"` and `conley_lag_cutoff=0` (cross-\n",
+    "  sectional spatial only) this is the Wave E.2 stratified-Conley\n",
+    "  sandwich on PSU totals. The panel-block extension\n",
+    "  `conley_lag_cutoff > 0` (spatial + within-PSU serial Bartlett HAC)\n",
+    "  is the Wave E.2 follow-up library synthesis, and it requires an\n",
+    "  effective PSU on the survey design (either explicit\n",
+    "  `survey_design.psu=` or `cluster=<col>` injected per Wave E.1's\n",
+    "  warn-and-use-PSU pattern). A dedicated `SpilloverDiD`\n",
+    "  survey-design tutorial is on the roadmap.\n",
+    "\n",
+    "**References:**\n",
+    "- Butts, K. (2021). *Difference-in-Differences with Spatial\n",
+    "  Spillovers*. arXiv:2105.03737.\n",
+    "- Kline, P., & Moretti, E. (2014). *Local Economic Development,\n",
+    "  Agglomeration Economies, and the Big Push: 100 Years of Evidence\n",
+    "  from the Tennessee Valley Authority*. Quarterly Journal of\n",
+    "  Economics 129(1), 275-331.\n",
+    "- Conley, T. G. (1999). *GMM Estimation with Cross-Sectional\n",
+    "  Dependence*. Journal of Econometrics 92(1), 1-45.\n",
+    "- Newey, W. K., & West, K. D. (1987). *A Simple, Positive\n",
+    "  Semi-definite, Heteroskedasticity and Autocorrelation Consistent\n",
+    "  Covariance Matrix*. Econometrica 55(3), 703-708.\n",
+    "- Gerber, I. (2026). *Design-Based Standard Errors for Two-Stage\n",
+    "  Difference-in-Differences*. arXiv:2605.04124."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/tutorials/README.md b/docs/tutorials/README.md
index 73f965c9..5c900b7a 100644
--- a/docs/tutorials/README.md
+++ b/docs/tutorials/README.md
@@ -119,6 +119,14 @@ End-to-end HAD walkthrough on a BRFSS-shape stratified survey design (5 strata x
 - `did_had_pretest_workflow` on both overall and event-study paths under `survey_design=`, walking the Phase 4.5 C0 QUG-deferred verdict suffix and the stratified-clustered Stute multiplier bootstrap
 - Companion drift-test file (`tests/test_t22_had_survey_design_drift.py`)
 
+### 23. SpilloverDiD on a TVA-style Spillover Panel (`23_spillover_tva.ipynb`)
+Practitioner workflow for `SpilloverDiD` (Butts 2021 ring-indicator estimator + Gardner 2022 two-stage residualize-then-fit) on a synthetic TVA-style panel (4 periods, 200 units = 25 treated + 120 near-control + 55 far-control) reproducing the Butts §4 Table 1 Panel A ~40% understatement direction:
+- When place-based interventions cause geographic spillovers, naive multi-period TWFE on the full sample understates the direct effect because near-controls absorb the spillover (here: ATT recovers as -4.29 vs true `tau_total = -7.4`, a 42% understatement)
+- `SpilloverDiD(rings=[0.0, 100.0], conley_coords=("lat", "lon"))` cleanly recovers both `tau_total` (-7.34) and the near-band spillover coefficient `delta_1` (-4.53)
+- Choosing the spillover bandwidth via a `rings` sensitivity grid at outer edges 50 / 100 / 150 / 200 km, with the documented "undershooting `d_bar`" failure mode at 50 km
+- Conley spatial-HAC variance under `vcov_type="conley", conley_cutoff_km=100, conley_lag_cutoff in {0, 1}` — the cutoff = `d_bar` choice follows Butts §3.1, while the `conley_lag_cutoff` serial extension is the library's documented Wave E.2 follow-up synthesis with Newey-West-style serial Bartlett HAC (per REGISTRY "Variance (Wave E.2 follow-up)")
+- Companion drift-test file (`tests/test_t23_spillover_tva_drift.py`)
+
 ## Running the Notebooks
 
 1. Install diff-diff with dependencies:
diff --git a/tests/test_t23_spillover_tva_drift.py b/tests/test_t23_spillover_tva_drift.py
new file mode 100644
index 00000000..153df6ba
--- /dev/null
+++ b/tests/test_t23_spillover_tva_drift.py
@@ -0,0 +1,684 @@
+"""Drift detection for Tutorial 23 (``docs/tutorials/23_spillover_tva.ipynb``).
+
+The tutorial narrative quotes seed-specific numbers (naive vs.
+SpilloverDiD comparison, sensitivity-grid endpoints, HC1 vs. Conley
+SE). If library numerics drift (estimator changes, RNG path changes,
+BLAS path changes), the prose can go stale silently while
+``pytest --nbmake`` still passes — it only checks that the cells
+execute without error.
+
+These asserts re-derive the same numbers using the locked T23 DGP
+duplicated below (verbatim from the notebook §2 code cell), then check
+them against the values quoted in the tutorial markdown. If a future
+change moves any number outside its tolerance band, this test fails
+and a maintainer is forced to either update the prose or investigate
+the methodology shift before merge.
+
+T23 is the first SpilloverDiD tutorial. It demonstrates the
+``SpilloverDiD`` estimator on a TVA-style synthetic panel reproducing
+the Butts (2021) §4 Table 1 Panel A ~40% understatement direction
+(naive multi-period TWFE significantly understates the direct effect
+when near-controls absorb spillover). The DGP-builder constants below
+MUST stay in sync with the corresponding constants in the notebook §2
+code cell; the ``test_dgp_true_parameters_match_quoted`` test catches
+silent drift on those values.
+"""
+
+from __future__ import annotations
+
+import warnings
+
+import numpy as np
+import pandas as pd
+import pytest
+
+from diff_diff import MultiPeriodDiD, SpilloverDiD
+
+# Locked DGP parameters — must stay in sync with the notebook §2 cell.
+MAIN_SEED = 23
+N_TREATED = 25
+N_NEAR = 120
+N_FAR = 55
+T_PERIODS = 4
+FIRST_TREAT = 3
+TAU_TOTAL = -7.4
+DELTA_1 = -4.5
+D_BAR_KM = 100.0
+NOISE_SD = 0.5
+
+
+def _build_t23_panel(seed: int = MAIN_SEED) -> pd.DataFrame:
+    """Duplicated verbatim from the notebook §2 code cell. Keep in sync."""
+    rng = np.random.default_rng(seed)
+    n_units = N_TREATED + N_NEAR + N_FAR
+    units = [f"u{i:04d}" for i in range(n_units)]
+    alpha = rng.normal(0.0, 1.0, size=n_units)
+    lambda_t = np.array([0.0, 0.5, 1.0, 1.5])[:T_PERIODS]
+
+    coords = np.empty((n_units, 2))
+    is_treated_unit = np.zeros(n_units, dtype=bool)
+    is_near_unit = np.zeros(n_units, dtype=bool)
+    for i in range(N_TREATED):
+        coords[i] = (rng.normal(0, 0.05), rng.normal(0, 0.05))
+        is_treated_unit[i] = True
+    for i in range(N_TREATED, N_TREATED + N_NEAR):
+        coords[i] = (rng.uniform(0.1, 0.7), rng.uniform(-0.3, 0.3))
+        is_near_unit[i] = True
+    for i in range(N_TREATED + N_NEAR, n_units):
+        coords[i] = (rng.uniform(2.0, 3.0), rng.uniform(-0.5, 0.5))
+
+    rows = []
+    for i, u in enumerate(units):
+        for t in range(1, T_PERIODS + 1):
+            D_it = int(is_treated_unit[i] and t >= FIRST_TREAT)
+            Ring1_it = int(is_near_unit[i] and t >= FIRST_TREAT)
+            y = (
+                alpha[i]
+                + lambda_t[t - 1]
+                + TAU_TOTAL * D_it
+                + DELTA_1 * Ring1_it * (1 - D_it)
+                + rng.normal(0, NOISE_SD)
+            )
+            rows.append(
+                {
+                    "unit": u,
+                    "time": t,
+                    "lat": coords[i, 0],
+                    "lon": coords[i, 1],
+                    "ever_treated": int(is_treated_unit[i]),
+                    "D": D_it,
+                    "y": y,
+                }
+            )
+    return pd.DataFrame(rows)
+
+
+@pytest.fixture(scope="module")
+def panel() -> pd.DataFrame:
+    return _build_t23_panel()
+
+
+@pytest.fixture(scope="module")
+def naive_fit(panel):
+    est = MultiPeriodDiD()
+    with warnings.catch_warnings():
+        # absorb=['unit'] makes the unit-invariant 'ever_treated' indicator
+        # perfectly collinear with the unit FE; MultiPeriodDiD drops it
+        # (with a UserWarning) and identifies the ATT through the
+        # ever_treated x post interaction columns. This is the expected
+        # TWFE specification; the rank-deficient drop is benign.
+        warnings.filterwarnings(
+            "ignore", category=UserWarning, message="Rank-deficient design matrix"
+        )
+        return est.fit(
+            panel,
+            outcome="y",
+            treatment="ever_treated",
+            time="time",
+            post_periods=[3, 4],
+            unit="unit",
+            absorb=["unit"],
+            reference_period=2,  # explicit pre-period; matches the current MPD default
+        )
+
+
+def _silence_spillover_matmul_warnings():
+    """Apply the notebook's narrow ``.*encountered in matmul``
+    ``RuntimeWarning`` filter. The three matmul warnings ("divide by
+    zero" / "overflow" / "invalid value") are an Apple Silicon M4 +
+    macOS Sequoia + numpy<2.3 Accelerate BLAS artifact documented at
+    ``TODO.md`` under "RuntimeWarnings in Linear Algebra Operations"
+    (root cause: Apple BLAS SME kernels corrupt the FP status register;
+    tracked as numpy#28687, fixed in numpy>=2.3). They DO NOT fire on
+    M3 / Intel / Linux or numpy>=2.3 — so this filter is a no-op there,
+    and any platform-specific noise it does silence does not affect
+    result correctness.
+
+    The post-filter warning surface (zero remaining warnings on the
+    T23 DGP) is pinned by ``test_spillover_fit_warning_policy_post_filter_clean``
+    and ``test_spillover_conley_fit_warning_policy_post_filter_clean``.
+    A new RuntimeWarning with a different message, or any UserWarning /
+    FutureWarning, fails those tests."""
+    warnings.filterwarnings("ignore", category=RuntimeWarning, message=r".*encountered in matmul")
+
+
+@pytest.fixture(scope="module")
+def spillover_fit(panel):
+    est = SpilloverDiD(rings=[0.0, D_BAR_KM], conley_coords=("lat", "lon"))
+    with warnings.catch_warnings():
+        _silence_spillover_matmul_warnings()
+        return est.fit(panel, outcome="y", unit="unit", time="time", treatment="D")
+
+
+@pytest.fixture(scope="module")
+def spillover_conley_lag0_fit(panel):
+    est = SpilloverDiD(
+        rings=[0.0, D_BAR_KM],
+        conley_coords=("lat", "lon"),
+        vcov_type="conley",
+        conley_cutoff_km=D_BAR_KM,
+        conley_lag_cutoff=0,
+    )
+    with warnings.catch_warnings():
+        _silence_spillover_matmul_warnings()
+        return est.fit(panel, outcome="y", unit="unit", time="time", treatment="D")
+
+
+@pytest.fixture(scope="module")
+def spillover_conley_lag1_fit(panel):
+    est = SpilloverDiD(
+        rings=[0.0, D_BAR_KM],
+        conley_coords=("lat", "lon"),
+        vcov_type="conley",
+        conley_cutoff_km=D_BAR_KM,
+        conley_lag_cutoff=1,
+    )
+    with warnings.catch_warnings():
+        _silence_spillover_matmul_warnings()
+        return est.fit(panel, outcome="y", unit="unit", time="time", treatment="D")
+
+
+# ----------------------------------------------------------------------------
+# Tests
+# ----------------------------------------------------------------------------
+
+
+def test_panel_composition(panel):
+    """800 rows = 200 units × 4 periods; 25 treated, 120 near, 55 far."""
+    assert len(panel) == (N_TREATED + N_NEAR + N_FAR) * T_PERIODS == 800
+    assert panel["unit"].nunique() == N_TREATED + N_NEAR + N_FAR == 200
+    assert panel["time"].nunique() == T_PERIODS == 4
+    treated_units = panel.loc[panel["ever_treated"] == 1, "unit"].nunique()
+    assert treated_units == N_TREATED == 25
+
+
+def test_panel_geographic_bands(panel):
+    """Treated cluster within ~10 km of origin (5 sigma * 2 lat/lon),
+    near-controls in [0.1, 0.7] lat degrees, far-controls in [2.0, 3.0]."""
+    units = panel.drop_duplicates("unit")
+    treated = units[units["ever_treated"] == 1]
+    untreated = units[units["ever_treated"] == 0]
+    near = untreated[untreated["lat"] <= 1.0]
+    far = untreated[untreated["lat"] > 1.0]
+
+    assert len(near) == N_NEAR == 120
+    assert len(far) == N_FAR == 55
+
+    # Treated cluster geometry
+    assert treated["lat"].abs().max() < 0.3
+    assert treated["lon"].abs().max() < 0.3
+    # Near-control band geometry
+    assert near["lat"].min() >= 0.1
+    assert near["lat"].max() <= 0.7
+    # Far-control band geometry
+    assert far["lat"].min() >= 2.0
+    assert far["lat"].max() <= 3.0
+
+
+def test_dgp_true_parameters_match_quoted():
+    """True parameters quoted in the tutorial narrative (§2)."""
+    assert TAU_TOTAL == -7.4
+    assert DELTA_1 == -4.5
+    assert D_BAR_KM == 100.0
+    assert NOISE_SD == 0.5
+    assert MAIN_SEED == 23
+
+
+def test_estimator_construction_matches_quoted():
+    """The §5 fit instantiation parameters must match the docstring narrative."""
+    est = SpilloverDiD(rings=[0.0, D_BAR_KM], conley_coords=("lat", "lon"))
+    params = est.get_params()
+    assert params["rings"] == [0.0, 100.0]
+    assert params["d_bar"] is None  # auto-default to max(rings)
+    assert params["conley_coords"] == ("lat", "lon")
+    assert params["vcov_type"] == "hc1"
+
+
+def test_naive_twfe_understates_tau_total(naive_fit):
+    """§3 quoted: naive ATT ≈ -4.29, ~58% of true tau_total (~42% understatement)."""
+    ratio = naive_fit.att / TAU_TOTAL
+    assert 0.55 <= ratio <= 0.62, f"naive ratio={ratio:.3f} outside [0.55, 0.62] band"
+
+
+def test_naive_att_endpoint_matches_quoted(naive_fit):
+    """§3 quoted endpoint: 2-decimal pin matching the published `-4.29`
+    in the notebook, README, and CHANGELOG. The well-conditioned naive
+    MultiPeriodDiD fit is stable across BLAS paths to better than 0.005,
+    so 2-decimal pinning is safe (in contrast to the borderline-rank-
+    deficient `rings=[0,50]` sensitivity point, which we keep at
+    round-to-1)."""
+    assert round(naive_fit.att, 2) == -4.29
+
+
+def test_spillover_did_recovers_tau_total(spillover_fit):
+    """§5 quoted: SpilloverDiD tau_total = -7.34, recovers true -7.4
+    (within 0.5 tolerance bound). Endpoint pinned to 2 decimals
+    matching the published `-7.34` in the notebook, README, and
+    CHANGELOG — well-conditioned fit, BLAS-stable at 2 decimals."""
+    assert abs(spillover_fit.att - TAU_TOTAL) < 0.5
+    assert round(spillover_fit.att, 2) == -7.34
+
+
+def test_spillover_did_recovers_delta_1(spillover_fit):
+    """§5 quoted: SpilloverDiD delta_1 = -4.53, recovers true -4.5
+    (within 0.5 tolerance bound). Endpoint pinned to 2 decimals
+    matching the published `-4.53` in the notebook, README, and
+    CHANGELOG — well-conditioned fit, BLAS-stable at 2 decimals."""
+    delta_1 = float(spillover_fit.spillover_effects.iloc[0]["coef"])
+    assert abs(delta_1 - DELTA_1) < 0.5
+    assert round(delta_1, 2) == -4.53
+
+
+def test_rings_sensitivity_grid_endpoints(panel):
+    """§4 quoted: d_bar=50 → tau=-5.4, others (100/150/200) → tau=-7.3.
+
+    Per the plan and reviewer guidance, round-to-1 tolerance is safer
+    than round-to-2 against BLAS divergence on the borderline-rank-deficient
+    smallest grid point.
+    """
+    expected_tau = {50.0: -5.4, 100.0: -7.3, 150.0: -7.3, 200.0: -7.3}
+    expected_delta = {50.0: -2.6, 100.0: -4.5, 150.0: -4.5, 200.0: -4.5}
+    for outer in (50.0, 100.0, 150.0, 200.0):
+        est = SpilloverDiD(rings=[0.0, outer], conley_coords=("lat", "lon"))
+        with warnings.catch_warnings():
+            _silence_spillover_matmul_warnings()
+            res = est.fit(panel, outcome="y", unit="unit", time="time", treatment="D")
+        assert res.spillover_effects is not None
+        delta_1 = float(res.spillover_effects.iloc[0]["coef"])
+        assert round(res.att, 1) == expected_tau[outer], (
+            f"d_bar={outer}: tau={res.att:.4f} (rounded {round(res.att, 1)}) "
+            f"vs expected {expected_tau[outer]}"
+        )
+        assert round(delta_1, 1) == expected_delta[outer], (
+            f"d_bar={outer}: delta_1={delta_1:.4f} (rounded {round(delta_1, 1)}) "
+            f"vs expected {expected_delta[outer]}"
+        )
+
+
+def test_rings_grid_d_bar_100_to_200_identical_delta_1(panel):
+    """§4 narrative claim covers BOTH coefficients: `d_bar in {100, 150,
+    200}` produces identical tau_total AND delta_1 (the test of
+    tau_total identity lives in `test_rings_grid_d_bar_100_to_200_identical`;
+    this companion test pins the delta_1 identity so a future drift
+    that affects only the spillover coefficient can't leave the
+    'identical' notebook claim stale)."""
+    deltas = []
+    for outer in (100.0, 150.0, 200.0):
+        est = SpilloverDiD(rings=[0.0, outer], conley_coords=("lat", "lon"))
+        with warnings.catch_warnings():
+            _silence_spillover_matmul_warnings()
+            res = est.fit(panel, outcome="y", unit="unit", time="time", treatment="D")
+        assert res.spillover_effects is not None
+        deltas.append(float(res.spillover_effects.iloc[0]["coef"]))
+    np.testing.assert_allclose(deltas, deltas[0] * np.ones(3), atol=1e-10)
+
+
+def test_rings_grid_d_bar_100_to_200_identical(panel):
+    """§4 narrative claim: once d_bar covers the true spillover horizon
+    (which here ends at ~78 km), widening past 100 km adds zero
+    observations to the ring and the estimates are identical."""
+    results = []
+    for outer in (100.0, 150.0, 200.0):
+        est = SpilloverDiD(rings=[0.0, outer], conley_coords=("lat", "lon"))
+        with warnings.catch_warnings():
+            _silence_spillover_matmul_warnings()
+            res = est.fit(panel, outcome="y", unit="unit", time="time", treatment="D")
+        results.append(res.att)
+    np.testing.assert_allclose(results, results[0] * np.ones(3), atol=1e-10)
+
+
+def test_conley_se_differs_from_hc1(spillover_fit, spillover_conley_lag0_fit):
+    """§6 sanity: Conley vcov produces a different SE than HC1 by more than
+    floating-point noise. Pairs with `test_conley_se_less_than_hc1` which
+    pins the direction of the difference for this specific DGP."""
+    assert abs(spillover_conley_lag0_fit.se - spillover_fit.se) > 1e-6
+
+
+def test_conley_se_less_than_hc1(spillover_fit, spillover_conley_lag0_fit):
+    """§6 prose claim: 'on this DGP, the Conley spatial-HAC SE comes in
+    *lower* than HC1'. Pin the direction so the narrative doesn't go
+    stale if a future library change flips the sign of the per-pair
+    score covariance and reverses the inequality."""
+    assert spillover_conley_lag0_fit.se < spillover_fit.se
+
+
+def test_conley_se_point_estimates_invariant(
+    spillover_fit, spillover_conley_lag0_fit, spillover_conley_lag1_fit
+):
+    """§6 narrative claim: variance-type choice doesn't move the point
+    estimates. tau_total is bit-identical across HC1 / Conley lag=0 /
+    Conley lag=1 (all paths use the same OLS solve; only the meat
+    differs)."""
+    np.testing.assert_allclose(
+        [spillover_conley_lag0_fit.att, spillover_conley_lag1_fit.att],
+        spillover_fit.att,
+        atol=1e-10,
+    )
+
+
+def test_conley_lag_cutoff_changes_se_vs_lag_zero(
+    spillover_conley_lag0_fit, spillover_conley_lag1_fit
+):
+    """§6 sanity: adding the serial term (lag=1) changes the SE relative
+    to spatial-only (lag=0). Direction-agnostic — on this DGP it
+    shrinks, on others it can grow."""
+    assert abs(spillover_conley_lag1_fit.se - spillover_conley_lag0_fit.se) > 1e-6
+
+
+def test_summary_renders_without_warning(spillover_fit):
+    """§5 smoke: the summary() call runs clean on the headline fit."""
+    with warnings.catch_warnings():
+        warnings.simplefilter("error")
+        out = spillover_fit.summary()
+    assert isinstance(out, str)
+    assert len(out) > 0
+
+
+def test_notebook_dgp_constants_match_test_module_constants():
+    """P2 sync guard: codex R6 caught that `test_notebook_dgp_ast_matches_test_fixture`
+    only compares the function body, and `test_dgp_true_parameters_match_quoted`
+    only reasserts the test module's own constants. A notebook-only edit
+    to `MAIN_SEED`, `N_TREATED`, `N_NEAR`, `N_FAR`, `T_PERIODS`,
+    `FIRST_TREAT`, `TAU_TOTAL`, `DELTA_1`, `D_BAR_KM`, or `NOISE_SD`
+    would change tutorial behavior without failing either of those
+    tests.
+
+    Parses the notebook §2 cell, walks the top-level
+    ``Assign`` nodes, and asserts the value of each expected constant
+    matches the test module's value. Any notebook-only constant edit
+    now fails this guard.
+
+    **CI isolation note:** the project's pure-Python and Rust CI jobs
+    copy ``tests/`` to an isolated location WITHOUT the ``docs/``
+    tree (to verify the installed package doesn't depend on the
+    source tree). When the notebook source isn't reachable, this
+    test skips gracefully — the sync-guard contract is meaningful
+    in local dev (where edits happen) and the nbmake job separately
+    verifies the notebook executes end-to-end."""
+    import ast
+    import json
+    from pathlib import Path
+
+    nb_path = Path(__file__).resolve().parents[1] / "docs" / "tutorials" / "23_spillover_tva.ipynb"
+    if not nb_path.exists():
+        pytest.skip(
+            f"Notebook source not found at {nb_path}; this drift guard "
+            f"requires source-tree access and is meaningful only in local "
+            f"dev. The nbmake CI job separately verifies the notebook "
+            f"executes end-to-end."
+        )
+    with nb_path.open() as f:
+        nb = json.load(f)
+
+    matches = [
+        c
+        for c in nb["cells"]
+        if c["cell_type"] == "code" and any("def build_t23_panel" in s for s in c["source"])
+    ]
+    assert len(matches) == 1, (
+        f"Expected exactly one notebook code cell defining `build_t23_panel`; "
+        f"found {len(matches)}."
+    )
+    nb_cell_src = "".join(matches[0]["source"])
+
+    # Walk top-level Assigns in the cell and collect a constant -> value dict.
+    nb_consts: dict = {}
+    tree = ast.parse(nb_cell_src)
+    for node in tree.body:
+        if isinstance(node, ast.Assign) and len(node.targets) == 1:
+            target = node.targets[0]
+            if isinstance(target, ast.Name):
+                try:
+                    nb_consts[target.id] = ast.literal_eval(node.value)
+                except (ValueError, SyntaxError):
+                    pass  # non-literal RHS; skip
+
+    expected = {
+        "MAIN_SEED": MAIN_SEED,
+        "N_TREATED": N_TREATED,
+        "N_NEAR": N_NEAR,
+        "N_FAR": N_FAR,
+        "T_PERIODS": T_PERIODS,
+        "FIRST_TREAT": FIRST_TREAT,
+        "TAU_TOTAL": TAU_TOTAL,
+        "DELTA_1": DELTA_1,
+        "D_BAR_KM": D_BAR_KM,
+        "NOISE_SD": NOISE_SD,
+    }
+    missing = [k for k in expected if k not in nb_consts]
+    assert not missing, (
+        f"Notebook §2 cell missing expected constant assignments: {missing}. "
+        f"If a constant was renamed or moved, update both notebook and test."
+    )
+    mismatched = {k: (nb_consts[k], expected[k]) for k in expected if nb_consts[k] != expected[k]}
+    assert not mismatched, (
+        f"Notebook §2 constants drifted from test module constants: {mismatched}. "
+        f"Each entry is (notebook_value, test_value). Update both to match."
+    )
+
+
+def test_notebook_dgp_ast_matches_test_fixture():
+    """P2 sync guard: enforces the "verbatim" duplication claim by
+    parsing the notebook's §2 ``build_t23_panel`` definition and
+    asserting that, modulo function name (notebook: ``build_t23_panel``;
+    test: ``_build_t23_panel``) and docstring, its AST matches the test
+    fixture's. Catches silent drift in non-constant DGP logic (coordinate
+    ranges, lambda_t, row construction) that the numerical-value pins
+    don't see — codex R4 P2 flagged this gap.
+
+    Uses ``ast.dump`` for a whitespace-/comment-agnostic comparison:
+    semantically identical code matches, cosmetic edits don't trigger
+    spurious failures.
+
+    **CI isolation note:** like
+    ``test_notebook_dgp_constants_match_test_module_constants``, this
+    test skips gracefully when CI's isolated-tests-copy step strips
+    the ``docs/`` tree. The sync-guard contract is meaningful in
+    local dev where edits happen."""
+    import ast
+    import inspect
+    import json
+    from pathlib import Path
+
+    nb_path = Path(__file__).resolve().parents[1] / "docs" / "tutorials" / "23_spillover_tva.ipynb"
+    if not nb_path.exists():
+        pytest.skip(
+            f"Notebook source not found at {nb_path}; this drift guard "
+            f"requires source-tree access and is meaningful only in local "
+            f"dev. The nbmake CI job separately verifies the notebook "
+            f"executes end-to-end."
+        )
+    with nb_path.open() as f:
+        nb = json.load(f)
+
+    matches = [
+        c
+        for c in nb["cells"]
+        if c["cell_type"] == "code" and any("def build_t23_panel" in s for s in c["source"])
+    ]
+    assert len(matches) == 1, (
+        f"Expected exactly one notebook code cell defining `build_t23_panel`; "
+        f"found {len(matches)}. If you renamed or split the §2 DGP cell, "
+        f"update this test's cell-locator."
+    )
+    nb_cell_src = "".join(matches[0]["source"])
+
+    def _extract_normalized_fn(src: str, fn_name: str) -> str:
+        """Parse `src`, find FunctionDef `fn_name`, strip its docstring,
+        rename it to the canonical `build_t23_panel`, and return the
+        normalized AST dump."""
+        tree = ast.parse(src)
+        fn = next(
+            (n for n in ast.walk(tree) if isinstance(n, ast.FunctionDef) and n.name == fn_name),
+            None,
+        )
+        assert fn is not None, f"Could not find FunctionDef `{fn_name}` in source"
+        if (
+            fn.body
+            and isinstance(fn.body[0], ast.Expr)
+            and isinstance(fn.body[0].value, ast.Constant)
+            and isinstance(fn.body[0].value.value, str)
+        ):
+            fn.body = fn.body[1:]
+        fn.name = "build_t23_panel"
+        return ast.dump(fn, annotate_fields=True, include_attributes=False)
+
+    nb_norm = _extract_normalized_fn(nb_cell_src, "build_t23_panel")
+    test_norm = _extract_normalized_fn(inspect.getsource(_build_t23_panel), "_build_t23_panel")
+
+    assert nb_norm == test_norm, (
+        f"Notebook §2 DGP cell drifted from test fixture `_build_t23_panel`.\n"
+        f"--- notebook AST ---\n{nb_norm[:400]}...\n"
+        f"--- test AST ---\n{test_norm[:400]}...\n"
+        f"Update one or both so the function bodies match modulo name + docstring."
+    )
+
+
+def test_seed_specific_geometry_pins_match_quoted(panel):
+    """P3 sync guard: the §2 panel-layout table and §6 within-cutoff
+    enumeration quote seed-specific geometry numbers (max distance from
+    origin, cluster diameter, band extents, far×far / near×near
+    pair-within-100km percentages). The drift test pins all the values
+    quoted in the notebook so prose can't go stale even if the headline
+    estimates remain within tolerance — codex R4 P3 flagged this gap."""
+    treated = panel[panel["ever_treated"] == 1].drop_duplicates("unit")
+    near = panel[(panel["ever_treated"] == 0) & (panel["lat"] <= 1.0)].drop_duplicates("unit")
+    far = panel[(panel["ever_treated"] == 0) & (panel["lat"] > 1.0)].drop_duplicates("unit")
+    deg_to_km = 111.0
+
+    def _max_dist_from_origin_km(d):
+        return float(np.sqrt(d["lat"] ** 2 + d["lon"] ** 2).max() * deg_to_km)
+
+    def _min_dist_from_origin_km(d):
+        return float(np.sqrt(d["lat"] ** 2 + d["lon"] ** 2).min() * deg_to_km)
+
+    def _band_diameter_km(d):
+        lats = d["lat"].values
+        lons = d["lon"].values
+        diffs = np.sqrt((lats[:, None] - lats[None, :]) ** 2 + (lons[:, None] - lons[None, :]) ** 2)
+        return float(diffs.max() * deg_to_km)
+
+    def _pct_pairs_within_100km(d):
+        lats = d["lat"].values
+        lons = d["lon"].values
+        n = len(lats)
+        dist = (
+            np.sqrt((lats[:, None] - lats[None, :]) ** 2 + (lons[:, None] - lons[None, :]) ** 2)
+            * deg_to_km
+        )
+        triu = np.triu(np.ones((n, n), dtype=bool), k=1)
+        pair_d = dist[triu]
+        return float((pair_d <= 100.0).sum() / len(pair_d) * 100.0)
+
+    def _cross_band_max_pair_km(d1, d2):
+        lats1, lons1 = d1["lat"].values, d1["lon"].values
+        lats2, lons2 = d2["lat"].values, d2["lon"].values
+        cross = (
+            np.sqrt((lats1[:, None] - lats2[None, :]) ** 2 + (lons1[:, None] - lons2[None, :]) ** 2)
+            * deg_to_km
+        )
+        return float(cross.max())
+
+    def _within_band_median_pair_km(d):
+        lats = d["lat"].values
+        lons = d["lon"].values
+        n = len(lats)
+        dist = (
+            np.sqrt((lats[:, None] - lats[None, :]) ** 2 + (lons[:, None] - lons[None, :]) ** 2)
+            * deg_to_km
+        )
+        triu = np.triu(np.ones((n, n), dtype=bool), k=1)
+        return float(np.median(dist[triu]))
+
+    # §2 quoted: "clustered around (0,0); max ~12 km from origin, cluster diameter ~22 km at seed 23"
+    assert round(_max_dist_from_origin_km(treated)) == 12, _max_dist_from_origin_km(treated)
+    assert round(_band_diameter_km(treated)) == 22, _band_diameter_km(treated)
+    # §2 quoted: "~12-82 km north"
+    assert round(_min_dist_from_origin_km(near)) == 12, _min_dist_from_origin_km(near)
+    assert round(_max_dist_from_origin_km(near)) == 82, _max_dist_from_origin_km(near)
+    # §2 quoted: "~224-331 km north"
+    assert round(_min_dist_from_origin_km(far)) == 224, _min_dist_from_origin_km(far)
+    assert round(_max_dist_from_origin_km(far)) == 331, _max_dist_from_origin_km(far)
+    # §6 quoted: "max within-band pairwise distance is ~131 km" for far band
+    # (NOT "lat extent" — geometrically the lat extent is ~111 km; 131 km is
+    # the pairwise band diameter accounting for lon dispersion too)
+    assert round(_band_diameter_km(far)) == 131, _band_diameter_km(far)
+    # §6 quoted: "100% of within-band pairs are within 100 km" for near band
+    assert round(_pct_pairs_within_100km(near)) == 100, _pct_pairs_within_100km(near)
+    # §6 quoted: "~95% of within-band pair distances are within 100 km" for far band
+    assert round(_pct_pairs_within_100km(far)) == 95, _pct_pairs_within_100km(far)
+    # §6 quoted: "treated × near-control pairs (max pairwise separation ~90 km)"
+    assert round(_cross_band_max_pair_km(treated, near)) == 90, _cross_band_max_pair_km(
+        treated, near
+    )
+    # §6 quoted: "median far/far pairwise distance is ~56 km"
+    assert round(_within_band_median_pair_km(far)) == 56, _within_band_median_pair_km(far)
+
+
+def _assert_post_filter_warning_surface_is_clean(captured) -> None:
+    """Shared T19-style platform-agnostic warning-policy assertion.
+
+    The notebook's narrow ``.*encountered in matmul`` filter (see
+    `_silence_spillover_matmul_warnings`) silences three Apple Silicon
+    M4 + numpy<2.3 Accelerate BLAS warnings that are emitted on the
+    affected platform but DO NOT fire on M3 / Intel / Linux or
+    numpy>=2.3 (per ``TODO.md`` "RuntimeWarnings in Linear Algebra
+    Operations"). The drift contract this assertion locks is
+    platform-agnostic:
+
+    - on platforms where the matmul warnings fire, they get filtered
+      and never reach the captured list;
+    - on platforms where they don't fire, the filter is a no-op;
+
+    EITHER WAY the post-filter captured list must be empty. Any
+    UserWarning, FutureWarning, DeprecationWarning, or RuntimeWarning
+    with a non-matmul message will fail this assertion and force the
+    maintainer to either update the notebook narrative or fix the
+    underlying cause."""
+    if not captured:
+        return
+    details = [(msg.category.__name__, str(msg.message)) for msg in captured]
+    assert False, (
+        f"Unexpected post-filter warnings on the T23 DGP: {details}. "
+        f"If a new warning is genuinely expected, broaden "
+        f"`_silence_spillover_matmul_warnings()` and update the §5/§6 "
+        f"notebook narrative accordingly."
+    )
+
+
+def test_spillover_fit_warning_policy_post_filter_clean(panel):
+    """§5 warning-policy guard (T19-pattern, platform-agnostic).
+
+    Mirrors the notebook's narrow ``.*encountered in matmul`` filter
+    inside the capture block, then asserts the post-filter warning
+    surface is empty on the T23 DGP. On Apple Silicon M4 + numpy<2.3
+    the three known BLAS matmul warnings fire and are filtered; on
+    M3 / Intel / Linux or numpy>=2.3 the filter is a no-op. EITHER
+    WAY a fresh ``UserWarning`` / ``FutureWarning`` or any non-matmul
+    ``RuntimeWarning`` will fail this guard."""
+    est = SpilloverDiD(rings=[0.0, D_BAR_KM], conley_coords=("lat", "lon"))
+    with warnings.catch_warnings(record=True) as w:
+        warnings.simplefilter("always")
+        _silence_spillover_matmul_warnings()  # mirror notebook §5 filter
+        est.fit(panel, outcome="y", unit="unit", time="time", treatment="D")
+    _assert_post_filter_warning_surface_is_clean(w)
+
+
+def test_spillover_conley_fit_warning_policy_post_filter_clean(panel):
+    """§6 warning-policy guard, parallel to §5 but on the Conley path
+    (vcov_type="conley", conley_cutoff_km=d_bar, conley_lag_cutoff in {0, 1}).
+    Same T19-style platform-agnostic contract: mirror the notebook
+    filter inside the capture, assert no remaining warning escaped."""
+    for lag in (0, 1):
+        est = SpilloverDiD(
+            rings=[0.0, D_BAR_KM],
+            conley_coords=("lat", "lon"),
+            vcov_type="conley",
+            conley_cutoff_km=D_BAR_KM,
+            conley_lag_cutoff=lag,
+        )
+        with warnings.catch_warnings(record=True) as w:
+            warnings.simplefilter("always")
+            _silence_spillover_matmul_warnings()  # mirror notebook §6 filter
+            est.fit(panel, outcome="y", unit="unit", time="time", treatment="D")
+        _assert_post_filter_warning_surface_is_clean(w)