SpilloverDiD: ring-indicator spillover-aware DiD (Butts 2021)#446
Conversation
|
/ai-review |
1 similar comment
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology No unmitigated source-material mismatch found. The time-varying ring regressor, stricter
Code Quality
Performance No findings. The remaining spillover-specific hotspot, the dense staggered nearest-treated-distance path, is now properly tracked in TODO.md:L128-L128. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
|
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
|
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality No additional findings. Performance No additional findings. Maintainability No additional findings. Tech Debt
Security No findings. Documentation/Tests
Path to Approval
|
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — the prior Executive Summary
Methodology
Code Quality No findings. Performance
Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests No unmitigated findings. The public docs now describe the connectivity requirement and the period-strict/unit-warn-drop behavior, and the new tests cover the connectivity helper directly ( |
New standalone estimator at `diff_diff/spillover.py` implementing
two-stage Gardner (2022) DiD with ring-indicator covariates that
identify, alongside the direct effect on treated (`tau_total`), per-ring
spillover effects on near-control units (`delta_j`). Reference:
Butts, K. (2023, originally 2021) "Difference-in-Differences with
Spatial Spillovers" arXiv:2105.03737v3; Gardner, J. (2022) "Two-stage
differences in differences" arXiv:2207.05943.
Handles panel non-staggered (paper Eqs 5/6/8) and Section 5 staggered
timing in one estimator. Stage-2 regressor uses the time-varying
`(1 - D_it) * Ring_{it,j}` form. Stage-1 subsample is Butts' STRICTER
`Omega_0 = {D_it = 0 AND S_it = 0}` (untreated AND unexposed).
Identification-check policy: period strict, unit warn-and-drop, plus
connected-component check on the Omega_0 bipartite graph. SE clamps
negative vcov diagonals before sqrt (sibling-estimator convention).
`coefficients` exposes all (1+K) stage-2 entries keyed to vcov columns
plus an "ATT" alias. `rank_deficient_action` validated at `__init__`.
Variance: stage-2 OLS via `solve_ols` (HC1 / Conley / cluster). Gardner
GMM first-stage uncertainty correction NOT applied in this PR (documented
limitation; tracked in TODO).
Deferred features (planned follow-ups, all in TODO): `event_study=True`
per-event-time × ring coefficients, `survey_design=` integration,
`ring_method="count"`, data-driven `d_bar`, GMM correction at stage 2,
sparse staggered ring-distance path, TwoStageDiD-Conley first-class.
Tests: `tests/test_spillover.py` (153 tests) + DGP factories at
`tests/_dgp_utils.py`. Includes 20-seed Gardner identity bit-identity
test (`SpilloverDiD.att` matches single-stage TWFE ring regression at
`atol=1e-10` on non-staggered DGPs — the reported non-staggered
`tau_total` IS the Butts Eqs. 4-6 estimator). Non-staggered MC at 50
seeds + 200-seed slow variant recovers both `tau_total` and `delta_1`;
staggered MC at 30 seeds anchors `tau_total` only.
Docs: REGISTRY section, API rst, `llms.txt` + `llms-full.txt`, README
catalog entry, references, `doc-deps.yaml`, TODO follow-up rows.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4fda83c to
ced54b3
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment
Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…overy Two test-coverage gaps surfaced on the rebased SHA: 1. `anticipation` was previously only round-tripped on results and rejected for invalid values, with no behavior-level test asserting that it actually shifts D_it / Omega_0 membership / the ring-exposure clock. A regression there could silently move rows in/out of stage 1 and change tau_total / delta_j without failing CI. 2. Staggered MC anchored only `tau_total`, not `delta_1`. A staggered- only ring-assignment or spillover-coefficient regression with roughly- unchanged tau_total wouldn't be caught. Fixes: - tests/test_spillover.py: new `TestSpilloverDiDAnticipationBehavior` class (3 tests) with a hand-built 4-period panel (1 treated @ t=2, 1 near-control, 2 far-controls). Asserts on both `treatment=` and `first_treat=` paths that anticipation=1 (vs 0) yields n_treated+1, smaller stage1_n_obs, and a different att. Plus a parity check that the two fit paths produce identical results under the same anticipation setting (entry points are internally unified). - tests/test_spillover.py: `TestSpilloverDiDIdentification::test_staggered_recovers_tau_total` renamed to `..._tau_total_and_delta_1` and extended to also assert staggered delta_1 recovery within 0.03 absolute tolerance (mean over 30 seeds: -0.0398, std 0.0083). Per-event-time `delta_jk` decomposition on staggered DGPs is still queued alongside event_study=True support. - docs/methodology/REGISTRY.md, CHANGELOG.md: narrow the staggered MC anchor language to match (tau_total + delta_1, not tau_total only); bump test count 153 -> 156. All 156 tests pass; black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — the prior P1 on Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Round-8 CI review flagged `test_conley_se_differs_from_hc1` as a test- coverage gap: the name promises a Conley-specific assertion but the body only checked finite SE + ATT invariance. A silent fallback to HC1 (e.g., if SpilloverDiD ever stopped threading the Conley kwargs through to `solve_ols`) would have passed. Replace with: - `test_conley_kwargs_threaded_to_solve_ols`: patches `solve_ols` at the import site, captures the kwargs of the stage-2 invocation, and asserts they include `vcov_type="conley"`, `conley_cutoff_km=200.0`, `conley_metric="haversine"`, `conley_lag_cutoff=0`, plus fit-time-derived `conley_coords` / `conley_time` / `conley_unit` arrays of the right shape. Any silent fallback to HC1 fails this. - `test_conley_att_invariant_vs_hc1`: extracted from the old test — vcov choice does not change ATT (residualization + OLS fit are independent of variance). Now stands as a clean invariant rather than pretending to verify Conley-specific output. Also bumps CHANGELOG test count 156 -> 157 and updates the Conley description to "plumbing (verifies solve_ols is called with vcov_type= 'conley' + Conley kwargs, no silent HC1 fallback)". All 157 tests pass; black + ruff clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good — the prior re-review gaps are addressed, and I did not find any new unmitigated P0/P1 issues in the changed spillover estimator. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Review limitation: I could not execute the test suite in this environment because Python runtime dependencies such as |
Summary
SpilloverDiD(diff_diff/spillover.py) implementing two-stage Gardner DiD with ring-indicator covariates that identify both the direct effect on treated units (tau_total) and per-ring spillover effects on near-control units (delta_j). Handles non-staggered and Section 5 staggered timing in a single estimator.Methodology references
(1 - D_it) * Ring_{it,j}form (paper page 12'sS_it = S_i * 1{t >= t_treat}notation; Section 5 Table 2'sS^k_{it}/Ring^k_{it,j}). The literal unit-static reading of Equation 5 is algebraically rank-deficient under TWFE; only the time-varying form supports the paper's identification (Proposition 2.3). Documented indocs/methodology/REGISTRY.md§ SpilloverDiD.Omega_0 = {D_it = 0 AND S_it = 0}(untreated AND unexposed) rather thanTwoStageDiD's{D_it = 0}(untreated only). Prevents spillover-contaminated near-controls from biasing the time FE.ValueError— dropping a period removes all units' cross-time identification), unit warn-and-drop (mirrorsTwoStageDiD's always-treated convention; the downstream finite-mask path excludes the affected rows from stage 2).two_stage.py::_compute_gmm_variance). Documented in REGISTRY +TODO.md.D_it = 0) rather than never-treated-only, so all-eventually-treated staggered designs can identify the counterfactual via not-yet-treated far-away rows.did2simplements Gardner two-stage without rings; no published R/Stata software implements the Butts ring estimator. Correctness anchored on (a) 20-seed deterministic regression test pinningSpilloverDiD.attagainst direct single-stage TWFE ring regression atatol=1e-10(the Gardner identity equivalence for non-staggered timing — empirically bit-identical, so the reported non-staggeredtau_totalIS the Butts Eqs. 4-6 estimator), (b) 50-seed Monte Carlo identification recovery on synthetic Butts-Assumption-satisfying DGPs (+ 200-seed@pytest.mark.slowvariant), and (c) Conley sparse-vs-dense parity inherited from the 3.3.3 release.Validation
tests/test_spillover.py):{0,1}treatment, NaN rejection on cluster/unit/time/first_treat/treatment, balanced panel, duplicate cells, non-absorbing treatment, conley_coords within-unit-constant, callable metric self-distance contract,hc2/hc2_bmrejected, NaN in outcome rejected, mixed-encoding time collapse caught)tau_totalanddelta_j; 200-seed slow variant)atol=1e-10)tests/_dgp_utils.py):generate_butts_nonstaggered_dgp/generate_butts_staggered_dgpsatisfy Butts Assumptions 1/3/5/7 by construction.docs/api/spillover.rst,diff_diff/guides/llms.txt+llms-full.txt,docs/references.rst,docs/doc-deps.yaml, README catalog entry,TODO.mdrows for deferred follow-ups.Security / privacy
Generated with Claude Code