Add cell-period IF allocator for dCDH survey variance by igerber · Pull Request #323 · igerber/diff-diff

igerber · 2026-04-18T21:49:48Z

Summary

Extends dCDH analytical survey variance with a cell-period IF allocator that decomposes the per-group influence function U[g] into per-(g, t)-cell attributions and expands to observation level as psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i}).
Relaxes the within-group constancy validator to within-cell constancy (trivially satisfied in one-obs-per-cell panels). Designs with strata or PSU that vary across cells of a group are now accepted by the analytical TSL path.
Adds two NotImplementedError gates for out-of-scope combinations: heterogeneity= with within-group-varying PSU/strata (follow-up PR will extend the WLS psi_obs path), and n_bootstrap > 0 with within-group-varying PSU (follow-up PR will extend the PSU-level wild bootstrap map).
Adds a Monte Carlo coverage simulation (tests/test_dcdh_cell_period_coverage.py, @pytest.mark.slow) on a DGP with PSU varying across the cells of each group. Empirical 95% coverage hits 0.956 (±2.5pp band).
Adds a deterministic row-sum invariant test on a hand-computed 4-group × 3-period panel verifying: closed-form joiner/stable_0 attribution, U_per_period.sum(axis=1) == U, post-period attribution placement, and cohort-centering preservation.
Inverts TestSurveyWithinGroupValidation rejection tests into acceptance tests under the cell allocator; adds positive-rule tests for within-cell strata / PSU variation rejection and gate tests for the two NotImplementedError combos.
Documents the new contract in docs/methodology/REGISTRY.md: post-period attribution is framed as a library convention (adopted because it preserves the group/PSU-sum identities of the prior group-level expansion and produces approximately nominal MC coverage on the test DGP) rather than a theorem derived from observation-level survey linearization. A TODO.md Methodology/Correctness entry tracks the open validation question.
Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to U_centered[g], so PSU-level Binder aggregation is byte-identical to the prior group-level expansion up to single-ULP floating-point noise. All existing survey, core, methodology, parity, and HonestDiD tests pass without modification.

Methodology references (required if estimator / math changes)

Method name(s): ChaisemartinDHaultfoeuille (dCDH, reversible-treatment DID)
Paper / source link(s):
- de Chaisemartin & D'Haultfœuille (2020), AER 110(9), https://doi.org/10.1257/aer.20181169
- de Chaisemartin & D'Haultfœuille (2022, rev. 2024), NBER WP 29873 — Web Appendix Section 3.7.3 (cohort-recentered plug-in variance)
Any intentional deviations from the source (and why): The paper's plug-in variance assumes iid sampling; the library's survey IF expansion (proportional per-group / per-cell weight allocation) is not in the papers. The cell-period allocator extends the pre-existing library convention. Post-period attribution is documented as a convention rather than a derived result — see docs/methodology/REGISTRY.md Note "Survey IF expansion — library convention" and the TODO.md entry tracking the open derivation question.

Validation

Tests added/updated:
- tests/test_survey_dcdh.py — 7 new/inverted tests in TestSurveyWithinGroupValidation (acceptance + within-cell rejection + two NotImplementedError gates + row-sum invariant on a hand-computed panel)
- tests/test_dcdh_cell_period_coverage.py — new file, 1 test, @pytest.mark.slow, 500 MC replications at 95.6% empirical coverage
- tests/test_methodology_chaisemartin_dhaultfoeuille.py — unpack the new tuple return in test_cohort_recentering_not_grand_mean
Backtest / simulation / notebook evidence (if applicable): MC coverage simulation is the primary empirical validation (tests/test_dcdh_cell_period_coverage.py); all 333 non-slow tests and the slow coverage test pass.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Plumbs per-(g,t)-cell contributions through the dCDH influence-function functions without changing behavior. Extends _obs_survey_info with time_ids and periods, modifies _compute_full_per_group_contributions, _compute_per_group_if_multi_horizon, and _compute_per_group_if_placebo_horizon to return per-period attribution matrices alongside the per-group IFs, adds _cohort_recenter_per_period for column-wise centering, and wires the 2D attributions through _compute_cohort_recentered_inputs into _compute_se / _survey_se_from_group_if. The survey expansion now uses the cell-period allocator psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i}) when per-period attribution is supplied. Under within-group-constant PSU (the current validator's accepted input), per-cell sums telescope to U_centered[g], so PSU-level Binder aggregation produces byte-identical variance (differences only at FP single-ULP level from summation order). The within-group-constant validator remains in place; it lifts in Stage 2 of PR 2. Tests in tests/test_survey_dcdh.py, tests/test_survey_dcdh_replicate_psu.py, tests/test_chaisemartin_dhaultfoeuille.py, and tests/test_honest_did.py pass unchanged (318/318). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces _validate_group_constant_strata_psu with a relaxed _validate_cell_constant_strata_psu (within-cell constancy, trivially satisfied in one-obs-per-cell panels). Adds a diagnostic helper _strata_psu_vary_within_group and two NotImplementedError gates at the validator call site: heterogeneity= and n_bootstrap > 0 both still use the legacy group-level IF expansion / PSU map and therefore reject designs whose PSU varies within group (PR 3 and PR 4 extend them). Updates REGISTRY.md cluster, heterogeneity, survey IF expansion, and survey+bootstrap notes to describe the cell-period allocator, the within-cell constancy contract, and the two scope gates. Updates the class fit() docstring survey_design section to match. Updates stale "within-group-constant" comments in the bootstrap path since the constancy is now enforced by the gate rather than the upstream validator. Inverts the TestSurveyWithinGroupValidation rejection tests to acceptance tests under the cell allocator. Adds positive-rule tests for within-cell strata and PSU variation (rejected) and gate tests for heterogeneity and bootstrap (raise NotImplementedError). Adds tests/test_dcdh_cell_period_coverage.py: Monte Carlo coverage simulation at 500 reps on a panel with PSU varying across cells of each group (two PSUs per group on even/odd periods). Empirical 95% coverage hits 0.956 on the seeded draw — inside the ±2.5pp tolerance (~2 MC-sigma). Marked @pytest.mark.slow and excluded by default. All 322 tests pass on the affected surface (test_survey_dcdh, test_survey_dcdh_replicate_psu, test_chaisemartin_dhaultfoeuille, test_honest_did). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The helper now returns (U, U_per_period). The cohort-recentering methodology test at test_cohort_recentering_not_grand_mean only needs the per-group vector U, so destructure and discard the per-period matrix. Matches the return-shape change landed in Stage 1 of the cell-period IF allocator work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds a row-sum identity test for the cell-period allocator on a hand-computed 4-group x 3-period panel, verifying: (1) per-group IF matches the closed-form joiner/stable_0 decomposition; (2) per-period attribution sums across time to the per-group IF exactly; (3) post- period attribution places all mass at t=2 with zeros at earlier columns; (4) column-wise cohort centering preserves the row-sum identity. REGISTRY.md frames the post-period attribution as a library convention — adopted because it preserves the group/PSU-sum identities of the prior group-level expansion and produces approximately nominal MC coverage on the test DGP — rather than a theorem derived from the observation-level survey linearization. Inline comments in _compute_full_per_group_contributions, _compute_per_group_if_multi_horizon, and _compute_per_group_if_placebo_horizon mirror that framing. Updates stale local type annotations for multi_horizon_if / placebo_horizon_if to the Tuple[np.ndarray, np.ndarray] returns. TODO.md gains a Methodology/Correctness entry tracking the open validation question (formal derivation against observation-level survey linearization, or replacement with a covariance-aware alternative). All 348 tests pass (slow MC coverage sim included). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-18T21:57:09Z

Overall Assessment

⚠️ Needs changes

Executive Summary

The PR affects ChaisemartinDHaultfoeuille survey inference for DID_M, DID_+, DID_-, event-study horizons, and placebo horizons by introducing a cell-period IF allocator.
The core methodology deviation from the papers is documented in the registry and tracked in TODO.md, so I am treating that part as informational, not a blocker.
There is one unmitigated P1: the newly advertised support for within-group-varying strata is incomplete when survey_design.psu is omitted, because the auto-injected PSU path still fails before the cell allocator runs.
There is also a non-blocking performance regression: dense per-period IF tensors are now built on all paths, including non-survey fits that never consume them.
Tests add useful invariant and coverage checks, but they do not cover the omitted-psu auto-inject path where the P1 issue appears.

Methodology

Severity P1. Impact: the new public contract says that, with no explicit psu, fit() auto-injects psu=<group_col> as a safe default while also supporting strata that vary across cells of a group, but the auto-injected path preserves nest=False, so a group that spans multiple strata still trips the survey resolver’s cross-stratum PSU check before the new allocator can run. See diff_diff/chaisemartin_dhaultfoeuille.py:648, diff_diff/chaisemartin_dhaultfoeuille.py:753, and diff_diff/survey.py:313. Concrete fix: either synthesize a stratum-nested injected PSU when auto-injecting under within-group-varying strata, or narrow the documented contract and raise a targeted pre-resolution error telling users that this case requires explicit psu plus nest=True.
Severity P3. Impact: the cell-period/post-period allocation itself is a documented library convention, not a paper-derived survey linearization, and the open derivation question is already tracked. I am not counting that against approval. See docs/methodology/REGISTRY.md:652 and TODO.md:60. Concrete fix: none required for approval.

Code Quality

No material findings beyond the contract issue above.

Performance

Severity P2. Impact: the PR now allocates dense U_per_period tensors unconditionally, even on non-survey/plugin paths that never use them. That adds O(G*T) memory to the main ATT path and O(G*T*L_max) to event-study/placebo paths for all fits, not just the new survey case. See diff_diff/chaisemartin_dhaultfoeuille.py:1643, diff_diff/chaisemartin_dhaultfoeuille.py:4374, and diff_diff/chaisemartin_dhaultfoeuille.py:4942. Concrete fix: only build/store the 2D IF tensors when a survey design actually needs the cell allocator, or compute them lazily on demand.

Maintainability

Severity P3. Impact: new internal comments describe the allocator as “methodologically correct” / “valid” even though the public methodology note and TODO.md explicitly say the derivation is still open. That creates an avoidable maintenance trap. See diff_diff/chaisemartin_dhaultfoeuille.py:5044, diff_diff/chaisemartin_dhaultfoeuille.py:5233, docs/methodology/REGISTRY.md:652, and TODO.md:60. Concrete fix: soften those comments to match the registry wording (“library convention”, “empirically validated”, “formal derivation deferred”).

Tech Debt

Severity P3. Impact: the main methodological limitation introduced here is already tracked under the repo’s deferred-work policy, so it is not a blocker. See TODO.md:60. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

Severity P2. Impact: the new varying-strata acceptance coverage only exercises the explicit psu="group", nest=True path; it does not cover the omitted-psu auto-inject path that the docs now describe as a safe default, which is why the P1 issue is currently unguarded. See tests/test_survey_dcdh.py:1120 and diff_diff/chaisemartin_dhaultfoeuille.py:648. Concrete fix: add a regression test with SurveyDesign(weights="pw", strata="stratum") on a panel where a group spans multiple strata, and assert the intended supported or documented behavior.
I could not execute the tests locally in this review environment because the available python is missing numpy.

Path to Approval

Resolve the omitted-psu / varying-strata interaction: either make auto-injection synthesize a stratum-nested PSU when groups span multiple strata, or explicitly narrow the public contract and raise a targeted error before survey resolution.
Add a regression test for SurveyDesign(weights="pw", strata="stratum") with within-group-varying strata and no explicit psu, so the intended behavior is enforced.

Public contract fix: when `survey_design.psu` is omitted, fit() auto-injects psu=<group_col> with nest=False. If strata vary within group, the synthesized PSU column reuses group labels across strata and downstream survey resolution fails on the cross-stratum PSU uniqueness check. fit() now detects that combination before auto- inject runs (via the existing _strata_psu_vary_within_group helper) and raises a targeted ValueError pointing users at the explicit `psu=<col>, nest=True` path. Performance fix: the cell-period allocator's per-(g,t) tensor (U_per_period) was being allocated unconditionally in _compute_full_per_group_contributions, _compute_per_group_if_multi_horizon, _compute_per_group_if_placebo_horizon, and their _compute_cohort_recentered_inputs aggregator. That added O(G*T) memory to the main ATT path and O(G*T*L_max) to the event-study / placebo paths on every fit — including non-survey fits that never consume the tensor. Adds a `compute_per_period: bool = True` param through each helper, gates allocation + accumulation behind the flag, and threads `compute_per_period=(_obs_survey_info is not None)` from fit(). Downstream consumers now None-guard the per-period recentering / _compute_se calls so the plug-in SE path runs with no cell-level work. Adds `test_auto_inject_with_varying_strata_raises` under `TestSurveyWithinGroupValidation` covering the targeted error path. Narrows the REGISTRY.md survey IF expansion Note to state the nest=True requirement under varying strata. Softens two stale comments (`_cohort_recenter_per_period` docstring, `_compute_cohort_recentered_inputs` inline comment) so they match the convention framing already in REGISTRY.md and TODO.md rather than claiming methodological correctness. All 352 tests pass (slow MC coverage sim included). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-18T22:34:48Z

/ai-review

github-actions · 2026-04-18T22:40:31Z

🔁 AI review rerun (requested by @igerber)

Head SHA: dedc2a6e38ca50aefeaf2411fdddf7d91c7c8938

Overall Assessment

⚠️ Needs changes

Executive Summary

The prior review’s main issue is only partially resolved: the PR now turns the default no-psu + varying-strata failure into a targeted error, but the fix overcorrects.
Severity P1 [Newly identified]: the new pre-auto-inject guard rejects SurveyDesign(..., strata=..., nest=True) with no explicit psu, even though the auto-inject path already preserves nest=True and SurveyDesign.resolve() can make the synthesized psu=group unique within strata.
The cell-period allocator itself remains a documented library convention rather than a paper-derived survey linearization; that is properly noted in the Methodology Registry and tracked in TODO.md, so it is informational only.
The earlier non-blocking performance concern looks resolved: dense per-period IF tensors are now only built when survey inference is active.
I could not run the tests locally in this environment because numpy is not installed, so this review is diff-based.

Methodology

Severity P1 [Newly identified]. Impact: the new auto-inject guard hard-errors on a valid nested-strata design. The comment says the failure mode is specifically the nest=False resolver path, but the implementation raises whenever strata vary within group, without checking nest. That blocks SurveyDesign(weights="pw", strata="stratum", nest=True) even though the auto-injected design preserves nest=True and the resolver explicitly supports making (stratum, psu) unique within strata. Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L729-L759, diff_diff/chaisemartin_dhaultfoeuille.py:L785-L795, diff_diff/survey.py:L299-L303, docs/methodology/REGISTRY.md:L652-L652. Concrete fix: only raise on the no-psu varying-strata path when the synthesized psu=<group> would be resolved with nest=False; allow the nest=True case to proceed and update the docs accordingly.

Code Quality

No new findings.

Performance

No new findings. The previous dense-allocation concern appears addressed by the new compute_per_period=_obs_survey_info is not None gating.

Maintainability

No new findings. The new comments are now aligned with the Registry/TODO wording and no longer overclaim a formal derivation.

Tech Debt

Severity P3. Impact: the open derivation question for the cell-period/post-period survey IF allocator is explicitly documented and tracked, so it is not a blocker under the project’s deferred-work policy. Locations: docs/methodology/REGISTRY.md:L652-L653, TODO.md:L59-L60. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

Severity P2. Impact: the new tests only cover the unsupported default nest=False auto-inject path, so the over-broad rejection of the still-valid nest=True path is unguarded. Location: tests/test_survey_dcdh.py:L1175-L1192. Concrete fix: add a regression test using SurveyDesign(weights="pw", strata="stratum", nest=True) with no explicit psu, and assert it succeeds and matches the explicit psu="group", nest=True path on SE / df_survey.

Local execution note: I could not run the targeted pytest files here because the environment is missing numpy.

Path to Approval

Narrow the new pre-auto-inject ValueError so it only fires for the actually unsupported case: varying strata with no explicit psu and nest=False.
Add a regression test for no-psu, varying-strata, nest=True, and compare it against explicit SurveyDesign(weights="pw", strata="stratum", psu="group", nest=True).
Update the fit() docstring and Methodology Registry note so they distinguish the unsupported nest=False auto-inject path from the valid nest=True auto-inject path.

The previous round's guard fired on any varying-strata + omitted-psu combination, which rejected `SurveyDesign(weights, strata, nest=True)` unnecessarily. `SurveyDesign.resolve()` at `diff_diff/survey.py:299-302` combines `(stratum, psu)` into globally-unique labels under nest=True, so the auto-injected `psu=<group>` is re-labeled per stratum and the cross-stratum uniqueness check passes. Only the `nest=False` default path actually needs the up-front guard. Narrows the guard to `not getattr(survey_design, "nest", False)` and updates the error message to enumerate three actionable remediations (constant-within-group strata, or `nest=True`, or explicit `psu`). Adds `test_auto_inject_with_varying_strata_nest_true_succeeds` under `TestSurveyWithinGroupValidation` covering the newly-accepted path: byte-for-byte match against explicit `SurveyDesign(weights, strata, psu="group", nest=True)` on `overall_se` and `survey_metadata.df_survey`. The default `nest=False` still-raises regression (`test_auto_inject_with_varying_strata_raises`) remains unchanged. Updates fit() docstring and the REGISTRY.md survey IF expansion Note to enumerate the three supported auto-inject paths: (1) strata constant within group, (2) strata vary + nest=True, (3) strata vary + nest=False (rejected with targeted ValueError). All 338 tests pass (affected surface + slow MC coverage sim). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-18T22:55:47Z

/ai-review

github-actions · 2026-04-18T23:03:40Z

🔁 AI review rerun (requested by @igerber)

Head SHA: ca7b6bfa5d0e321f3a2fb60fb3340148a44b3d7f

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review blocker on omitted psu + varying strata + nest=True appears resolved in code and tests: the pre-auto-inject guard now only rejects the nest=False path, and the new regression covers the valid nest=True path at diff_diff/chaisemartin_dhaultfoeuille.py:L750-L810 and tests/test_survey_dcdh.py:L1175-L1213.
I did not find new unmitigated P0/P1 issues in the changed estimator/variance paths. The new per-period IF tensor is threaded through overall, joiner/leaver, event-study, and placebo SE computation, and inference still routes through safe_inference().
The cell-period/post-period survey IF allocator remains a library convention rather than a paper-derived observation-level linearization, but that deviation is explicitly documented in the Methodology Registry and tracked in TODO.md, so it is informational only under the project policy at docs/methodology/REGISTRY.md:L652-L653 and TODO.md:L60.
Local execution note: this is a diff-based re-review. I could not run pytest here because the environment is missing numpy and pytest.

Methodology

No unmitigated P0/P1 findings. The changed behavior matches the documented contract in the Methodology Registry for the nest=True auto-inject case, within-cell constancy checks, and the documented survey-allocator scope at diff_diff/chaisemartin_dhaultfoeuille.py:L750-L865 and docs/methodology/REGISTRY.md:L652-L653.

Code Quality

No findings.

Performance

No findings. The dense per-period attribution tensors are now only requested when survey inference is active, which resolves the prior always-allocate concern visible in the changed call sites.

Maintainability

No findings. The fit docstring, in-code comments, tests, and Registry note now describe the same allocator contract and follow-up PR boundaries.

Tech Debt

Severity P3. Impact: the survey cell-period allocator’s post-period attribution is still a library convention, not a paper-derived observation-level linearization, but that limitation is explicitly tracked and therefore non-blocking under the project’s deferred-work policy at docs/methodology/REGISTRY.md:L652-L653 and TODO.md:L60. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

No new findings in the diff. The updated tests cover the resolved nest=True auto-inject path, within-cell rejection rules, the row-sum invariant, and the new slow coverage simulation at tests/test_survey_dcdh.py:L1175-L1543 and tests/test_dcdh_cell_period_coverage.py:L1-L134. Local execution note: I could not run them in this environment.

Lifts the NotImplementedError gate for heterogeneity= + within-group- varying PSU/strata under survey designs. The heterogeneity WLS coefficient IF psi_g is now attributed in full to the (g, out_idx) post-period cell and expanded to observation level as psi_i = psi_g * (w_i / W_{g, out_idx}) — the DID_l single-cell convention shipped in PR #323. Under PSU=group the per-obs distribution differs from the legacy psi_i = psi_g * (w_i / W_g) expansion, but the PSU-level aggregate telescopes to psi_g in both paths, so Binder TSL variance and Rao-Wu replicate variance are byte-identical under PSU=group; under within-group-varying PSU, mass lands in the post-period PSU of the transition. Tests: flipped the gating test to assert all five inference fields finite; added PSU-level byte-identity unit test constructing both psi_obs arrays and asserting compute_survey_if_variance agreement within ULP; added nest=True + varying-strata + heterogeneity smoke test (newly-unblocked regime); added multi-horizon smoke test; added slow-tier MC null-coverage test (500 reps, within-group- varying PSU, empirical 95% coverage inside [0.925, 0.975]). n_bootstrap > 0 + within-group-varying PSU remains gated (follow-up PR). Updated REGISTRY.md heterogeneity Note + Survey IF expansion Note scope-limitations paragraph; updated _compute_heterogeneity_test docstring + the stale legacy-allocator comment in _survey_se_from_group_if; added CHANGELOG Changed entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…entinel cells **P1 (methodology):** Under terminal missingness, `_cohort_recenter_per_period` subtracts cohort column means across the full period grid, so a group with no observation at period t acquires non-zero centered mass at that cell. The PR-4 cell-level bootstrap builds `psu_codes_per_cell` with -1 sentinels for such cells and `_unroll_target_to_cells` drops them — silently losing that centered mass. Under within-group-varying PSU + terminal missingness, this would under-cluster the bootstrap SE/CI/p-values. Conservative guard: `_unroll_target_to_cells` now raises `ValueError` when any sentinel cell (-1 PSU) carries non-zero cohort-recentered IF mass (|u| > 1e-12). The error message points users to `n_bootstrap=0` for analytical TSL on such panels. The analytical path has the same mass-leakage behavior under this regime but was shipped in PR #323; documenting the bootstrap- specific guard here avoids advertising a broken combination. Regression test: `test_bootstrap_cell_level_raises_on_sentinel_mass_leak` constructs a per-cell IF tensor with non-zero mass at a -1-PSU cell and asserts `_compute_dcdh_bootstrap` raises with the documented error message. **P2 (tests):** The slow MC bootstrap coverage test previously ran at `L_max=1`, which collapsed the multi-horizon block to a single target and never exercised the cross-horizon shared-weight path described in its own docstring. Bumped to `L_max=2` so the shared (n_bootstrap, n_psu) PSU-level weight matrix is drawn once and broadcast across horizons via each horizon's cell-to-PSU map. Added three assertions: - Horizon-1 bootstrap CI coverage in [0.925, 0.975]. - Horizon-2 bootstrap CI coverage in [0.910, 0.975]. Tolerance is wider than h-1 because finite-sample analytical TSL coverage on this DGP is itself ~0.93 at l=2 (measured offline: analytical h-1 = 0.94, h-2 = 0.926 at n_groups=40). An observed bootstrap coverage within 1pp of the analytical baseline is consistent with correct clustering; a drop to ≤ 0.90 would indicate a real shared-weight broadcast regression. - `cband_crit_value` finite in ≥ 90% of reps — validates that the shared (n_bootstrap, n_psu) weight matrix produces a coherent joint distribution across horizons (required for a valid sup-t simultaneous band). Bumped n_bootstrap to 1000 (from 500) to keep internal bootstrap MC noise below ~0.3pp per CI endpoint at horizon-2's slightly wider percentile-CI spread. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Addresses the P0 escalation: recommending `n_bootstrap=0` as a workaround for terminal missingness + within-group-varying PSU was incorrect — the analytical TSL path uses the SAME cell-period allocator and has the same silent mass-drop bug. **Analytical guard parity.** `_survey_se_from_group_if` now computes `W_cell` (per-(g,t) weight totals) and, before the cell-to-obs expansion, checks whether any cell has `W_cell == 0` while the corresponding cohort-recentered IF mass is non-zero. If so, raises a targeted `ValueError` mirroring the bootstrap-side `_unroll_target_to_cells` guard. The message text is aligned across the two paths (same "no positive-weight observations" phrasing) so the regression test matches both. **Docs cleanup.** Removed the "use n_bootstrap=0 as workaround" language from REGISTRY, CHANGELOG, and the fit() docstring. Replaced with the correct workaround: pre-process the panel (drop late-exit groups / trim to a balanced sub-panel), or use an explicit `psu=<group_col>` so the dispatcher routes through the legacy group-level path (which does not use the cell-period allocator and is not affected by the mass-leak). **Regression test update.** The end-to-end fit() regression now asserts `ValueError` on BOTH `n_bootstrap=0` and `n_bootstrap > 0` under the terminally-missing + within-group-varying PSU fixture. This is technically a behavior change for panels previously covered silently by PR #323's cell-period analytical allocator — those panels used to produce finite (but silently mass-dropped) SEs and now raise. The change closes a real silent-correctness bug; the analytical path never had a principled treatment for the leaked mass in the first place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The round-5 analytical guard in _survey_se_from_group_if was over- scoped: it fired on any panel with leaked centered mass, including PSU-within-group-constant regimes (PSU=group auto-inject, strictly- coarser-PSU-within-group-constant). But the docs said those regimes are the intended workaround. Panels with terminal missingness and PSU=group are the documented supported path and must not raise. Fix: add an analytical dispatcher inside `_survey_se_from_group_if` mirroring the bootstrap's `_psu_varies_within_group` routing. When PSU is within-group-constant on the eligible subset, fall back to the legacy group-level allocator (which uses U_centered[g] directly via the row-sum identity and does not trigger the sentinel-mass guard). Only when PSU actually varies within group does the cell allocator run — and only then can the sentinel-mass guard fire. Byte-identity: under PSU=group the row-sum identity makes the cell and group allocators statistically equivalent, but the legacy branch was the one in use before PR #323 introduced the cell allocator. Rerouting to it under within-group-constant regimes is a regression-free fallback (it's what PR #323 aimed to preserve via byte-identity in the first place). Test coverage: added `test_fit_succeeds_on_terminal_missingness_with_psu_group` — same fixture as the varying-PSU failure regression but with auto-inject `psu=<group>`, asserts both `n_bootstrap=0` and `n_bootstrap > 0` succeed with finite SE. Also updated the cell-level bootstrap `ValueError` text (and the _unroll_target_to_cells docstring) to no longer advertise `n_bootstrap=0` as a workaround — both paths now fail consistently on the varying-PSU carve-out, and the correct workaround is pre-processing or using an explicit `psu=<group_col>`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Round-6's constant-PSU fallback in `_survey_se_from_group_if` silently disabled the cell-period allocator for replicate-weight designs, because replicate designs have `resolved.psu is None` and the fallback routes `psu_arr is None` to the legacy group-level path. That's an allocator change on a Class A surface (overall DID_M, joiners, leavers, dynamic placebos) under per-row-varying replicate ratios, where cell and legacy allocators produce materially different Rao-Wu variances (same non-invariance PR #329 established for heterogeneity). Fix: restrict the round-6 fallback to the TSL branch by gating on `not resolved.uses_replicate_variance`. Replicate designs retain the cell-period allocator (the PR #323 Class A contract), and the sentinel-mass guard still fires on mass leakage when it applies. Regression: `TestReplicateClassA::test_att_cell_allocator_with_varying_replicate_ratios` constructs legacy and cell-level psi_obs on the fitted replicate design, feeds both through `compute_replicate_if_variance`, and asserts they produce materially different variances — locking the allocator contract so a future refactor that switches Class A to the legacy allocator would be detectable. Mirrors the heterogeneity non-invariance guard from PR #329's CI review round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ss A regression **P1 (docs scope carve-out missing for replicate ATT):** the sentinel-mass `ValueError` fires BEFORE the replicate-vs-TSL dispatch inside `_survey_se_from_group_if`, so replicate-weight ATT fits (which always use the cell-period allocator per the PR #323 Class A contract) also raise on terminal missingness. The docs previously scoped the carve-out to "within-group-varying PSU / analytical TSL / bootstrap" and silently omitted the replicate path. Rewrite the scope note in REGISTRY.md (both the balanced- baseline Note and the Survey + bootstrap contract Note), CHANGELOG, and the fit() docstring to list all three affected paths explicitly: Binder TSL with within-group-varying PSU, Rao-Wu replicate ATT, and the cell-level wild PSU bootstrap. Clarify that the `psu=<group_col>` workaround applies ONLY to Binder TSL — replicate ATT and the varying-PSU bootstrap have no allocator fallback; the only workaround for those paths is pre-processing the panel to remove terminal missingness. **P2 (synthetic allocator test didn't pin fitted SE):** the previous `test_att_cell_allocator_with_varying_replicate_ratios` showed that synthetic cell and legacy allocators differ, but never tied the difference back to the fit's `overall_se`. A refactor switching Class A replicate to the legacy allocator could have slipped through unnoticed. Rewrite using a pytest monkeypatch spy on `compute_replicate_if_variance` to capture the psi_obs the fit actually passes, then reconstruct the legacy-equivalent psi_obs via the row-sum identity on the SAME captured `U_centered` and verify: - fit's `overall_se**2` equals the recomputed variance on the captured cell-allocator psi_obs (sanity / dispatch alignment); - the legacy-reconstructed psi_obs produces a materially different Rao-Wu variance on the same fixture — so a silent Class A allocator regression would flip overall_se and fail this guard. No code behavior changed in round-8 beyond the doc updates — only the test and documentation now lock the contract explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-failures audit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber and others added 4 commits April 18, 2026 17:48

igerber added the ready-for-ci Triggers CI test workflows label Apr 18, 2026

igerber merged commit 5798d09 into main Apr 19, 2026
23 of 24 checks passed

igerber deleted the feature/dcdh-cell-period-if branch April 19, 2026 00:34

igerber mentioned this pull request Apr 19, 2026

Extend dCDH PSU-level wild bootstrap to cell granularity #332

Merged

igerber mentioned this pull request Apr 20, 2026

Release 3.2.0: BusinessReport preview, dCDH survey completion, silent-failures audit #342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add cell-period IF allocator for dCDH survey variance#323

Add cell-period IF allocator for dCDH survey variance#323
igerber merged 6 commits intomainfrom
feature/dcdh-cell-period-if

igerber commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

igerber commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

igerber commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 18, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

igerber commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 18, 2026

Uh oh!

github-actions Bot commented Apr 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant