Add cell-period IF allocator for dCDH survey variance#323
Conversation
Plumbs per-(g,t)-cell contributions through the dCDH influence-function
functions without changing behavior. Extends _obs_survey_info with
time_ids and periods, modifies _compute_full_per_group_contributions,
_compute_per_group_if_multi_horizon, and _compute_per_group_if_placebo_horizon
to return per-period attribution matrices alongside the per-group IFs,
adds _cohort_recenter_per_period for column-wise centering, and wires the
2D attributions through _compute_cohort_recentered_inputs into
_compute_se / _survey_se_from_group_if.
The survey expansion now uses the cell-period allocator
psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i}) when
per-period attribution is supplied. Under within-group-constant PSU
(the current validator's accepted input), per-cell sums telescope to
U_centered[g], so PSU-level Binder aggregation produces byte-identical
variance (differences only at FP single-ULP level from summation order).
The within-group-constant validator remains in place; it lifts in
Stage 2 of PR 2. Tests in tests/test_survey_dcdh.py,
tests/test_survey_dcdh_replicate_psu.py,
tests/test_chaisemartin_dhaultfoeuille.py, and tests/test_honest_did.py
pass unchanged (318/318).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces _validate_group_constant_strata_psu with a relaxed _validate_cell_constant_strata_psu (within-cell constancy, trivially satisfied in one-obs-per-cell panels). Adds a diagnostic helper _strata_psu_vary_within_group and two NotImplementedError gates at the validator call site: heterogeneity= and n_bootstrap > 0 both still use the legacy group-level IF expansion / PSU map and therefore reject designs whose PSU varies within group (PR 3 and PR 4 extend them). Updates REGISTRY.md cluster, heterogeneity, survey IF expansion, and survey+bootstrap notes to describe the cell-period allocator, the within-cell constancy contract, and the two scope gates. Updates the class fit() docstring survey_design section to match. Updates stale "within-group-constant" comments in the bootstrap path since the constancy is now enforced by the gate rather than the upstream validator. Inverts the TestSurveyWithinGroupValidation rejection tests to acceptance tests under the cell allocator. Adds positive-rule tests for within-cell strata and PSU variation (rejected) and gate tests for heterogeneity and bootstrap (raise NotImplementedError). Adds tests/test_dcdh_cell_period_coverage.py: Monte Carlo coverage simulation at 500 reps on a panel with PSU varying across cells of each group (two PSUs per group on even/odd periods). Empirical 95% coverage hits 0.956 on the seeded draw — inside the ±2.5pp tolerance (~2 MC-sigma). Marked @pytest.mark.slow and excluded by default. All 322 tests pass on the affected surface (test_survey_dcdh, test_survey_dcdh_replicate_psu, test_chaisemartin_dhaultfoeuille, test_honest_did). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The helper now returns (U, U_per_period). The cohort-recentering methodology test at test_cohort_recentering_not_grand_mean only needs the per-group vector U, so destructure and discard the per-period matrix. Matches the return-shape change landed in Stage 1 of the cell-period IF allocator work. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a row-sum identity test for the cell-period allocator on a hand-computed 4-group x 3-period panel, verifying: (1) per-group IF matches the closed-form joiner/stable_0 decomposition; (2) per-period attribution sums across time to the per-group IF exactly; (3) post- period attribution places all mass at t=2 with zeros at earlier columns; (4) column-wise cohort centering preserves the row-sum identity. REGISTRY.md frames the post-period attribution as a library convention — adopted because it preserves the group/PSU-sum identities of the prior group-level expansion and produces approximately nominal MC coverage on the test DGP — rather than a theorem derived from the observation-level survey linearization. Inline comments in _compute_full_per_group_contributions, _compute_per_group_if_multi_horizon, and _compute_per_group_if_placebo_horizon mirror that framing. Updates stale local type annotations for multi_horizon_if / placebo_horizon_if to the Tuple[np.ndarray, np.ndarray] returns. TODO.md gains a Methodology/Correctness entry tracking the open validation question (formal derivation against observation-level survey linearization, or replacement with a covariance-aware alternative). All 348 tests pass (slow MC coverage sim included). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
Public contract fix: when `survey_design.psu` is omitted, fit() auto-injects psu=<group_col> with nest=False. If strata vary within group, the synthesized PSU column reuses group labels across strata and downstream survey resolution fails on the cross-stratum PSU uniqueness check. fit() now detects that combination before auto- inject runs (via the existing _strata_psu_vary_within_group helper) and raises a targeted ValueError pointing users at the explicit `psu=<col>, nest=True` path. Performance fix: the cell-period allocator's per-(g,t) tensor (U_per_period) was being allocated unconditionally in _compute_full_per_group_contributions, _compute_per_group_if_multi_horizon, _compute_per_group_if_placebo_horizon, and their _compute_cohort_recentered_inputs aggregator. That added O(G*T) memory to the main ATT path and O(G*T*L_max) to the event-study / placebo paths on every fit — including non-survey fits that never consume the tensor. Adds a `compute_per_period: bool = True` param through each helper, gates allocation + accumulation behind the flag, and threads `compute_per_period=(_obs_survey_info is not None)` from fit(). Downstream consumers now None-guard the per-period recentering / _compute_se calls so the plug-in SE path runs with no cell-level work. Adds `test_auto_inject_with_varying_strata_raises` under `TestSurveyWithinGroupValidation` covering the targeted error path. Narrows the REGISTRY.md survey IF expansion Note to state the nest=True requirement under varying strata. Softens two stale comments (`_cohort_recenter_per_period` docstring, `_compute_cohort_recentered_inputs` inline comment) so they match the convention framing already in REGISTRY.md and TODO.md rather than claiming methodological correctness. All 352 tests pass (slow MC coverage sim included). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentExecutive Summary
Methodology
Code QualityNo new findings. PerformanceNo new findings. The previous dense-allocation concern appears addressed by the new MaintainabilityNo new findings. The new comments are now aligned with the Registry/TODO wording and no longer overclaim a formal derivation. Tech Debt
SecurityNo findings. Documentation/Tests
Local execution note: I could not run the targeted pytest files here because the environment is missing Path to Approval
|
The previous round's guard fired on any varying-strata + omitted-psu combination, which rejected `SurveyDesign(weights, strata, nest=True)` unnecessarily. `SurveyDesign.resolve()` at `diff_diff/survey.py:299-302` combines `(stratum, psu)` into globally-unique labels under nest=True, so the auto-injected `psu=<group>` is re-labeled per stratum and the cross-stratum uniqueness check passes. Only the `nest=False` default path actually needs the up-front guard. Narrows the guard to `not getattr(survey_design, "nest", False)` and updates the error message to enumerate three actionable remediations (constant-within-group strata, or `nest=True`, or explicit `psu`). Adds `test_auto_inject_with_varying_strata_nest_true_succeeds` under `TestSurveyWithinGroupValidation` covering the newly-accepted path: byte-for-byte match against explicit `SurveyDesign(weights, strata, psu="group", nest=True)` on `overall_se` and `survey_metadata.df_survey`. The default `nest=False` still-raises regression (`test_auto_inject_with_varying_strata_raises`) remains unchanged. Updates fit() docstring and the REGISTRY.md survey IF expansion Note to enumerate the three supported auto-inject paths: (1) strata constant within group, (2) strata vary + nest=True, (3) strata vary + nest=False (rejected with targeted ValueError). All 338 tests pass (affected surface + slow MC coverage sim). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Lifts the NotImplementedError gate for heterogeneity= + within-group-
varying PSU/strata under survey designs. The heterogeneity WLS
coefficient IF psi_g is now attributed in full to the (g, out_idx)
post-period cell and expanded to observation level as
psi_i = psi_g * (w_i / W_{g, out_idx}) — the DID_l single-cell
convention shipped in PR #323. Under PSU=group the per-obs
distribution differs from the legacy psi_i = psi_g * (w_i / W_g)
expansion, but the PSU-level aggregate telescopes to psi_g in both
paths, so Binder TSL variance and Rao-Wu replicate variance are
byte-identical under PSU=group; under within-group-varying PSU,
mass lands in the post-period PSU of the transition.
Tests: flipped the gating test to assert all five inference fields
finite; added PSU-level byte-identity unit test constructing both
psi_obs arrays and asserting compute_survey_if_variance agreement
within ULP; added nest=True + varying-strata + heterogeneity
smoke test (newly-unblocked regime); added multi-horizon smoke test;
added slow-tier MC null-coverage test (500 reps, within-group-
varying PSU, empirical 95% coverage inside [0.925, 0.975]).
n_bootstrap > 0 + within-group-varying PSU remains gated (follow-up
PR). Updated REGISTRY.md heterogeneity Note + Survey IF expansion
Note scope-limitations paragraph; updated _compute_heterogeneity_test
docstring + the stale legacy-allocator comment in
_survey_se_from_group_if; added CHANGELOG Changed entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…entinel cells **P1 (methodology):** Under terminal missingness, `_cohort_recenter_per_period` subtracts cohort column means across the full period grid, so a group with no observation at period t acquires non-zero centered mass at that cell. The PR-4 cell-level bootstrap builds `psu_codes_per_cell` with -1 sentinels for such cells and `_unroll_target_to_cells` drops them — silently losing that centered mass. Under within-group-varying PSU + terminal missingness, this would under-cluster the bootstrap SE/CI/p-values. Conservative guard: `_unroll_target_to_cells` now raises `ValueError` when any sentinel cell (-1 PSU) carries non-zero cohort-recentered IF mass (|u| > 1e-12). The error message points users to `n_bootstrap=0` for analytical TSL on such panels. The analytical path has the same mass-leakage behavior under this regime but was shipped in PR #323; documenting the bootstrap- specific guard here avoids advertising a broken combination. Regression test: `test_bootstrap_cell_level_raises_on_sentinel_mass_leak` constructs a per-cell IF tensor with non-zero mass at a -1-PSU cell and asserts `_compute_dcdh_bootstrap` raises with the documented error message. **P2 (tests):** The slow MC bootstrap coverage test previously ran at `L_max=1`, which collapsed the multi-horizon block to a single target and never exercised the cross-horizon shared-weight path described in its own docstring. Bumped to `L_max=2` so the shared (n_bootstrap, n_psu) PSU-level weight matrix is drawn once and broadcast across horizons via each horizon's cell-to-PSU map. Added three assertions: - Horizon-1 bootstrap CI coverage in [0.925, 0.975]. - Horizon-2 bootstrap CI coverage in [0.910, 0.975]. Tolerance is wider than h-1 because finite-sample analytical TSL coverage on this DGP is itself ~0.93 at l=2 (measured offline: analytical h-1 = 0.94, h-2 = 0.926 at n_groups=40). An observed bootstrap coverage within 1pp of the analytical baseline is consistent with correct clustering; a drop to ≤ 0.90 would indicate a real shared-weight broadcast regression. - `cband_crit_value` finite in ≥ 90% of reps — validates that the shared (n_bootstrap, n_psu) weight matrix produces a coherent joint distribution across horizons (required for a valid sup-t simultaneous band). Bumped n_bootstrap to 1000 (from 500) to keep internal bootstrap MC noise below ~0.3pp per CI endpoint at horizon-2's slightly wider percentile-CI spread. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Addresses the P0 escalation: recommending `n_bootstrap=0` as a workaround for terminal missingness + within-group-varying PSU was incorrect — the analytical TSL path uses the SAME cell-period allocator and has the same silent mass-drop bug. **Analytical guard parity.** `_survey_se_from_group_if` now computes `W_cell` (per-(g,t) weight totals) and, before the cell-to-obs expansion, checks whether any cell has `W_cell == 0` while the corresponding cohort-recentered IF mass is non-zero. If so, raises a targeted `ValueError` mirroring the bootstrap-side `_unroll_target_to_cells` guard. The message text is aligned across the two paths (same "no positive-weight observations" phrasing) so the regression test matches both. **Docs cleanup.** Removed the "use n_bootstrap=0 as workaround" language from REGISTRY, CHANGELOG, and the fit() docstring. Replaced with the correct workaround: pre-process the panel (drop late-exit groups / trim to a balanced sub-panel), or use an explicit `psu=<group_col>` so the dispatcher routes through the legacy group-level path (which does not use the cell-period allocator and is not affected by the mass-leak). **Regression test update.** The end-to-end fit() regression now asserts `ValueError` on BOTH `n_bootstrap=0` and `n_bootstrap > 0` under the terminally-missing + within-group-varying PSU fixture. This is technically a behavior change for panels previously covered silently by PR #323's cell-period analytical allocator — those panels used to produce finite (but silently mass-dropped) SEs and now raise. The change closes a real silent-correctness bug; the analytical path never had a principled treatment for the leaked mass in the first place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The round-5 analytical guard in _survey_se_from_group_if was over- scoped: it fired on any panel with leaked centered mass, including PSU-within-group-constant regimes (PSU=group auto-inject, strictly- coarser-PSU-within-group-constant). But the docs said those regimes are the intended workaround. Panels with terminal missingness and PSU=group are the documented supported path and must not raise. Fix: add an analytical dispatcher inside `_survey_se_from_group_if` mirroring the bootstrap's `_psu_varies_within_group` routing. When PSU is within-group-constant on the eligible subset, fall back to the legacy group-level allocator (which uses U_centered[g] directly via the row-sum identity and does not trigger the sentinel-mass guard). Only when PSU actually varies within group does the cell allocator run — and only then can the sentinel-mass guard fire. Byte-identity: under PSU=group the row-sum identity makes the cell and group allocators statistically equivalent, but the legacy branch was the one in use before PR #323 introduced the cell allocator. Rerouting to it under within-group-constant regimes is a regression-free fallback (it's what PR #323 aimed to preserve via byte-identity in the first place). Test coverage: added `test_fit_succeeds_on_terminal_missingness_with_psu_group` — same fixture as the varying-PSU failure regression but with auto-inject `psu=<group>`, asserts both `n_bootstrap=0` and `n_bootstrap > 0` succeed with finite SE. Also updated the cell-level bootstrap `ValueError` text (and the _unroll_target_to_cells docstring) to no longer advertise `n_bootstrap=0` as a workaround — both paths now fail consistently on the varying-PSU carve-out, and the correct workaround is pre-processing or using an explicit `psu=<group_col>`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Round-6's constant-PSU fallback in `_survey_se_from_group_if` silently disabled the cell-period allocator for replicate-weight designs, because replicate designs have `resolved.psu is None` and the fallback routes `psu_arr is None` to the legacy group-level path. That's an allocator change on a Class A surface (overall DID_M, joiners, leavers, dynamic placebos) under per-row-varying replicate ratios, where cell and legacy allocators produce materially different Rao-Wu variances (same non-invariance PR #329 established for heterogeneity). Fix: restrict the round-6 fallback to the TSL branch by gating on `not resolved.uses_replicate_variance`. Replicate designs retain the cell-period allocator (the PR #323 Class A contract), and the sentinel-mass guard still fires on mass leakage when it applies. Regression: `TestReplicateClassA::test_att_cell_allocator_with_varying_replicate_ratios` constructs legacy and cell-level psi_obs on the fitted replicate design, feeds both through `compute_replicate_if_variance`, and asserts they produce materially different variances — locking the allocator contract so a future refactor that switches Class A to the legacy allocator would be detectable. Mirrors the heterogeneity non-invariance guard from PR #329's CI review round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ss A regression **P1 (docs scope carve-out missing for replicate ATT):** the sentinel-mass `ValueError` fires BEFORE the replicate-vs-TSL dispatch inside `_survey_se_from_group_if`, so replicate-weight ATT fits (which always use the cell-period allocator per the PR #323 Class A contract) also raise on terminal missingness. The docs previously scoped the carve-out to "within-group-varying PSU / analytical TSL / bootstrap" and silently omitted the replicate path. Rewrite the scope note in REGISTRY.md (both the balanced- baseline Note and the Survey + bootstrap contract Note), CHANGELOG, and the fit() docstring to list all three affected paths explicitly: Binder TSL with within-group-varying PSU, Rao-Wu replicate ATT, and the cell-level wild PSU bootstrap. Clarify that the `psu=<group_col>` workaround applies ONLY to Binder TSL — replicate ATT and the varying-PSU bootstrap have no allocator fallback; the only workaround for those paths is pre-processing the panel to remove terminal missingness. **P2 (synthetic allocator test didn't pin fitted SE):** the previous `test_att_cell_allocator_with_varying_replicate_ratios` showed that synthetic cell and legacy allocators differ, but never tied the difference back to the fit's `overall_se`. A refactor switching Class A replicate to the legacy allocator could have slipped through unnoticed. Rewrite using a pytest monkeypatch spy on `compute_replicate_if_variance` to capture the psi_obs the fit actually passes, then reconstruct the legacy-equivalent psi_obs via the row-sum identity on the SAME captured `U_centered` and verify: - fit's `overall_se**2` equals the recomputed variance on the captured cell-allocator psi_obs (sanity / dispatch alignment); - the legacy-reconstructed psi_obs produces a materially different Rao-Wu variance on the same fixture — so a silent Class A allocator regression would flip overall_se and fail this guard. No code behavior changed in round-8 beyond the doc updates — only the test and documentation now lock the contract explicitly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-failures audit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
U[g]into per-(g, t)-cell attributions and expands to observation level aspsi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i}).NotImplementedErrorgates for out-of-scope combinations:heterogeneity=with within-group-varying PSU/strata (follow-up PR will extend the WLSpsi_obspath), andn_bootstrap > 0with within-group-varying PSU (follow-up PR will extend the PSU-level wild bootstrap map).tests/test_dcdh_cell_period_coverage.py,@pytest.mark.slow) on a DGP with PSU varying across the cells of each group. Empirical 95% coverage hits 0.956 (±2.5pp band).U_per_period.sum(axis=1) == U, post-period attribution placement, and cohort-centering preservation.TestSurveyWithinGroupValidationrejection tests into acceptance tests under the cell allocator; adds positive-rule tests for within-cell strata / PSU variation rejection and gate tests for the two NotImplementedError combos.docs/methodology/REGISTRY.md: post-period attribution is framed as a library convention (adopted because it preserves the group/PSU-sum identities of the prior group-level expansion and produces approximately nominal MC coverage on the test DGP) rather than a theorem derived from observation-level survey linearization. A TODO.md Methodology/Correctness entry tracks the open validation question.U_centered[g], so PSU-level Binder aggregation is byte-identical to the prior group-level expansion up to single-ULP floating-point noise. All existing survey, core, methodology, parity, and HonestDiD tests pass without modification.Methodology references (required if estimator / math changes)
docs/methodology/REGISTRY.mdNote "Survey IF expansion — library convention" and theTODO.mdentry tracking the open derivation question.Validation
tests/test_survey_dcdh.py— 7 new/inverted tests inTestSurveyWithinGroupValidation(acceptance + within-cell rejection + two NotImplementedError gates + row-sum invariant on a hand-computed panel)tests/test_dcdh_cell_period_coverage.py— new file, 1 test,@pytest.mark.slow, 500 MC replications at 95.6% empirical coveragetests/test_methodology_chaisemartin_dhaultfoeuille.py— unpack the new tuple return intest_cohort_recentering_not_grand_meantests/test_dcdh_cell_period_coverage.py); all 333 non-slow tests and the slow coverage test pass.Security / privacy
Generated with Claude Code