Extend dCDH heterogeneity SE to cell-period allocator by igerber · Pull Request #329 · igerber/diff-diff

igerber · 2026-04-19T10:40:31Z

Summary

Lift the NotImplementedError gate for heterogeneity= combined with within-group-varying PSU/strata under survey_design.
Replace the group-level IF expansion ψ_i = ψ_g * (w_i / W_g) with the cell-period post-period single-cell expansion ψ_i = ψ_g * (w_i / W_{g, out_idx}), mirroring the DID_l convention shipped in v3.1.x.
Under PSU=group the per-obs distribution differs but the PSU-level aggregate telescopes to ψ_g, so Binder TSL and Rao-Wu replicate variance are byte-identical; under within-group-varying PSU, mass lands in the post-period PSU.
n_bootstrap > 0 + within-group-varying PSU remains gated (follow-up PR).

Methodology references (required if estimator / math changes)

Method name(s): ChaisemartinDHaultfoeuille._compute_heterogeneity_test (Web Appendix Section 1.5, Lemma 7); cell-period survey IF allocator.
Paper / source link(s): de Chaisemartin & D'Haultfœuille (2020, AER); (2022, NBER WP 29873, Web Appendix §§1.5, 3.7.3). Binder (1983) stratified-PSU variance. Survey IF expansion is a library extension not derived in the source papers (see REGISTRY.md Survey IF expansion Note).
Any intentional deviations from the source (and why): Post-period single-cell attribution for ψ_g is a library convention (documented in REGISTRY.md heterogeneity Note + Survey IF expansion Note). Formal observation-level derivation remains open (tracked in TODO.md). Empty post-period cells with zero weight drop the group's contribution (matches ATT cell allocator's empty-cell convention).

Validation

Tests added/updated:
- tests/test_survey_dcdh.py: flipped gating test (test_heterogeneity_with_varying_psu_succeeds — asserts all 5 inference fields finite); added TestHeterogeneityCellPeriod::test_psu_level_byte_identity_under_psu_equals_group (math-aligned PSU-sum unit test); added test_heterogeneity_auto_inject_with_varying_strata_nest_true_succeeds (newly-unblocked nest=True regime); added test_heterogeneity_multi_horizon_varying_psu_succeeds.
- tests/test_dcdh_heterogeneity_cell_period_coverage.py: slow-tier 500-rep MC null-coverage test on a DGP with within-group-varying PSU; empirical 95% coverage inside [0.925, 0.975].
Existing regression guard: tests/test_survey_dcdh.py::TestSurveyHeterogeneity (PSU=group numerics) passes byte-for-byte.
Backtest / simulation / notebook evidence (if applicable): MC coverage test validates the allocator empirically on the newly-unblocked regime. No tutorial updates.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Lifts the NotImplementedError gate for heterogeneity= + within-group- varying PSU/strata under survey designs. The heterogeneity WLS coefficient IF psi_g is now attributed in full to the (g, out_idx) post-period cell and expanded to observation level as psi_i = psi_g * (w_i / W_{g, out_idx}) — the DID_l single-cell convention shipped in PR #323. Under PSU=group the per-obs distribution differs from the legacy psi_i = psi_g * (w_i / W_g) expansion, but the PSU-level aggregate telescopes to psi_g in both paths, so Binder TSL variance and Rao-Wu replicate variance are byte-identical under PSU=group; under within-group-varying PSU, mass lands in the post-period PSU of the transition. Tests: flipped the gating test to assert all five inference fields finite; added PSU-level byte-identity unit test constructing both psi_obs arrays and asserting compute_survey_if_variance agreement within ULP; added nest=True + varying-strata + heterogeneity smoke test (newly-unblocked regime); added multi-horizon smoke test; added slow-tier MC null-coverage test (500 reps, within-group- varying PSU, empirical 95% coverage inside [0.925, 0.975]). n_bootstrap > 0 + within-group-varying PSU remains gated (follow-up PR). Updated REGISTRY.md heterogeneity Note + Survey IF expansion Note scope-limitations paragraph; updated _compute_heterogeneity_test docstring + the stale legacy-allocator comment in _survey_se_from_group_if; added CHANGELOG Changed entry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-19T10:47:24Z

Overall Assessment

⚠️ Needs changes — one unmitigated P1 in the replicate-weight heterogeneity SE path.

Executive Summary

The affected method is ChaisemartinDHaultfoeuille._compute_heterogeneity_test (Web Appendix §1.5 / Lemma 7), specifically its survey-design variance path.
The TSL/Binder cell-period allocator itself is documented in REGISTRY.md, and the lack of a formal observation-level derivation is already tracked in TODO.md; I am not counting that documented library extension as a defect.
P1: the PR also applies the new post-period psi_obs allocator to replicate-weight variance, but compute_replicate_if_variance() is observation-level, not PSU-level. That breaks the PR’s claimed “byte-identical under PSU=group” property and makes the replicate branch an undocumented methodology change.
The changed tests validate analytical TSL and Binder byte-identity, but they do not exercise the newly-unblocked replicate_weights= + within-group-varying PSU heterogeneity path.
Minor internal docstrings/comments still describe heterogeneity + within-group-varying PSU as gated.

Methodology

P1 Affected method: ChaisemartinDHaultfoeuille._compute_heterogeneity_test survey heterogeneity SE path in diff_diff/chaisemartin_dhaultfoeuille.py:L3931-L3972, with the replicate helper in diff_diff/survey.py:L1707-L1742. Impact: the new code assumes that the legacy group allocator and new post-period cell allocator are identical for Rao-Wu replicate variance under psu=group, and the same claim appears in docs/methodology/REGISTRY.md:L618, docs/methodology/REGISTRY.md:L652, and CHANGELOG.md:L12. That is not true for the shipped helper: compute_replicate_if_variance() forms each replicate as theta_r = sum_i ratio_ir * psi_i, so moving psi_g from all rows in a group to only the post-period cell changes theta_r whenever replicate ratios vary within group. A minimal counterexample from the implemented formula is old psi=[0.5,0.5], new psi=[0,1], replicate ratios [0.5,1.5] / [1.5,0.5], which gives replicate totals [1,1] vs [1.5,0.5]. This silently changes heterogeneity SE/p-value/CI for previously supported replicate-weight fits and is an undocumented methodology deviation, not a mere implementation choice. Concrete fix: either keep the new allocator out of the replicate branch until a replicate-specific derivation/validation exists, or explicitly document the behavioral change and change the implementation/test suite to match that contract; if byte-identity is the intended contract, the replicate path must aggregate at the group/PSU level before applying Rao-Wu reweighting.

Code Quality

No findings.

Performance

No findings.

Maintainability

P3 diff_diff/chaisemartin_dhaultfoeuille.py:L5431-L5438 still says “heterogeneity and n_bootstrap > 0 currently require within-group constancy,” and tests/test_survey_dcdh.py:L1074-L1080 still says heterogeneity + within-group-varying PSU raises NotImplementedError. Impact: the internal contract description is now stale and will mislead the next maintainer who touches the gating logic. Concrete fix: update both docstrings/comments so they say only n_bootstrap > 0 + within-group-varying PSU remains gated.

Tech Debt

P3 Informational only: the lack of a formal observation-level derivation for the post-period cell allocator is already tracked in TODO.md:L60 and documented in docs/methodology/REGISTRY.md:L652. No action required for this PR unless you decide to widen the TODO to cover the replicate-weight branch explicitly.

Security

No findings.

Documentation/Tests

P2 The changed tests in tests/test_survey_dcdh.py:L1141-L1278 and tests/test_dcdh_heterogeneity_cell_period_coverage.py:L91-L150 cover only the analytical TSL path and Binder byte-identity. They do not cover the newly-unblocked replicate-weight heterogeneity path where the P1 lives. Impact: the current suite would not catch the replicate non-invariance described above. Concrete fix: add a targeted regression with replicate_weights= that vary within group and heterogeneity= under psu=group or within-group-varying PSU, and assert the intended documented behavior.
Validation note: this review was source-based; I could not run pytest here because the sandbox Python lacks the project’s NumPy/Pandas dependencies.

Path to Approval

Decide the contract for replicate-weight heterogeneity under the new allocator. Either re-gate/retain the legacy group-level expansion for replicate designs for now, or explicitly register the new replicate behavior as a methodology deviation instead of claiming byte-identity under psu=group.
Add a regression test for heterogeneity= + replicate_weights= where replicate ratios vary within group. A helper-level test against compute_replicate_if_variance() is acceptable; an end-to-end fit() test is better.
Update the stale internal docstrings/comments so the codebase has one consistent statement of what is and is not still gated.

The first revision routed BOTH Binder TSL and Rao-Wu replicate variance through the new cell-period psi_obs allocator and claimed byte-identity under PSU=group. That claim only holds for Binder TSL, which aggregates psi to PSU before forming the variance quadratic. Rao-Wu (via compute_replicate_if_variance) computes theta_r = sum_i ratio_ir * psi_i at the observation level, so when a replicate column's ratios vary within a group (common — the library accepts per-row replicate matrices, not just PSU-aligned ones), moving psi_g mass from the full group onto the single post-period cell silently changes theta_r. Counterexample: psi_legacy=[0.5, 0.5], psi_new=[0, 1], ratios=[0.5, 1.5] gives theta_legacy=1, theta_new=1.5. Fix: split the observation-level allocator inside _compute_heterogeneity_test by variance helper. Binder TSL keeps the cell-period single-cell allocator (the DID_l post-period convention, needed for within-group-varying PSU). Rao-Wu replicate reverts to the legacy group-level allocator psi_i = psi_g * (w_i / W_g), preserving byte-identity of the replicate SE for every previously-supported fit. Replicate + within-group-varying PSU is unreachable by construction (SurveyDesign rejects replicate_weights combined with explicit strata/psu/fpc). Add a helper-level regression test (test_replicate_variance_non_invariance_under_varying_ratios) that reproduces the reviewer's counterexample: constructs legacy and cell-period psi_obs on a fixture with within-group-varying replicate ratios, pushes both through compute_replicate_if_variance, and asserts the two variances differ materially — so any future refactor that silently reroutes the replicate branch through the cell-period allocator will fail this test. Sync the updated contract across surfaces: _compute_heterogeneity_test docstring now enumerates both allocators and the justification; the REGISTRY.md heterogeneity Note and Survey IF expansion Note describe the split; CHANGELOG frames the new support as "under Binder TSL" rather than universal. Also fix P3 stale docstrings/comments flagged by the CI reviewer: _strata_psu_vary_within_group and TestSurveyWithinGroupValidation no longer claim heterogeneity still requires within-group constancy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T10:55:42Z

/ai-review

github-actions · 2026-04-19T11:02:50Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6205b15c7aa941c143a26e0cf57d61a59f5e8b31

Overall Assessment

Looks good — no unmitigated P0 or P1 findings in the reviewed diff.

Executive Summary

Affected method: ChaisemartinDHaultfoeuille._compute_heterogeneity_test survey heterogeneity SE path, plus the fit-time gate for within-group-varying survey structure in diff_diff/chaisemartin_dhaultfoeuille.py:3683 and diff_diff/chaisemartin_dhaultfoeuille.py:826.
Cross-checking against Web Appendix §1.5 / Lemma 7, the paper-level heterogeneity estimand is still the OLS regression of S_g(Y_{g,F_g-1+\ell} - Y_{g,F_g-1}) on X_g plus F_g × D_{g,1} × S_g indicators; the PR changes only the library’s survey-design variance plumbing around that regression, not the core paper estimand. (nber.org)
The prior replicate-variance P1 is resolved: Binder TSL now uses the documented post-period cell allocator, while the Rao-Wu replicate branch intentionally keeps the legacy group-level allocator in diff_diff/chaisemartin_dhaultfoeuille.py:3947, diff_diff/chaisemartin_dhaultfoeuille.py:3977, and diff_diff/chaisemartin_dhaultfoeuille.py:4004.
That Binder-vs-replicate split is now explicitly documented in docs/methodology/REGISTRY.md:618, docs/methodology/REGISTRY.md:652, and CHANGELOG.md:12, so I do not see an undocumented methodology deviation in the changed path.
The changed heterogeneity inference still goes through safe_inference() after converting nonpositive/invalid variance to NaN, so I do not see a new partial-NaN or inline-inference anti-pattern in the modified code path at diff_diff/chaisemartin_dhaultfoeuille.py:4024.
Test coverage now includes end-to-end varying-PSU heterogeneity fits, a helper regression for replicate non-invariance, and a slow MC coverage check in tests/test_survey_dcdh.py:1142, tests/test_survey_dcdh.py:1705, and tests/test_dcdh_heterogeneity_cell_period_coverage.py:93. I could not execute them here because pytest is not installed in this environment.

Methodology

Severity: P3 informational. Impact: Web Appendix §1.5 defines the heterogeneity target and estimator as the coefficient on X_g in an OLS regression of S_g(Y_{g,F_g-1+\ell} - Y_{g,F_g-1}) on X_g and saturated F_g × D_{g,1} × S_g indicators, with Lemma 7 proving unbiasedness under Assumptions 1, 3, and 15. The PR preserves that paper-level object and limits its change to the survey-design IF allocation/variance layer described in diff_diff/chaisemartin_dhaultfoeuille.py:3683 and docs/methodology/REGISTRY.md:618. Concrete fix: None. (nber.org)
Severity: P3 informational. Impact: The previous undocumented replicate-variance mismatch is addressed: the code now explicitly keeps the legacy group allocator on the replicate branch, matching the updated registry note and eliminating the prior silent Rao-Wu behavior change. Concrete fix: None.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 informational. Impact: The open question of a formal observation-level derivation for the survey cell-period allocator remains tracked in TODO.md:60 and documented in docs/methodology/REGISTRY.md:652. Per the review policy, that tracked limitation is mitigated and not a blocker. Concrete fix: None.

Security

No findings.

Documentation/Tests

Severity: P3 informational. Impact: The PR adds targeted tests for the newly-unblocked Binder path and for the replicate-path counterexample that motivated the prior re-review, including tests/test_survey_dcdh.py:1228, tests/test_survey_dcdh.py:1614, and tests/test_dcdh_heterogeneity_cell_period_coverage.py:93. I could not run them here because pytest is unavailable. Concrete fix: None.

Round-6's constant-PSU fallback in `_survey_se_from_group_if` silently disabled the cell-period allocator for replicate-weight designs, because replicate designs have `resolved.psu is None` and the fallback routes `psu_arr is None` to the legacy group-level path. That's an allocator change on a Class A surface (overall DID_M, joiners, leavers, dynamic placebos) under per-row-varying replicate ratios, where cell and legacy allocators produce materially different Rao-Wu variances (same non-invariance PR #329 established for heterogeneity). Fix: restrict the round-6 fallback to the TSL branch by gating on `not resolved.uses_replicate_variance`. Replicate designs retain the cell-period allocator (the PR #323 Class A contract), and the sentinel-mass guard still fires on mass leakage when it applies. Regression: `TestReplicateClassA::test_att_cell_allocator_with_varying_replicate_ratios` constructs legacy and cell-level psi_obs on the fitted replicate design, feeds both through `compute_replicate_if_variance`, and asserts they produce materially different variances — locking the allocator contract so a future refactor that switches Class A to the legacy allocator would be detectable. Mirrors the heterogeneity non-invariance guard from PR #329's CI review round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-failures audit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026

igerber merged commit ac181b7 into main Apr 19, 2026
23 of 24 checks passed

igerber deleted the feature/dcdh-heterogeneity-cell-period branch April 19, 2026 12:37

igerber mentioned this pull request Apr 19, 2026

Extend dCDH PSU-level wild bootstrap to cell granularity #332

Merged

igerber mentioned this pull request Apr 20, 2026

Release 3.2.0: BusinessReport preview, dCDH survey completion, silent-failures audit #342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend dCDH heterogeneity SE to cell-period allocator#329

Extend dCDH heterogeneity SE to cell-period allocator#329
igerber merged 2 commits intomainfrom
feature/dcdh-heterogeneity-cell-period

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 19, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant