Skip to content

Extend dCDH heterogeneity SE to cell-period allocator#329

Merged
igerber merged 2 commits intomainfrom
feature/dcdh-heterogeneity-cell-period
Apr 19, 2026
Merged

Extend dCDH heterogeneity SE to cell-period allocator#329
igerber merged 2 commits intomainfrom
feature/dcdh-heterogeneity-cell-period

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 19, 2026

Summary

  • Lift the NotImplementedError gate for heterogeneity= combined with within-group-varying PSU/strata under survey_design.
  • Replace the group-level IF expansion ψ_i = ψ_g * (w_i / W_g) with the cell-period post-period single-cell expansion ψ_i = ψ_g * (w_i / W_{g, out_idx}), mirroring the DID_l convention shipped in v3.1.x.
  • Under PSU=group the per-obs distribution differs but the PSU-level aggregate telescopes to ψ_g, so Binder TSL and Rao-Wu replicate variance are byte-identical; under within-group-varying PSU, mass lands in the post-period PSU.
  • n_bootstrap > 0 + within-group-varying PSU remains gated (follow-up PR).

Methodology references (required if estimator / math changes)

  • Method name(s): ChaisemartinDHaultfoeuille._compute_heterogeneity_test (Web Appendix Section 1.5, Lemma 7); cell-period survey IF allocator.
  • Paper / source link(s): de Chaisemartin & D'Haultfœuille (2020, AER); (2022, NBER WP 29873, Web Appendix §§1.5, 3.7.3). Binder (1983) stratified-PSU variance. Survey IF expansion is a library extension not derived in the source papers (see REGISTRY.md Survey IF expansion Note).
  • Any intentional deviations from the source (and why): Post-period single-cell attribution for ψ_g is a library convention (documented in REGISTRY.md heterogeneity Note + Survey IF expansion Note). Formal observation-level derivation remains open (tracked in TODO.md). Empty post-period cells with zero weight drop the group's contribution (matches ATT cell allocator's empty-cell convention).

Validation

  • Tests added/updated:
    • tests/test_survey_dcdh.py: flipped gating test (test_heterogeneity_with_varying_psu_succeeds — asserts all 5 inference fields finite); added TestHeterogeneityCellPeriod::test_psu_level_byte_identity_under_psu_equals_group (math-aligned PSU-sum unit test); added test_heterogeneity_auto_inject_with_varying_strata_nest_true_succeeds (newly-unblocked nest=True regime); added test_heterogeneity_multi_horizon_varying_psu_succeeds.
    • tests/test_dcdh_heterogeneity_cell_period_coverage.py: slow-tier 500-rep MC null-coverage test on a DGP with within-group-varying PSU; empirical 95% coverage inside [0.925, 0.975].
  • Existing regression guard: tests/test_survey_dcdh.py::TestSurveyHeterogeneity (PSU=group numerics) passes byte-for-byte.
  • Backtest / simulation / notebook evidence (if applicable): MC coverage test validates the allocator empirically on the newly-unblocked regime. No tutorial updates.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Lifts the NotImplementedError gate for heterogeneity= + within-group-
varying PSU/strata under survey designs. The heterogeneity WLS
coefficient IF psi_g is now attributed in full to the (g, out_idx)
post-period cell and expanded to observation level as
psi_i = psi_g * (w_i / W_{g, out_idx}) — the DID_l single-cell
convention shipped in PR #323. Under PSU=group the per-obs
distribution differs from the legacy psi_i = psi_g * (w_i / W_g)
expansion, but the PSU-level aggregate telescopes to psi_g in both
paths, so Binder TSL variance and Rao-Wu replicate variance are
byte-identical under PSU=group; under within-group-varying PSU,
mass lands in the post-period PSU of the transition.

Tests: flipped the gating test to assert all five inference fields
finite; added PSU-level byte-identity unit test constructing both
psi_obs arrays and asserting compute_survey_if_variance agreement
within ULP; added nest=True + varying-strata + heterogeneity
smoke test (newly-unblocked regime); added multi-horizon smoke test;
added slow-tier MC null-coverage test (500 reps, within-group-
varying PSU, empirical 95% coverage inside [0.925, 0.975]).

n_bootstrap > 0 + within-group-varying PSU remains gated (follow-up
PR). Updated REGISTRY.md heterogeneity Note + Survey IF expansion
Note scope-limitations paragraph; updated _compute_heterogeneity_test
docstring + the stale legacy-allocator comment in
_survey_se_from_group_if; added CHANGELOG Changed entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes — one unmitigated P1 in the replicate-weight heterogeneity SE path.

Executive Summary

  • The affected method is ChaisemartinDHaultfoeuille._compute_heterogeneity_test (Web Appendix §1.5 / Lemma 7), specifically its survey-design variance path.
  • The TSL/Binder cell-period allocator itself is documented in REGISTRY.md, and the lack of a formal observation-level derivation is already tracked in TODO.md; I am not counting that documented library extension as a defect.
  • P1: the PR also applies the new post-period psi_obs allocator to replicate-weight variance, but compute_replicate_if_variance() is observation-level, not PSU-level. That breaks the PR’s claimed “byte-identical under PSU=group” property and makes the replicate branch an undocumented methodology change.
  • The changed tests validate analytical TSL and Binder byte-identity, but they do not exercise the newly-unblocked replicate_weights= + within-group-varying PSU heterogeneity path.
  • Minor internal docstrings/comments still describe heterogeneity + within-group-varying PSU as gated.

Methodology

  • P1 Affected method: ChaisemartinDHaultfoeuille._compute_heterogeneity_test survey heterogeneity SE path in diff_diff/chaisemartin_dhaultfoeuille.py:L3931-L3972, with the replicate helper in diff_diff/survey.py:L1707-L1742. Impact: the new code assumes that the legacy group allocator and new post-period cell allocator are identical for Rao-Wu replicate variance under psu=group, and the same claim appears in docs/methodology/REGISTRY.md:L618, docs/methodology/REGISTRY.md:L652, and CHANGELOG.md:L12. That is not true for the shipped helper: compute_replicate_if_variance() forms each replicate as theta_r = sum_i ratio_ir * psi_i, so moving psi_g from all rows in a group to only the post-period cell changes theta_r whenever replicate ratios vary within group. A minimal counterexample from the implemented formula is old psi=[0.5,0.5], new psi=[0,1], replicate ratios [0.5,1.5] / [1.5,0.5], which gives replicate totals [1,1] vs [1.5,0.5]. This silently changes heterogeneity SE/p-value/CI for previously supported replicate-weight fits and is an undocumented methodology deviation, not a mere implementation choice. Concrete fix: either keep the new allocator out of the replicate branch until a replicate-specific derivation/validation exists, or explicitly document the behavioral change and change the implementation/test suite to match that contract; if byte-identity is the intended contract, the replicate path must aggregate at the group/PSU level before applying Rao-Wu reweighting.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • P3 diff_diff/chaisemartin_dhaultfoeuille.py:L5431-L5438 still says “heterogeneity and n_bootstrap > 0 currently require within-group constancy,” and tests/test_survey_dcdh.py:L1074-L1080 still says heterogeneity + within-group-varying PSU raises NotImplementedError. Impact: the internal contract description is now stale and will mislead the next maintainer who touches the gating logic. Concrete fix: update both docstrings/comments so they say only n_bootstrap > 0 + within-group-varying PSU remains gated.

Tech Debt

  • P3 Informational only: the lack of a formal observation-level derivation for the post-period cell allocator is already tracked in TODO.md:L60 and documented in docs/methodology/REGISTRY.md:L652. No action required for this PR unless you decide to widen the TODO to cover the replicate-weight branch explicitly.

Security

  • No findings.

Documentation/Tests

  • P2 The changed tests in tests/test_survey_dcdh.py:L1141-L1278 and tests/test_dcdh_heterogeneity_cell_period_coverage.py:L91-L150 cover only the analytical TSL path and Binder byte-identity. They do not cover the newly-unblocked replicate-weight heterogeneity path where the P1 lives. Impact: the current suite would not catch the replicate non-invariance described above. Concrete fix: add a targeted regression with replicate_weights= that vary within group and heterogeneity= under psu=group or within-group-varying PSU, and assert the intended documented behavior.
  • Validation note: this review was source-based; I could not run pytest here because the sandbox Python lacks the project’s NumPy/Pandas dependencies.

Path to Approval

  1. Decide the contract for replicate-weight heterogeneity under the new allocator. Either re-gate/retain the legacy group-level expansion for replicate designs for now, or explicitly register the new replicate behavior as a methodology deviation instead of claiming byte-identity under psu=group.
  2. Add a regression test for heterogeneity= + replicate_weights= where replicate ratios vary within group. A helper-level test against compute_replicate_if_variance() is acceptable; an end-to-end fit() test is better.
  3. Update the stale internal docstrings/comments so the codebase has one consistent statement of what is and is not still gated.

The first revision routed BOTH Binder TSL and Rao-Wu replicate variance
through the new cell-period psi_obs allocator and claimed byte-identity
under PSU=group. That claim only holds for Binder TSL, which aggregates
psi to PSU before forming the variance quadratic. Rao-Wu (via
compute_replicate_if_variance) computes

    theta_r = sum_i ratio_ir * psi_i

at the observation level, so when a replicate column's ratios vary
within a group (common — the library accepts per-row replicate matrices,
not just PSU-aligned ones), moving psi_g mass from the full group onto
the single post-period cell silently changes theta_r. Counterexample:
psi_legacy=[0.5, 0.5], psi_new=[0, 1], ratios=[0.5, 1.5] gives
theta_legacy=1, theta_new=1.5.

Fix: split the observation-level allocator inside
_compute_heterogeneity_test by variance helper. Binder TSL keeps the
cell-period single-cell allocator (the DID_l post-period convention,
needed for within-group-varying PSU). Rao-Wu replicate reverts to the
legacy group-level allocator psi_i = psi_g * (w_i / W_g), preserving
byte-identity of the replicate SE for every previously-supported fit.
Replicate + within-group-varying PSU is unreachable by construction
(SurveyDesign rejects replicate_weights combined with explicit
strata/psu/fpc).

Add a helper-level regression test
(test_replicate_variance_non_invariance_under_varying_ratios) that
reproduces the reviewer's counterexample: constructs legacy and
cell-period psi_obs on a fixture with within-group-varying replicate
ratios, pushes both through compute_replicate_if_variance, and asserts
the two variances differ materially — so any future refactor that
silently reroutes the replicate branch through the cell-period allocator
will fail this test.

Sync the updated contract across surfaces: _compute_heterogeneity_test
docstring now enumerates both allocators and the justification; the
REGISTRY.md heterogeneity Note and Survey IF expansion Note describe
the split; CHANGELOG frames the new support as "under Binder TSL"
rather than universal.

Also fix P3 stale docstrings/comments flagged by the CI reviewer:
_strata_psu_vary_within_group and TestSurveyWithinGroupValidation no
longer claim heterogeneity still requires within-group constancy.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 19, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6205b15c7aa941c143a26e0cf57d61a59f5e8b31


Overall Assessment

Looks good — no unmitigated P0 or P1 findings in the reviewed diff.

Executive Summary

Methodology

  • Severity: P3 informational. Impact: Web Appendix §1.5 defines the heterogeneity target and estimator as the coefficient on X_g in an OLS regression of S_g(Y_{g,F_g-1+\ell} - Y_{g,F_g-1}) on X_g and saturated F_g × D_{g,1} × S_g indicators, with Lemma 7 proving unbiasedness under Assumptions 1, 3, and 15. The PR preserves that paper-level object and limits its change to the survey-design IF allocation/variance layer described in diff_diff/chaisemartin_dhaultfoeuille.py:3683 and docs/methodology/REGISTRY.md:618. Concrete fix: None. (nber.org)
  • Severity: P3 informational. Impact: The previous undocumented replicate-variance mismatch is addressed: the code now explicitly keeps the legacy group allocator on the replicate branch, matching the updated registry note and eliminating the prior silent Rao-Wu behavior change. Concrete fix: None.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3 informational. Impact: The open question of a formal observation-level derivation for the survey cell-period allocator remains tracked in TODO.md:60 and documented in docs/methodology/REGISTRY.md:652. Per the review policy, that tracked limitation is mitigated and not a blocker. Concrete fix: None.

Security

  • No findings.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026
@igerber igerber merged commit ac181b7 into main Apr 19, 2026
23 of 24 checks passed
@igerber igerber deleted the feature/dcdh-heterogeneity-cell-period branch April 19, 2026 12:37
igerber added a commit that referenced this pull request Apr 19, 2026
Round-6's constant-PSU fallback in `_survey_se_from_group_if`
silently disabled the cell-period allocator for replicate-weight
designs, because replicate designs have `resolved.psu is None` and
the fallback routes `psu_arr is None` to the legacy group-level
path. That's an allocator change on a Class A surface (overall
DID_M, joiners, leavers, dynamic placebos) under per-row-varying
replicate ratios, where cell and legacy allocators produce
materially different Rao-Wu variances (same non-invariance PR #329
established for heterogeneity).

Fix: restrict the round-6 fallback to the TSL branch by gating on
`not resolved.uses_replicate_variance`. Replicate designs retain
the cell-period allocator (the PR #323 Class A contract), and the
sentinel-mass guard still fires on mass leakage when it applies.

Regression: `TestReplicateClassA::test_att_cell_allocator_with_varying_replicate_ratios`
constructs legacy and cell-level psi_obs on the fitted replicate
design, feeds both through `compute_replicate_if_variance`, and
asserts they produce materially different variances — locking the
allocator contract so a future refactor that switches Class A to
the legacy allocator would be detectable. Mirrors the heterogeneity
non-invariance guard from PR #329's CI review round.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 20, 2026
…-failures audit

Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per
project SemVer convention, minor bumps are reserved for new estimators or new
module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults
(PR #318) add a new public API surface and drive this bump.

Headline work:
- PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner-
  ready output layer. Plain-English narrative summaries across all 16 result types,
  with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md.
- PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear
  regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the
  upcoming HeterogeneousAdoptionDiD estimator.
- PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A
  contract), heterogeneity + within-group-varying PSU under Binder TSL, and
  PSU-level Hall-Mammen wild bootstrap at cell granularity.
- PR #333 performance review - docs/performance-scenarios.md documents 5-7
  realistic practitioner workflows; benchmark harness extended.

Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339)
continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J.

CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default
CI after false-positive flakes; perf-review harness is the principled replacement.

Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml,
diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released:
2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the
comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers;
the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in
a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant