Extend dCDH heterogeneity SE to cell-period allocator#329
Merged
Conversation
Lifts the NotImplementedError gate for heterogeneity= + within-group-
varying PSU/strata under survey designs. The heterogeneity WLS
coefficient IF psi_g is now attributed in full to the (g, out_idx)
post-period cell and expanded to observation level as
psi_i = psi_g * (w_i / W_{g, out_idx}) — the DID_l single-cell
convention shipped in PR #323. Under PSU=group the per-obs
distribution differs from the legacy psi_i = psi_g * (w_i / W_g)
expansion, but the PSU-level aggregate telescopes to psi_g in both
paths, so Binder TSL variance and Rao-Wu replicate variance are
byte-identical under PSU=group; under within-group-varying PSU,
mass lands in the post-period PSU of the transition.
Tests: flipped the gating test to assert all five inference fields
finite; added PSU-level byte-identity unit test constructing both
psi_obs arrays and asserting compute_survey_if_variance agreement
within ULP; added nest=True + varying-strata + heterogeneity
smoke test (newly-unblocked regime); added multi-horizon smoke test;
added slow-tier MC null-coverage test (500 reps, within-group-
varying PSU, empirical 95% coverage inside [0.925, 0.975]).
n_bootstrap > 0 + within-group-varying PSU remains gated (follow-up
PR). Updated REGISTRY.md heterogeneity Note + Survey IF expansion
Note scope-limitations paragraph; updated _compute_heterogeneity_test
docstring + the stale legacy-allocator comment in
_survey_se_from_group_if; added CHANGELOG Changed entry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
The first revision routed BOTH Binder TSL and Rao-Wu replicate variance
through the new cell-period psi_obs allocator and claimed byte-identity
under PSU=group. That claim only holds for Binder TSL, which aggregates
psi to PSU before forming the variance quadratic. Rao-Wu (via
compute_replicate_if_variance) computes
theta_r = sum_i ratio_ir * psi_i
at the observation level, so when a replicate column's ratios vary
within a group (common — the library accepts per-row replicate matrices,
not just PSU-aligned ones), moving psi_g mass from the full group onto
the single post-period cell silently changes theta_r. Counterexample:
psi_legacy=[0.5, 0.5], psi_new=[0, 1], ratios=[0.5, 1.5] gives
theta_legacy=1, theta_new=1.5.
Fix: split the observation-level allocator inside
_compute_heterogeneity_test by variance helper. Binder TSL keeps the
cell-period single-cell allocator (the DID_l post-period convention,
needed for within-group-varying PSU). Rao-Wu replicate reverts to the
legacy group-level allocator psi_i = psi_g * (w_i / W_g), preserving
byte-identity of the replicate SE for every previously-supported fit.
Replicate + within-group-varying PSU is unreachable by construction
(SurveyDesign rejects replicate_weights combined with explicit
strata/psu/fpc).
Add a helper-level regression test
(test_replicate_variance_non_invariance_under_varying_ratios) that
reproduces the reviewer's counterexample: constructs legacy and
cell-period psi_obs on a fixture with within-group-varying replicate
ratios, pushes both through compute_replicate_if_variance, and asserts
the two variances differ materially — so any future refactor that
silently reroutes the replicate branch through the cell-period allocator
will fail this test.
Sync the updated contract across surfaces: _compute_heterogeneity_test
docstring now enumerates both allocators and the justification; the
REGISTRY.md heterogeneity Note and Survey IF expansion Note describe
the split; CHANGELOG frames the new support as "under Binder TSL"
rather than universal.
Also fix P3 stale docstrings/comments flagged by the CI reviewer:
_strata_psu_vary_within_group and TestSurveyWithinGroupValidation no
longer claim heterogeneity still requires within-group constancy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall AssessmentLooks good — no unmitigated P0 or P1 findings in the reviewed diff. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
igerber
added a commit
that referenced
this pull request
Apr 19, 2026
Round-6's constant-PSU fallback in `_survey_se_from_group_if` silently disabled the cell-period allocator for replicate-weight designs, because replicate designs have `resolved.psu is None` and the fallback routes `psu_arr is None` to the legacy group-level path. That's an allocator change on a Class A surface (overall DID_M, joiners, leavers, dynamic placebos) under per-row-varying replicate ratios, where cell and legacy allocators produce materially different Rao-Wu variances (same non-invariance PR #329 established for heterogeneity). Fix: restrict the round-6 fallback to the TSL branch by gating on `not resolved.uses_replicate_variance`. Replicate designs retain the cell-period allocator (the PR #323 Class A contract), and the sentinel-mass guard still fires on mass leakage when it applies. Regression: `TestReplicateClassA::test_att_cell_allocator_with_varying_replicate_ratios` constructs legacy and cell-level psi_obs on the fitted replicate design, feeds both through `compute_replicate_if_variance`, and asserts they produce materially different variances — locking the allocator contract so a future refactor that switches Class A to the legacy allocator would be detectable. Mirrors the heterogeneity non-invariance guard from PR #329's CI review round. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
Apr 20, 2026
…-failures audit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
NotImplementedErrorgate forheterogeneity=combined with within-group-varying PSU/strata undersurvey_design.ψ_i = ψ_g * (w_i / W_g)with the cell-period post-period single-cell expansionψ_i = ψ_g * (w_i / W_{g, out_idx}), mirroring the DID_l convention shipped in v3.1.x.ψ_g, so Binder TSL and Rao-Wu replicate variance are byte-identical; under within-group-varying PSU, mass lands in the post-period PSU.n_bootstrap > 0+ within-group-varying PSU remains gated (follow-up PR).Methodology references (required if estimator / math changes)
ChaisemartinDHaultfoeuille._compute_heterogeneity_test(Web Appendix Section 1.5, Lemma 7); cell-period survey IF allocator.REGISTRY.mdSurvey IF expansion Note).ψ_gis a library convention (documented inREGISTRY.mdheterogeneity Note + Survey IF expansion Note). Formal observation-level derivation remains open (tracked inTODO.md). Empty post-period cells with zero weight drop the group's contribution (matches ATT cell allocator's empty-cell convention).Validation
tests/test_survey_dcdh.py: flipped gating test (test_heterogeneity_with_varying_psu_succeeds— asserts all 5 inference fields finite); addedTestHeterogeneityCellPeriod::test_psu_level_byte_identity_under_psu_equals_group(math-aligned PSU-sum unit test); addedtest_heterogeneity_auto_inject_with_varying_strata_nest_true_succeeds(newly-unblockednest=Trueregime); addedtest_heterogeneity_multi_horizon_varying_psu_succeeds.tests/test_dcdh_heterogeneity_cell_period_coverage.py: slow-tier 500-rep MC null-coverage test on a DGP with within-group-varying PSU; empirical 95% coverage inside [0.925, 0.975].tests/test_survey_dcdh.py::TestSurveyHeterogeneity(PSU=group numerics) passes byte-for-byte.Security / privacy