Skip to content

Add cell-period IF allocator for dCDH survey variance#323

Merged
igerber merged 6 commits intomainfrom
feature/dcdh-cell-period-if
Apr 19, 2026
Merged

Add cell-period IF allocator for dCDH survey variance#323
igerber merged 6 commits intomainfrom
feature/dcdh-cell-period-if

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 18, 2026

Summary

  • Extends dCDH analytical survey variance with a cell-period IF allocator that decomposes the per-group influence function U[g] into per-(g, t)-cell attributions and expands to observation level as psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i}).
  • Relaxes the within-group constancy validator to within-cell constancy (trivially satisfied in one-obs-per-cell panels). Designs with strata or PSU that vary across cells of a group are now accepted by the analytical TSL path.
  • Adds two NotImplementedError gates for out-of-scope combinations: heterogeneity= with within-group-varying PSU/strata (follow-up PR will extend the WLS psi_obs path), and n_bootstrap > 0 with within-group-varying PSU (follow-up PR will extend the PSU-level wild bootstrap map).
  • Adds a Monte Carlo coverage simulation (tests/test_dcdh_cell_period_coverage.py, @pytest.mark.slow) on a DGP with PSU varying across the cells of each group. Empirical 95% coverage hits 0.956 (±2.5pp band).
  • Adds a deterministic row-sum invariant test on a hand-computed 4-group × 3-period panel verifying: closed-form joiner/stable_0 attribution, U_per_period.sum(axis=1) == U, post-period attribution placement, and cohort-centering preservation.
  • Inverts TestSurveyWithinGroupValidation rejection tests into acceptance tests under the cell allocator; adds positive-rule tests for within-cell strata / PSU variation rejection and gate tests for the two NotImplementedError combos.
  • Documents the new contract in docs/methodology/REGISTRY.md: post-period attribution is framed as a library convention (adopted because it preserves the group/PSU-sum identities of the prior group-level expansion and produces approximately nominal MC coverage on the test DGP) rather than a theorem derived from observation-level survey linearization. A TODO.md Methodology/Correctness entry tracks the open validation question.
  • Under within-group-constant PSU (the pre-allocator accepted input), per-cell sums telescope to U_centered[g], so PSU-level Binder aggregation is byte-identical to the prior group-level expansion up to single-ULP floating-point noise. All existing survey, core, methodology, parity, and HonestDiD tests pass without modification.

Methodology references (required if estimator / math changes)

  • Method name(s): ChaisemartinDHaultfoeuille (dCDH, reversible-treatment DID)
  • Paper / source link(s):
    • de Chaisemartin & D'Haultfœuille (2020), AER 110(9), https://doi.org/10.1257/aer.20181169
    • de Chaisemartin & D'Haultfœuille (2022, rev. 2024), NBER WP 29873 — Web Appendix Section 3.7.3 (cohort-recentered plug-in variance)
  • Any intentional deviations from the source (and why): The paper's plug-in variance assumes iid sampling; the library's survey IF expansion (proportional per-group / per-cell weight allocation) is not in the papers. The cell-period allocator extends the pre-existing library convention. Post-period attribution is documented as a convention rather than a derived result — see docs/methodology/REGISTRY.md Note "Survey IF expansion — library convention" and the TODO.md entry tracking the open derivation question.

Validation

  • Tests added/updated:
    • tests/test_survey_dcdh.py — 7 new/inverted tests in TestSurveyWithinGroupValidation (acceptance + within-cell rejection + two NotImplementedError gates + row-sum invariant on a hand-computed panel)
    • tests/test_dcdh_cell_period_coverage.py — new file, 1 test, @pytest.mark.slow, 500 MC replications at 95.6% empirical coverage
    • tests/test_methodology_chaisemartin_dhaultfoeuille.py — unpack the new tuple return in test_cohort_recentering_not_grand_mean
  • Backtest / simulation / notebook evidence (if applicable): MC coverage simulation is the primary empirical validation (tests/test_dcdh_cell_period_coverage.py); all 333 non-slow tests and the slow coverage test pass.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber and others added 4 commits April 18, 2026 17:48
Plumbs per-(g,t)-cell contributions through the dCDH influence-function
functions without changing behavior. Extends _obs_survey_info with
time_ids and periods, modifies _compute_full_per_group_contributions,
_compute_per_group_if_multi_horizon, and _compute_per_group_if_placebo_horizon
to return per-period attribution matrices alongside the per-group IFs,
adds _cohort_recenter_per_period for column-wise centering, and wires the
2D attributions through _compute_cohort_recentered_inputs into
_compute_se / _survey_se_from_group_if.

The survey expansion now uses the cell-period allocator
psi_i = U_centered_per_period[g_i, t_i] * (w_i / W_{g_i, t_i}) when
per-period attribution is supplied. Under within-group-constant PSU
(the current validator's accepted input), per-cell sums telescope to
U_centered[g], so PSU-level Binder aggregation produces byte-identical
variance (differences only at FP single-ULP level from summation order).

The within-group-constant validator remains in place; it lifts in
Stage 2 of PR 2. Tests in tests/test_survey_dcdh.py,
tests/test_survey_dcdh_replicate_psu.py,
tests/test_chaisemartin_dhaultfoeuille.py, and tests/test_honest_did.py
pass unchanged (318/318).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces _validate_group_constant_strata_psu with a relaxed
_validate_cell_constant_strata_psu (within-cell constancy, trivially
satisfied in one-obs-per-cell panels). Adds a diagnostic helper
_strata_psu_vary_within_group and two NotImplementedError gates at the
validator call site: heterogeneity= and n_bootstrap > 0 both still use
the legacy group-level IF expansion / PSU map and therefore reject
designs whose PSU varies within group (PR 3 and PR 4 extend them).

Updates REGISTRY.md cluster, heterogeneity, survey IF expansion, and
survey+bootstrap notes to describe the cell-period allocator, the
within-cell constancy contract, and the two scope gates. Updates the
class fit() docstring survey_design section to match. Updates stale
"within-group-constant" comments in the bootstrap path since the
constancy is now enforced by the gate rather than the upstream
validator.

Inverts the TestSurveyWithinGroupValidation rejection tests to
acceptance tests under the cell allocator. Adds positive-rule tests
for within-cell strata and PSU variation (rejected) and gate tests for
heterogeneity and bootstrap (raise NotImplementedError).

Adds tests/test_dcdh_cell_period_coverage.py: Monte Carlo coverage
simulation at 500 reps on a panel with PSU varying across cells of
each group (two PSUs per group on even/odd periods). Empirical 95%
coverage hits 0.956 on the seeded draw — inside the ±2.5pp tolerance
(~2 MC-sigma). Marked @pytest.mark.slow and excluded by default.

All 322 tests pass on the affected surface (test_survey_dcdh,
test_survey_dcdh_replicate_psu, test_chaisemartin_dhaultfoeuille,
test_honest_did).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The helper now returns (U, U_per_period). The cohort-recentering
methodology test at test_cohort_recentering_not_grand_mean only needs
the per-group vector U, so destructure and discard the per-period
matrix. Matches the return-shape change landed in Stage 1 of the
cell-period IF allocator work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a row-sum identity test for the cell-period allocator on a
hand-computed 4-group x 3-period panel, verifying: (1) per-group IF
matches the closed-form joiner/stable_0 decomposition; (2) per-period
attribution sums across time to the per-group IF exactly; (3) post-
period attribution places all mass at t=2 with zeros at earlier
columns; (4) column-wise cohort centering preserves the row-sum
identity.

REGISTRY.md frames the post-period attribution as a library
convention — adopted because it preserves the group/PSU-sum
identities of the prior group-level expansion and produces
approximately nominal MC coverage on the test DGP — rather than a
theorem derived from the observation-level survey linearization.
Inline comments in _compute_full_per_group_contributions,
_compute_per_group_if_multi_horizon, and
_compute_per_group_if_placebo_horizon mirror that framing.

Updates stale local type annotations for multi_horizon_if /
placebo_horizon_if to the Tuple[np.ndarray, np.ndarray] returns.

TODO.md gains a Methodology/Correctness entry tracking the open
validation question (formal derivation against observation-level
survey linearization, or replacement with a covariance-aware
alternative).

All 348 tests pass (slow MC coverage sim included).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes

Executive Summary

  • The PR affects ChaisemartinDHaultfoeuille survey inference for DID_M, DID_+, DID_-, event-study horizons, and placebo horizons by introducing a cell-period IF allocator.
  • The core methodology deviation from the papers is documented in the registry and tracked in TODO.md, so I am treating that part as informational, not a blocker.
  • There is one unmitigated P1: the newly advertised support for within-group-varying strata is incomplete when survey_design.psu is omitted, because the auto-injected PSU path still fails before the cell allocator runs.
  • There is also a non-blocking performance regression: dense per-period IF tensors are now built on all paths, including non-survey fits that never consume them.
  • Tests add useful invariant and coverage checks, but they do not cover the omitted-psu auto-inject path where the P1 issue appears.

Methodology

  • Severity P1. Impact: the new public contract says that, with no explicit psu, fit() auto-injects psu=<group_col> as a safe default while also supporting strata that vary across cells of a group, but the auto-injected path preserves nest=False, so a group that spans multiple strata still trips the survey resolver’s cross-stratum PSU check before the new allocator can run. See diff_diff/chaisemartin_dhaultfoeuille.py:648, diff_diff/chaisemartin_dhaultfoeuille.py:753, and diff_diff/survey.py:313. Concrete fix: either synthesize a stratum-nested injected PSU when auto-injecting under within-group-varying strata, or narrow the documented contract and raise a targeted pre-resolution error telling users that this case requires explicit psu plus nest=True.
  • Severity P3. Impact: the cell-period/post-period allocation itself is a documented library convention, not a paper-derived survey linearization, and the open derivation question is already tracked. I am not counting that against approval. See docs/methodology/REGISTRY.md:652 and TODO.md:60. Concrete fix: none required for approval.

Code Quality

  • No material findings beyond the contract issue above.

Performance

Maintainability

Tech Debt

  • Severity P3. Impact: the main methodological limitation introduced here is already tracked under the repo’s deferred-work policy, so it is not a blocker. See TODO.md:60. Concrete fix: none required in this PR.

Security

  • No findings.

Documentation/Tests

  • Severity P2. Impact: the new varying-strata acceptance coverage only exercises the explicit psu="group", nest=True path; it does not cover the omitted-psu auto-inject path that the docs now describe as a safe default, which is why the P1 issue is currently unguarded. See tests/test_survey_dcdh.py:1120 and diff_diff/chaisemartin_dhaultfoeuille.py:648. Concrete fix: add a regression test with SurveyDesign(weights="pw", strata="stratum") on a panel where a group spans multiple strata, and assert the intended supported or documented behavior.
  • I could not execute the tests locally in this review environment because the available python is missing numpy.

Path to Approval

  1. Resolve the omitted-psu / varying-strata interaction: either make auto-injection synthesize a stratum-nested PSU when groups span multiple strata, or explicitly narrow the public contract and raise a targeted error before survey resolution.
  2. Add a regression test for SurveyDesign(weights="pw", strata="stratum") with within-group-varying strata and no explicit psu, so the intended behavior is enforced.

Public contract fix: when `survey_design.psu` is omitted, fit()
auto-injects psu=<group_col> with nest=False. If strata vary within
group, the synthesized PSU column reuses group labels across strata
and downstream survey resolution fails on the cross-stratum PSU
uniqueness check. fit() now detects that combination before auto-
inject runs (via the existing _strata_psu_vary_within_group helper)
and raises a targeted ValueError pointing users at the explicit
`psu=<col>, nest=True` path.

Performance fix: the cell-period allocator's per-(g,t) tensor
(U_per_period) was being allocated unconditionally in
_compute_full_per_group_contributions,
_compute_per_group_if_multi_horizon,
_compute_per_group_if_placebo_horizon, and their
_compute_cohort_recentered_inputs aggregator. That added O(G*T)
memory to the main ATT path and O(G*T*L_max) to the event-study /
placebo paths on every fit — including non-survey fits that never
consume the tensor. Adds a `compute_per_period: bool = True` param
through each helper, gates allocation + accumulation behind the
flag, and threads `compute_per_period=(_obs_survey_info is not None)`
from fit(). Downstream consumers now None-guard the per-period
recentering / _compute_se calls so the plug-in SE path runs with no
cell-level work.

Adds `test_auto_inject_with_varying_strata_raises` under
`TestSurveyWithinGroupValidation` covering the targeted error path.
Narrows the REGISTRY.md survey IF expansion Note to state the
nest=True requirement under varying strata. Softens two stale
comments (`_cohort_recenter_per_period` docstring,
`_compute_cohort_recentered_inputs` inline comment) so they match
the convention framing already in REGISTRY.md and TODO.md rather
than claiming methodological correctness.

All 352 tests pass (slow MC coverage sim included).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 18, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: dedc2a6e38ca50aefeaf2411fdddf7d91c7c8938


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The prior review’s main issue is only partially resolved: the PR now turns the default no-psu + varying-strata failure into a targeted error, but the fix overcorrects.
  • Severity P1 [Newly identified]: the new pre-auto-inject guard rejects SurveyDesign(..., strata=..., nest=True) with no explicit psu, even though the auto-inject path already preserves nest=True and SurveyDesign.resolve() can make the synthesized psu=group unique within strata.
  • The cell-period allocator itself remains a documented library convention rather than a paper-derived survey linearization; that is properly noted in the Methodology Registry and tracked in TODO.md, so it is informational only.
  • The earlier non-blocking performance concern looks resolved: dense per-period IF tensors are now only built when survey inference is active.
  • I could not run the tests locally in this environment because numpy is not installed, so this review is diff-based.

Methodology

  • Severity P1 [Newly identified]. Impact: the new auto-inject guard hard-errors on a valid nested-strata design. The comment says the failure mode is specifically the nest=False resolver path, but the implementation raises whenever strata vary within group, without checking nest. That blocks SurveyDesign(weights="pw", strata="stratum", nest=True) even though the auto-injected design preserves nest=True and the resolver explicitly supports making (stratum, psu) unique within strata. Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L729-L759, diff_diff/chaisemartin_dhaultfoeuille.py:L785-L795, diff_diff/survey.py:L299-L303, docs/methodology/REGISTRY.md:L652-L652. Concrete fix: only raise on the no-psu varying-strata path when the synthesized psu=<group> would be resolved with nest=False; allow the nest=True case to proceed and update the docs accordingly.

Code Quality

No new findings.

Performance

No new findings. The previous dense-allocation concern appears addressed by the new compute_per_period=_obs_survey_info is not None gating.

Maintainability

No new findings. The new comments are now aligned with the Registry/TODO wording and no longer overclaim a formal derivation.

Tech Debt

  • Severity P3. Impact: the open derivation question for the cell-period/post-period survey IF allocator is explicitly documented and tracked, so it is not a blocker under the project’s deferred-work policy. Locations: docs/methodology/REGISTRY.md:L652-L653, TODO.md:L59-L60. Concrete fix: none required for approval.

Security

No findings.

Documentation/Tests

  • Severity P2. Impact: the new tests only cover the unsupported default nest=False auto-inject path, so the over-broad rejection of the still-valid nest=True path is unguarded. Location: tests/test_survey_dcdh.py:L1175-L1192. Concrete fix: add a regression test using SurveyDesign(weights="pw", strata="stratum", nest=True) with no explicit psu, and assert it succeeds and matches the explicit psu="group", nest=True path on SE / df_survey.

Local execution note: I could not run the targeted pytest files here because the environment is missing numpy.

Path to Approval

  1. Narrow the new pre-auto-inject ValueError so it only fires for the actually unsupported case: varying strata with no explicit psu and nest=False.
  2. Add a regression test for no-psu, varying-strata, nest=True, and compare it against explicit SurveyDesign(weights="pw", strata="stratum", psu="group", nest=True).
  3. Update the fit() docstring and Methodology Registry note so they distinguish the unsupported nest=False auto-inject path from the valid nest=True auto-inject path.

The previous round's guard fired on any varying-strata + omitted-psu
combination, which rejected `SurveyDesign(weights, strata, nest=True)`
unnecessarily. `SurveyDesign.resolve()` at `diff_diff/survey.py:299-302`
combines `(stratum, psu)` into globally-unique labels under nest=True,
so the auto-injected `psu=<group>` is re-labeled per stratum and the
cross-stratum uniqueness check passes. Only the `nest=False` default
path actually needs the up-front guard.

Narrows the guard to `not getattr(survey_design, "nest", False)` and
updates the error message to enumerate three actionable remediations
(constant-within-group strata, or `nest=True`, or explicit `psu`).

Adds `test_auto_inject_with_varying_strata_nest_true_succeeds` under
`TestSurveyWithinGroupValidation` covering the newly-accepted path:
byte-for-byte match against explicit `SurveyDesign(weights, strata,
psu="group", nest=True)` on `overall_se` and `survey_metadata.df_survey`.
The default `nest=False` still-raises regression
(`test_auto_inject_with_varying_strata_raises`) remains unchanged.

Updates fit() docstring and the REGISTRY.md survey IF expansion Note
to enumerate the three supported auto-inject paths: (1) strata constant
within group, (2) strata vary + nest=True, (3) strata vary + nest=False
(rejected with targeted ValueError).

All 338 tests pass (affected surface + slow MC coverage sim).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 18, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: ca7b6bfa5d0e321f3a2fb60fb3340148a44b3d7f


Overall Assessment

✅ Looks good

Executive Summary

  • The prior re-review blocker on omitted psu + varying strata + nest=True appears resolved in code and tests: the pre-auto-inject guard now only rejects the nest=False path, and the new regression covers the valid nest=True path at diff_diff/chaisemartin_dhaultfoeuille.py:L750-L810 and tests/test_survey_dcdh.py:L1175-L1213.
  • I did not find new unmitigated P0/P1 issues in the changed estimator/variance paths. The new per-period IF tensor is threaded through overall, joiner/leaver, event-study, and placebo SE computation, and inference still routes through safe_inference().
  • The cell-period/post-period survey IF allocator remains a library convention rather than a paper-derived observation-level linearization, but that deviation is explicitly documented in the Methodology Registry and tracked in TODO.md, so it is informational only under the project policy at docs/methodology/REGISTRY.md:L652-L653 and TODO.md:L60.
  • Local execution note: this is a diff-based re-review. I could not run pytest here because the environment is missing numpy and pytest.

Methodology

Code Quality

  • No findings.

Performance

  • No findings. The dense per-period attribution tensors are now only requested when survey inference is active, which resolves the prior always-allocate concern visible in the changed call sites.

Maintainability

  • No findings. The fit docstring, in-code comments, tests, and Registry note now describe the same allocator contract and follow-up PR boundaries.

Tech Debt

  • Severity P3. Impact: the survey cell-period allocator’s post-period attribution is still a library convention, not a paper-derived observation-level linearization, but that limitation is explicitly tracked and therefore non-blocking under the project’s deferred-work policy at docs/methodology/REGISTRY.md:L652-L653 and TODO.md:L60. Concrete fix: none required for approval.

Security

  • No findings.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 18, 2026
@igerber igerber merged commit 5798d09 into main Apr 19, 2026
23 of 24 checks passed
@igerber igerber deleted the feature/dcdh-cell-period-if branch April 19, 2026 00:34
igerber added a commit that referenced this pull request Apr 19, 2026
Lifts the NotImplementedError gate for heterogeneity= + within-group-
varying PSU/strata under survey designs. The heterogeneity WLS
coefficient IF psi_g is now attributed in full to the (g, out_idx)
post-period cell and expanded to observation level as
psi_i = psi_g * (w_i / W_{g, out_idx}) — the DID_l single-cell
convention shipped in PR #323. Under PSU=group the per-obs
distribution differs from the legacy psi_i = psi_g * (w_i / W_g)
expansion, but the PSU-level aggregate telescopes to psi_g in both
paths, so Binder TSL variance and Rao-Wu replicate variance are
byte-identical under PSU=group; under within-group-varying PSU,
mass lands in the post-period PSU of the transition.

Tests: flipped the gating test to assert all five inference fields
finite; added PSU-level byte-identity unit test constructing both
psi_obs arrays and asserting compute_survey_if_variance agreement
within ULP; added nest=True + varying-strata + heterogeneity
smoke test (newly-unblocked regime); added multi-horizon smoke test;
added slow-tier MC null-coverage test (500 reps, within-group-
varying PSU, empirical 95% coverage inside [0.925, 0.975]).

n_bootstrap > 0 + within-group-varying PSU remains gated (follow-up
PR). Updated REGISTRY.md heterogeneity Note + Survey IF expansion
Note scope-limitations paragraph; updated _compute_heterogeneity_test
docstring + the stale legacy-allocator comment in
_survey_se_from_group_if; added CHANGELOG Changed entry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 19, 2026
…entinel cells

**P1 (methodology):** Under terminal missingness,
`_cohort_recenter_per_period` subtracts cohort column means across
the full period grid, so a group with no observation at period t
acquires non-zero centered mass at that cell. The PR-4 cell-level
bootstrap builds `psu_codes_per_cell` with -1 sentinels for such
cells and `_unroll_target_to_cells` drops them — silently losing
that centered mass. Under within-group-varying PSU + terminal
missingness, this would under-cluster the bootstrap SE/CI/p-values.

Conservative guard: `_unroll_target_to_cells` now raises
`ValueError` when any sentinel cell (-1 PSU) carries non-zero
cohort-recentered IF mass (|u| > 1e-12). The error message points
users to `n_bootstrap=0` for analytical TSL on such panels. The
analytical path has the same mass-leakage behavior under this
regime but was shipped in PR #323; documenting the bootstrap-
specific guard here avoids advertising a broken combination.

Regression test: `test_bootstrap_cell_level_raises_on_sentinel_mass_leak`
constructs a per-cell IF tensor with non-zero mass at a -1-PSU cell
and asserts `_compute_dcdh_bootstrap` raises with the documented
error message.

**P2 (tests):** The slow MC bootstrap coverage test previously ran
at `L_max=1`, which collapsed the multi-horizon block to a single
target and never exercised the cross-horizon shared-weight path
described in its own docstring. Bumped to `L_max=2` so the shared
(n_bootstrap, n_psu) PSU-level weight matrix is drawn once and
broadcast across horizons via each horizon's cell-to-PSU map.
Added three assertions:

- Horizon-1 bootstrap CI coverage in [0.925, 0.975].
- Horizon-2 bootstrap CI coverage in [0.910, 0.975]. Tolerance is
  wider than h-1 because finite-sample analytical TSL coverage on
  this DGP is itself ~0.93 at l=2 (measured offline: analytical
  h-1 = 0.94, h-2 = 0.926 at n_groups=40). An observed bootstrap
  coverage within 1pp of the analytical baseline is consistent
  with correct clustering; a drop to ≤ 0.90 would indicate a
  real shared-weight broadcast regression.
- `cband_crit_value` finite in ≥ 90% of reps — validates that the
  shared (n_bootstrap, n_psu) weight matrix produces a coherent
  joint distribution across horizons (required for a valid sup-t
  simultaneous band).

Bumped n_bootstrap to 1000 (from 500) to keep internal bootstrap
MC noise below ~0.3pp per CI endpoint at horizon-2's slightly
wider percentile-CI spread.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 19, 2026
Addresses the P0 escalation: recommending `n_bootstrap=0` as a
workaround for terminal missingness + within-group-varying PSU was
incorrect — the analytical TSL path uses the SAME cell-period
allocator and has the same silent mass-drop bug.

**Analytical guard parity.** `_survey_se_from_group_if` now computes
`W_cell` (per-(g,t) weight totals) and, before the cell-to-obs
expansion, checks whether any cell has `W_cell == 0` while the
corresponding cohort-recentered IF mass is non-zero. If so, raises
a targeted `ValueError` mirroring the bootstrap-side
`_unroll_target_to_cells` guard. The message text is aligned
across the two paths (same "no positive-weight observations"
phrasing) so the regression test matches both.

**Docs cleanup.** Removed the "use n_bootstrap=0 as workaround"
language from REGISTRY, CHANGELOG, and the fit() docstring. Replaced
with the correct workaround: pre-process the panel (drop late-exit
groups / trim to a balanced sub-panel), or use an explicit
`psu=<group_col>` so the dispatcher routes through the legacy
group-level path (which does not use the cell-period allocator and
is not affected by the mass-leak).

**Regression test update.** The end-to-end fit() regression now
asserts `ValueError` on BOTH `n_bootstrap=0` and `n_bootstrap > 0`
under the terminally-missing + within-group-varying PSU fixture.

This is technically a behavior change for panels previously covered
silently by PR #323's cell-period analytical allocator — those
panels used to produce finite (but silently mass-dropped) SEs and
now raise. The change closes a real silent-correctness bug; the
analytical path never had a principled treatment for the leaked
mass in the first place.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 19, 2026
The round-5 analytical guard in _survey_se_from_group_if was over-
scoped: it fired on any panel with leaked centered mass, including
PSU-within-group-constant regimes (PSU=group auto-inject, strictly-
coarser-PSU-within-group-constant). But the docs said those regimes
are the intended workaround. Panels with terminal missingness and
PSU=group are the documented supported path and must not raise.

Fix: add an analytical dispatcher inside `_survey_se_from_group_if`
mirroring the bootstrap's `_psu_varies_within_group` routing. When
PSU is within-group-constant on the eligible subset, fall back to
the legacy group-level allocator (which uses U_centered[g] directly
via the row-sum identity and does not trigger the sentinel-mass
guard). Only when PSU actually varies within group does the cell
allocator run — and only then can the sentinel-mass guard fire.

Byte-identity: under PSU=group the row-sum identity makes the cell
and group allocators statistically equivalent, but the legacy
branch was the one in use before PR #323 introduced the cell
allocator. Rerouting to it under within-group-constant regimes is
a regression-free fallback (it's what PR #323 aimed to preserve
via byte-identity in the first place).

Test coverage: added `test_fit_succeeds_on_terminal_missingness_with_psu_group`
— same fixture as the varying-PSU failure regression but with
auto-inject `psu=<group>`, asserts both `n_bootstrap=0` and
`n_bootstrap > 0` succeed with finite SE.

Also updated the cell-level bootstrap `ValueError` text (and the
_unroll_target_to_cells docstring) to no longer advertise
`n_bootstrap=0` as a workaround — both paths now fail consistently
on the varying-PSU carve-out, and the correct workaround is
pre-processing or using an explicit `psu=<group_col>`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 19, 2026
Round-6's constant-PSU fallback in `_survey_se_from_group_if`
silently disabled the cell-period allocator for replicate-weight
designs, because replicate designs have `resolved.psu is None` and
the fallback routes `psu_arr is None` to the legacy group-level
path. That's an allocator change on a Class A surface (overall
DID_M, joiners, leavers, dynamic placebos) under per-row-varying
replicate ratios, where cell and legacy allocators produce
materially different Rao-Wu variances (same non-invariance PR #329
established for heterogeneity).

Fix: restrict the round-6 fallback to the TSL branch by gating on
`not resolved.uses_replicate_variance`. Replicate designs retain
the cell-period allocator (the PR #323 Class A contract), and the
sentinel-mass guard still fires on mass leakage when it applies.

Regression: `TestReplicateClassA::test_att_cell_allocator_with_varying_replicate_ratios`
constructs legacy and cell-level psi_obs on the fitted replicate
design, feeds both through `compute_replicate_if_variance`, and
asserts they produce materially different variances — locking the
allocator contract so a future refactor that switches Class A to
the legacy allocator would be detectable. Mirrors the heterogeneity
non-invariance guard from PR #329's CI review round.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 19, 2026
…ss A regression

**P1 (docs scope carve-out missing for replicate ATT):** the
sentinel-mass `ValueError` fires BEFORE the replicate-vs-TSL
dispatch inside `_survey_se_from_group_if`, so replicate-weight ATT
fits (which always use the cell-period allocator per the PR #323
Class A contract) also raise on terminal missingness. The docs
previously scoped the carve-out to "within-group-varying PSU /
analytical TSL / bootstrap" and silently omitted the replicate
path. Rewrite the scope note in REGISTRY.md (both the balanced-
baseline Note and the Survey + bootstrap contract Note), CHANGELOG,
and the fit() docstring to list all three affected paths explicitly:
Binder TSL with within-group-varying PSU, Rao-Wu replicate ATT,
and the cell-level wild PSU bootstrap. Clarify that the
`psu=<group_col>` workaround applies ONLY to Binder TSL — replicate
ATT and the varying-PSU bootstrap have no allocator fallback; the
only workaround for those paths is pre-processing the panel to
remove terminal missingness.

**P2 (synthetic allocator test didn't pin fitted SE):** the
previous `test_att_cell_allocator_with_varying_replicate_ratios`
showed that synthetic cell and legacy allocators differ, but never
tied the difference back to the fit's `overall_se`. A refactor
switching Class A replicate to the legacy allocator could have
slipped through unnoticed. Rewrite using a pytest monkeypatch spy
on `compute_replicate_if_variance` to capture the psi_obs the fit
actually passes, then reconstruct the legacy-equivalent psi_obs via
the row-sum identity on the SAME captured `U_centered` and verify:

- fit's `overall_se**2` equals the recomputed variance on the
  captured cell-allocator psi_obs (sanity / dispatch alignment);
- the legacy-reconstructed psi_obs produces a materially different
  Rao-Wu variance on the same fixture — so a silent Class A
  allocator regression would flip overall_se and fail this guard.

No code behavior changed in round-8 beyond the doc updates — only
the test and documentation now lock the contract explicitly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 20, 2026
…-failures audit

Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per
project SemVer convention, minor bumps are reserved for new estimators or new
module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults
(PR #318) add a new public API surface and drive this bump.

Headline work:
- PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner-
  ready output layer. Plain-English narrative summaries across all 16 result types,
  with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md.
- PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear
  regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the
  upcoming HeterogeneousAdoptionDiD estimator.
- PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A
  contract), heterogeneity + within-group-varying PSU under Binder TSL, and
  PSU-level Hall-Mammen wild bootstrap at cell granularity.
- PR #333 performance review - docs/performance-scenarios.md documents 5-7
  realistic practitioner workflows; benchmark harness extended.

Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339)
continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J.

CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default
CI after false-positive flakes; perf-review harness is the principled replacement.

Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml,
diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released:
2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the
comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers;
the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in
a follow-up.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant