Add per-path joint sup-t bands to ChaisemartinDHaultfoeuille.by_path by igerber · Pull Request #374 · igerber/diff-diff

igerber · 2026-04-25T17:46:02Z

Summary

Adds per-path joint sup-t simultaneous confidence bands across horizons 1..L_max within each path when n_bootstrap > 0 is set with by_path=k. A single shared (n_bootstrap, n_eligible) Hall-Mammen multiplier weight matrix is drawn per path and broadcast across all valid horizons; the path-specific critical value c_p = quantile(max_l |t_l|, 1-α) is applied per horizon as cband_conf_int and surfaced at top level as results.path_sup_t_bands[path].
Closes Wave 2 Claude/setup pip install k2 m4j #4 of the by_path follow-up sequence (dCDH: add by_path per-path event-study disaggregation #357 foundation, dCDH by_path R-parity fixtures + TestDCDHDynRParityByPath #360 R-parity, dCDH by_path + n_bootstrap support (library-consistent percentile CI) #364 bootstrap, dCDH by_path + placebo: per-path backward-horizon placebos (Wave 2 #3) #371 placebos).
Bundled sibling-surface pre-audit fix: stale "Phase 2 placeholder" docstring on the existing OVERALL sup_t_bands field replaced with the actual contract description.

Methodology

Closely parallels the OVERALL event_study_sup_t_bands pattern at chaisemartin_dhaultfoeuille_bootstrap.py:599-614, with one documented intentional asymmetry: per-path sup-t draws a fresh shared-weights matrix per path AFTER the per-path SE bootstrap block has already populated results.path_ses via independent per-(path, horizon) draws. Numerator uses fresh shared draws; denominator uses bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical — the fresh draw preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable.

Gates (mirror OVERALL :605,612): a path needs >= 2 valid horizons (finite bootstrap SE > 0) AND >= 50% finite sup-t draws to receive a band; otherwise the path is absent from path_sup_t_bands (NaN-on-invalid absent-key contract).

Empty-state: path_sup_t_bands is None when not requested (no bootstrap or by_path is None); {} when requested but no path passes both gates.

Interpretation: bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths (different inference target, deferred).

Deviation from R

did_multiplegt_dyn provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL event_study_sup_t_bands (also Python-only). Inherits the cross-path cohort-sharing SE deviation from R documented for path_effects (the bootstrap SE used in the t-stat denominator carries the same deviation).

Test plan

pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands -v -m slow — 11 new tests pass
pytest tests/test_chaisemartin_dhaultfoeuille.py -k "ByPath" -m "slow or not slow" — 71 passed (regression on existing per-path tests)
pytest tests/test_chaisemartin_dhaultfoeuille.py -k "sup_t or cband" — OVERALL sup-t bit-identical
Full dCDH suite: pytest tests/test_chaisemartin_dhaultfoeuille.py tests/test_methodology_chaisemartin_dhaultfoeuille.py tests/test_chaisemartin_dhaultfoeuille_parity.py -m "slow or not slow" — 259 passed
black + ruff clean on changed source files
Smoke test on R-parity scenario multi_path_reversible_by_path (n_bootstrap=200, seed=42, L_max=3) shows per-path crits in expected range [1.96, 2.394] (between marginal and Bonferroni)

🤖 Generated with Claude Code

github-actions · 2026-04-25T17:51:10Z

Overall Assessment

⚠️ Needs changes

Executive Summary

Affected method: ChaisemartinDHaultfoeuille Phase 3 by_path multiplier-bootstrap joint sup-t bands over the dynamic DID_l event-study path.
I found no unmitigated P0/P1 methodology mismatch in the new sup-t construction itself. The Python-only extension, the fresh shared-draw asymmetry vs the OVERALL sup-t path, and the inherited deviation from R are explicitly documented in docs/methodology/REGISTRY.md:L643-L643.
P1: on panels where by_path is requested but no path has a complete window, the new surface returns results.path_sup_t_bands=None instead of the documented {} because the bootstrap collector is skipped when path_effects is empty and the results layer only materializes path_sup_t_bands from a non-None crit dict.
The pre-existing empty-window edge case is already covered for path_effects in tests/test_chaisemartin_dhaultfoeuille.py:L4015-L4078, but the new sup-t tests only cover non-empty gate failures such as L_max=1 in tests/test_chaisemartin_dhaultfoeuille.py:L5714-L5775, so this regression is currently untested.
No new inline inference / partial-NaN anti-patterns, control-group logic errors, or security issues stood out in the changed code.

Methodology

P3 Informational: no unmitigated methodology defect found. The PR affects a Python-only joint-band extension on top of dCDH’s by_path event-study surface; the lack of an R analogue and the fresh shared-draw asymmetry vs the OVERALL sup-t implementation are explicitly documented in docs/methodology/REGISTRY.md:L643-L643 and diff_diff/chaisemartin_dhaultfoeuille_results.py:L419-L430. Impact: none under the review rubric. Concrete fix: none required.

Code Quality

P1 Empty-result-set contract regression. fit() only builds path_bootstrap_inputs when len(path_effects) > 0 in diff_diff/chaisemartin_dhaultfoeuille.py:L2662-L2669, the sup-t bootstrap block only initializes path_cband_crit_values when results.path_ses is truthy in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L805-L863, and the result layer maps None to path_sup_t_bands=None in diff_diff/chaisemartin_dhaultfoeuille.py:L3663-L3683. That breaks the documented empty-state contract for requested-but-empty panels in docs/methodology/REGISTRY.md:L643-L643 and diff_diff/chaisemartin_dhaultfoeuille_results.py:L426-L430, and it diverges from the existing empty-path sentinel behavior already enforced for path_effects in tests/test_chaisemartin_dhaultfoeuille.py:L4015-L4078. Impact: downstream callers cannot distinguish “feature not requested” from “requested, but no path window survived,” which is a production-facing edge-case regression on the new public surface. Concrete fix: preserve the requested-but-empty sentinel by initializing path_cband_crit_values and path_cband_n_valid_horizons to {} whenever by_path + n_bootstrap is active, even if path_effects == {}, so results.path_sup_t_bands is {} rather than None.

Performance

No findings in the changed diff.

Maintainability

No findings beyond the contract bug above.

Tech Debt

No mitigating TODO.md entry applies to the P1 above, and this is not deferrable under the review rules.

Security

No findings in the changed diff.

Documentation/Tests

P3 Informational: the new prose hard-codes “Hall-Mammen” for the per-path sup-t draw in CHANGELOG.md:L11-L11, docs/methodology/REGISTRY.md:L643-L643, and tests/test_chaisemartin_dhaultfoeuille.py:L5496-L5500, but the implementation still honors the estimator’s configurable bootstrap_weights via weight_type=self.bootstrap_weights in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L835-L840. Impact: the docs imply a fixed weight family even though rademacher and webb remain valid implementation choices. Concrete fix: rephrase this to “shared multiplier weights using bootstrap_weights” or equivalent.

Path to Approval

Make the new surface honor the documented requested-but-empty sentinel: when by_path and n_bootstrap > 0 are active but no path has a complete window, return results.path_sup_t_bands == {} rather than None.
Add a regression test on the existing empty-window fixture with bootstrap enabled, asserting results.path_effects == {}, results.path_sup_t_bands == {}, and that no per-horizon cband_conf_int keys are written.

…wording P1: `path_sup_t_bands` now returns `{}` (not `None`) when `by_path + n_bootstrap > 0` is requested but `path_effects == {}` (no observed path has a complete window). Mirrors the documented empty-state contract used by `path_effects` and `path_placebo_event_study`. The bug: `_collect_path_bootstrap_inputs` is gated on `len(path_effects) > 0`, so when `path_effects == {}` the bootstrap collector is skipped, `bootstrap_results.path_cband_crit_values` stays `None`, and the result-class kwarg builder mapped `None` to `path_sup_t_bands = None` — losing the requested-vs-empty distinction. Fix: the kwarg builder now keys off `self.by_path is not None and self.n_bootstrap > 0`. When that condition holds we always materialize a dict (empty when no path passed both gates), reserving `None` only for "feature not requested." When `path_cband_crit_values is None` we treat it as the empty case for the dict comprehension. P3: replaced "Hall-Mammen multiplier weight matrix" with neutral "multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb)" in CHANGELOG, REGISTRY, helper docstring, and TestByPathSupTBands docstring. The implementation honors `self.bootstrap_weights`, so the prose shouldn't fix the family. Regression test added: `test_path_sup_t_bands_empty_dict_when_no_complete_window` on the same empty-window panel as the analytical sibling at `test_empty_path_surface_when_no_complete_window` (`:4015+`), asserting `path_effects == {}`, `path_sup_t_bands == {}`, and no horizon writes a `cband_conf_int` key. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T18:13:12Z

/ai-review

github-actions · 2026-04-25T18:17:58Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e9bda71ad0f8057dbfeb57881cc1e9fa2cb8bc9b

Overall Assessment
Needs changes

Executive Summary

Re-review status: the prior empty-state path_sup_t_bands=None regression appears fixed. The results layer now materializes {} for requested-but-empty panels in diff_diff/chaisemartin_dhaultfoeuille.py:L3663-L3699, and there is a direct regression test in tests/test_chaisemartin_dhaultfoeuille.py:L5805-L5864.
Affected method: ChaisemartinDHaultfoeuille per-path multiplier-bootstrap joint sup-t bands for by_path event-study output.
I did not find an unmitigated defect in the documented Python-only extension itself, the fresh shared-draw asymmetry vs overall sup-t, or the inherited cross-path SE deviation from R; those are explicitly documented in docs/methodology/REGISTRY.md:L643-L643.
P1: the changed docs/changelog/results comments say the band is emitted when >=50% of sup-t draws are finite, but the implementation still uses a strict >50% gate on both the overall and per-path sup-t blocks.
I did not find new inline-inference anti-patterns, partial-NaN inference bugs, control-group logic defects, or security issues in the changed diff.
I could not execute pytest here because pytest is not installed in this environment.

Methodology

Severity P1. Impact: undocumented gate mismatch on the new inference surface. docs/methodology/REGISTRY.md:L643-L643, CHANGELOG.md:L11-L11, and the changed result docs/comments in diff_diff/chaisemartin_dhaultfoeuille_results.py:L164-L172, L370-L379, L419-L430, L550-L560 all specify a >=50% finite-draw requirement, but the code requires strictly more than half of draws to be finite in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L605-L613 and L851-L855. Exactly-half-finite cases will therefore drop sup_t_bands / path_sup_t_bands despite the documented contract. Concrete fix: align the implementation and documentation. If strict >50% is intended, update all changed prose to say that explicitly; if >=50% is intended, relax both guards accordingly. In either case, add a boundary regression for the exact-50% case.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not find a matching TODO.md mitigation for the P1 above.

Security

No findings.

Documentation/Tests

No separate findings beyond the P1 contract mismatch above. The prior empty-result-set issue is now regression-covered at tests/test_chaisemartin_dhaultfoeuille.py:L5805-L5864, but there is still no targeted test for the exact-50% finite-draw boundary.

Path to Approval

Make the finite-draw gate consistent across code and docs for both overall and per-path sup-t. Update either the code in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L605-L613 and L851-L855 or the changed prose in docs/methodology/REGISTRY.md:L643-L643, CHANGELOG.md:L11-L11, and diff_diff/chaisemartin_dhaultfoeuille_results.py:L164-L172, L370-L379, L419-L430, L550-L560.
Add a regression that forces exactly n_bootstrap / 2 finite sup-t draws and asserts the chosen boundary semantics for both sup_t_bands and path_sup_t_bands.

Code gate is `finite_mask.sum() > 0.5 * n_bootstrap` (strict majority, mirrors OVERALL `:612`). Prose said `>=50%` in 8 places; aligned to "strict majority (more than 50%)" / `>50%`: - `CHANGELOG.md` Unreleased entry - `docs/methodology/REGISTRY.md` per-path joint sup-t Note (added explicit "Exactly half-finite draws are NOT enough" clarification) - `diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py` helper comment - `diff_diff/chaisemartin_dhaultfoeuille_results.py`: - `DCDHBootstrapResults.path_cband_crit_values` field comment - `ChaisemartinDHaultfoeuilleResults.sup_t_bands` Attributes (the OVERALL surface — pre-audit fix updated this in R0; my new prose introduced the >=50% wording so I'm correcting it here too) - `ChaisemartinDHaultfoeuilleResults.path_sup_t_bands` Attributes - `path_sup_t_bands` field comment - `tests/test_chaisemartin_dhaultfoeuille.py`: - `TestByPathSupTBands.test_path_sup_t_absent_path_has_no_cband_keys` docstring Boundary regression added per reviewer ask: `test_path_sup_t_strict_majority_gate_at_exact_50pct` monkeypatches `_generate_psu_or_group_weights` to inject NaN into exactly half of the bootstrap rows. With `n_bootstrap=4` and 2 NaN rows, `finite_mask.sum() == 2 == 0.5 * 4`, so the gate `2 > 2.0` is False. Asserts both `path_sup_t_bands == {}` and `sup_t_bands is None`, plus no horizon writes `cband_conf_int` (per-path or OVERALL). Pins the strict-majority semantics on both sup-t surfaces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T18:41:42Z

/ai-review

github-actions · 2026-04-25T18:45:21Z

🔁 AI review rerun (requested by @igerber)

Head SHA: b1737f3f4815d3b1f5942238e13d3f3903f6c99b

Overall Assessment

⚠️ Needs changes

Executive Summary

Re-review status: the prior requested-but-empty regression for path_sup_t_bands appears fixed in diff_diff/chaisemartin_dhaultfoeuille.py:L3663-L3698, with direct coverage in tests/test_chaisemartin_dhaultfoeuille.py:L5805-L5864.
Affected method: ChaisemartinDHaultfoeuille per-path multiplier-bootstrap joint sup-t bands for by_path, a documented Python-only extension alongside the overall event_study_sup_t_bands.
P1 still open: the Methodology Registry and changelog say the finite-draw gate is >= 50%, but the implementation enforces a strict majority on both the overall and per-path sup-t paths.
I did not find an unmitigated defect in the documented fresh-draw asymmetry vs the overall sup-t implementation, or in the inherited cross-path SE deviation from R; both are documented in docs/methodology/REGISTRY.md.
I did not find new inline-inference anti-patterns, partial-NaN inference bugs, control-group logic defects, or security issues in the changed diff.

Methodology

Severity P1 [Previously identified, still open]. Impact: undocumented methodology mismatch on the new inference surface. docs/methodology/REGISTRY.md:L643-L643 and CHANGELOG.md:L11-L11 state that a band is emitted when >= 50% of sup-t draws are finite, but the code requires a strict majority on both surfaces: overall at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L599-L613 and per-path at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L821-L855. Exactly-half-finite cases are therefore rejected despite the documented contract. Concrete fix: align the source-of-truth docs and code. If strict-majority semantics are intended, update REGISTRY.md and CHANGELOG.md to say “more than 50%” / > 50%; if >= 50% is intended, relax both gates accordingly.

No other unmitigated methodology findings. The Python-only extension itself, the fresh shared-draw asymmetry, and the inherited cross-path SE deviation are documented, so they are not defects under the stated rubric.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No separate finding. I did not find a mitigating TODO.md entry for the P1 above, and this class of methodology mismatch would not be mitigated by TODO tracking anyway.

Security

No findings.

Documentation/Tests

No separate finding beyond the P1 contract mismatch above. The prior empty-state issue appears covered. I reviewed the added tests in the diff but did not execute pytest in this environment.

Path to Approval

Make the finite-draw gate consistent across docs/methodology/REGISTRY.md, CHANGELOG.md, and the bootstrap implementation for both overall and per-path sup-t bands. Choose either strict-majority (> 50%) or inclusive (>= 50%) semantics and keep the exact-50 boundary expectation consistent everywhere.

When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are now computed across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all valid horizons, producing correlated bootstrap distributions. The path-specific critical value `c_p = quantile(max_l |t_l|, 1-α)` is applied per horizon as `cband_conf_int = (eff - c_p·se, eff + c_p·se)` and surfaced at top level as `results.path_sup_t_bands[path]`. Closes Wave 2 #4 of the by_path follow-up sequence (#357 foundation, #360 R-parity, #364 bootstrap, #371 placebos). **Methodology asymmetry vs OVERALL** (intentional, documented): per-path sup-t draws fresh shared weights AFTER the per-path SE bootstrap block has populated `path_ses` via independent per-(path, horizon) draws. Asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Preserves RNG-state isolation for existing per-path SE seed-reproducibility tests. **Gates** mirror OVERALL: `>=2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Otherwise the path is absent from `path_sup_t_bands`. **Empty-state contract**: `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates (covers two cases: `path_effects == {}` upstream OR all paths fail gates downstream). **Deviation from R**: `did_multiplegt_dyn` provides no joint / sup-t bands at any surface — Python-only methodology extension consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Bundled pre-audit fix** (sibling-surface check): the existing OVERALL `sup_t_bands` field's stale "Phase 2 placeholder" docstring updated to the actual contract description. Tests: new `TestByPathSupTBands` class with 13 tests covering: attr None when no bootstrap / no by_path; keys match `path_effects` with finite crit; band wider than pointwise; crit finite and positive; seed reproducibility; single-horizon-path-skip; L_max=1 skip; n_valid_horizons matches; absent-path-no-cband-keys; summary renders; empty-dict-when-no-complete-window; strict-majority-gate-at-exact-50pct (monkeypatches the weight generator to inject NaN into half the bootstrap rows, asserting both `sup_t_bands is None` and `path_sup_t_bands == {}` at the boundary). All `@pytest.mark.slow`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T19:00:24Z

/ai-review

github-actions · 2026-04-25T19:06:33Z

🔁 AI review rerun (requested by @igerber)

Head SHA: c0c0d4e2aecb395f3ab58ccf0d534193965127f1

Overall Assessment

✅ Looks good

Executive Summary

Re-review status: the prior >=50% vs >50% finite-draw gate mismatch is resolved. Code and docs now consistently use strict-majority semantics for both overall and per-path sup-t bands at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L599-L614 and L781-L867, docs/methodology/REGISTRY.md:L643-L643, diff_diff/chaisemartin_dhaultfoeuille_results.py:L371-L440, and CHANGELOG.md:L12-L12.
Affected method: ChaisemartinDHaultfoeuille per-path joint sup-t simultaneous bands under by_path=k with n_bootstrap > 0. The fresh-draw asymmetry vs overall sup-t and the inherited cross-path cohort-sharing SE deviation from R are explicitly documented in the Methodology Registry, so they are mitigated, not defects.
I did not find an unmitigated methodology defect, incorrect variance/SE handling, NaN/inference inconsistency, control-group issue, or inline-inference anti-pattern in the changed implementation at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L781-L867, diff_diff/chaisemartin_dhaultfoeuille.py:L3004-L3028, and L3663-L3700.
The requested-but-empty path_sup_t_bands contract and the exact-50% strict-majority boundary are both covered in the added regressions at tests/test_chaisemartin_dhaultfoeuille.py:L5806-L5961.
I could not execute pytest locally because pytest is not installed in this environment.

Methodology

No findings. The new per-path sup-t surface matches the documented contract in docs/methodology/REGISTRY.md:L643-L643, and the previous P1 gate-contract mismatch is resolved.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not find a new untracked correctness limitation that would need TODO.md mitigation.

Security

No findings.

Documentation/Tests

No findings from the diff. The new tests are well targeted for the main risk areas here: requested-but-empty output, absent-key behavior for invalid bands, seed reproducibility, summary rendering, and the exact-50% boundary in tests/test_chaisemartin_dhaultfoeuille.py:L5550-L5961. I was not able to run them in this environment because pytest is unavailable.

Cross-surface gap caught in self-audit: OVERALL `to_dataframe(level= "event_study")` includes `cband_lower` / `cband_upper` columns (`chaisemartin_dhaultfoeuille_results.py:1495-1496,1531-1532`) but the per-path table at `level="by_path"` does not — even though per-path now produces `cband_conf_int` writes via the new sup-t propagation block. Cross-surface twin asymmetry the CI reviewer didn't flag; caught by my own grep audit on `cband_conf_int` consumers. Fix: extend `to_dataframe(level="by_path")` to emit the same two columns. Populated for positive-horizon rows of paths with a finite sup-t crit (read from `path_effects[path]["horizons"][l] ["cband_conf_int"]`); NaN for placebo rows (no joint band per the positive-only sup-t spec), unbanded paths, and the requested-but-empty fallback DataFrame (which now includes the columns in its canonical schema). Tests added: - `test_path_sup_t_to_dataframe_emits_cband_columns` — column presence + per-row alignment with the dict surface - `test_path_sup_t_to_dataframe_empty_path_fallback_has_cband_columns` — empty-path fallback DataFrame schema parity Docs updated: - REGISTRY.md: `to_dataframe(level="by_path")` integration note added to the new sup-t Note; canonical column list in the existing `Note (Phase 3 by_path ...)` block extended with `cband_lower / cband_upper` - CHANGELOG entry: surface listing now mentions to_dataframe columns - `by_path` parameter docstring: rendering surface listing extended - `path_sup_t_bands` Attributes docstring: rendering surface listing extended Suite: 263 tests pass (was 261, +2 new tests). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T19:18:47Z

/ai-review

github-actions · 2026-04-25T19:23:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 2df79a00c36941d78007ea9ac7935a5129741425

Overall Assessment

✅ Looks good

Executive Summary

Re-review status: the prior strict-majority gate mismatch appears resolved. The overall and per-path sup-t collectors now both use > 50% finite-draw semantics, the registry says the same, and there is an exact-50% regression test for the boundary at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L599-L614, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L802-L858, docs/methodology/REGISTRY.md:L643-L643, and tests/test_chaisemartin_dhaultfoeuille.py:L5948-L6035.
Affected method: ChaisemartinDHaultfoeuille by_path joint sup-t bands. The new collector mirrors the existing overall sup-t construction on valid-horizon gating and per-horizon band propagation, and it preserves the project’s NaN/absent-key inference contract at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L781-L867 and diff_diff/chaisemartin_dhaultfoeuille.py:L3007-L3032.
I did not find an unmitigated P0/P1 methodology defect, incorrect SE/variance handling, NaN inconsistency, or control-group logic issue in the changed implementation.
The fresh-draw asymmetry vs overall sup-t and the inherited cross-path cohort-sharing deviation from R are explicitly documented in the Methodology Registry, so they are mitigated rather than blockers at docs/methodology/REGISTRY.md:L643-L643.
Minor follow-up only: the in-code to_dataframe(level="by_path") docstring is stale relative to the new exporter schema, and the new sup-t tests do not directly cover non-default bootstrap_weights.
I could not run pytest locally because pytest is not installed in this environment.

Methodology

Severity: P3 informational. Impact: none. The only substantive methodology deviations I found are documented: the per-path sup-t block uses fresh shared draws while reusing previously bootstrapped path SEs in the denominator, and it inherits the already-documented cross-path cohort-sharing SE deviation from R. That makes them mitigated, not defects, under the review rules. Concrete fix: none. References: docs/methodology/REGISTRY.md:L643-L643, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L781-L867, diff_diff/chaisemartin_dhaultfoeuille.py:L3666-L3703.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not see a new deferred item that needs TODO.md tracking.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: low. The in-code exporter docs for to_dataframe(level="by_path") still describe the pre-change schema and omit cband_lower / cband_upper, even though the implementation now returns those columns. That leaves the local API docstring stale for callers inspecting the results surface. Concrete fix: update the by_path bullet in diff_diff/chaisemartin_dhaultfoeuille_results.py:L1375-L1379 to include the two cband columns and note their NaN-on-absent-band behavior; implementation is at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1632-L1706.
Severity: P3. Impact: low. TestByPathSupTBands exercises only the default Rademacher path on the new per-path shared-draw sup-t branch, even though the feature advertises support for mammen and webb. The code likely reuses safe helper paths, but the new surface lacks direct regression coverage for those multiplier choices. Concrete fix: parameterize one existing invariant test in tests/test_chaisemartin_dhaultfoeuille.py:L5519-L6035 over bootstrap_weights in {"rademacher", "mammen", "webb"}.

Execution note: local test execution was not possible here because python -m pytest fails with No module named pytest.

…erage P3 #1: ``to_dataframe`` method docstring at ``chaisemartin_dhaultfoeuille_results.py:1375-1379`` listed the pre-change ``level="by_path"`` schema (no ``cband_*`` columns) even though the implementation now returns them. Updated the bullet to include ``cband_lower / cband_upper``, document the negative-horizon placebo convention, and document the NaN-on-absent-band behavior. P3 #2: ``TestByPathSupTBands::test_path_sup_t_seed_reproducibility`` only exercised the default ``rademacher`` weight family. Parameterized over ``["rademacher", "mammen", "webb"]`` to pin that the per-path sup-t branch correctly threads ``self.bootstrap_weights`` through ``_generate_psu_or_group_weights`` for all three multiplier families the feature advertises. The existing OVERALL machinery handles all three uniformly, but the per-path surface lacked direct coverage. Each variant must produce a finite, reproducible crit on the standard 3-path fixture. 17 tests pass on TestByPathSupTBands (was 15: +2 new parameterized variants on the existing seed_reproducibility test). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T19:26:09Z

/ai-review

github-actions · 2026-04-25T19:31:30Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 1febbb190836089855af8f9ccfbe825f1adce840

Overall Assessment

✅ Looks good

Executive Summary

Re-review status: the prior P3 follow-ups appear addressed. The exporter docstring now documents cband_lower / cband_upper, and the new per-path sup-t tests now cover non-default bootstrap_weights (rademacher, mammen, webb) at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1375-L1388 and tests/test_chaisemartin_dhaultfoeuille.py:L5664-L6066.
Affected method: ChaisemartinDHaultfoeuille by_path joint sup-t simultaneous bands across horizons 1..L_max. The PR adds a per-path shared-draw bootstrap collector, propagates cband_conf_int onto positive-horizon path results, and exposes path_sup_t_bands plus by_path dataframe columns at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L781-L867, diff_diff/chaisemartin_dhaultfoeuille.py:L3007-L3032, and diff_diff/chaisemartin_dhaultfoeuille.py:L3666-L3703.
I did not find an unmitigated P0/P1 on estimator math, weighting, variance/SE handling, NaN/absent-key inference, or control-group logic in the changed code.
The fresh-draw asymmetry versus the existing overall sup-t path and the inherited cross-path cohort-sharing SE deviation from R are explicitly documented in the Methodology Registry, so they are mitigated rather than defects at docs/methodology/REGISTRY.md:L643-L643.
Edge-case coverage in the changed tests is good: exact-50% strict-majority gating, L_max=1 no-band behavior, requested-but-empty {} state, absent-key cband_conf_int, dataframe schema, and summary rendering are all exercised at tests/test_chaisemartin_dhaultfoeuille.py:L5747-L6066.
Residual risk: this is a source-only review. I could not run pytest locally because the sandbox lacks pytest and numpy.

Methodology

Severity: P3 informational. Impact: none. The changed method is the Python-only by_path joint sup-t extension. The implementation matches the documented contract: shared multiplier weights within path, >=2 valid-horizon gate, strict >50% finite-draw gate, path-level critical-value storage, and positive-horizon band propagation. The only substantive deviations I found are documented in the registry, so they are mitigated. Concrete fix: none. References: docs/methodology/REGISTRY.md:L643-L643, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L781-L867, diff_diff/chaisemartin_dhaultfoeuille.py:L3007-L3032.

Code Quality

Severity: none. Impact: no defect identified in the changed code. Concrete fix: none.

Performance

Severity: none. Impact: no material performance issue identified beyond the documented extra per-path bootstrap pass required by the feature. Concrete fix: none.

Maintainability

Severity: none. Impact: no issue identified. The new surface is threaded consistently through fit-time result construction, summary rendering, and to_dataframe(level="by_path"). Concrete fix: none. References: diff_diff/chaisemartin_dhaultfoeuille.py:L3666-L3703, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1291-L1715.

Tech Debt

Severity: none. Impact: no new untracked deferrable work identified in the diff. Concrete fix: none.

Security

Severity: none. Impact: no security issue identified in the changed code. Concrete fix: none.

Documentation/Tests

Severity: none. Impact: no open doc/test defect identified in the changed files; the previous re-review P3s appear resolved. Concrete fix: none. References: diff_diff/chaisemartin_dhaultfoeuille_results.py:L371-L443, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1375-L1388, tests/test_chaisemartin_dhaultfoeuille.py:L5664-L6066.

PR #357 shipped by_path foundation; PRs #364/#371/#374 completed the inference surface (bootstrap, placebos, sup-t bands). Wave 3 begins design-variant extensions; this PR is item #5: combine by_path=k with controls=[...] (DID^X). Architecture: the per-baseline OLS residualization at chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel BEFORE path enumeration, so all four downstream surfaces (analytical SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized Y_mat automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing controls + per-period DID contract. Canonical R behavior: `did_multiplegt_dyn(..., by_path=k, controls=...)` re-runs the per-baseline residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls). On the multi_path_reversible DGP all switchers share baseline D_{g,1}=0, so R's per-path control pool equals our global control pool and the residualization coefficients coincide. Per-path point estimates match R exactly (rtol ~1e-11); per-path SE within ~6.5% (Phase 2 envelope, inheriting the documented cross-path cohort- sharing deviation). Changes: - Delete the gate at chaisemartin_dhaultfoeuille.py:988-992 - Update by_path docstring (remove `controls` from incompatible list, add inheritance paragraph) - New R parity scenario `multi_path_reversible_by_path_controls` in benchmarks/R/generate_dcdh_dynr_test_values.R + regenerated golden values - New TestDCDHDynRParityByPathControls in tests/test_chaisemartin_dhaultfoeuille_parity.py - New TestByPathControls in tests/test_chaisemartin_dhaultfoeuille.py (12 tests covering analytical / bootstrap / placebo / sup-t / cband to_dataframe / per-period unadjusted / covariate_residuals round- trip / multi-covariate) - Remove the `controls` parametrize entry from TestByPathGates::test_forbids_phase3_fit_kwargs - Update REGISTRY.md (remove `controls` from gated-combos list, add inheritance sub-paragraph documenting the four-surface auto- inheritance) - CHANGELOG: Unreleased > Added entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber force-pushed the dcdh-by-path-sup-t branch from b1737f3 to c0c0d4e Compare April 25, 2026 19:00

igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026

igerber merged commit 94083be into main Apr 25, 2026
23 of 24 checks passed

igerber deleted the dcdh-by-path-sup-t branch April 25, 2026 20:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add per-path joint sup-t bands to ChaisemartinDHaultfoeuille.by_path#374

Add per-path joint sup-t bands to ChaisemartinDHaultfoeuille.by_path#374
igerber merged 3 commits intomainfrom
dcdh-by-path-sup-t

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 25, 2026

Summary

Methodology

Deviation from R

Test plan

Uh oh!

github-actions Bot commented Apr 25, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant