Skip to content

Add per-path joint sup-t bands to ChaisemartinDHaultfoeuille.by_path#374

Merged
igerber merged 3 commits intomainfrom
dcdh-by-path-sup-t
Apr 25, 2026
Merged

Add per-path joint sup-t bands to ChaisemartinDHaultfoeuille.by_path#374
igerber merged 3 commits intomainfrom
dcdh-by-path-sup-t

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 25, 2026

Summary

Methodology

Closely parallels the OVERALL event_study_sup_t_bands pattern at chaisemartin_dhaultfoeuille_bootstrap.py:599-614, with one documented intentional asymmetry: per-path sup-t draws a fresh shared-weights matrix per path AFTER the per-path SE bootstrap block has already populated results.path_ses via independent per-(path, horizon) draws. Numerator uses fresh shared draws; denominator uses bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical — the fresh draw preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable.

Gates (mirror OVERALL :605,612): a path needs >= 2 valid horizons (finite bootstrap SE > 0) AND >= 50% finite sup-t draws to receive a band; otherwise the path is absent from path_sup_t_bands (NaN-on-invalid absent-key contract).

Empty-state: path_sup_t_bands is None when not requested (no bootstrap or by_path is None); {} when requested but no path passes both gates.

Interpretation: bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths (different inference target, deferred).

Deviation from R

did_multiplegt_dyn provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL event_study_sup_t_bands (also Python-only). Inherits the cross-path cohort-sharing SE deviation from R documented for path_effects (the bootstrap SE used in the t-stat denominator carries the same deviation).

Test plan

  • pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands -v -m slow — 11 new tests pass
  • pytest tests/test_chaisemartin_dhaultfoeuille.py -k "ByPath" -m "slow or not slow" — 71 passed (regression on existing per-path tests)
  • pytest tests/test_chaisemartin_dhaultfoeuille.py -k "sup_t or cband" — OVERALL sup-t bit-identical
  • Full dCDH suite: pytest tests/test_chaisemartin_dhaultfoeuille.py tests/test_methodology_chaisemartin_dhaultfoeuille.py tests/test_chaisemartin_dhaultfoeuille_parity.py -m "slow or not slow" — 259 passed
  • black + ruff clean on changed source files
  • Smoke test on R-parity scenario multi_path_reversible_by_path (n_bootstrap=200, seed=42, L_max=3) shows per-path crits in expected range [1.96, 2.394] (between marginal and Bonferroni)

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes

Executive Summary

  • Affected method: ChaisemartinDHaultfoeuille Phase 3 by_path multiplier-bootstrap joint sup-t bands over the dynamic DID_l event-study path.
  • I found no unmitigated P0/P1 methodology mismatch in the new sup-t construction itself. The Python-only extension, the fresh shared-draw asymmetry vs the OVERALL sup-t path, and the inherited deviation from R are explicitly documented in docs/methodology/REGISTRY.md:L643-L643.
  • P1: on panels where by_path is requested but no path has a complete window, the new surface returns results.path_sup_t_bands=None instead of the documented {} because the bootstrap collector is skipped when path_effects is empty and the results layer only materializes path_sup_t_bands from a non-None crit dict.
  • The pre-existing empty-window edge case is already covered for path_effects in tests/test_chaisemartin_dhaultfoeuille.py:L4015-L4078, but the new sup-t tests only cover non-empty gate failures such as L_max=1 in tests/test_chaisemartin_dhaultfoeuille.py:L5714-L5775, so this regression is currently untested.
  • No new inline inference / partial-NaN anti-patterns, control-group logic errors, or security issues stood out in the changed code.

Methodology

  • P3 Informational: no unmitigated methodology defect found. The PR affects a Python-only joint-band extension on top of dCDH’s by_path event-study surface; the lack of an R analogue and the fresh shared-draw asymmetry vs the OVERALL sup-t implementation are explicitly documented in docs/methodology/REGISTRY.md:L643-L643 and diff_diff/chaisemartin_dhaultfoeuille_results.py:L419-L430. Impact: none under the review rubric. Concrete fix: none required.

Code Quality

  • P1 Empty-result-set contract regression. fit() only builds path_bootstrap_inputs when len(path_effects) > 0 in diff_diff/chaisemartin_dhaultfoeuille.py:L2662-L2669, the sup-t bootstrap block only initializes path_cband_crit_values when results.path_ses is truthy in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L805-L863, and the result layer maps None to path_sup_t_bands=None in diff_diff/chaisemartin_dhaultfoeuille.py:L3663-L3683. That breaks the documented empty-state contract for requested-but-empty panels in docs/methodology/REGISTRY.md:L643-L643 and diff_diff/chaisemartin_dhaultfoeuille_results.py:L426-L430, and it diverges from the existing empty-path sentinel behavior already enforced for path_effects in tests/test_chaisemartin_dhaultfoeuille.py:L4015-L4078. Impact: downstream callers cannot distinguish “feature not requested” from “requested, but no path window survived,” which is a production-facing edge-case regression on the new public surface. Concrete fix: preserve the requested-but-empty sentinel by initializing path_cband_crit_values and path_cband_n_valid_horizons to {} whenever by_path + n_bootstrap is active, even if path_effects == {}, so results.path_sup_t_bands is {} rather than None.

Performance

No findings in the changed diff.

Maintainability

No findings beyond the contract bug above.

Tech Debt

No mitigating TODO.md entry applies to the P1 above, and this is not deferrable under the review rules.

Security

No findings in the changed diff.

Documentation/Tests

  • P3 Informational: the new prose hard-codes “Hall-Mammen” for the per-path sup-t draw in CHANGELOG.md:L11-L11, docs/methodology/REGISTRY.md:L643-L643, and tests/test_chaisemartin_dhaultfoeuille.py:L5496-L5500, but the implementation still honors the estimator’s configurable bootstrap_weights via weight_type=self.bootstrap_weights in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L835-L840. Impact: the docs imply a fixed weight family even though rademacher and webb remain valid implementation choices. Concrete fix: rephrase this to “shared multiplier weights using bootstrap_weights” or equivalent.

Path to Approval

  1. Make the new surface honor the documented requested-but-empty sentinel: when by_path and n_bootstrap > 0 are active but no path has a complete window, return results.path_sup_t_bands == {} rather than None.
  2. Add a regression test on the existing empty-window fixture with bootstrap enabled, asserting results.path_effects == {}, results.path_sup_t_bands == {}, and that no per-horizon cband_conf_int keys are written.

igerber added a commit that referenced this pull request Apr 25, 2026
…wording

P1: `path_sup_t_bands` now returns `{}` (not `None`) when `by_path +
n_bootstrap > 0` is requested but `path_effects == {}` (no observed
path has a complete window). Mirrors the documented empty-state
contract used by `path_effects` and `path_placebo_event_study`.

The bug: `_collect_path_bootstrap_inputs` is gated on
`len(path_effects) > 0`, so when `path_effects == {}` the bootstrap
collector is skipped, `bootstrap_results.path_cband_crit_values` stays
`None`, and the result-class kwarg builder mapped `None` to
`path_sup_t_bands = None` — losing the requested-vs-empty distinction.

Fix: the kwarg builder now keys off `self.by_path is not None and
self.n_bootstrap > 0`. When that condition holds we always materialize
a dict (empty when no path passed both gates), reserving `None` only
for "feature not requested." When `path_cband_crit_values is None` we
treat it as the empty case for the dict comprehension.

P3: replaced "Hall-Mammen multiplier weight matrix" with neutral
"multiplier weight matrix (using the estimator's configured
`bootstrap_weights` — Rademacher / Mammen / Webb)" in CHANGELOG,
REGISTRY, helper docstring, and TestByPathSupTBands docstring. The
implementation honors `self.bootstrap_weights`, so the prose
shouldn't fix the family.

Regression test added: `test_path_sup_t_bands_empty_dict_when_no_complete_window`
on the same empty-window panel as the analytical sibling at
`test_empty_path_surface_when_no_complete_window` (`:4015+`),
asserting `path_effects == {}`, `path_sup_t_bands == {}`, and no
horizon writes a `cband_conf_int` key.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e9bda71ad0f8057dbfeb57881cc1e9fa2cb8bc9b


Overall Assessment
Needs changes

Executive Summary

  • Re-review status: the prior empty-state path_sup_t_bands=None regression appears fixed. The results layer now materializes {} for requested-but-empty panels in diff_diff/chaisemartin_dhaultfoeuille.py:L3663-L3699, and there is a direct regression test in tests/test_chaisemartin_dhaultfoeuille.py:L5805-L5864.
  • Affected method: ChaisemartinDHaultfoeuille per-path multiplier-bootstrap joint sup-t bands for by_path event-study output.
  • I did not find an unmitigated defect in the documented Python-only extension itself, the fresh shared-draw asymmetry vs overall sup-t, or the inherited cross-path SE deviation from R; those are explicitly documented in docs/methodology/REGISTRY.md:L643-L643.
  • P1: the changed docs/changelog/results comments say the band is emitted when >=50% of sup-t draws are finite, but the implementation still uses a strict >50% gate on both the overall and per-path sup-t blocks.
  • I did not find new inline-inference anti-patterns, partial-NaN inference bugs, control-group logic defects, or security issues in the changed diff.
  • I could not execute pytest here because pytest is not installed in this environment.

Methodology

  • Severity P1. Impact: undocumented gate mismatch on the new inference surface. docs/methodology/REGISTRY.md:L643-L643, CHANGELOG.md:L11-L11, and the changed result docs/comments in diff_diff/chaisemartin_dhaultfoeuille_results.py:L164-L172, L370-L379, L419-L430, L550-L560 all specify a >=50% finite-draw requirement, but the code requires strictly more than half of draws to be finite in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L605-L613 and L851-L855. Exactly-half-finite cases will therefore drop sup_t_bands / path_sup_t_bands despite the documented contract. Concrete fix: align the implementation and documentation. If strict >50% is intended, update all changed prose to say that explicitly; if >=50% is intended, relax both guards accordingly. In either case, add a boundary regression for the exact-50% case.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. I did not find a matching TODO.md mitigation for the P1 above.

Security

  • No findings.

Documentation/Tests

  • No separate findings beyond the P1 contract mismatch above. The prior empty-result-set issue is now regression-covered at tests/test_chaisemartin_dhaultfoeuille.py:L5805-L5864, but there is still no targeted test for the exact-50% finite-draw boundary.

Path to Approval

  1. Make the finite-draw gate consistent across code and docs for both overall and per-path sup-t. Update either the code in diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L605-L613 and L851-L855 or the changed prose in docs/methodology/REGISTRY.md:L643-L643, CHANGELOG.md:L11-L11, and diff_diff/chaisemartin_dhaultfoeuille_results.py:L164-L172, L370-L379, L419-L430, L550-L560.
  2. Add a regression that forces exactly n_bootstrap / 2 finite sup-t draws and asserts the chosen boundary semantics for both sup_t_bands and path_sup_t_bands.

igerber added a commit that referenced this pull request Apr 25, 2026
Code gate is `finite_mask.sum() > 0.5 * n_bootstrap` (strict majority,
mirrors OVERALL `:612`). Prose said `>=50%` in 8 places; aligned to
"strict majority (more than 50%)" / `>50%`:

- `CHANGELOG.md` Unreleased entry
- `docs/methodology/REGISTRY.md` per-path joint sup-t Note (added
  explicit "Exactly half-finite draws are NOT enough" clarification)
- `diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py` helper comment
- `diff_diff/chaisemartin_dhaultfoeuille_results.py`:
  - `DCDHBootstrapResults.path_cband_crit_values` field comment
  - `ChaisemartinDHaultfoeuilleResults.sup_t_bands` Attributes (the
    OVERALL surface — pre-audit fix updated this in R0; my new prose
    introduced the >=50% wording so I'm correcting it here too)
  - `ChaisemartinDHaultfoeuilleResults.path_sup_t_bands` Attributes
  - `path_sup_t_bands` field comment
- `tests/test_chaisemartin_dhaultfoeuille.py`:
  - `TestByPathSupTBands.test_path_sup_t_absent_path_has_no_cband_keys`
    docstring

Boundary regression added per reviewer ask:
`test_path_sup_t_strict_majority_gate_at_exact_50pct` monkeypatches
`_generate_psu_or_group_weights` to inject NaN into exactly half of
the bootstrap rows. With `n_bootstrap=4` and 2 NaN rows,
`finite_mask.sum() == 2 == 0.5 * 4`, so the gate `2 > 2.0` is False.
Asserts both `path_sup_t_bands == {}` and `sup_t_bands is None`, plus
no horizon writes `cband_conf_int` (per-path or OVERALL). Pins the
strict-majority semantics on both sup-t surfaces.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: b1737f3f4815d3b1f5942238e13d3f3903f6c99b


Overall Assessment

⚠️ Needs changes

Executive Summary

  • Re-review status: the prior requested-but-empty regression for path_sup_t_bands appears fixed in diff_diff/chaisemartin_dhaultfoeuille.py:L3663-L3698, with direct coverage in tests/test_chaisemartin_dhaultfoeuille.py:L5805-L5864.
  • Affected method: ChaisemartinDHaultfoeuille per-path multiplier-bootstrap joint sup-t bands for by_path, a documented Python-only extension alongside the overall event_study_sup_t_bands.
  • P1 still open: the Methodology Registry and changelog say the finite-draw gate is >= 50%, but the implementation enforces a strict majority on both the overall and per-path sup-t paths.
  • I did not find an unmitigated defect in the documented fresh-draw asymmetry vs the overall sup-t implementation, or in the inherited cross-path SE deviation from R; both are documented in docs/methodology/REGISTRY.md.
  • I did not find new inline-inference anti-patterns, partial-NaN inference bugs, control-group logic defects, or security issues in the changed diff.

Methodology

  • Severity P1 [Previously identified, still open]. Impact: undocumented methodology mismatch on the new inference surface. docs/methodology/REGISTRY.md:L643-L643 and CHANGELOG.md:L11-L11 state that a band is emitted when >= 50% of sup-t draws are finite, but the code requires a strict majority on both surfaces: overall at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L599-L613 and per-path at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L821-L855. Exactly-half-finite cases are therefore rejected despite the documented contract. Concrete fix: align the source-of-truth docs and code. If strict-majority semantics are intended, update REGISTRY.md and CHANGELOG.md to say “more than 50%” / > 50%; if >= 50% is intended, relax both gates accordingly.

No other unmitigated methodology findings. The Python-only extension itself, the fresh shared-draw asymmetry, and the inherited cross-path SE deviation are documented, so they are not defects under the stated rubric.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No separate finding. I did not find a mitigating TODO.md entry for the P1 above, and this class of methodology mismatch would not be mitigated by TODO tracking anyway.

Security

  • No findings.

Documentation/Tests

  • No separate finding beyond the P1 contract mismatch above. The prior empty-state issue appears covered. I reviewed the added tests in the diff but did not execute pytest in this environment.

Path to Approval

  1. Make the finite-draw gate consistent across docs/methodology/REGISTRY.md, CHANGELOG.md, and the bootstrap implementation for both overall and per-path sup-t bands. Choose either strict-majority (> 50%) or inclusive (>= 50%) semantics and keep the exact-50 boundary expectation consistent everywhere.

When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t
simultaneous confidence bands are now computed across horizons
`1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)`
multiplier weight matrix (using the estimator's configured
`bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path
and broadcast across all valid horizons, producing correlated
bootstrap distributions. The path-specific critical value
`c_p = quantile(max_l |t_l|, 1-α)` is applied per horizon as
`cband_conf_int = (eff - c_p·se, eff + c_p·se)` and surfaced at top
level as `results.path_sup_t_bands[path]`.

Closes Wave 2 #4 of the by_path follow-up sequence (#357 foundation,
#360 R-parity, #364 bootstrap, #371 placebos).

**Methodology asymmetry vs OVERALL** (intentional, documented):
per-path sup-t draws fresh shared weights AFTER the per-path SE
bootstrap block has populated `path_ses` via independent per-(path,
horizon) draws. Asymptotically equivalent to OVERALL's self-consistent
reuse but NOT bit-identical. Preserves RNG-state isolation for
existing per-path SE seed-reproducibility tests.

**Gates** mirror OVERALL: `>=2` valid horizons (finite bootstrap SE > 0)
AND a strict majority (more than 50%) of finite sup-t draws to receive
a band. Otherwise the path is absent from `path_sup_t_bands`.

**Empty-state contract**: `path_sup_t_bands is None` when not requested
(no bootstrap or `by_path is None`); `{}` when requested but no path
passes both gates (covers two cases: `path_effects == {}` upstream OR
all paths fail gates downstream).

**Deviation from R**: `did_multiplegt_dyn` provides no joint / sup-t
bands at any surface — Python-only methodology extension consistent
with the existing OVERALL `event_study_sup_t_bands` (also Python-only).
Inherits the cross-path cohort-sharing SE deviation from R documented
for `path_effects`.

**Bundled pre-audit fix** (sibling-surface check): the existing OVERALL
`sup_t_bands` field's stale "Phase 2 placeholder" docstring updated to
the actual contract description.

Tests: new `TestByPathSupTBands` class with 13 tests covering: attr
None when no bootstrap / no by_path; keys match `path_effects` with
finite crit; band wider than pointwise; crit finite and positive;
seed reproducibility; single-horizon-path-skip; L_max=1 skip;
n_valid_horizons matches; absent-path-no-cband-keys; summary renders;
empty-dict-when-no-complete-window; strict-majority-gate-at-exact-50pct
(monkeypatches the weight generator to inject NaN into half the
bootstrap rows, asserting both `sup_t_bands is None` and
`path_sup_t_bands == {}` at the boundary). All `@pytest.mark.slow`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the dcdh-by-path-sup-t branch from b1737f3 to c0c0d4e Compare April 25, 2026 19:00
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: c0c0d4e2aecb395f3ab58ccf0d534193965127f1


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review status: the prior >=50% vs >50% finite-draw gate mismatch is resolved. Code and docs now consistently use strict-majority semantics for both overall and per-path sup-t bands at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L599-L614 and L781-L867, docs/methodology/REGISTRY.md:L643-L643, diff_diff/chaisemartin_dhaultfoeuille_results.py:L371-L440, and CHANGELOG.md:L12-L12.
  • Affected method: ChaisemartinDHaultfoeuille per-path joint sup-t simultaneous bands under by_path=k with n_bootstrap > 0. The fresh-draw asymmetry vs overall sup-t and the inherited cross-path cohort-sharing SE deviation from R are explicitly documented in the Methodology Registry, so they are mitigated, not defects.
  • I did not find an unmitigated methodology defect, incorrect variance/SE handling, NaN/inference inconsistency, control-group issue, or inline-inference anti-pattern in the changed implementation at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L781-L867, diff_diff/chaisemartin_dhaultfoeuille.py:L3004-L3028, and L3663-L3700.
  • The requested-but-empty path_sup_t_bands contract and the exact-50% strict-majority boundary are both covered in the added regressions at tests/test_chaisemartin_dhaultfoeuille.py:L5806-L5961.
  • I could not execute pytest locally because pytest is not installed in this environment.

Methodology

No findings. The new per-path sup-t surface matches the documented contract in docs/methodology/REGISTRY.md:L643-L643, and the previous P1 gate-contract mismatch is resolved.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not find a new untracked correctness limitation that would need TODO.md mitigation.

Security

No findings.

Documentation/Tests

No findings from the diff. The new tests are well targeted for the main risk areas here: requested-but-empty output, absent-key behavior for invalid bands, seed reproducibility, summary rendering, and the exact-50% boundary in tests/test_chaisemartin_dhaultfoeuille.py:L5550-L5961. I was not able to run them in this environment because pytest is unavailable.

Cross-surface gap caught in self-audit: OVERALL `to_dataframe(level=
"event_study")` includes `cband_lower` / `cband_upper` columns
(`chaisemartin_dhaultfoeuille_results.py:1495-1496,1531-1532`) but the
per-path table at `level="by_path"` does not — even though per-path
now produces `cband_conf_int` writes via the new sup-t propagation
block. Cross-surface twin asymmetry the CI reviewer didn't flag;
caught by my own grep audit on `cband_conf_int` consumers.

Fix: extend `to_dataframe(level="by_path")` to emit the same two
columns. Populated for positive-horizon rows of paths with a finite
sup-t crit (read from `path_effects[path]["horizons"][l]
["cband_conf_int"]`); NaN for placebo rows (no joint band per the
positive-only sup-t spec), unbanded paths, and the requested-but-empty
fallback DataFrame (which now includes the columns in its canonical
schema).

Tests added:
- `test_path_sup_t_to_dataframe_emits_cband_columns` — column
  presence + per-row alignment with the dict surface
- `test_path_sup_t_to_dataframe_empty_path_fallback_has_cband_columns`
  — empty-path fallback DataFrame schema parity

Docs updated:
- REGISTRY.md: `to_dataframe(level="by_path")` integration note added
  to the new sup-t Note; canonical column list in the existing
  `Note (Phase 3 by_path ...)` block extended with `cband_lower /
  cband_upper`
- CHANGELOG entry: surface listing now mentions to_dataframe columns
- `by_path` parameter docstring: rendering surface listing extended
- `path_sup_t_bands` Attributes docstring: rendering surface listing
  extended

Suite: 263 tests pass (was 261, +2 new tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 2df79a00c36941d78007ea9ac7935a5129741425


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review status: the prior strict-majority gate mismatch appears resolved. The overall and per-path sup-t collectors now both use > 50% finite-draw semantics, the registry says the same, and there is an exact-50% regression test for the boundary at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L599-L614, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L802-L858, docs/methodology/REGISTRY.md:L643-L643, and tests/test_chaisemartin_dhaultfoeuille.py:L5948-L6035.
  • Affected method: ChaisemartinDHaultfoeuille by_path joint sup-t bands. The new collector mirrors the existing overall sup-t construction on valid-horizon gating and per-horizon band propagation, and it preserves the project’s NaN/absent-key inference contract at diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L781-L867 and diff_diff/chaisemartin_dhaultfoeuille.py:L3007-L3032.
  • I did not find an unmitigated P0/P1 methodology defect, incorrect SE/variance handling, NaN inconsistency, or control-group logic issue in the changed implementation.
  • The fresh-draw asymmetry vs overall sup-t and the inherited cross-path cohort-sharing deviation from R are explicitly documented in the Methodology Registry, so they are mitigated rather than blockers at docs/methodology/REGISTRY.md:L643-L643.
  • Minor follow-up only: the in-code to_dataframe(level="by_path") docstring is stale relative to the new exporter schema, and the new sup-t tests do not directly cover non-default bootstrap_weights.
  • I could not run pytest locally because pytest is not installed in this environment.

Methodology

  • Severity: P3 informational. Impact: none. The only substantive methodology deviations I found are documented: the per-path sup-t block uses fresh shared draws while reusing previously bootstrapped path SEs in the denominator, and it inherits the already-documented cross-path cohort-sharing SE deviation from R. That makes them mitigated, not defects, under the review rules. Concrete fix: none. References: docs/methodology/REGISTRY.md:L643-L643, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L781-L867, diff_diff/chaisemartin_dhaultfoeuille.py:L3666-L3703.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not see a new deferred item that needs TODO.md tracking.

Security

No findings.

Documentation/Tests

  • Severity: P3. Impact: low. The in-code exporter docs for to_dataframe(level="by_path") still describe the pre-change schema and omit cband_lower / cband_upper, even though the implementation now returns those columns. That leaves the local API docstring stale for callers inspecting the results surface. Concrete fix: update the by_path bullet in diff_diff/chaisemartin_dhaultfoeuille_results.py:L1375-L1379 to include the two cband columns and note their NaN-on-absent-band behavior; implementation is at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1632-L1706.
  • Severity: P3. Impact: low. TestByPathSupTBands exercises only the default Rademacher path on the new per-path shared-draw sup-t branch, even though the feature advertises support for mammen and webb. The code likely reuses safe helper paths, but the new surface lacks direct regression coverage for those multiplier choices. Concrete fix: parameterize one existing invariant test in tests/test_chaisemartin_dhaultfoeuille.py:L5519-L6035 over bootstrap_weights in {"rademacher", "mammen", "webb"}.

Execution note: local test execution was not possible here because python -m pytest fails with No module named pytest.

…erage

P3 #1: ``to_dataframe`` method docstring at
``chaisemartin_dhaultfoeuille_results.py:1375-1379`` listed the
pre-change ``level="by_path"`` schema (no ``cband_*`` columns) even
though the implementation now returns them. Updated the bullet to
include ``cband_lower / cband_upper``, document the negative-horizon
placebo convention, and document the NaN-on-absent-band behavior.

P3 #2: ``TestByPathSupTBands::test_path_sup_t_seed_reproducibility``
only exercised the default ``rademacher`` weight family. Parameterized
over ``["rademacher", "mammen", "webb"]`` to pin that the per-path
sup-t branch correctly threads ``self.bootstrap_weights`` through
``_generate_psu_or_group_weights`` for all three multiplier families
the feature advertises. The existing OVERALL machinery handles all
three uniformly, but the per-path surface lacked direct coverage.
Each variant must produce a finite, reproducible crit on the standard
3-path fixture.

17 tests pass on TestByPathSupTBands (was 15: +2 new parameterized
variants on the existing seed_reproducibility test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 1febbb190836089855af8f9ccfbe825f1adce840


Overall Assessment

✅ Looks good

Executive Summary

Methodology

Code Quality

  • Severity: none. Impact: no defect identified in the changed code. Concrete fix: none.

Performance

  • Severity: none. Impact: no material performance issue identified beyond the documented extra per-path bootstrap pass required by the feature. Concrete fix: none.

Maintainability

Tech Debt

  • Severity: none. Impact: no new untracked deferrable work identified in the diff. Concrete fix: none.

Security

  • Severity: none. Impact: no security issue identified in the changed code. Concrete fix: none.

Documentation/Tests

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026
@igerber igerber merged commit 94083be into main Apr 25, 2026
23 of 24 checks passed
@igerber igerber deleted the dcdh-by-path-sup-t branch April 25, 2026 20:59
igerber added a commit that referenced this pull request Apr 25, 2026
PR #357 shipped by_path foundation; PRs #364/#371/#374 completed the
inference surface (bootstrap, placebos, sup-t bands). Wave 3 begins
design-variant extensions; this PR is item #5: combine by_path=k with
controls=[...] (DID^X).

Architecture: the per-baseline OLS residualization at
chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel
BEFORE path enumeration, so all four downstream surfaces (analytical
SE, bootstrap SE, per-path placebos, per-path joint sup-t bands)
consume the residualized Y_mat automatically (Frisch-Waugh-Lovell).
Per-period effects remain unadjusted, consistent with the existing
controls + per-period DID contract.

Canonical R behavior: `did_multiplegt_dyn(..., by_path=k, controls=...)`
re-runs the per-baseline residualization on each path's restricted
subsample (path's switchers + same-baseline not-yet-treated controls).
On the multi_path_reversible DGP all switchers share baseline
D_{g,1}=0, so R's per-path control pool equals our global control
pool and the residualization coefficients coincide. Per-path point
estimates match R exactly (rtol ~1e-11); per-path SE within ~6.5%
(Phase 2 envelope, inheriting the documented cross-path cohort-
sharing deviation).

Changes:
- Delete the gate at chaisemartin_dhaultfoeuille.py:988-992
- Update by_path docstring (remove `controls` from incompatible list,
  add inheritance paragraph)
- New R parity scenario `multi_path_reversible_by_path_controls` in
  benchmarks/R/generate_dcdh_dynr_test_values.R + regenerated golden
  values
- New TestDCDHDynRParityByPathControls in
  tests/test_chaisemartin_dhaultfoeuille_parity.py
- New TestByPathControls in tests/test_chaisemartin_dhaultfoeuille.py
  (12 tests covering analytical / bootstrap / placebo / sup-t / cband
  to_dataframe / per-period unadjusted / covariate_residuals round-
  trip / multi-covariate)
- Remove the `controls` parametrize entry from
  TestByPathGates::test_forbids_phase3_fit_kwargs
- Update REGISTRY.md (remove `controls` from gated-combos list, add
  inheritance sub-paragraph documenting the four-surface auto-
  inheritance)
- CHANGELOG: Unreleased > Added entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant