dCDH by_path + placebo: per-path backward-horizon placebos (Wave 2 #3) by igerber · Pull Request #371 · igerber/diff-diff

igerber · 2026-04-25T12:50:24Z

Summary

Adds per-path backward-horizon placebos DID^{pl}_{path, l} for l = 1..L_max under the existing joiners/leavers IF precedent applied backward; surfaced on results.path_placebo_event_study[path][-l].
Bundled analytical + bootstrap + R-parity (closes Wave 2 item Add robust parallel trends testing with Wasserstein distance #3 of project_dcdh_by_path_next_prs.md).
Library-wide NaN-on-invalid bootstrap contract from PR dCDH by_path + n_bootstrap support (library-consistent percentile CI) #364 enforced uniformly on the new surface; bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the cross-path cohort-sharing deviation from R already documented for path_effects.

Test plan

pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo -v (8 analytical tests pass)
pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo::TestBootstrap -v -m slow (5 bootstrap tests pass)
pytest tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo -v (R-parity confirmed; point estimates exact, SE within ~5% rtol)
Full dCDH suite regression: pytest tests/test_chaisemartin_dhaultfoeuille.py tests/test_chaisemartin_dhaultfoeuille_parity.py tests/test_methodology_chaisemartin_dhaultfoeuille.py (224 passed)
Slow suite: 21 passed
R fixture regenerated via Rscript benchmarks/R/generate_dcdh_dynr_test_values.R
ruff check + black --check on changed methodology files: clean

🤖 Generated with Claude Code

Wave 2 item 3 of the by_path follow-up. Adds per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` under the existing joiners/leavers IF precedent applied backward, surfaced on the new `results.path_placebo_event_study[path][-l]` attribute (negative-int keys mirroring `placebo_event_study`). Bundled scope: - Analytical: extend `_compute_per_group_if_placebo_horizon` with `switcher_subset_mask` (parallel to the PR #357 multi_horizon extension); new `_compute_path_placebos` sibling helper of `_compute_path_effects`; cohort-recentered plug-in SE with path- specific divisor `N^{pl}_{l, path}`. - Bootstrap: new `_collect_path_placebo_bootstrap_inputs` collector + `_compute_dcdh_bootstrap` per-`(path, lag_l)` dispatch reusing `_bootstrap_one_target`; bootstrap propagation block in fit() enforcing the library-wide NaN-on-invalid contract from PR #364 (canonical pattern — non-finite bootstrap SE writes NaN to the full inference tuple, never falls back to analytical). - Reporting: `summary()` renders negative-keyed placebo rows alongside positive event-study rows in each path block; `to_dataframe(level="by_path")` emits negative-horizon rows; footer aggregate predicate covers the new surface. - R-parity: extend `extract_dcdh_by_path` with `n_placebos` param; new `multi_path_reversible_by_path_placebo` scenario in the R generator and `dcdh_dynr_golden_values.json`. Per-`(path, lag)` point estimates match R exactly; SE within Phase-2 envelope (~5% rtol). New parity class `TestDCDHDynRParityByPathPlacebo`. SE inherits the cross-path cohort-sharing deviation from R already documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges on cohort-mixed panels. Bootstrap SE is a Monte Carlo analog of the analytical SE — same per-path centered IF input — and inherits the same deviation. Tests: - `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (8 analytical invariants + 5 bootstrap subclass tests under `@pytest.mark.slow`). - `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` (R-parity on `multi_path_reversible_by_path_placebo`). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-25T12:58:01Z

Overall assessment

✅ Looks good — no unmitigated P0/P1 findings in the changed dCDH estimator, weighting, or bootstrap/SE code paths.

Executive summary

Affected methods: dCDH by_path backward-horizon placebos DID^{pl}_{path,l} and the new per-path bootstrap inference surface.
The implementation follows the existing joiners/leavers IF precedent and the already-documented full-panel cohort-centering deviation from R; I did not find an undocumented estimator or variance mismatch in the new code paths.
The bootstrap propagation on the new surface is NaN-consistent and derives t_stat through safe_inference, so it avoids the project’s known inline-inference and partial-NaN anti-patterns.
Remaining issues are P3 only: one contradiction in REGISTRY.md, and some documentation/tests that do not fully document or pin the new public surface.
Static review only: I could not run pytest here because the sandbox Python is missing the repo runtime dependencies.

Methodology

Affected code paths are the analytical per-path placebo computation and bootstrap propagation in diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5578, diff_diff/chaisemartin_dhaultfoeuille.py:L5705-L5829, diff_diff/chaisemartin_dhaultfoeuille.py:L2941-L2987, and diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779. Against docs/methodology/REGISTRY.md and the in-code docstrings, I did not find a P0/P1 methodology defect.

Severity P3 | Impact: REGISTRY.md now says both that placebos are “not computed per path” and, later in the same note, that per-path placebos are supported. Because the registry is the project’s methodology source of truth, this contradiction can make the new surface look undocumented to future reviewers. Concrete fix: delete or rewrite the stale sentence so only the TWFE diagnostic remains sample-level. Reference: docs/methodology/REGISTRY.md:L641-L641.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not see a new deferrable issue that needed to be tracked in TODO.md.

Security

No findings.

Documentation/Tests

Severity P3 | Impact: the new public surface is not fully documented or tightly pinned by tests. The results docstring documents path_effects but omits path_placebo_event_study, and two new tests overstate what they verify: test_attr_is_none_when_placebo_false changes fixtures between the placebo-off and placebo-on branches, and test_path_placebo_point_estimate_within_path_mean only checks finiteness rather than equality to a within-path mean. Concrete fix: document path_placebo_event_study in the results class and tighten the tests to reuse one fixture and assert an explicit per-path placebo mean identity. References: diff_diff/chaisemartin_dhaultfoeuille_results.py:L370-L390, tests/test_chaisemartin_dhaultfoeuille.py:L5071-L5078, tests/test_chaisemartin_dhaultfoeuille.py:L5091-L5106.

…ning P3 fixes from CI reviewer round 1: - REGISTRY.md `Note (Phase 3 by_path ...)`: drop "Placebos" from the "remain sample-level summaries" sentence (TWFE diagnostic stays sample-level; placebos are now per-path under the new sub-bullet). Resolves the in-note contradiction the reviewer flagged. - ChaisemartinDHaultfoeuilleResults docstring: add the `path_placebo_event_study` Attributes block entry alongside `path_effects`, documenting the negative-int inner-key convention and the inherited cross-path cohort-sharing deviation. - TestByPathPlacebo.test_attr_is_none_when_placebo_false: use the same `_by_path_placebo_data()` fixture for both placebo-off and placebo-on branches, so the difference is attributable solely to the `placebo` flag (not a fixture swap). - TestByPathPlacebo.test_path_placebo_point_estimate_within_path_mean: replace the finiteness-only check with an explicit recomputation of the within-path-mean DID^pl identity from the raw data, asserting equality at atol=1e-10 / rtol=1e-10. Pins the estimand identity against silent regressions in the per-path IF construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T13:04:15Z

/ai-review

github-actions · 2026-04-25T13:17:18Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e405aaf174a57e338e47d4e8ee5170828ffe94c4

Overall Assessment

✅ Looks good: no unmitigated P0/P1 findings in the changed dCDH estimator, variance, or bootstrap/inference paths.

Executive Summary

Affected methods: dCDH by_path + placebo per-path backward-horizon placebos DID^{pl}_{path,l} and the sibling bootstrap propagation on results.path_placebo_event_study[path][-l] in diff_diff/chaisemartin_dhaultfoeuille.py:L2102-L2125, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5829, and diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779.
Methodology check passed: the new path-placebo estimator follows the documented joiners/leavers-style IF construction, path-specific divisor N^{pl}_{l,path}, and the already-documented full-panel cohort-sharing SE deviation from R in docs/methodology/REGISTRY.md:L641-L641.
The new bootstrap surface is NaN-consistent and uses safe_inference rather than inline effect / se, so it avoids the project’s known P0/P1 inference anti-patterns (diff_diff/chaisemartin_dhaultfoeuille.py:L2941-L2987, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779, tests/test_chaisemartin_dhaultfoeuille.py:L5338-L5404).
The prior re-review P3s are addressed: the REGISTRY note no longer contradicts itself about per-path placebos, the results docstring now documents path_placebo_event_study, and the tests now reuse one fixture and assert the explicit within-path placebo identity (docs/methodology/REGISTRY.md:L641-L641, diff_diff/chaisemartin_dhaultfoeuille_results.py:L377-L390, tests/test_chaisemartin_dhaultfoeuille.py:L5071-L5177).
Remaining issue is P3 only: the new golden parity scenario’s params.n_groups metadata is misleading relative to the serialized dataset size (benchmarks/R/generate_dcdh_dynr_test_values.R:L675-L689, benchmarks/data/dcdh_dynr_golden_values.json:L760-L775).
Static review only: pytest is not installed here, and the local Python environment also lacks numpy, so the new tests could not be run in this environment.

Methodology

No findings. The new per-path placebo analytical path and bootstrap propagation match the documented Phase 3 by_path extension, and I did not find an undocumented estimator, control-pool, weighting, or SE mismatch in diff_diff/chaisemartin_dhaultfoeuille.py:L424-L432, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5925, and docs/methodology/REGISTRY.md:L641-L641.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. Nothing new appears to require TODO.md tracking.

Security

No findings.

Documentation/Tests

Severity P3 | Impact: The newly added multi_path_reversible_by_path_placebo benchmark metadata says params.n_groups = 80, but the serialized dataset contains 120 unique groups because gen_reversible() appends stable control cohorts after the initial switcher allocation. This does not affect the current parity test, which reads data rather than params, but it makes the new golden scenario misleading for anyone reusing its metadata. Concrete fix: store the realized post-append group count in params.n_groups, or rename the field to n_switcher_groups consistently across the by-path golden scenarios. References: benchmarks/R/generate_dcdh_dynr_test_values.R:L187-L194, benchmarks/R/generate_dcdh_dynr_test_values.R:L675-L689, benchmarks/data/dcdh_dynr_golden_values.json:L760-L775.

P3 fix: scenario 15's `params.n_groups` was 80 (the switcher cohort allocator input fed into `gen_reversible(n_groups=N_GOLDEN, ...)`) while the realized panel actually contains 120 groups (80 switchers + 20 never-treated + 20 always-treated, appended by `gen_reversible`'s default cohort additions at line 64). Replace with two explicit fields: - `n_switcher_groups = 80`: the load-bearing DGP allocator input - `n_realized_groups = 120`: the actual unique-group count in the serialized data The parity test reads the `data` block directly, not `params`, so it is unaffected by this metadata change. Resolves the misleading-metadata finding the reviewer flagged on R2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T13:25:44Z

/ai-review

github-actions · 2026-04-25T13:31:08Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f70c17f721db2f11f7b9b5dce6800acb6e220e78

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in the changed dCDH by_path + placebo estimator, variance, or bootstrap/inference paths.

Executive Summary

Re-review focus: the prior P3 about misleading golden-fixture metadata appears resolved. Scenario 15 now records both switcher-cohort count and realized panel size in benchmarks/R/generate_dcdh_dynr_test_values.R:L684-L699 and benchmarks/data/dcdh_dynr_golden_values.json:L767-L770.
Affected methodology surface: analytical per-path backward placebos plus bootstrap propagation and results rendering in diff_diff/chaisemartin_dhaultfoeuille.py:L2102-L2125, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5829, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779, and diff_diff/chaisemartin_dhaultfoeuille_results.py:L377-L390.
Methodology cross-check passed: the new code matches the Methodology Registry’s documented joiners/leavers-style IF construction, path-specific divisor N^{pl}_{l,path}, and the documented cross-path cohort-sharing SE deviation from R in docs/methodology/REGISTRY.md:L641.
Edge-case handling looks correct on the changed surface: zero-eligible path/lag cells return NaN/n_obs=0, and bootstrap propagation uses the all-or-nothing NaN contract rather than partial inference fields (diff_diff/chaisemartin_dhaultfoeuille.py:L2941-L2987, tests/test_chaisemartin_dhaultfoeuille.py:L5338-L5404).
One P3 maintainability nit remains: an internal docstring in _collect_path_placebo_bootstrap_inputs() still describes a nonexistent ["horizons"] wrapper (diff_diff/chaisemartin_dhaultfoeuille.py:L5736-L5738).
Static review only: I could not execute pytest here because this environment lacks pytest, numpy, and pandas.

Methodology

No findings. The new per-path placebo path is consistent with the documented estimator/control-pool/SE contract in docs/methodology/REGISTRY.md:L641 and the implementation at diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5578, diff_diff/chaisemartin_dhaultfoeuille.py:L5832-L5959.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity P3 | Impact: _collect_path_placebo_bootstrap_inputs() documents the analytical-results shape as path_placebos[path]["horizons"][-lag_l]["effect"], but the implementation actually uses path_placebos[path][-lag_l]["effect"]. This is harmless at runtime, but it can mislead future maintenance and reviews. Concrete fix: update the docstring at diff_diff/chaisemartin_dhaultfoeuille.py:L5736-L5738 to match the actual access pattern at diff_diff/chaisemartin_dhaultfoeuille.py:L5818-L5824.

Tech Debt

No findings. I did not identify any new deferrable item that needs TODO.md tracking.

Security

No findings.

Documentation/Tests

No findings. The new unit/parity coverage is appropriately targeted for the changed surface in tests/test_chaisemartin_dhaultfoeuille.py:L4995-L5424 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L635-L753.
Review note: I could not re-run the tests in this environment because pytest, numpy, and pandas are not installed.

P3 fix: `_collect_path_placebo_bootstrap_inputs` docstring described the analytical-results shape as `path_placebos[path]["horizons"][-lag_l]["effect"]` but the actual access is `path_placebos[path][-lag_l]["effect"]` (no `["horizons"]` wrapper -- `_compute_path_placebos` returns the negative-keyed inner dict directly, intentionally diverging from `_compute_path_effects`'s `["horizons"]` nesting). Update the docstring to match the actual access pattern at the implementation site (`:5818-5824`). Harmless at runtime; the fix is to prevent the comment from misleading future maintenance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T13:36:52Z

/ai-review

github-actions · 2026-04-25T13:44:12Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3ba7f6d228cd62177dc36b49ce4c1c8187983497

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in the changed ChaisemartinDHaultfoeuille by_path + placebo estimator, variance, bootstrap, or results surfaces.

Executive Summary

Re-review scope: the prior P3 docstring mismatch in _collect_path_placebo_bootstrap_inputs() appears resolved at diff_diff/chaisemartin_dhaultfoeuille.py:L5705-L5745.
Affected method(s): the PR extends the dCDH Phase 3 by_path surface to per-path dynamic placebos DID^{pl}_{path,l} and threads that surface through bootstrap inference and results rendering at diff_diff/chaisemartin_dhaultfoeuille.py:L2102-L2125, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5578, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779, and diff_diff/chaisemartin_dhaultfoeuille_results.py:L1141-L1217.
Methodology cross-check passed: the implementation matches the Registry’s documented library extension and documented cross-path cohort-sharing SE deviation from R at docs/methodology/REGISTRY.md:L641.
Edge-case handling on the changed surface looks correct: zero-eligible (path, lag) cells return NaN/n_obs=0, and bootstrap propagation uses the all-or-nothing NaN contract rather than partial inference fields at diff_diff/chaisemartin_dhaultfoeuille.py:L5529-L5537 and diff_diff/chaisemartin_dhaultfoeuille.py:L2941-L2987.
One P3 doc/test gap remains: the new path_placebo_event_study attribute implements the {} empty-state sentinel, but that contract is not documented or regression-tested on the new surface.
Static review only: I could not execute pytest because this environment does not have pytest, numpy, or pandas.

Methodology

No findings. The affected method is the dCDH Phase 3 by_path per-path placebo extension; the backward-horizon IF, unchanged control-pool logic, path-specific divisor N^{pl}_{l,path}, and inherited cross-path cohort-sharing SE deviation align with docs/methodology/REGISTRY.md:L641, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5578, diff_diff/chaisemartin_dhaultfoeuille.py:L5835-L5962, and diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings. The previously flagged misleading ["horizons"] docstring reference is fixed at diff_diff/chaisemartin_dhaultfoeuille.py:L5736-L5742.

Tech Debt

No findings. I did not identify any new deferrable item that needs TODO.md tracking.

Security

No findings.

Documentation/Tests

Severity P3 | Impact: the new path_placebo_event_study surface uses {} when by_path is requested but no path has a complete window, but the results docstring only documents None vs populated, and the new tests cover the analogous path_effects empty-state contract but not the placebo sibling. Downstream callers can mis-handle the attribute by checking only is None, and regressions on that empty-state contract would not be pinned. Concrete fix: update diff_diff/chaisemartin_dhaultfoeuille_results.py:L377-L390 (and ideally docs/methodology/REGISTRY.md:L641) to document None = not requested / {} = requested but empty for path_placebo_event_study, and add a targeted regression next to tests/test_chaisemartin_dhaultfoeuille.py:L4015-L4111 that fits by_path + placebo on the existing no-complete-window panel and asserts results.path_placebo_event_study == {}.

…ent_study P3 fix: the new `path_placebo_event_study` surface implements the None-vs-{} empty-state sentinel (parallel to `path_effects` — `_compute_path_placebos` returns `{}` when no observed path has a complete window) but the contract was not documented or regression- tested on the new sibling surface. - Result class docstring (`ChaisemartinDHaultfoeuilleResults`): add an explicit empty-state paragraph documenting `None` = not requested vs `{}` = requested but empty (mirrors the same contract on `path_effects`). - REGISTRY.md `Note (Phase 3 by_path ...)` per-path placebos paragraph: add an "Empty-state contract" sentence. - Add `TestByPathPlacebo::test_empty_path_placebo_surface_when_no_complete_window` mirroring the existing `path_effects` empty-state regression (`TestByPathEdgeCases::test_empty_path_surface_when_no_complete_window`) on the same no-complete-window panel construction (F_g=3, n_periods=4, L_max=3 → window [2, 5] extends past the panel). Asserts `path_placebo_event_study == {}` (NOT None) and confirms `path_effects == {}` parallel state so both sibling surfaces hit the same empty-state branch consistently. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-25T13:50:44Z

/ai-review

github-actions · 2026-04-25T13:55:25Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 97bc2fad4a5bbf0e970a555e464f387ba16f85dd

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in the changed ChaisemartinDHaultfoeuille by_path + placebo surface.

Executive Summary

Re-review scope: the prior P3 on documenting and testing the {} empty-state sentinel is resolved in diff_diff/chaisemartin_dhaultfoeuille_results.py:L377-L396 and tests/test_chaisemartin_dhaultfoeuille.py:L5287-L5466.
Affected methodology: the PR adds per-path backward-horizon placebos DID^{pl}_{path,l}, plus sibling bootstrap collection/propagation and results rendering at diff_diff/chaisemartin_dhaultfoeuille.py:L2102-L2125, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5832, diff_diff/chaisemartin_dhaultfoeuille.py:L2941-L2987, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1186-L1215, and diff_diff/chaisemartin_dhaultfoeuille_results.py:L1568-L1598.
Methodology cross-check passed against docs/methodology/REGISTRY.md:L641: the new point-estimate/control-pool logic mirrors the existing placebo IF, and the cross-path cohort-sharing SE divergence from R is explicitly documented, so it is not a defect.
Edge-case handling on the changed surface looks correct: zero-eligible (path, lag) cells return NaN/n_obs=0, bootstrap invalid-SE propagation writes the full inference tuple to NaN, and the new switcher_subset_mask=None default preserves legacy placebo IF behavior, covered in tests/test_chaisemartin_dhaultfoeuille.py:L5092-L5268 and tests/test_chaisemartin_dhaultfoeuille.py:L5400-L5466.
R-parity coverage for the new surface is present at tests/test_chaisemartin_dhaultfoeuille_parity.py:L635-L753.
Static review only: I could not execute the targeted tests because pytest is not installed in this environment (/bin/bash: pytest: command not found).

Methodology

Severity P3-informational | Impact: The changed SE convention for per-path placebos inherits the already-documented full-panel cohort-sharing deviation from R; the implementation and Registry are aligned at docs/methodology/REGISTRY.md:L641 and diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5578. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not identify a new deferrable item that needs TODO.md tracking.

Security

No findings.

Documentation/Tests

No findings. The prior re-review P3 is addressed by the updated results docstring and Registry note at diff_diff/chaisemartin_dhaultfoeuille_results.py:L377-L396 and docs/methodology/REGISTRY.md:L641, plus targeted regression/parity coverage at tests/test_chaisemartin_dhaultfoeuille.py:L5287-L5466 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L635-L753.

When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are now computed across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all valid horizons, producing correlated bootstrap distributions. The path-specific critical value `c_p = quantile(max_l |t_l|, 1-α)` is applied per horizon as `cband_conf_int = (eff - c_p·se, eff + c_p·se)` and surfaced at top level as `results.path_sup_t_bands[path]`. Closes Wave 2 #4 of the by_path follow-up sequence (#357 foundation, #360 R-parity, #364 bootstrap, #371 placebos). **Methodology asymmetry vs OVERALL** (intentional, documented): per-path sup-t draws fresh shared weights AFTER the per-path SE bootstrap block has populated `path_ses` via independent per-(path, horizon) draws. Asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Preserves RNG-state isolation for existing per-path SE seed-reproducibility tests. **Gates** mirror OVERALL: `>=2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Otherwise the path is absent from `path_sup_t_bands`. **Empty-state contract**: `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates (covers two cases: `path_effects == {}` upstream OR all paths fail gates downstream). **Deviation from R**: `did_multiplegt_dyn` provides no joint / sup-t bands at any surface — Python-only methodology extension consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Bundled pre-audit fix** (sibling-surface check): the existing OVERALL `sup_t_bands` field's stale "Phase 2 placeholder" docstring updated to the actual contract description. Tests: new `TestByPathSupTBands` class with 13 tests covering: attr None when no bootstrap / no by_path; keys match `path_effects` with finite crit; band wider than pointwise; crit finite and positive; seed reproducibility; single-horizon-path-skip; L_max=1 skip; n_valid_horizons matches; absent-path-no-cband-keys; summary renders; empty-dict-when-no-complete-window; strict-majority-gate-at-exact-50pct (monkeypatches the weight generator to inject NaN into half the bootstrap rows, asserting both `sup_t_bands is None` and `path_sup_t_bands == {}` at the boundary). All `@pytest.mark.slow`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PR #357 shipped by_path foundation; PRs #364/#371/#374 completed the inference surface (bootstrap, placebos, sup-t bands). Wave 3 begins design-variant extensions; this PR is item #5: combine by_path=k with controls=[...] (DID^X). Architecture: the per-baseline OLS residualization at chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel BEFORE path enumeration, so all four downstream surfaces (analytical SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized Y_mat automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing controls + per-period DID contract. Canonical R behavior: `did_multiplegt_dyn(..., by_path=k, controls=...)` re-runs the per-baseline residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls). On the multi_path_reversible DGP all switchers share baseline D_{g,1}=0, so R's per-path control pool equals our global control pool and the residualization coefficients coincide. Per-path point estimates match R exactly (rtol ~1e-11); per-path SE within ~6.5% (Phase 2 envelope, inheriting the documented cross-path cohort- sharing deviation). Changes: - Delete the gate at chaisemartin_dhaultfoeuille.py:988-992 - Update by_path docstring (remove `controls` from incompatible list, add inheritance paragraph) - New R parity scenario `multi_path_reversible_by_path_controls` in benchmarks/R/generate_dcdh_dynr_test_values.R + regenerated golden values - New TestDCDHDynRParityByPathControls in tests/test_chaisemartin_dhaultfoeuille_parity.py - New TestByPathControls in tests/test_chaisemartin_dhaultfoeuille.py (12 tests covering analytical / bootstrap / placebo / sup-t / cband to_dataframe / per-period unadjusted / covariate_residuals round- trip / multi-covariate) - Remove the `controls` parametrize entry from TestByPathGates::test_forbids_phase3_fit_kwargs - Update REGISTRY.md (remove `controls` from gated-combos list, add inheritance sub-paragraph documenting the four-surface auto- inheritance) - CHANGELOG: Unreleased > Added entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026

igerber merged commit 2899e6c into main Apr 25, 2026
23 of 24 checks passed

igerber deleted the dcdh-by-path-placebo branch April 25, 2026 15:39

igerber mentioned this pull request Apr 25, 2026

Add per-path joint sup-t bands to ChaisemartinDHaultfoeuille.by_path #374

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dCDH by_path + placebo: per-path backward-horizon placebos (Wave 2 #3)#371

dCDH by_path + placebo: per-path backward-horizon placebos (Wave 2 #3)#371
igerber merged 5 commits intomainfrom
dcdh-by-path-placebo

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 25, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 25, 2026

Overall assessment

Executive summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

igerber commented Apr 25, 2026

Uh oh!

github-actions Bot commented Apr 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant