Skip to content

dCDH by_path + placebo: per-path backward-horizon placebos (Wave 2 #3)#371

Merged
igerber merged 5 commits intomainfrom
dcdh-by-path-placebo
Apr 25, 2026
Merged

dCDH by_path + placebo: per-path backward-horizon placebos (Wave 2 #3)#371
igerber merged 5 commits intomainfrom
dcdh-by-path-placebo

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 25, 2026

Summary

Test plan

  • pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo -v (8 analytical tests pass)
  • pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo::TestBootstrap -v -m slow (5 bootstrap tests pass)
  • pytest tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo -v (R-parity confirmed; point estimates exact, SE within ~5% rtol)
  • Full dCDH suite regression: pytest tests/test_chaisemartin_dhaultfoeuille.py tests/test_chaisemartin_dhaultfoeuille_parity.py tests/test_methodology_chaisemartin_dhaultfoeuille.py (224 passed)
  • Slow suite: 21 passed
  • R fixture regenerated via Rscript benchmarks/R/generate_dcdh_dynr_test_values.R
  • ruff check + black --check on changed methodology files: clean

🤖 Generated with Claude Code

Wave 2 item 3 of the by_path follow-up. Adds per-path backward-horizon
placebos `DID^{pl}_{path, l}` for `l = 1..L_max` under the existing
joiners/leavers IF precedent applied backward, surfaced on the new
`results.path_placebo_event_study[path][-l]` attribute (negative-int
keys mirroring `placebo_event_study`).

Bundled scope:
- Analytical: extend `_compute_per_group_if_placebo_horizon` with
  `switcher_subset_mask` (parallel to the PR #357 multi_horizon
  extension); new `_compute_path_placebos` sibling helper of
  `_compute_path_effects`; cohort-recentered plug-in SE with path-
  specific divisor `N^{pl}_{l, path}`.
- Bootstrap: new `_collect_path_placebo_bootstrap_inputs` collector
  + `_compute_dcdh_bootstrap` per-`(path, lag_l)` dispatch reusing
  `_bootstrap_one_target`; bootstrap propagation block in fit()
  enforcing the library-wide NaN-on-invalid contract from PR #364
  (canonical pattern — non-finite bootstrap SE writes NaN to the
  full inference tuple, never falls back to analytical).
- Reporting: `summary()` renders negative-keyed placebo rows
  alongside positive event-study rows in each path block;
  `to_dataframe(level="by_path")` emits negative-horizon rows;
  footer aggregate predicate covers the new surface.
- R-parity: extend `extract_dcdh_by_path` with `n_placebos` param;
  new `multi_path_reversible_by_path_placebo` scenario in the R
  generator and `dcdh_dynr_golden_values.json`. Per-`(path, lag)`
  point estimates match R exactly; SE within Phase-2 envelope (~5%
  rtol). New parity class `TestDCDHDynRParityByPathPlacebo`.

SE inherits the cross-path cohort-sharing deviation from R already
documented for `path_effects` (full-panel cohort-centered plug-in vs
R's per-path re-run): tracks R within tolerance on single-path-cohort
panels, diverges on cohort-mixed panels. Bootstrap SE is a Monte Carlo
analog of the analytical SE — same per-path centered IF input — and
inherits the same deviation.

Tests:
- `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (8
  analytical invariants + 5 bootstrap subclass tests under
  `@pytest.mark.slow`).
- `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo`
  (R-parity on `multi_path_reversible_by_path_placebo`).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall assessment

✅ Looks good — no unmitigated P0/P1 findings in the changed dCDH estimator, weighting, or bootstrap/SE code paths.

Executive summary

  • Affected methods: dCDH by_path backward-horizon placebos DID^{pl}_{path,l} and the new per-path bootstrap inference surface.
  • The implementation follows the existing joiners/leavers IF precedent and the already-documented full-panel cohort-centering deviation from R; I did not find an undocumented estimator or variance mismatch in the new code paths.
  • The bootstrap propagation on the new surface is NaN-consistent and derives t_stat through safe_inference, so it avoids the project’s known inline-inference and partial-NaN anti-patterns.
  • Remaining issues are P3 only: one contradiction in REGISTRY.md, and some documentation/tests that do not fully document or pin the new public surface.
  • Static review only: I could not run pytest here because the sandbox Python is missing the repo runtime dependencies.

Methodology

Affected code paths are the analytical per-path placebo computation and bootstrap propagation in diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5578, diff_diff/chaisemartin_dhaultfoeuille.py:L5705-L5829, diff_diff/chaisemartin_dhaultfoeuille.py:L2941-L2987, and diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779. Against docs/methodology/REGISTRY.md and the in-code docstrings, I did not find a P0/P1 methodology defect.

  • Severity P3 | Impact: REGISTRY.md now says both that placebos are “not computed per path” and, later in the same note, that per-path placebos are supported. Because the registry is the project’s methodology source of truth, this contradiction can make the new surface look undocumented to future reviewers. Concrete fix: delete or rewrite the stale sentence so only the TWFE diagnostic remains sample-level. Reference: docs/methodology/REGISTRY.md:L641-L641.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. I did not see a new deferrable issue that needed to be tracked in TODO.md.

Security

  • No findings.

Documentation/Tests

  • Severity P3 | Impact: the new public surface is not fully documented or tightly pinned by tests. The results docstring documents path_effects but omits path_placebo_event_study, and two new tests overstate what they verify: test_attr_is_none_when_placebo_false changes fixtures between the placebo-off and placebo-on branches, and test_path_placebo_point_estimate_within_path_mean only checks finiteness rather than equality to a within-path mean. Concrete fix: document path_placebo_event_study in the results class and tighten the tests to reuse one fixture and assert an explicit per-path placebo mean identity. References: diff_diff/chaisemartin_dhaultfoeuille_results.py:L370-L390, tests/test_chaisemartin_dhaultfoeuille.py:L5071-L5078, tests/test_chaisemartin_dhaultfoeuille.py:L5091-L5106.

…ning

P3 fixes from CI reviewer round 1:

- REGISTRY.md `Note (Phase 3 by_path ...)`: drop "Placebos" from the
  "remain sample-level summaries" sentence (TWFE diagnostic stays
  sample-level; placebos are now per-path under the new sub-bullet).
  Resolves the in-note contradiction the reviewer flagged.
- ChaisemartinDHaultfoeuilleResults docstring: add the
  `path_placebo_event_study` Attributes block entry alongside
  `path_effects`, documenting the negative-int inner-key convention
  and the inherited cross-path cohort-sharing deviation.
- TestByPathPlacebo.test_attr_is_none_when_placebo_false: use the
  same `_by_path_placebo_data()` fixture for both placebo-off and
  placebo-on branches, so the difference is attributable solely to
  the `placebo` flag (not a fixture swap).
- TestByPathPlacebo.test_path_placebo_point_estimate_within_path_mean:
  replace the finiteness-only check with an explicit recomputation of
  the within-path-mean DID^pl identity from the raw data, asserting
  equality at atol=1e-10 / rtol=1e-10. Pins the estimand identity
  against silent regressions in the per-path IF construction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e405aaf174a57e338e47d4e8ee5170828ffe94c4


Overall Assessment

✅ Looks good: no unmitigated P0/P1 findings in the changed dCDH estimator, variance, or bootstrap/inference paths.

Executive Summary

  • Affected methods: dCDH by_path + placebo per-path backward-horizon placebos DID^{pl}_{path,l} and the sibling bootstrap propagation on results.path_placebo_event_study[path][-l] in diff_diff/chaisemartin_dhaultfoeuille.py:L2102-L2125, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5829, and diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779.
  • Methodology check passed: the new path-placebo estimator follows the documented joiners/leavers-style IF construction, path-specific divisor N^{pl}_{l,path}, and the already-documented full-panel cohort-sharing SE deviation from R in docs/methodology/REGISTRY.md:L641-L641.
  • The new bootstrap surface is NaN-consistent and uses safe_inference rather than inline effect / se, so it avoids the project’s known P0/P1 inference anti-patterns (diff_diff/chaisemartin_dhaultfoeuille.py:L2941-L2987, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779, tests/test_chaisemartin_dhaultfoeuille.py:L5338-L5404).
  • The prior re-review P3s are addressed: the REGISTRY note no longer contradicts itself about per-path placebos, the results docstring now documents path_placebo_event_study, and the tests now reuse one fixture and assert the explicit within-path placebo identity (docs/methodology/REGISTRY.md:L641-L641, diff_diff/chaisemartin_dhaultfoeuille_results.py:L377-L390, tests/test_chaisemartin_dhaultfoeuille.py:L5071-L5177).
  • Remaining issue is P3 only: the new golden parity scenario’s params.n_groups metadata is misleading relative to the serialized dataset size (benchmarks/R/generate_dcdh_dynr_test_values.R:L675-L689, benchmarks/data/dcdh_dynr_golden_values.json:L760-L775).
  • Static review only: pytest is not installed here, and the local Python environment also lacks numpy, so the new tests could not be run in this environment.

Methodology

  • No findings. The new per-path placebo analytical path and bootstrap propagation match the documented Phase 3 by_path extension, and I did not find an undocumented estimator, control-pool, weighting, or SE mismatch in diff_diff/chaisemartin_dhaultfoeuille.py:L424-L432, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5925, and docs/methodology/REGISTRY.md:L641-L641.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. Nothing new appears to require TODO.md tracking.

Security

  • No findings.

Documentation/Tests

  • Severity P3 | Impact: The newly added multi_path_reversible_by_path_placebo benchmark metadata says params.n_groups = 80, but the serialized dataset contains 120 unique groups because gen_reversible() appends stable control cohorts after the initial switcher allocation. This does not affect the current parity test, which reads data rather than params, but it makes the new golden scenario misleading for anyone reusing its metadata. Concrete fix: store the realized post-append group count in params.n_groups, or rename the field to n_switcher_groups consistently across the by-path golden scenarios. References: benchmarks/R/generate_dcdh_dynr_test_values.R:L187-L194, benchmarks/R/generate_dcdh_dynr_test_values.R:L675-L689, benchmarks/data/dcdh_dynr_golden_values.json:L760-L775.

P3 fix: scenario 15's `params.n_groups` was 80 (the switcher cohort
allocator input fed into `gen_reversible(n_groups=N_GOLDEN, ...)`)
while the realized panel actually contains 120 groups (80 switchers +
20 never-treated + 20 always-treated, appended by `gen_reversible`'s
default cohort additions at line 64).

Replace with two explicit fields:
- `n_switcher_groups = 80`: the load-bearing DGP allocator input
- `n_realized_groups = 120`: the actual unique-group count in the
  serialized data

The parity test reads the `data` block directly, not `params`, so it
is unaffected by this metadata change. Resolves the misleading-metadata
finding the reviewer flagged on R2.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f70c17f721db2f11f7b9b5dce6800acb6e220e78


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in the changed dCDH by_path + placebo estimator, variance, or bootstrap/inference paths.

Executive Summary

  • Re-review focus: the prior P3 about misleading golden-fixture metadata appears resolved. Scenario 15 now records both switcher-cohort count and realized panel size in benchmarks/R/generate_dcdh_dynr_test_values.R:L684-L699 and benchmarks/data/dcdh_dynr_golden_values.json:L767-L770.
  • Affected methodology surface: analytical per-path backward placebos plus bootstrap propagation and results rendering in diff_diff/chaisemartin_dhaultfoeuille.py:L2102-L2125, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5829, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779, and diff_diff/chaisemartin_dhaultfoeuille_results.py:L377-L390.
  • Methodology cross-check passed: the new code matches the Methodology Registry’s documented joiners/leavers-style IF construction, path-specific divisor N^{pl}_{l,path}, and the documented cross-path cohort-sharing SE deviation from R in docs/methodology/REGISTRY.md:L641.
  • Edge-case handling looks correct on the changed surface: zero-eligible path/lag cells return NaN/n_obs=0, and bootstrap propagation uses the all-or-nothing NaN contract rather than partial inference fields (diff_diff/chaisemartin_dhaultfoeuille.py:L2941-L2987, tests/test_chaisemartin_dhaultfoeuille.py:L5338-L5404).
  • One P3 maintainability nit remains: an internal docstring in _collect_path_placebo_bootstrap_inputs() still describes a nonexistent ["horizons"] wrapper (diff_diff/chaisemartin_dhaultfoeuille.py:L5736-L5738).
  • Static review only: I could not execute pytest here because this environment lacks pytest, numpy, and pandas.

Methodology

  • No findings. The new per-path placebo path is consistent with the documented estimator/control-pool/SE contract in docs/methodology/REGISTRY.md:L641 and the implementation at diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5578, diff_diff/chaisemartin_dhaultfoeuille.py:L5832-L5959.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity P3 | Impact: _collect_path_placebo_bootstrap_inputs() documents the analytical-results shape as path_placebos[path]["horizons"][-lag_l]["effect"], but the implementation actually uses path_placebos[path][-lag_l]["effect"]. This is harmless at runtime, but it can mislead future maintenance and reviews. Concrete fix: update the docstring at diff_diff/chaisemartin_dhaultfoeuille.py:L5736-L5738 to match the actual access pattern at diff_diff/chaisemartin_dhaultfoeuille.py:L5818-L5824.

Tech Debt

  • No findings. I did not identify any new deferrable item that needs TODO.md tracking.

Security

  • No findings.

Documentation/Tests

  • No findings. The new unit/parity coverage is appropriately targeted for the changed surface in tests/test_chaisemartin_dhaultfoeuille.py:L4995-L5424 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L635-L753.
  • Review note: I could not re-run the tests in this environment because pytest, numpy, and pandas are not installed.

P3 fix: `_collect_path_placebo_bootstrap_inputs` docstring described
the analytical-results shape as
`path_placebos[path]["horizons"][-lag_l]["effect"]` but the actual
access is `path_placebos[path][-lag_l]["effect"]` (no `["horizons"]`
wrapper -- `_compute_path_placebos` returns the negative-keyed inner
dict directly, intentionally diverging from `_compute_path_effects`'s
`["horizons"]` nesting). Update the docstring to match the actual
access pattern at the implementation site (`:5818-5824`).

Harmless at runtime; the fix is to prevent the comment from misleading
future maintenance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3ba7f6d228cd62177dc36b49ce4c1c8187983497


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in the changed ChaisemartinDHaultfoeuille by_path + placebo estimator, variance, bootstrap, or results surfaces.

Executive Summary

  • Re-review scope: the prior P3 docstring mismatch in _collect_path_placebo_bootstrap_inputs() appears resolved at diff_diff/chaisemartin_dhaultfoeuille.py:L5705-L5745.
  • Affected method(s): the PR extends the dCDH Phase 3 by_path surface to per-path dynamic placebos DID^{pl}_{path,l} and threads that surface through bootstrap inference and results rendering at diff_diff/chaisemartin_dhaultfoeuille.py:L2102-L2125, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5578, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779, and diff_diff/chaisemartin_dhaultfoeuille_results.py:L1141-L1217.
  • Methodology cross-check passed: the implementation matches the Registry’s documented library extension and documented cross-path cohort-sharing SE deviation from R at docs/methodology/REGISTRY.md:L641.
  • Edge-case handling on the changed surface looks correct: zero-eligible (path, lag) cells return NaN/n_obs=0, and bootstrap propagation uses the all-or-nothing NaN contract rather than partial inference fields at diff_diff/chaisemartin_dhaultfoeuille.py:L5529-L5537 and diff_diff/chaisemartin_dhaultfoeuille.py:L2941-L2987.
  • One P3 doc/test gap remains: the new path_placebo_event_study attribute implements the {} empty-state sentinel, but that contract is not documented or regression-tested on the new surface.
  • Static review only: I could not execute pytest because this environment does not have pytest, numpy, or pandas.

Methodology

  • No findings. The affected method is the dCDH Phase 3 by_path per-path placebo extension; the backward-horizon IF, unchanged control-pool logic, path-specific divisor N^{pl}_{l,path}, and inherited cross-path cohort-sharing SE deviation align with docs/methodology/REGISTRY.md:L641, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5578, diff_diff/chaisemartin_dhaultfoeuille.py:L5835-L5962, and diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings. The previously flagged misleading ["horizons"] docstring reference is fixed at diff_diff/chaisemartin_dhaultfoeuille.py:L5736-L5742.

Tech Debt

  • No findings. I did not identify any new deferrable item that needs TODO.md tracking.

Security

  • No findings.

Documentation/Tests

  • Severity P3 | Impact: the new path_placebo_event_study surface uses {} when by_path is requested but no path has a complete window, but the results docstring only documents None vs populated, and the new tests cover the analogous path_effects empty-state contract but not the placebo sibling. Downstream callers can mis-handle the attribute by checking only is None, and regressions on that empty-state contract would not be pinned. Concrete fix: update diff_diff/chaisemartin_dhaultfoeuille_results.py:L377-L390 (and ideally docs/methodology/REGISTRY.md:L641) to document None = not requested / {} = requested but empty for path_placebo_event_study, and add a targeted regression next to tests/test_chaisemartin_dhaultfoeuille.py:L4015-L4111 that fits by_path + placebo on the existing no-complete-window panel and asserts results.path_placebo_event_study == {}.

…ent_study

P3 fix: the new `path_placebo_event_study` surface implements the
None-vs-{} empty-state sentinel (parallel to `path_effects` —
`_compute_path_placebos` returns `{}` when no observed path has a
complete window) but the contract was not documented or regression-
tested on the new sibling surface.

- Result class docstring (`ChaisemartinDHaultfoeuilleResults`):
  add an explicit empty-state paragraph documenting `None` = not
  requested vs `{}` = requested but empty (mirrors the same
  contract on `path_effects`).
- REGISTRY.md `Note (Phase 3 by_path ...)` per-path placebos
  paragraph: add an "Empty-state contract" sentence.
- Add `TestByPathPlacebo::test_empty_path_placebo_surface_when_no_complete_window`
  mirroring the existing `path_effects` empty-state regression
  (`TestByPathEdgeCases::test_empty_path_surface_when_no_complete_window`)
  on the same no-complete-window panel construction (F_g=3,
  n_periods=4, L_max=3 → window [2, 5] extends past the panel).
  Asserts `path_placebo_event_study == {}` (NOT None) and confirms
  `path_effects == {}` parallel state so both sibling surfaces hit
  the same empty-state branch consistently.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 25, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 97bc2fad4a5bbf0e970a555e464f387ba16f85dd


Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings in the changed ChaisemartinDHaultfoeuille by_path + placebo surface.

Executive Summary

  • Re-review scope: the prior P3 on documenting and testing the {} empty-state sentinel is resolved in diff_diff/chaisemartin_dhaultfoeuille_results.py:L377-L396 and tests/test_chaisemartin_dhaultfoeuille.py:L5287-L5466.
  • Affected methodology: the PR adds per-path backward-horizon placebos DID^{pl}_{path,l}, plus sibling bootstrap collection/propagation and results rendering at diff_diff/chaisemartin_dhaultfoeuille.py:L2102-L2125, diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5832, diff_diff/chaisemartin_dhaultfoeuille.py:L2941-L2987, diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py:L729-L779, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1186-L1215, and diff_diff/chaisemartin_dhaultfoeuille_results.py:L1568-L1598.
  • Methodology cross-check passed against docs/methodology/REGISTRY.md:L641: the new point-estimate/control-pool logic mirrors the existing placebo IF, and the cross-path cohort-sharing SE divergence from R is explicitly documented, so it is not a defect.
  • Edge-case handling on the changed surface looks correct: zero-eligible (path, lag) cells return NaN/n_obs=0, bootstrap invalid-SE propagation writes the full inference tuple to NaN, and the new switcher_subset_mask=None default preserves legacy placebo IF behavior, covered in tests/test_chaisemartin_dhaultfoeuille.py:L5092-L5268 and tests/test_chaisemartin_dhaultfoeuille.py:L5400-L5466.
  • R-parity coverage for the new surface is present at tests/test_chaisemartin_dhaultfoeuille_parity.py:L635-L753.
  • Static review only: I could not execute the targeted tests because pytest is not installed in this environment (/bin/bash: pytest: command not found).

Methodology

  • Severity P3-informational | Impact: The changed SE convention for per-path placebos inherits the already-documented full-panel cohort-sharing deviation from R; the implementation and Registry are aligned at docs/methodology/REGISTRY.md:L641 and diff_diff/chaisemartin_dhaultfoeuille.py:L5408-L5578. Concrete fix: none.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. I did not identify a new deferrable item that needs TODO.md tracking.

Security

  • No findings.

Documentation/Tests

  • No findings. The prior re-review P3 is addressed by the updated results docstring and Registry note at diff_diff/chaisemartin_dhaultfoeuille_results.py:L377-L396 and docs/methodology/REGISTRY.md:L641, plus targeted regression/parity coverage at tests/test_chaisemartin_dhaultfoeuille.py:L5287-L5466 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L635-L753.

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 25, 2026
@igerber igerber merged commit 2899e6c into main Apr 25, 2026
23 of 24 checks passed
@igerber igerber deleted the dcdh-by-path-placebo branch April 25, 2026 15:39
igerber added a commit that referenced this pull request Apr 25, 2026
When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t
simultaneous confidence bands are now computed across horizons
`1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)`
multiplier weight matrix (using the estimator's configured
`bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path
and broadcast across all valid horizons, producing correlated
bootstrap distributions. The path-specific critical value
`c_p = quantile(max_l |t_l|, 1-α)` is applied per horizon as
`cband_conf_int = (eff - c_p·se, eff + c_p·se)` and surfaced at top
level as `results.path_sup_t_bands[path]`.

Closes Wave 2 #4 of the by_path follow-up sequence (#357 foundation,
#360 R-parity, #364 bootstrap, #371 placebos).

**Methodology asymmetry vs OVERALL** (intentional, documented):
per-path sup-t draws fresh shared weights AFTER the per-path SE
bootstrap block has populated `path_ses` via independent per-(path,
horizon) draws. Asymptotically equivalent to OVERALL's self-consistent
reuse but NOT bit-identical. Preserves RNG-state isolation for
existing per-path SE seed-reproducibility tests.

**Gates** mirror OVERALL: `>=2` valid horizons (finite bootstrap SE > 0)
AND a strict majority (more than 50%) of finite sup-t draws to receive
a band. Otherwise the path is absent from `path_sup_t_bands`.

**Empty-state contract**: `path_sup_t_bands is None` when not requested
(no bootstrap or `by_path is None`); `{}` when requested but no path
passes both gates (covers two cases: `path_effects == {}` upstream OR
all paths fail gates downstream).

**Deviation from R**: `did_multiplegt_dyn` provides no joint / sup-t
bands at any surface — Python-only methodology extension consistent
with the existing OVERALL `event_study_sup_t_bands` (also Python-only).
Inherits the cross-path cohort-sharing SE deviation from R documented
for `path_effects`.

**Bundled pre-audit fix** (sibling-surface check): the existing OVERALL
`sup_t_bands` field's stale "Phase 2 placeholder" docstring updated to
the actual contract description.

Tests: new `TestByPathSupTBands` class with 13 tests covering: attr
None when no bootstrap / no by_path; keys match `path_effects` with
finite crit; band wider than pointwise; crit finite and positive;
seed reproducibility; single-horizon-path-skip; L_max=1 skip;
n_valid_horizons matches; absent-path-no-cband-keys; summary renders;
empty-dict-when-no-complete-window; strict-majority-gate-at-exact-50pct
(monkeypatches the weight generator to inject NaN into half the
bootstrap rows, asserting both `sup_t_bands is None` and
`path_sup_t_bands == {}` at the boundary). All `@pytest.mark.slow`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 25, 2026
PR #357 shipped by_path foundation; PRs #364/#371/#374 completed the
inference surface (bootstrap, placebos, sup-t bands). Wave 3 begins
design-variant extensions; this PR is item #5: combine by_path=k with
controls=[...] (DID^X).

Architecture: the per-baseline OLS residualization at
chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel
BEFORE path enumeration, so all four downstream surfaces (analytical
SE, bootstrap SE, per-path placebos, per-path joint sup-t bands)
consume the residualized Y_mat automatically (Frisch-Waugh-Lovell).
Per-period effects remain unadjusted, consistent with the existing
controls + per-period DID contract.

Canonical R behavior: `did_multiplegt_dyn(..., by_path=k, controls=...)`
re-runs the per-baseline residualization on each path's restricted
subsample (path's switchers + same-baseline not-yet-treated controls).
On the multi_path_reversible DGP all switchers share baseline
D_{g,1}=0, so R's per-path control pool equals our global control
pool and the residualization coefficients coincide. Per-path point
estimates match R exactly (rtol ~1e-11); per-path SE within ~6.5%
(Phase 2 envelope, inheriting the documented cross-path cohort-
sharing deviation).

Changes:
- Delete the gate at chaisemartin_dhaultfoeuille.py:988-992
- Update by_path docstring (remove `controls` from incompatible list,
  add inheritance paragraph)
- New R parity scenario `multi_path_reversible_by_path_controls` in
  benchmarks/R/generate_dcdh_dynr_test_values.R + regenerated golden
  values
- New TestDCDHDynRParityByPathControls in
  tests/test_chaisemartin_dhaultfoeuille_parity.py
- New TestByPathControls in tests/test_chaisemartin_dhaultfoeuille.py
  (12 tests covering analytical / bootstrap / placebo / sup-t / cband
  to_dataframe / per-period unadjusted / covariate_residuals round-
  trip / multi-covariate)
- Remove the `controls` parametrize entry from
  TestByPathGates::test_forbids_phase3_fit_kwargs
- Update REGISTRY.md (remove `controls` from gated-combos list, add
  inheritance sub-paragraph documenting the four-surface auto-
  inheritance)
- CHANGELOG: Unreleased > Added entry

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant