dCDH by_path + placebo: per-path backward-horizon placebos (Wave 2 #3)#371
dCDH by_path + placebo: per-path backward-horizon placebos (Wave 2 #3)#371
Conversation
Wave 2 item 3 of the by_path follow-up. Adds per-path backward-horizon
placebos `DID^{pl}_{path, l}` for `l = 1..L_max` under the existing
joiners/leavers IF precedent applied backward, surfaced on the new
`results.path_placebo_event_study[path][-l]` attribute (negative-int
keys mirroring `placebo_event_study`).
Bundled scope:
- Analytical: extend `_compute_per_group_if_placebo_horizon` with
`switcher_subset_mask` (parallel to the PR #357 multi_horizon
extension); new `_compute_path_placebos` sibling helper of
`_compute_path_effects`; cohort-recentered plug-in SE with path-
specific divisor `N^{pl}_{l, path}`.
- Bootstrap: new `_collect_path_placebo_bootstrap_inputs` collector
+ `_compute_dcdh_bootstrap` per-`(path, lag_l)` dispatch reusing
`_bootstrap_one_target`; bootstrap propagation block in fit()
enforcing the library-wide NaN-on-invalid contract from PR #364
(canonical pattern — non-finite bootstrap SE writes NaN to the
full inference tuple, never falls back to analytical).
- Reporting: `summary()` renders negative-keyed placebo rows
alongside positive event-study rows in each path block;
`to_dataframe(level="by_path")` emits negative-horizon rows;
footer aggregate predicate covers the new surface.
- R-parity: extend `extract_dcdh_by_path` with `n_placebos` param;
new `multi_path_reversible_by_path_placebo` scenario in the R
generator and `dcdh_dynr_golden_values.json`. Per-`(path, lag)`
point estimates match R exactly; SE within Phase-2 envelope (~5%
rtol). New parity class `TestDCDHDynRParityByPathPlacebo`.
SE inherits the cross-path cohort-sharing deviation from R already
documented for `path_effects` (full-panel cohort-centered plug-in vs
R's per-path re-run): tracks R within tolerance on single-path-cohort
panels, diverges on cohort-mixed panels. Bootstrap SE is a Monte Carlo
analog of the analytical SE — same per-path centered IF input — and
inherits the same deviation.
Tests:
- `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (8
analytical invariants + 5 bootstrap subclass tests under
`@pytest.mark.slow`).
- `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo`
(R-parity on `multi_path_reversible_by_path_placebo`).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Overall assessment✅ Looks good — no unmitigated P0/P1 findings in the changed dCDH estimator, weighting, or bootstrap/SE code paths. Executive summary
MethodologyAffected code paths are the analytical per-path placebo computation and bootstrap propagation in
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ning P3 fixes from CI reviewer round 1: - REGISTRY.md `Note (Phase 3 by_path ...)`: drop "Placebos" from the "remain sample-level summaries" sentence (TWFE diagnostic stays sample-level; placebos are now per-path under the new sub-bullet). Resolves the in-note contradiction the reviewer flagged. - ChaisemartinDHaultfoeuilleResults docstring: add the `path_placebo_event_study` Attributes block entry alongside `path_effects`, documenting the negative-int inner-key convention and the inherited cross-path cohort-sharing deviation. - TestByPathPlacebo.test_attr_is_none_when_placebo_false: use the same `_by_path_placebo_data()` fixture for both placebo-off and placebo-on branches, so the difference is attributable solely to the `placebo` flag (not a fixture swap). - TestByPathPlacebo.test_path_placebo_point_estimate_within_path_mean: replace the finiteness-only check with an explicit recomputation of the within-path-mean DID^pl identity from the raw data, asserting equality at atol=1e-10 / rtol=1e-10. Pins the estimand identity against silent regressions in the per-path IF construction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good: no unmitigated P0/P1 findings in the changed dCDH estimator, variance, or bootstrap/inference paths. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P3 fix: scenario 15's `params.n_groups` was 80 (the switcher cohort allocator input fed into `gen_reversible(n_groups=N_GOLDEN, ...)`) while the realized panel actually contains 120 groups (80 switchers + 20 never-treated + 20 always-treated, appended by `gen_reversible`'s default cohort additions at line 64). Replace with two explicit fields: - `n_switcher_groups = 80`: the load-bearing DGP allocator input - `n_realized_groups = 120`: the actual unique-group count in the serialized data The parity test reads the `data` block directly, not `params`, so it is unaffected by this metadata change. Resolves the misleading-metadata finding the reviewer flagged on R2. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good — no unmitigated P0/P1 findings in the changed dCDH Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
P3 fix: `_collect_path_placebo_bootstrap_inputs` docstring described the analytical-results shape as `path_placebos[path]["horizons"][-lag_l]["effect"]` but the actual access is `path_placebos[path][-lag_l]["effect"]` (no `["horizons"]` wrapper -- `_compute_path_placebos` returns the negative-keyed inner dict directly, intentionally diverging from `_compute_path_effects`'s `["horizons"]` nesting). Update the docstring to match the actual access pattern at the implementation site (`:5818-5824`). Harmless at runtime; the fix is to prevent the comment from misleading future maintenance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings in the changed Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…ent_study
P3 fix: the new `path_placebo_event_study` surface implements the
None-vs-{} empty-state sentinel (parallel to `path_effects` —
`_compute_path_placebos` returns `{}` when no observed path has a
complete window) but the contract was not documented or regression-
tested on the new sibling surface.
- Result class docstring (`ChaisemartinDHaultfoeuilleResults`):
add an explicit empty-state paragraph documenting `None` = not
requested vs `{}` = requested but empty (mirrors the same
contract on `path_effects`).
- REGISTRY.md `Note (Phase 3 by_path ...)` per-path placebos
paragraph: add an "Empty-state contract" sentence.
- Add `TestByPathPlacebo::test_empty_path_placebo_surface_when_no_complete_window`
mirroring the existing `path_effects` empty-state regression
(`TestByPathEdgeCases::test_empty_path_surface_when_no_complete_window`)
on the same no-complete-window panel construction (F_g=3,
n_periods=4, L_max=3 → window [2, 5] extends past the panel).
Asserts `path_placebo_event_study == {}` (NOT None) and confirms
`path_effects == {}` parallel state so both sibling surfaces hit
the same empty-state branch consistently.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings in the changed Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are now computed across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all valid horizons, producing correlated bootstrap distributions. The path-specific critical value `c_p = quantile(max_l |t_l|, 1-α)` is applied per horizon as `cband_conf_int = (eff - c_p·se, eff + c_p·se)` and surfaced at top level as `results.path_sup_t_bands[path]`. Closes Wave 2 #4 of the by_path follow-up sequence (#357 foundation, #360 R-parity, #364 bootstrap, #371 placebos). **Methodology asymmetry vs OVERALL** (intentional, documented): per-path sup-t draws fresh shared weights AFTER the per-path SE bootstrap block has populated `path_ses` via independent per-(path, horizon) draws. Asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Preserves RNG-state isolation for existing per-path SE seed-reproducibility tests. **Gates** mirror OVERALL: `>=2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Otherwise the path is absent from `path_sup_t_bands`. **Empty-state contract**: `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates (covers two cases: `path_effects == {}` upstream OR all paths fail gates downstream). **Deviation from R**: `did_multiplegt_dyn` provides no joint / sup-t bands at any surface — Python-only methodology extension consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Bundled pre-audit fix** (sibling-surface check): the existing OVERALL `sup_t_bands` field's stale "Phase 2 placeholder" docstring updated to the actual contract description. Tests: new `TestByPathSupTBands` class with 13 tests covering: attr None when no bootstrap / no by_path; keys match `path_effects` with finite crit; band wider than pointwise; crit finite and positive; seed reproducibility; single-horizon-path-skip; L_max=1 skip; n_valid_horizons matches; absent-path-no-cband-keys; summary renders; empty-dict-when-no-complete-window; strict-majority-gate-at-exact-50pct (monkeypatches the weight generator to inject NaN into half the bootstrap rows, asserting both `sup_t_bands is None` and `path_sup_t_bands == {}` at the boundary). All `@pytest.mark.slow`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
PR #357 shipped by_path foundation; PRs #364/#371/#374 completed the inference surface (bootstrap, placebos, sup-t bands). Wave 3 begins design-variant extensions; this PR is item #5: combine by_path=k with controls=[...] (DID^X). Architecture: the per-baseline OLS residualization at chaisemartin_dhaultfoeuille.py:1498 runs once on the full panel BEFORE path enumeration, so all four downstream surfaces (analytical SE, bootstrap SE, per-path placebos, per-path joint sup-t bands) consume the residualized Y_mat automatically (Frisch-Waugh-Lovell). Per-period effects remain unadjusted, consistent with the existing controls + per-period DID contract. Canonical R behavior: `did_multiplegt_dyn(..., by_path=k, controls=...)` re-runs the per-baseline residualization on each path's restricted subsample (path's switchers + same-baseline not-yet-treated controls). On the multi_path_reversible DGP all switchers share baseline D_{g,1}=0, so R's per-path control pool equals our global control pool and the residualization coefficients coincide. Per-path point estimates match R exactly (rtol ~1e-11); per-path SE within ~6.5% (Phase 2 envelope, inheriting the documented cross-path cohort- sharing deviation). Changes: - Delete the gate at chaisemartin_dhaultfoeuille.py:988-992 - Update by_path docstring (remove `controls` from incompatible list, add inheritance paragraph) - New R parity scenario `multi_path_reversible_by_path_controls` in benchmarks/R/generate_dcdh_dynr_test_values.R + regenerated golden values - New TestDCDHDynRParityByPathControls in tests/test_chaisemartin_dhaultfoeuille_parity.py - New TestByPathControls in tests/test_chaisemartin_dhaultfoeuille.py (12 tests covering analytical / bootstrap / placebo / sup-t / cband to_dataframe / per-period unadjusted / covariate_residuals round- trip / multi-covariate) - Remove the `controls` parametrize entry from TestByPathGates::test_forbids_phase3_fit_kwargs - Update REGISTRY.md (remove `controls` from gated-combos list, add inheritance sub-paragraph documenting the four-surface auto- inheritance) - CHANGELOG: Unreleased > Added entry Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
DID^{pl}_{path, l}forl = 1..L_maxunder the existing joiners/leavers IF precedent applied backward; surfaced onresults.path_placebo_event_study[path][-l].project_dcdh_by_path_next_prs.md).path_effects.Test plan
pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo -v(8 analytical tests pass)pytest tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo::TestBootstrap -v -m slow(5 bootstrap tests pass)pytest tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo -v(R-parity confirmed; point estimates exact, SE within ~5% rtol)pytest tests/test_chaisemartin_dhaultfoeuille.py tests/test_chaisemartin_dhaultfoeuille_parity.py tests/test_methodology_chaisemartin_dhaultfoeuille.py(224 passed)Rscript benchmarks/R/generate_dcdh_dynr_test_values.Rruff check+black --checkon changed methodology files: clean🤖 Generated with Claude Code