dCDH by_path: lift trends_linear + trends_nonparam gates (Wave 3 #6+#7) by igerber · Pull Request #393 · igerber/diff-diff

igerber · 2026-04-26T13:41:49Z

Summary

Lifts the two NotImplementedError gates blocking ChaisemartinDHaultfoeuille.by_path from combining with trends_linear (DID^{fd} group-specific linear trends) and trends_nonparam (state-set trends), Wave 3 #6+#7 of the dCDH by_path follow-up sequence (PR #378 shipped #5).

path_cumulated_event_study (new Results field): surfaces the cumulated level effect delta_l = sum_{l'=1..l} DID^{fd}_{path, l'} per path under trends_linear=True. This is what R returns under did_multiplegt_dyn(..., by_path, trends_lin). Mirrors the global linear_trends_effects cumulation; raw per-horizon DID^{fd}_l per path still surfaces on path_effects[path][l].
set_ids threading: set_ids parameter added to the four per-path IF helpers (_compute_path_effects, _compute_path_placebos, _collect_path_bootstrap_inputs, _collect_path_placebo_bootstrap_inputs) so trends_nonparam's set-restricted control pool reaches per-path analytical SE, bootstrap, placebos, and sup-t bands automatically.
to_dataframe(level="by_path") gains cumulated_effect / cumulated_se columns (always present, NaN-when-None — mirrors the cband_* convention). summary() renders a per-path "Cumulated Level Effects" sub-section under each path block.

R parity validated via two new golden-value scenarios (single_baseline_multi_path_by_path_trends_lin for trends_linear and multi_path_reversible_by_path_trends_nonparam for trends_nonparam). Per-path point estimates match R bit-exactly (POINT_RTOL=1e-9); cumulated SE within CUM_SE_RTOL=0.20 (widened vs the 0.12 used for non-cumulated by_path parity because the conservative upper-bound SE compounds the cross-path cohort-sharing deviation under summation). Placebo parity for trends_linear is intentionally skipped (documented architectural divergence between R's per-path full-pipeline call and Python's global-then-disaggregate architecture).

Test plan

TestByPathTrendsLinear (10 non-slow + 1 slow): gate removal, path_cumulated_event_study populated under trends_linear, is None without trends_linear, conservative SE upper bound formula, to_dataframe columns, summary() rendering, bootstrap finite SE inheritance, per-period unaffected
TestByPathTrendsNonparam (5 non-slow + 1 slow): gate removal, set restriction changes per-path estimates (sanity check that set_ids threading reached the IF helpers), per-path SE finite, validation passthrough (time-varying / missing column), bootstrap finite SE
TestDCDHDynRParityByPathTrendsLinear: per-path cumulated point estimates match R bit-exactly on event horizons; cumulated SE within ±20%
TestDCDHDynRParityByPathTrendsNonparam: per-path point estimates AND placebos match R bit-exactly; per-path SE within ±15%
Regression: TestLinearTrends (non-by_path), TestStateSetTrends (non-by_path), TestCovariateAdjustment (non-by_path) remain unchanged
Full suite: pytest tests/test_chaisemartin_dhaultfoeuille.py tests/test_chaisemartin_dhaultfoeuille_parity.py tests/test_methodology_chaisemartin_dhaultfoeuille.py (239 + 10 = 249 tests passing locally)

🤖 Generated with Claude Code

github-actions · 2026-04-26T13:46:39Z

Overall Assessment

⚠️ Needs changes — one unmitigated P1 on the new by_path + trends_linear inference surface.

Executive Summary

Affected methods: dCDH by_path, DID^{fd} linear-trends cumulation, state-set trends, and per-path bootstrap inference.
trends_nonparam looks methodologically consistent in the diff: set_ids is threaded through the per-path analytical, placebo, and bootstrap collector paths the registry requires.
The documented R deviations for cohort-mixed per-path SE and by_path + trends_linear placebo parity are explicitly recorded in docs/methodology/REGISTRY.md:L641-L643, so I did not treat them as defects.
P1: the new path_cumulated_event_study surface is computed before bootstrap and never updated after bootstrap overwrites path_effects, so bootstrap runs mix bootstrap raw-path inference with analytical cumulated-path inference.
The added tests do not cover that bootstrap/cumulation path; they only assert bootstrap finiteness on res.path_effects.

Methodology

P1 Impact: path_cumulated_event_study is built from analytical per-horizon SEs at diff_diff/chaisemartin_dhaultfoeuille.py:L2075-L2126, but bootstrap later overwrites only path_effects[path]["horizons"][l] at diff_diff/chaisemartin_dhaultfoeuille.py:L3034-L3096. The global analogue linear_trends_effects is assembled after bootstrap propagation at diff_diff/chaisemartin_dhaultfoeuille.py:L3405-L3454, so the new by-path surface is inconsistent with the existing methodology contract. summary() renders stale analytical cumulated SE/t/p rows at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1308-L1332, and to_dataframe(level="by_path") exports stale analytical cumulated_se at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1713-L1774, even when the raw path horizons are using bootstrap inference. Concrete fix: recompute path_cumulated_event_study after bootstrap propagation, or add a dedicated overwrite block so its se, t_stat, p_value, and conf_int are derived from the final post-bootstrap path_effects state and obey the same NaN-on-invalid bootstrap contract as the rest of dCDH.

Code Quality

No additional findings in the changed code. The diff uses safe_inference() rather than introducing new inline inference anti-patterns.

Performance

No findings in the diff.

Maintainability

No additional findings in the diff.

Tech Debt

No separate tech-debt finding, but the blocking issue above is not tracked in TODO.md:L51-L113, so it remains unmitigated.

Security

No findings in the changed files.

Documentation/Tests

P2 Impact: the new tests do not exercise the failing bootstrap path the registry claims is covered. tests/test_chaisemartin_dhaultfoeuille.py:L7199-L7221 only checks finite bootstrap SEs on res.path_effects; it never asserts anything about res.path_cumulated_event_study after bootstrap. Meanwhile docs/methodology/REGISTRY.md:L641-L643 says the by_path + trends_linear cross-surface invariants, including bootstrap and path_cumulated_event_study, are regression-tested at TestByPathTrendsLinear. That gap is why the P1 above can slip through. Concrete fix: add bootstrap regressions for path_cumulated_event_study, including both a normal bootstrap case and an invalid-bootstrap case (n_bootstrap=1) asserting the cumulated inference tuple goes NaN instead of retaining analytical values.

Path to Approval

Recompute or overwrite results.path_cumulated_event_study after bootstrap_results.path_ses are propagated, so cumulated-path inference uses the final bootstrap horizon state rather than pre-bootstrap analytical SEs.
Add regression coverage for by_path + trends_linear + n_bootstrap > 0, including:
- a positive-bootstrap case asserting cumulated SE follows the documented running-sum rule on the final per-horizon path SEs;
- an n_bootstrap=1 case asserting se, t_stat, p_value, and conf_int on the cumulated surface are NaN-consistent.

…strap CI reviewer flagged that the cumulated layer was built before the bootstrap propagation block, so when n_bootstrap > 0 it surfaced stale analytical SEs / t-stats / p-values / CIs while path_effects[path][l] had been overwritten with bootstrap stats — a public-surface inconsistency with the library-wide bootstrap contract. Fix: move the _compute_path_cumulated_event_study() call to AFTER the bootstrap propagation block at chaisemartin_dhaultfoeuille.py:3034-3081 (mirrors the global linear_trends_effects placement at :3405-3454). The helper reads path_effects[path]["horizons"][l]["se"] which by then is bootstrap-overwritten under n_bootstrap > 0, so cumulated SE becomes a running sum of bootstrap per-horizon SEs. Also addresses the P2 test-coverage gap with two new regressions in TestByPathTrendsLinear: - test_bootstrap_cumulated_uses_post_bootstrap_per_horizon_se: positive case asserting cumulated SE == running sum of post-bootstrap per- horizon SEs (rtol 1e-12). - test_bootstrap_cumulated_nan_consistent_when_n_bootstrap_one: n_bootstrap=1 case asserting cumulated SE / t_stat / p_value / conf_int are all NaN per the library-wide NaN-on-invalid contract. REGISTRY note updated with the post-bootstrap recomputation contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T13:58:40Z

/ai-review

github-actions · 2026-04-26T14:05:41Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 0a2783a7f5b36a61e970a906dc30ded48dd98205

Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains on the newly lifted by_path + trends_linear path.

Executive Summary

Affected methods: dCDH by_path, per-path trends_linear DID^{fd} cumulation, per-path trends_nonparam control-pool restriction, and the by-path bootstrap/placebo/sup-t surfaces.
The prior rereview blocker is fixed: path_cumulated_event_study is now recomputed after bootstrap propagation, and the new tests pin both the post-bootstrap running-sum rule and NaN-on-invalid-bootstrap behavior.
trends_nonparam looks methodologically consistent in the diff: set_ids now reaches the four per-path IF/bootstrap helpers the registry says should consume the set-restricted control pool.
P1: by_path + trends_linear is now ungated on all panels, but unlike the analogous by_path + controls path it still does not warn on multi-baseline switcher panels even though the new docs acknowledge the same R-divergence pattern.
I could not run pytest here because pytest is not installed; verification was static plus a no-write compile() parse check.

Methodology

Severity: P1
Impact: The new by_path + trends_linear path is only parity-validated on single-baseline panels, and the method docstring explicitly says it has the “same multi-baseline divergence pattern as controls”; however the actual trends_linear fit path only warns about F_g < 3 exclusions and has no analogous multi-baseline warning. The control path does implement that warning. This leaves a newly ungated regime where per-path point estimates can materially diverge from did_multiplegt_dyn(..., by_path, trends_lin) with no user-visible signal. diff_diff/chaisemartin_dhaultfoeuille.py:L441-L463, diff_diff/chaisemartin_dhaultfoeuille.py:L1529-L1584, diff_diff/chaisemartin_dhaultfoeuille.py:L1619-L1657, docs/methodology/REGISTRY.md:L641-L643
Concrete fix: Mirror the by_path + controls warning condition for by_path + trends_linear (switcher baselines D_{g,1} take multiple values), and add explicit deviation language to the registry paragraph so the public contract matches the runtime behavior.

Resolved in this rereview: the earlier stale-bootstrap-cumulation issue is fixed by recomputing path_cumulated_event_study after bootstrap propagation. diff_diff/chaisemartin_dhaultfoeuille.py:L3078-L3116, tests/test_chaisemartin_dhaultfoeuille.py:L7223-L7319

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No separate finding. The P1 above is not tracked in TODO.md:L51-L113, so it remains unmitigated.

Security

No findings.

Documentation/Tests

Severity: P2
Impact: The new TestByPathTrendsLinear and parity coverage only exercise single-baseline fixtures, so the known multi-baseline divergence/warning path is not regression-tested. By contrast, by_path + controls already pins both the warning and the no-warning single-baseline case. That gap is why the missing runtime warning above is easy to ship unnoticed. tests/test_chaisemartin_dhaultfoeuille.py:L6620-L6688, tests/test_chaisemartin_dhaultfoeuille.py:L6941-L7319, tests/test_chaisemartin_dhaultfoeuille_parity.py:L900-L1054
Concrete fix: Add a multi-baseline by_path + trends_linear fixture that must emit the warning, plus a single-baseline / heterogeneous-F_g fixture that must not emit it, mirroring the existing TestByPathControls pattern.

Path to Approval

Add a fit-time UserWarning for by_path + trends_linear on multi-baseline switcher panels, using the same D_{g,1} multiplicity check already used for by_path + controls.
Update docs/methodology/REGISTRY.md to spell out that multi-baseline by_path + trends_linear can diverge from R point estimates and that the estimator warns when this regime is detected.
Add regression tests for the warning contract: one multi-baseline fixture that must warn, and one single-baseline / heterogeneous-F_g fixture that must not.

CI reviewer flagged that by_path + trends_linear was ungated on all panels but only warned on F_g<3 exclusions, while the analogous by_path + controls path warns on multi-baseline switcher panels (D_{g,1} multiplicity) where R's per-path full-pipeline call diverges from Python's global-then-disaggregate architecture. The by_path docstring already acknowledged the "same multi-baseline divergence pattern as controls" (line 461-462), but the runtime warning was missing — a docstring-vs-implementation gap. Fix: mirror the controls warning at chaisemartin_dhaultfoeuille.py :1565-1584 inside the trends_linear block (right after the F_g<3 warning). Same predicate (switcher mask on first_switch_idx_arr, unique baseline check). Different message naming the trends_linear mechanism (full-pipeline including first-differencing per path subset). REGISTRY note updated to spell out the warning explicitly. Three regression tests added in TestByPathTrendsLinear, mirroring TestByPathControls: - test_multi_baseline_panel_emits_r_deviation_warning: panel with joiners and leavers must emit the warning - test_single_baseline_panel_does_not_emit_r_deviation_warning: single-baseline panel must not warn - test_single_baseline_heterogeneous_F_g_does_not_warn: pins the precise predicate (baseline multiplicity, NOT F_g multiplicity) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T14:09:27Z

/ai-review

github-actions · 2026-04-26T14:16:31Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 35ab5fe66c236759ec4a1e2e370ec1d10fffbc66

Overall Assessment

⚠️ Needs changes — the prior methodology blocker is fixed, but one unmitigated P1 remains: the new by_path + trends_linear / by_path + trends_nonparam regressions do not actually pin the newly lifted parameter interactions across all existing by-path inference surfaces.

Executive Summary

The prior rereview blocker is resolved: by_path + trends_linear now emits the multi-baseline R-divergence warning at fit time, and the registry note documents it.
The implementation itself looks methodologically aligned with the registry: per-path trends_linear cumulation mirrors global linear_trends_effects, and set_ids is threaded through the four per-path IF/bootstrap helpers.
P1: coverage for the newly lifted trends_* paths is incomplete. The new trends_linear “per-period unaffected” test never inspects per_period_effects; the parity test explicitly skips negative-horizon placebos; and the new trends_nonparam class never pins the modified sup-t / cband_* bootstrap path.
P3: two public docstrings are stale relative to the shipped API: fit() still says path_effects[path][l], and to_dataframe(level="by_path") omits the new cumulated_effect / cumulated_se columns.
pytest was unavailable in this environment; review was static plus an in-memory compile() parse check of the touched Python files.

Methodology

No findings. The prior rereview issue is addressed: the multi-baseline by_path + trends_linear warning is present in diff_diff/chaisemartin_dhaultfoeuille.py:L1651-L1680, and the deviation is documented in docs/methodology/REGISTRY.md:L641.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The P1 below is not tracked in TODO.md, so it remains unmitigated.

Security

No findings.

Documentation/Tests

P1 Impact: the newly lifted parameter interactions are not actually regression-tested across all by-path surfaces. The code explicitly preserves raw per-period outputs under trends_linear in diff_diff/chaisemartin_dhaultfoeuille.py:L1774-L1788, but test_per_period_effects_unaffected_by_trends_linear_by_path never checks per_period_effects and instead re-checks event_study_effects in tests/test_chaisemartin_dhaultfoeuille.py:L7161-L7197. The R parity test for by_path + trends_linear also skips every negative-horizon placebo row in tests/test_chaisemartin_dhaultfoeuille_parity.py:L1015-L1020, so the newly ungated raw-placebo path is only executed, not asserted. On the trends_nonparam side, the new class stops at point-estimate / SE / validation checks in tests/test_chaisemartin_dhaultfoeuille.py:L7486-L7630; unlike the existing controls coverage in tests/test_chaisemartin_dhaultfoeuille.py:L6427-L6560, it adds no parameter-specific regression for the modified path-level sup-t / cband_* bootstrap path. Under the project’s own parameter-interaction rule, that leaves silent regressions possible in newly lifted code paths. Concrete fix: add explicit regressions for per_period_effects, path_placebo_event_study, and path_sup_t_bands / to_dataframe(level="by_path") under the new trends_linear and trends_nonparam combinations.
P3 Impact: the public docstrings still describe the result schema inaccurately. fit() says raw DID^{fd} values surface at path_effects[path][l] in diff_diff/chaisemartin_dhaultfoeuille.py:L441-L448, but the actual API uses path_effects[path]["horizons"][l]. to_dataframe(level="by_path") also omits the new cumulated_effect / cumulated_se columns from its documented schema in diff_diff/chaisemartin_dhaultfoeuille_results.py:L1417-L1430. Concrete fix: update both docstrings to match the shipped shape and column set.

Path to Approval

Replace tests/test_chaisemartin_dhaultfoeuille.py:L7161-L7197 with a real per_period_effects invariance check, and add the same per_period_effects regression for by_path + trends_nonparam.
Add a by_path + trends_linear + placebo=True regression that asserts path_placebo_event_study negative horizons are populated and NaN-consistent, rather than only executing that path and skipping it in parity.
Add parameter-specific bootstrap/sup-t regressions for the newly threaded paths, asserting that path_sup_t_bands and to_dataframe(level="by_path") cband_* columns are populated correctly under by_path + trends_nonparam (and ideally trends_linear) when n_bootstrap > 0.

P1 fix — test coverage gaps for newly lifted parameter interactions: - test_per_period_effects_unaffected_by_trends_linear_by_path now actually inspects per_period_effects (was checking event_study_effects by mistake). Asserts no_bp == bp on did_plus_t / did_minus_t at every horizon since per-period DID uses raw Y_mat per the contract at chaisemartin_dhaultfoeuille.py:1493-1496. - Symmetric test_per_period_effects_unaffected_by_trends_nonparam_by_path added — set_ids doesn't touch the per-period path either. - test_per_path_placebos_with_trends_linear_present asserts path_placebo_event_study is populated under by_path + trends_linear + placebo=True with finite values. The R-parity path skips placebo rows due to documented divergence; this regression locks the Python-side surface population so the path doesn't silently regress. - test_sup_t_bands_with_trends_linear_finite_crit and the nonparam symmetric test pin the bootstrap-collector path: path_sup_t_bands populated, cband_lower/upper columns surface on to_dataframe(level="by_path") under both new combinations with n_bootstrap > 0. P3 fix — docstring schema staleness: - by_path docstring corrected: path_effects[path]["horizons"][l] (not path_effects[path][l]). - to_dataframe(level="by_path") schema now lists cumulated_effect / cumulated_se columns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T14:21:27Z

/ai-review

github-actions · 2026-04-26T14:26:49Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6cf673ca287bc18bd5ebf26328d8e7f3288d3f7c

Overall Assessment

✅ Looks good - the prior re-review P1 is addressed, and I found no unmitigated P0/P1 issues in the changed estimator paths. One P3 documentation inconsistency remains.

Executive Summary

The prior P1 test-surface gap is closed: the PR now explicitly covers per_period_effects, trends_linear per-path placebos, and trends_nonparam bootstrap/sup-t/cband paths in tests/test_chaisemartin_dhaultfoeuille.py:L6941-L7825 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L900-L1162.
Methodology looks aligned with the registry: by_path + trends_linear now warns on the documented multi-baseline R divergence, and the new per-path cumulated layer is computed after bootstrap from the final per-horizon SEs, mirroring the global linear_trends_effects contract in diff_diff/chaisemartin_dhaultfoeuille.py:L1657-L1691, L3119-L3153, L3462-L3511.
trends_nonparam appears fully threaded through the by-path analytical, placebo, bootstrap, and sup-t helper stack via set_ids in diff_diff/chaisemartin_dhaultfoeuille.py:L2134-L2140, L2831-L2876, L5506-L6215.
I did not find new NaN/inference, empty-result, or control-pool correctness defects in the changed code.
P3: the methodology registry still contains stale by-path schema text for this new surface in docs/methodology/REGISTRY.md:L641.
pytest was unavailable here (pytest: command not found), so this review is static plus a source-level compile() parse check of the touched Python files.

Methodology

No findings. The affected methods are Phase 3 by_path disaggregation plus the trends_linear and trends_nonparam extensions. The implementation matches the documented contract: the multi-baseline trends_linear deviation is surfaced with a warning in diff_diff/chaisemartin_dhaultfoeuille.py:L1657-L1691, set_ids is threaded through the four by-path analytical/bootstrap helpers in diff_diff/chaisemartin_dhaultfoeuille.py:L2134-L2140, L2831-L2876, L5506-L6215, and the new cumulated layer is built after bootstrap from final per-horizon SEs in diff_diff/chaisemartin_dhaultfoeuille.py:L3119-L3153, L5690-L5814. The main deviations from R are documented in docs/methodology/REGISTRY.md:L641-L643, so they are mitigated rather than silent defects.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

The previous re-review P1 is resolved: per-period invariance, trends-linear negative-horizon placebo population, and trends_nonparam bootstrap/sup-t/cband coverage are now explicitly tested in tests/test_chaisemartin_dhaultfoeuille.py:L7161-L7805, with new parity coverage in tests/test_chaisemartin_dhaultfoeuille_parity.py:L900-L1162.

Severity: P3. Impact: docs/methodology/REGISTRY.md:L641 still describes the old by-path result shape. The top-level to_dataframe(level="by_path") schema omits cumulated_effect / cumulated_se, and the new linear-trends text still says path_effects[path][l] instead of path_effects[path]["horizons"][l]. The same stale shape also appears in the new test-class docstring at tests/test_chaisemartin_dhaultfoeuille.py:L6944-L6948. Concrete fix: update those docstrings to match the shipped API already described in diff_diff/chaisemartin_dhaultfoeuille_results.py:L1417-L1436.

Final P3 cleanup round, no methodology / behavior changes: - REGISTRY.md by_path Note schema now lists cumulated_effect / cumulated_se in the to_dataframe(level="by_path") column list (was missing). - REGISTRY.md trends_linear paragraph: corrected path_effects[path][l] -> path_effects[path]["horizons"][l] to match the shipped result-class shape. - TestByPathTrendsLinear class docstring: same path_effects shape correction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T14:30:24Z

/ai-review

github-actions · 2026-04-26T14:36:14Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3d45bf34f170019fe924809b9da378a67005f65f

Overall Assessment

⚠️ Needs changes

Executive Summary

The affected methods are Phase 3 by_path disaggregation under trends_linear (per-path DID^{fd} plus cumulated level effects) and trends_nonparam (set-restricted controls). I did not find an undocumented estimator/SE deviation against docs/methodology/REGISTRY.md.
The prior documentation issue looks resolved: the registry now documents the new by_path + trends_linear / by_path + trends_nonparam contracts.
I did not find a new NaN-inference or control-pool bug in the changed estimator code. safe_inference() is used consistently, and set_ids is threaded into the intended per-path helpers.
P1: the newly reachable negative-horizon bootstrap path is still untested for by_path + trends_linear + placebo + n_bootstrap.
P1: the new set_ids threading into the per-path placebo bootstrap collector is still untested for by_path + trends_nonparam + placebo + n_bootstrap.
I could not run pytest here because pytest is not installed; this is a static review plus source compile() checks on the touched Python files.

Methodology

No findings. The implementation now matches the documented registry contract for:

per-path raw DID^{fd} on path_effects[path]["horizons"][l]
per-path cumulated level effects on path_cumulated_event_study
the documented multi-baseline divergence warning for by_path + trends_linear
set-restricted control pools for by_path + trends_nonparam

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not see an existing TODO.md entry that would mitigate the test-surface gaps below.

Security

No findings.

Documentation/Tests

Severity: P1. Impact: diff_diff/chaisemartin_dhaultfoeuille.py:L3155-L3201 and diff_diff/chaisemartin_dhaultfoeuille.py:L6116-L6244 are now reachable under by_path + trends_linear + placebo + n_bootstrap, but the new tests only cover analytical negative horizons at tests/test_chaisemartin_dhaultfoeuille.py:L7251-L7298 and positive-horizon bootstrap at tests/test_chaisemartin_dhaultfoeuille.py:L7228-L7249. No test asserts that negative-horizon rows get bootstrap-derived se / p_value / conf_int on the first-differenced path, so a silent placebo-bootstrap regression would currently pass. Concrete fix: add a TestByPathTrendsLinear case with placebo=True, n_bootstrap>0 that checks negative-horizon path_placebo_event_study entries receive bootstrap-populated inference fields, plus an n_bootstrap=1 check for the full-NaN contract on negative horizons.
Severity: P1. Impact: the PR explicitly threads set_ids into _collect_path_placebo_bootstrap_inputs at diff_diff/chaisemartin_dhaultfoeuille.py:L6116-L6244 and its call site at diff_diff/chaisemartin_dhaultfoeuille.py:L2862-L2875, but the new trends_nonparam tests only cover analytical placebos via parity at tests/test_chaisemartin_dhaultfoeuille_parity.py:L1092-L1185 and positive-horizon bootstrap/sup-t at tests/test_chaisemartin_dhaultfoeuille.py:L7729-L7852. No test exercises by_path + trends_nonparam + placebo + n_bootstrap>0, so a wrong negative-horizon control pool would silently ship. Concrete fix: add a TestByPathTrendsNonparam case with placebo=True, n_bootstrap>0 on a fixture where set restriction changes placebo controls, then assert at least one negative-horizon bootstrap result differs from the no-set fit and remains finite.
Severity: P3. Impact: diff_diff/chaisemartin_dhaultfoeuille_results.py:L384-L440 still omits the new path_cumulated_event_study attribute from the results-class attribute docs even though the field was added at diff_diff/chaisemartin_dhaultfoeuille_results.py:L555-L570. Public API docs are slightly incomplete. Concrete fix: add a short attribute entry mirroring the field docstring and the to_dataframe(level="by_path") schema.

Path to Approval

Add a negative-horizon bootstrap regression for by_path + trends_linear + placebo + n_bootstrap, including the n_bootstrap=1 NaN-inference contract.
Add a negative-horizon bootstrap regression for by_path + trends_nonparam + placebo + n_bootstrap that proves set_ids reaches the placebo bootstrap collector.

…string P1 fixes — coverage gaps for newly reachable bootstrap+placebo paths: - test_per_path_placebos_with_trends_linear_bootstrap_inference: asserts negative-horizon SE differs between analytical and bootstrap fits, proving the placebo bootstrap propagation block runs through the first-differenced path. - test_per_path_placebos_with_trends_linear_bootstrap_nan_consistent: n_bootstrap=1 case asserting NaN-consistent inference on negative horizons (locks the library-wide NaN-on-invalid contract on this newly reachable surface). - test_per_path_placebos_with_trends_nonparam_bootstrap_inference: comparison fit with vs without trends_nonparam under bootstrap + placebo; asserts negative-horizon SE differs, proving set_ids reaches _collect_path_placebo_bootstrap_inputs. P3 fix — Results dataclass attribute documentation: - Added a path_cumulated_event_study attribute entry to ChaisemartinDHaultfoeuilleResults attributes docstring (was added as a field but missing from the public attribute table). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T14:39:23Z

/ai-review

github-actions · 2026-04-26T14:43:58Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e890d604006e7c791fa1907bfd9bdbfb728d8242

Overall Assessment

⚠️ Needs changes

Executive Summary

Affected methods: ChaisemartinDHaultfoeuille.by_path under trends_linear=True (per-path DID^{fd} plus new path_cumulated_event_study) and under trends_nonparam=... (set-restricted controls threaded into per-path analytical/bootstrapped/placebo paths).
The prior re-review P1s look addressed: the diff now adds explicit negative-horizon bootstrap coverage for by_path + trends_linear + placebo and by_path + trends_nonparam + placebo in tests/test_chaisemartin_dhaultfoeuille.py and tests/test_chaisemartin_dhaultfoeuille.py.
The set_ids threading itself is consistent across the four intended per-path helpers, so the new trends_nonparam control-pool restriction looks aligned with the existing registry contract.
P1: by_path + trends_linear still has an undocumented divergence regime on single-baseline panels with F_g=3. This PR’s benchmark/parity comments say only F_g >= 4 avoids the boundary-case divergence, but the new public contract still states single-baseline parity broadly and the runtime warning only covers the multi-baseline case.
I could not run pytest here because pytest is not installed; this is a static review. I did syntax-check the touched Python files with compile() successfully.

Methodology

Severity: P1. Impact: The new public contract for by_path + trends_linear overstates when Python matches R. The user-facing docstring says parity holds on “single-baseline panels” and the new runtime warning only fires for multi-baseline switcher panels, but this same PR’s benchmark generator and parity-test docstring explicitly say there is still a separate F_g=3 boundary-case divergence and that the parity fixture had to be restricted to F_g >= 4 to avoid it. That means a paper-valid, single-baseline panel can still produce R-divergent per-path point estimates/cumulated effects with no warning and no registry note. References: diff_diff/chaisemartin_dhaultfoeuille.py, diff_diff/chaisemartin_dhaultfoeuille.py, benchmarks/R/generate_dcdh_dynr_test_values.R, tests/test_chaisemartin_dhaultfoeuille_parity.py, docs/methodology/REGISTRY.md, CHANGELOG.md.
Concrete fix: Either make the F_g=3 regime match R, or document and warn on it explicitly. At minimum, narrow the contract from “single-baseline panels” to the actually parity-tested regime (for example, selected-path switchers with sufficient pre-window depth, F_g >= 4), add the same limitation to REGISTRY.md/CHANGELOG.md, and emit a targeted UserWarning when any selected path includes F_g=3 switchers.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

No findings on the PR itself. The last review’s two test-surface P1s appear resolved by the new negative-horizon bootstrap tests. Review limitation: pytest is unavailable in this environment, so I could only do static inspection plus compile() on the touched Python files.

Path to Approval

Update the by_path + trends_linear methodology contract to cover the known F_g=3 boundary divergence: either fix the implementation to match R there, or explicitly document and warn on that regime in diff_diff/chaisemartin_dhaultfoeuille.py, docs/methodology/REGISTRY.md, and CHANGELOG.md.
Add a regression test with a single-baseline by_path + trends_linear panel that includes selected-path F_g=3 switchers and asserts the intended behavior: either R-aligned estimates after a code fix, or the new targeted warning plus narrowed contract if the divergence remains documented.

…ontract CI reviewer flagged that the by_path + trends_linear public contract overstated R parity. The shipped contract said "single-baseline panels" broadly, but the parity scenario was restricted to F_g >= 4 because F_g=3 switchers trigger a 30%+ point-estimate divergence between Python's global-then-disaggregate architecture and R's per-path full- pipeline call (only 1 valid pre-window Z value remains after first-differencing + time==1 filter, triggering different control- eligibility treatment). Fix: add a targeted UserWarning fired at fit-time when by_path + trends_linear is set AND any switcher has F_g==3 (predicate: first_switch_idx_arr == 2). Mirrors the existing F_g < 3 exclusion warning but for the boundary case rather than the exclusion case. Documentation: docstring at the by_path paragraph now narrows the parity claim to "single-baseline panels with sufficient pre-window depth (F_g >= 4 for every selected-path switcher)" and explicitly documents the F_g=3 boundary regime + warning. REGISTRY adds the same narrowing + warning text to the per-path linear-trends Note. CHANGELOG adds the divergence note to the Unreleased entry. Regression test test_F_g_three_boundary_case_emits_warning added in TestByPathTrendsLinear: 4 F_g=3 switchers + 4 F_g=4 switchers in a single-baseline panel, asserts the new warning fires (named "by_path + trends_linear" + "F_g=3"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-26T14:51:29Z

/ai-review

github-actions · 2026-04-26T14:59:47Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f38c7034c431e5fa39f212a95005e5d135ef94fc

Overall Assessment
✅ Looks good

Executive Summary

Affected methods: ChaisemartinDHaultfoeuille.by_path for DID^{fd} linear trends (Web Appendix §1.3) and state-set trends (Web Appendix §1.4). I did not find any unmitigated P0/P1 methodology issues in the changed code or its documented contract. docs/methodology/REGISTRY.md:L631-L643
[Previously identified] The earlier F_g=3 boundary-case documentation gap for by_path + trends_linear is resolved: the estimator docstring, fit-time warning, registry note, and changelog now all disclose that regime explicitly. diff_diff/chaisemartin_dhaultfoeuille.py:L441-L489 diff_diff/chaisemartin_dhaultfoeuille.py:L1668-L1740 docs/methodology/REGISTRY.md:L641-L643 CHANGELOG.md:L8-L11
The new path_cumulated_event_study surface matches the library’s existing linear_trends_effects contract: conservative cumulated SEs, NaN-consistent inference via safe_inference(), and construction after bootstrap propagation so it uses final per-horizon SEs. diff_diff/chaisemartin_dhaultfoeuille.py:L3168-L3202 diff_diff/chaisemartin_dhaultfoeuille.py:L3511-L3560 diff_diff/chaisemartin_dhaultfoeuille.py:L5739-L5863
set_ids is threaded consistently into the by-path analytical, placebo, and bootstrap helpers, so trends_nonparam reaches analytical SE, bootstrap SE, placebos, and sup-t bands as intended. diff_diff/chaisemartin_dhaultfoeuille.py:L2166-L2189 diff_diff/chaisemartin_dhaultfoeuille.py:L2333-L2349 diff_diff/chaisemartin_dhaultfoeuille.py:L2880-L2924 diff_diff/chaisemartin_dhaultfoeuille.py:L5647-L5658 diff_diff/chaisemartin_dhaultfoeuille.py:L5960-L5970 diff_diff/chaisemartin_dhaultfoeuille.py:L6125-L6135 diff_diff/chaisemartin_dhaultfoeuille.py:L6248-L6258
Test coverage is materially better: regression tests pin the new warning contracts and trends_nonparam bootstrap/placebo paths, and new parity classes cover both lifted gates against golden values. Static review only: pytest was unavailable here, but in-memory compile() succeeded on the touched Python files. tests/test_chaisemartin_dhaultfoeuille.py:L7603-L7655 tests/test_chaisemartin_dhaultfoeuille.py:L7974-L8037 tests/test_chaisemartin_dhaultfoeuille_parity.py:L900-L1115

Methodology

Severity: P3 informational. Impact: The remaining by-path divergences from R are now documented rather than silent: multi-baseline switcher panels, the F_g=3 boundary case under trends_linear, placebo parity under trends_linear, and the inherited cross-path cohort-sharing SE deviation. That resolves the prior re-review methodology concern, so I do not treat these as blocking defects. diff_diff/chaisemartin_dhaultfoeuille.py:L441-L489 diff_diff/chaisemartin_dhaultfoeuille.py:L1668-L1740 docs/methodology/REGISTRY.md:L641-L643
Concrete fix: No action required.

Code Quality
No findings. The changed by-path effect, cumulated, and placebo paths use safe_inference() rather than introducing new inline inference anti-patterns. diff_diff/chaisemartin_dhaultfoeuille.py:L5719-L5725 diff_diff/chaisemartin_dhaultfoeuille.py:L5846-L5858 diff_diff/chaisemartin_dhaultfoeuille.py:L6022-L6032

Performance
No findings.

Maintainability
No findings. The new path_cumulated_event_study surface is wired consistently through the result docs, dataclass, summary(), and to_dataframe(level="by_path"). diff_diff/chaisemartin_dhaultfoeuille_results.py:L421-L440 diff_diff/chaisemartin_dhaultfoeuille_results.py:L575-L590 diff_diff/chaisemartin_dhaultfoeuille_results.py:L1328-L1353 diff_diff/chaisemartin_dhaultfoeuille_results.py:L1700-L1803

Tech Debt
No findings.

Security
No findings.

Documentation/Tests
No findings. The documentation and tests now cover the new behavior and the known deviation regimes, including the targeted F_g=3 warning and dedicated R-parity fixtures for both lifted gates. Review limitation: I could not run pytest in this environment. CHANGELOG.md:L8-L11 tests/test_chaisemartin_dhaultfoeuille.py:L6941-L8065 tests/test_chaisemartin_dhaultfoeuille_parity.py:L900-L1115

…strap CI reviewer flagged that the cumulated layer was built before the bootstrap propagation block, so when n_bootstrap > 0 it surfaced stale analytical SEs / t-stats / p-values / CIs while path_effects[path][l] had been overwritten with bootstrap stats — a public-surface inconsistency with the library-wide bootstrap contract. Fix: move the _compute_path_cumulated_event_study() call to AFTER the bootstrap propagation block at chaisemartin_dhaultfoeuille.py:3034-3081 (mirrors the global linear_trends_effects placement at :3405-3454). The helper reads path_effects[path]["horizons"][l]["se"] which by then is bootstrap-overwritten under n_bootstrap > 0, so cumulated SE becomes a running sum of bootstrap per-horizon SEs. Also addresses the P2 test-coverage gap with two new regressions in TestByPathTrendsLinear: - test_bootstrap_cumulated_uses_post_bootstrap_per_horizon_se: positive case asserting cumulated SE == running sum of post-bootstrap per- horizon SEs (rtol 1e-12). - test_bootstrap_cumulated_nan_consistent_when_n_bootstrap_one: n_bootstrap=1 case asserting cumulated SE / t_stat / p_value / conf_int are all NaN per the library-wide NaN-on-invalid contract. REGISTRY note updated with the post-bootstrap recomputation contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI reviewer flagged that by_path + trends_linear was ungated on all panels but only warned on F_g<3 exclusions, while the analogous by_path + controls path warns on multi-baseline switcher panels (D_{g,1} multiplicity) where R's per-path full-pipeline call diverges from Python's global-then-disaggregate architecture. The by_path docstring already acknowledged the "same multi-baseline divergence pattern as controls" (line 461-462), but the runtime warning was missing — a docstring-vs-implementation gap. Fix: mirror the controls warning at chaisemartin_dhaultfoeuille.py :1565-1584 inside the trends_linear block (right after the F_g<3 warning). Same predicate (switcher mask on first_switch_idx_arr, unique baseline check). Different message naming the trends_linear mechanism (full-pipeline including first-differencing per path subset). REGISTRY note updated to spell out the warning explicitly. Three regression tests added in TestByPathTrendsLinear, mirroring TestByPathControls: - test_multi_baseline_panel_emits_r_deviation_warning: panel with joiners and leavers must emit the warning - test_single_baseline_panel_does_not_emit_r_deviation_warning: single-baseline panel must not warn - test_single_baseline_heterogeneous_F_g_does_not_warn: pins the precise predicate (baseline multiplicity, NOT F_g multiplicity) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P1 fix — test coverage gaps for newly lifted parameter interactions: - test_per_period_effects_unaffected_by_trends_linear_by_path now actually inspects per_period_effects (was checking event_study_effects by mistake). Asserts no_bp == bp on did_plus_t / did_minus_t at every horizon since per-period DID uses raw Y_mat per the contract at chaisemartin_dhaultfoeuille.py:1493-1496. - Symmetric test_per_period_effects_unaffected_by_trends_nonparam_by_path added — set_ids doesn't touch the per-period path either. - test_per_path_placebos_with_trends_linear_present asserts path_placebo_event_study is populated under by_path + trends_linear + placebo=True with finite values. The R-parity path skips placebo rows due to documented divergence; this regression locks the Python-side surface population so the path doesn't silently regress. - test_sup_t_bands_with_trends_linear_finite_crit and the nonparam symmetric test pin the bootstrap-collector path: path_sup_t_bands populated, cband_lower/upper columns surface on to_dataframe(level="by_path") under both new combinations with n_bootstrap > 0. P3 fix — docstring schema staleness: - by_path docstring corrected: path_effects[path]["horizons"][l] (not path_effects[path][l]). - to_dataframe(level="by_path") schema now lists cumulated_effect / cumulated_se columns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Final P3 cleanup round, no methodology / behavior changes: - REGISTRY.md by_path Note schema now lists cumulated_effect / cumulated_se in the to_dataframe(level="by_path") column list (was missing). - REGISTRY.md trends_linear paragraph: corrected path_effects[path][l] -> path_effects[path]["horizons"][l] to match the shipped result-class shape. - TestByPathTrendsLinear class docstring: same path_effects shape correction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…string P1 fixes — coverage gaps for newly reachable bootstrap+placebo paths: - test_per_path_placebos_with_trends_linear_bootstrap_inference: asserts negative-horizon SE differs between analytical and bootstrap fits, proving the placebo bootstrap propagation block runs through the first-differenced path. - test_per_path_placebos_with_trends_linear_bootstrap_nan_consistent: n_bootstrap=1 case asserting NaN-consistent inference on negative horizons (locks the library-wide NaN-on-invalid contract on this newly reachable surface). - test_per_path_placebos_with_trends_nonparam_bootstrap_inference: comparison fit with vs without trends_nonparam under bootstrap + placebo; asserts negative-horizon SE differs, proving set_ids reaches _collect_path_placebo_bootstrap_inputs. P3 fix — Results dataclass attribute documentation: - Added a path_cumulated_event_study attribute entry to ChaisemartinDHaultfoeuilleResults attributes docstring (was added as a field but missing from the public attribute table). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ontract CI reviewer flagged that the by_path + trends_linear public contract overstated R parity. The shipped contract said "single-baseline panels" broadly, but the parity scenario was restricted to F_g >= 4 because F_g=3 switchers trigger a 30%+ point-estimate divergence between Python's global-then-disaggregate architecture and R's per-path full- pipeline call (only 1 valid pre-window Z value remains after first-differencing + time==1 filter, triggering different control- eligibility treatment). Fix: add a targeted UserWarning fired at fit-time when by_path + trends_linear is set AND any switcher has F_g==3 (predicate: first_switch_idx_arr == 2). Mirrors the existing F_g < 3 exclusion warning but for the boundary case rather than the exclusion case. Documentation: docstring at the by_path paragraph now narrows the parity claim to "single-baseline panels with sufficient pre-window depth (F_g >= 4 for every selected-path switcher)" and explicitly documents the F_g=3 boundary regime + warning. REGISTRY adds the same narrowing + warning text to the per-path linear-trends Note. CHANGELOG adds the divergence note to the Unreleased entry. Regression test test_F_g_three_boundary_case_emits_warning added in TestByPathTrendsLinear: 4 F_g=3 switchers + 4 F_g=4 switchers in a single-baseline panel, asserts the new warning fires (named "by_path + trends_linear" + "F_g=3"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Wave 3 #6+#7 of the dCDH by_path follow-up sequence (after PR #378 shipped #5 by_path + controls). Removes the two NotImplementedError gates at chaisemartin_dhaultfoeuille.py:1014-1023 and adds: - New `path_cumulated_event_study` Results field surfacing the cumulated level effect delta_l per path under trends_linear=True (mirrors the global linear_trends_effects cumulation; this is what R returns under did_multiplegt_dyn(..., by_path, trends_lin)). - `set_ids` parameter threaded through the four per-path IF helpers so trends_nonparam's set-restricted control pool reaches per-path analytical SE, bootstrap, placebos, and sup-t bands automatically. - to_dataframe(level="by_path") gains cumulated_effect / cumulated_se columns (always present, NaN-when-None — mirrors cband_*). - summary() renders a per-path "Cumulated Level Effects" sub-section. Validated against R DIDmultiplegtDYN 2.3.3 via two new golden scenarios: - single_baseline_multi_path_by_path_trends_lin (custom DGP: F_g >= 4, cohort-single-path, n_periods=13) — per-path cumulated point estimates match R bit-exactly (POINT_RTOL=1e-9), cumulated SE within ±20% - multi_path_reversible_by_path_trends_nonparam — per-path point estimates AND placebos match R bit-exactly, SE within ±15% Placebo parity for trends_linear is intentionally skipped: R's per-path placebo re-runs on the path-restricted subsample with different control eligibility than Python's global-then-disaggregate architecture, so the divergence is methodological, not a tolerance issue. Internal regression covers placebo + trends_linear (finite values, bootstrap inheritance). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…strap CI reviewer flagged that the cumulated layer was built before the bootstrap propagation block, so when n_bootstrap > 0 it surfaced stale analytical SEs / t-stats / p-values / CIs while path_effects[path][l] had been overwritten with bootstrap stats — a public-surface inconsistency with the library-wide bootstrap contract. Fix: move the _compute_path_cumulated_event_study() call to AFTER the bootstrap propagation block at chaisemartin_dhaultfoeuille.py:3034-3081 (mirrors the global linear_trends_effects placement at :3405-3454). The helper reads path_effects[path]["horizons"][l]["se"] which by then is bootstrap-overwritten under n_bootstrap > 0, so cumulated SE becomes a running sum of bootstrap per-horizon SEs. Also addresses the P2 test-coverage gap with two new regressions in TestByPathTrendsLinear: - test_bootstrap_cumulated_uses_post_bootstrap_per_horizon_se: positive case asserting cumulated SE == running sum of post-bootstrap per- horizon SEs (rtol 1e-12). - test_bootstrap_cumulated_nan_consistent_when_n_bootstrap_one: n_bootstrap=1 case asserting cumulated SE / t_stat / p_value / conf_int are all NaN per the library-wide NaN-on-invalid contract. REGISTRY note updated with the post-bootstrap recomputation contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

CI reviewer flagged that by_path + trends_linear was ungated on all panels but only warned on F_g<3 exclusions, while the analogous by_path + controls path warns on multi-baseline switcher panels (D_{g,1} multiplicity) where R's per-path full-pipeline call diverges from Python's global-then-disaggregate architecture. The by_path docstring already acknowledged the "same multi-baseline divergence pattern as controls" (line 461-462), but the runtime warning was missing — a docstring-vs-implementation gap. Fix: mirror the controls warning at chaisemartin_dhaultfoeuille.py :1565-1584 inside the trends_linear block (right after the F_g<3 warning). Same predicate (switcher mask on first_switch_idx_arr, unique baseline check). Different message naming the trends_linear mechanism (full-pipeline including first-differencing per path subset). REGISTRY note updated to spell out the warning explicitly. Three regression tests added in TestByPathTrendsLinear, mirroring TestByPathControls: - test_multi_baseline_panel_emits_r_deviation_warning: panel with joiners and leavers must emit the warning - test_single_baseline_panel_does_not_emit_r_deviation_warning: single-baseline panel must not warn - test_single_baseline_heterogeneous_F_g_does_not_warn: pins the precise predicate (baseline multiplicity, NOT F_g multiplicity) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P1 fix — test coverage gaps for newly lifted parameter interactions: - test_per_period_effects_unaffected_by_trends_linear_by_path now actually inspects per_period_effects (was checking event_study_effects by mistake). Asserts no_bp == bp on did_plus_t / did_minus_t at every horizon since per-period DID uses raw Y_mat per the contract at chaisemartin_dhaultfoeuille.py:1493-1496. - Symmetric test_per_period_effects_unaffected_by_trends_nonparam_by_path added — set_ids doesn't touch the per-period path either. - test_per_path_placebos_with_trends_linear_present asserts path_placebo_event_study is populated under by_path + trends_linear + placebo=True with finite values. The R-parity path skips placebo rows due to documented divergence; this regression locks the Python-side surface population so the path doesn't silently regress. - test_sup_t_bands_with_trends_linear_finite_crit and the nonparam symmetric test pin the bootstrap-collector path: path_sup_t_bands populated, cband_lower/upper columns surface on to_dataframe(level="by_path") under both new combinations with n_bootstrap > 0. P3 fix — docstring schema staleness: - by_path docstring corrected: path_effects[path]["horizons"][l] (not path_effects[path][l]). - to_dataframe(level="by_path") schema now lists cumulated_effect / cumulated_se columns. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Final P3 cleanup round, no methodology / behavior changes: - REGISTRY.md by_path Note schema now lists cumulated_effect / cumulated_se in the to_dataframe(level="by_path") column list (was missing). - REGISTRY.md trends_linear paragraph: corrected path_effects[path][l] -> path_effects[path]["horizons"][l] to match the shipped result-class shape. - TestByPathTrendsLinear class docstring: same path_effects shape correction. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…string P1 fixes — coverage gaps for newly reachable bootstrap+placebo paths: - test_per_path_placebos_with_trends_linear_bootstrap_inference: asserts negative-horizon SE differs between analytical and bootstrap fits, proving the placebo bootstrap propagation block runs through the first-differenced path. - test_per_path_placebos_with_trends_linear_bootstrap_nan_consistent: n_bootstrap=1 case asserting NaN-consistent inference on negative horizons (locks the library-wide NaN-on-invalid contract on this newly reachable surface). - test_per_path_placebos_with_trends_nonparam_bootstrap_inference: comparison fit with vs without trends_nonparam under bootstrap + placebo; asserts negative-horizon SE differs, proving set_ids reaches _collect_path_placebo_bootstrap_inputs. P3 fix — Results dataclass attribute documentation: - Added a path_cumulated_event_study attribute entry to ChaisemartinDHaultfoeuilleResults attributes docstring (was added as a field but missing from the public attribute table). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ontract CI reviewer flagged that the by_path + trends_linear public contract overstated R parity. The shipped contract said "single-baseline panels" broadly, but the parity scenario was restricted to F_g >= 4 because F_g=3 switchers trigger a 30%+ point-estimate divergence between Python's global-then-disaggregate architecture and R's per-path full- pipeline call (only 1 valid pre-window Z value remains after first-differencing + time==1 filter, triggering different control- eligibility treatment). Fix: add a targeted UserWarning fired at fit-time when by_path + trends_linear is set AND any switcher has F_g==3 (predicate: first_switch_idx_arr == 2). Mirrors the existing F_g < 3 exclusion warning but for the boundary case rather than the exclusion case. Documentation: docstring at the by_path paragraph now narrows the parity claim to "single-baseline panels with sufficient pre-window depth (F_g >= 4 for every selected-path switcher)" and explicitly documents the F_g=3 boundary regime + warning. REGISTRY adds the same narrowing + warning text to the per-path linear-trends Note. CHANGELOG adds the divergence note to the Unreleased entry. Regression test test_F_g_three_boundary_case_emits_warning added in TestByPathTrendsLinear: 4 F_g=3 switchers + 4 F_g=4 switchers in a single-baseline panel, asserts the new warning fires (named "by_path + trends_linear" + "F_g=3"). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 26, 2026

igerber force-pushed the dcdh-by-path-trends branch from f38c703 to 7c7b790 Compare April 26, 2026 15:24

igerber and others added 3 commits April 26, 2026 16:33

igerber and others added 4 commits April 26, 2026 16:33

igerber force-pushed the dcdh-by-path-trends branch from 7c7b790 to 41d54b9 Compare April 26, 2026 20:36

Conversation

igerber commented Apr 26, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

igerber commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant