Skip to content

dCDH by_path: lift trends_linear + trends_nonparam gates (Wave 3 #6+#7)#393

Open
igerber wants to merge 7 commits intomainfrom
dcdh-by-path-trends
Open

dCDH by_path: lift trends_linear + trends_nonparam gates (Wave 3 #6+#7)#393
igerber wants to merge 7 commits intomainfrom
dcdh-by-path-trends

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Apr 26, 2026

Summary

Lifts the two NotImplementedError gates blocking ChaisemartinDHaultfoeuille.by_path from combining with trends_linear (DID^{fd} group-specific linear trends) and trends_nonparam (state-set trends), Wave 3 #6+#7 of the dCDH by_path follow-up sequence (PR #378 shipped #5).

  • path_cumulated_event_study (new Results field): surfaces the cumulated level effect delta_l = sum_{l'=1..l} DID^{fd}_{path, l'} per path under trends_linear=True. This is what R returns under did_multiplegt_dyn(..., by_path, trends_lin). Mirrors the global linear_trends_effects cumulation; raw per-horizon DID^{fd}_l per path still surfaces on path_effects[path][l].
  • set_ids threading: set_ids parameter added to the four per-path IF helpers (_compute_path_effects, _compute_path_placebos, _collect_path_bootstrap_inputs, _collect_path_placebo_bootstrap_inputs) so trends_nonparam's set-restricted control pool reaches per-path analytical SE, bootstrap, placebos, and sup-t bands automatically.
  • to_dataframe(level="by_path") gains cumulated_effect / cumulated_se columns (always present, NaN-when-None — mirrors the cband_* convention). summary() renders a per-path "Cumulated Level Effects" sub-section under each path block.

R parity validated via two new golden-value scenarios (single_baseline_multi_path_by_path_trends_lin for trends_linear and multi_path_reversible_by_path_trends_nonparam for trends_nonparam). Per-path point estimates match R bit-exactly (POINT_RTOL=1e-9); cumulated SE within CUM_SE_RTOL=0.20 (widened vs the 0.12 used for non-cumulated by_path parity because the conservative upper-bound SE compounds the cross-path cohort-sharing deviation under summation). Placebo parity for trends_linear is intentionally skipped (documented architectural divergence between R's per-path full-pipeline call and Python's global-then-disaggregate architecture).

Test plan

  • TestByPathTrendsLinear (10 non-slow + 1 slow): gate removal, path_cumulated_event_study populated under trends_linear, is None without trends_linear, conservative SE upper bound formula, to_dataframe columns, summary() rendering, bootstrap finite SE inheritance, per-period unaffected
  • TestByPathTrendsNonparam (5 non-slow + 1 slow): gate removal, set restriction changes per-path estimates (sanity check that set_ids threading reached the IF helpers), per-path SE finite, validation passthrough (time-varying / missing column), bootstrap finite SE
  • TestDCDHDynRParityByPathTrendsLinear: per-path cumulated point estimates match R bit-exactly on event horizons; cumulated SE within ±20%
  • TestDCDHDynRParityByPathTrendsNonparam: per-path point estimates AND placebos match R bit-exactly; per-path SE within ±15%
  • Regression: TestLinearTrends (non-by_path), TestStateSetTrends (non-by_path), TestCovariateAdjustment (non-by_path) remain unchanged
  • Full suite: pytest tests/test_chaisemartin_dhaultfoeuille.py tests/test_chaisemartin_dhaultfoeuille_parity.py tests/test_methodology_chaisemartin_dhaultfoeuille.py (239 + 10 = 249 tests passing locally)

🤖 Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes — one unmitigated P1 on the new by_path + trends_linear inference surface.

Executive Summary

  • Affected methods: dCDH by_path, DID^{fd} linear-trends cumulation, state-set trends, and per-path bootstrap inference.
  • trends_nonparam looks methodologically consistent in the diff: set_ids is threaded through the per-path analytical, placebo, and bootstrap collector paths the registry requires.
  • The documented R deviations for cohort-mixed per-path SE and by_path + trends_linear placebo parity are explicitly recorded in docs/methodology/REGISTRY.md:L641-L643, so I did not treat them as defects.
  • P1: the new path_cumulated_event_study surface is computed before bootstrap and never updated after bootstrap overwrites path_effects, so bootstrap runs mix bootstrap raw-path inference with analytical cumulated-path inference.
  • The added tests do not cover that bootstrap/cumulation path; they only assert bootstrap finiteness on res.path_effects.

Methodology

  • P1 Impact: path_cumulated_event_study is built from analytical per-horizon SEs at diff_diff/chaisemartin_dhaultfoeuille.py:L2075-L2126, but bootstrap later overwrites only path_effects[path]["horizons"][l] at diff_diff/chaisemartin_dhaultfoeuille.py:L3034-L3096. The global analogue linear_trends_effects is assembled after bootstrap propagation at diff_diff/chaisemartin_dhaultfoeuille.py:L3405-L3454, so the new by-path surface is inconsistent with the existing methodology contract. summary() renders stale analytical cumulated SE/t/p rows at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1308-L1332, and to_dataframe(level="by_path") exports stale analytical cumulated_se at diff_diff/chaisemartin_dhaultfoeuille_results.py:L1713-L1774, even when the raw path horizons are using bootstrap inference. Concrete fix: recompute path_cumulated_event_study after bootstrap propagation, or add a dedicated overwrite block so its se, t_stat, p_value, and conf_int are derived from the final post-bootstrap path_effects state and obey the same NaN-on-invalid bootstrap contract as the rest of dCDH.

Code Quality

  • No additional findings in the changed code. The diff uses safe_inference() rather than introducing new inline inference anti-patterns.

Performance

  • No findings in the diff.

Maintainability

  • No additional findings in the diff.

Tech Debt

  • No separate tech-debt finding, but the blocking issue above is not tracked in TODO.md:L51-L113, so it remains unmitigated.

Security

  • No findings in the changed files.

Documentation/Tests

  • P2 Impact: the new tests do not exercise the failing bootstrap path the registry claims is covered. tests/test_chaisemartin_dhaultfoeuille.py:L7199-L7221 only checks finite bootstrap SEs on res.path_effects; it never asserts anything about res.path_cumulated_event_study after bootstrap. Meanwhile docs/methodology/REGISTRY.md:L641-L643 says the by_path + trends_linear cross-surface invariants, including bootstrap and path_cumulated_event_study, are regression-tested at TestByPathTrendsLinear. That gap is why the P1 above can slip through. Concrete fix: add bootstrap regressions for path_cumulated_event_study, including both a normal bootstrap case and an invalid-bootstrap case (n_bootstrap=1) asserting the cumulated inference tuple goes NaN instead of retaining analytical values.

Path to Approval

  1. Recompute or overwrite results.path_cumulated_event_study after bootstrap_results.path_ses are propagated, so cumulated-path inference uses the final bootstrap horizon state rather than pre-bootstrap analytical SEs.
  2. Add regression coverage for by_path + trends_linear + n_bootstrap > 0, including:
    • a positive-bootstrap case asserting cumulated SE follows the documented running-sum rule on the final per-horizon path SEs;
    • an n_bootstrap=1 case asserting se, t_stat, p_value, and conf_int on the cumulated surface are NaN-consistent.

igerber added a commit that referenced this pull request Apr 26, 2026
…strap

CI reviewer flagged that the cumulated layer was built before the
bootstrap propagation block, so when n_bootstrap > 0 it surfaced stale
analytical SEs / t-stats / p-values / CIs while path_effects[path][l]
had been overwritten with bootstrap stats — a public-surface
inconsistency with the library-wide bootstrap contract.

Fix: move the _compute_path_cumulated_event_study() call to AFTER the
bootstrap propagation block at chaisemartin_dhaultfoeuille.py:3034-3081
(mirrors the global linear_trends_effects placement at :3405-3454).
The helper reads path_effects[path]["horizons"][l]["se"] which by then
is bootstrap-overwritten under n_bootstrap > 0, so cumulated SE
becomes a running sum of bootstrap per-horizon SEs.

Also addresses the P2 test-coverage gap with two new regressions in
TestByPathTrendsLinear:

- test_bootstrap_cumulated_uses_post_bootstrap_per_horizon_se: positive
  case asserting cumulated SE == running sum of post-bootstrap per-
  horizon SEs (rtol 1e-12).
- test_bootstrap_cumulated_nan_consistent_when_n_bootstrap_one:
  n_bootstrap=1 case asserting cumulated SE / t_stat / p_value /
  conf_int are all NaN per the library-wide NaN-on-invalid contract.

REGISTRY note updated with the post-bootstrap recomputation contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 0a2783a7f5b36a61e970a906dc30ded48dd98205


Overall Assessment

⚠️ Needs changes — one unmitigated P1 remains on the newly lifted by_path + trends_linear path.

Executive Summary

  • Affected methods: dCDH by_path, per-path trends_linear DID^{fd} cumulation, per-path trends_nonparam control-pool restriction, and the by-path bootstrap/placebo/sup-t surfaces.
  • The prior rereview blocker is fixed: path_cumulated_event_study is now recomputed after bootstrap propagation, and the new tests pin both the post-bootstrap running-sum rule and NaN-on-invalid-bootstrap behavior.
  • trends_nonparam looks methodologically consistent in the diff: set_ids now reaches the four per-path IF/bootstrap helpers the registry says should consume the set-restricted control pool.
  • P1: by_path + trends_linear is now ungated on all panels, but unlike the analogous by_path + controls path it still does not warn on multi-baseline switcher panels even though the new docs acknowledge the same R-divergence pattern.
  • I could not run pytest here because pytest is not installed; verification was static plus a no-write compile() parse check.

Methodology

  • Severity: P1
    Impact: The new by_path + trends_linear path is only parity-validated on single-baseline panels, and the method docstring explicitly says it has the “same multi-baseline divergence pattern as controls”; however the actual trends_linear fit path only warns about F_g < 3 exclusions and has no analogous multi-baseline warning. The control path does implement that warning. This leaves a newly ungated regime where per-path point estimates can materially diverge from did_multiplegt_dyn(..., by_path, trends_lin) with no user-visible signal. diff_diff/chaisemartin_dhaultfoeuille.py:L441-L463, diff_diff/chaisemartin_dhaultfoeuille.py:L1529-L1584, diff_diff/chaisemartin_dhaultfoeuille.py:L1619-L1657, docs/methodology/REGISTRY.md:L641-L643
    Concrete fix: Mirror the by_path + controls warning condition for by_path + trends_linear (switcher baselines D_{g,1} take multiple values), and add explicit deviation language to the registry paragraph so the public contract matches the runtime behavior.

Resolved in this rereview: the earlier stale-bootstrap-cumulation issue is fixed by recomputing path_cumulated_event_study after bootstrap propagation. diff_diff/chaisemartin_dhaultfoeuille.py:L3078-L3116, tests/test_chaisemartin_dhaultfoeuille.py:L7223-L7319

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No separate finding. The P1 above is not tracked in TODO.md:L51-L113, so it remains unmitigated.

Security

  • No findings.

Documentation/Tests

  • Severity: P2
    Impact: The new TestByPathTrendsLinear and parity coverage only exercise single-baseline fixtures, so the known multi-baseline divergence/warning path is not regression-tested. By contrast, by_path + controls already pins both the warning and the no-warning single-baseline case. That gap is why the missing runtime warning above is easy to ship unnoticed. tests/test_chaisemartin_dhaultfoeuille.py:L6620-L6688, tests/test_chaisemartin_dhaultfoeuille.py:L6941-L7319, tests/test_chaisemartin_dhaultfoeuille_parity.py:L900-L1054
    Concrete fix: Add a multi-baseline by_path + trends_linear fixture that must emit the warning, plus a single-baseline / heterogeneous-F_g fixture that must not emit it, mirroring the existing TestByPathControls pattern.

Path to Approval

  1. Add a fit-time UserWarning for by_path + trends_linear on multi-baseline switcher panels, using the same D_{g,1} multiplicity check already used for by_path + controls.
  2. Update docs/methodology/REGISTRY.md to spell out that multi-baseline by_path + trends_linear can diverge from R point estimates and that the estimator warns when this regime is detected.
  3. Add regression tests for the warning contract: one multi-baseline fixture that must warn, and one single-baseline / heterogeneous-F_g fixture that must not.

igerber added a commit that referenced this pull request Apr 26, 2026
CI reviewer flagged that by_path + trends_linear was ungated on all
panels but only warned on F_g<3 exclusions, while the analogous
by_path + controls path warns on multi-baseline switcher panels
(D_{g,1} multiplicity) where R's per-path full-pipeline call diverges
from Python's global-then-disaggregate architecture. The by_path
docstring already acknowledged the "same multi-baseline divergence
pattern as controls" (line 461-462), but the runtime warning was
missing — a docstring-vs-implementation gap.

Fix: mirror the controls warning at chaisemartin_dhaultfoeuille.py
:1565-1584 inside the trends_linear block (right after the F_g<3
warning). Same predicate (switcher mask on first_switch_idx_arr,
unique baseline check). Different message naming the trends_linear
mechanism (full-pipeline including first-differencing per path
subset).

REGISTRY note updated to spell out the warning explicitly.

Three regression tests added in TestByPathTrendsLinear, mirroring
TestByPathControls:
- test_multi_baseline_panel_emits_r_deviation_warning: panel with
  joiners and leavers must emit the warning
- test_single_baseline_panel_does_not_emit_r_deviation_warning:
  single-baseline panel must not warn
- test_single_baseline_heterogeneous_F_g_does_not_warn: pins the
  precise predicate (baseline multiplicity, NOT F_g multiplicity)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 35ab5fe66c236759ec4a1e2e370ec1d10fffbc66


Overall Assessment

⚠️ Needs changes — the prior methodology blocker is fixed, but one unmitigated P1 remains: the new by_path + trends_linear / by_path + trends_nonparam regressions do not actually pin the newly lifted parameter interactions across all existing by-path inference surfaces.

Executive Summary

  • The prior rereview blocker is resolved: by_path + trends_linear now emits the multi-baseline R-divergence warning at fit time, and the registry note documents it.
  • The implementation itself looks methodologically aligned with the registry: per-path trends_linear cumulation mirrors global linear_trends_effects, and set_ids is threaded through the four per-path IF/bootstrap helpers.
  • P1: coverage for the newly lifted trends_* paths is incomplete. The new trends_linear “per-period unaffected” test never inspects per_period_effects; the parity test explicitly skips negative-horizon placebos; and the new trends_nonparam class never pins the modified sup-t / cband_* bootstrap path.
  • P3: two public docstrings are stale relative to the shipped API: fit() still says path_effects[path][l], and to_dataframe(level="by_path") omits the new cumulated_effect / cumulated_se columns.
  • pytest was unavailable in this environment; review was static plus an in-memory compile() parse check of the touched Python files.

Methodology

  • No findings. The prior rereview issue is addressed: the multi-baseline by_path + trends_linear warning is present in diff_diff/chaisemartin_dhaultfoeuille.py:L1651-L1680, and the deviation is documented in docs/methodology/REGISTRY.md:L641.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. The P1 below is not tracked in TODO.md, so it remains unmitigated.

Security

  • No findings.

Documentation/Tests

  • P1 Impact: the newly lifted parameter interactions are not actually regression-tested across all by-path surfaces. The code explicitly preserves raw per-period outputs under trends_linear in diff_diff/chaisemartin_dhaultfoeuille.py:L1774-L1788, but test_per_period_effects_unaffected_by_trends_linear_by_path never checks per_period_effects and instead re-checks event_study_effects in tests/test_chaisemartin_dhaultfoeuille.py:L7161-L7197. The R parity test for by_path + trends_linear also skips every negative-horizon placebo row in tests/test_chaisemartin_dhaultfoeuille_parity.py:L1015-L1020, so the newly ungated raw-placebo path is only executed, not asserted. On the trends_nonparam side, the new class stops at point-estimate / SE / validation checks in tests/test_chaisemartin_dhaultfoeuille.py:L7486-L7630; unlike the existing controls coverage in tests/test_chaisemartin_dhaultfoeuille.py:L6427-L6560, it adds no parameter-specific regression for the modified path-level sup-t / cband_* bootstrap path. Under the project’s own parameter-interaction rule, that leaves silent regressions possible in newly lifted code paths. Concrete fix: add explicit regressions for per_period_effects, path_placebo_event_study, and path_sup_t_bands / to_dataframe(level="by_path") under the new trends_linear and trends_nonparam combinations.
  • P3 Impact: the public docstrings still describe the result schema inaccurately. fit() says raw DID^{fd} values surface at path_effects[path][l] in diff_diff/chaisemartin_dhaultfoeuille.py:L441-L448, but the actual API uses path_effects[path]["horizons"][l]. to_dataframe(level="by_path") also omits the new cumulated_effect / cumulated_se columns from its documented schema in diff_diff/chaisemartin_dhaultfoeuille_results.py:L1417-L1430. Concrete fix: update both docstrings to match the shipped shape and column set.

Path to Approval

  1. Replace tests/test_chaisemartin_dhaultfoeuille.py:L7161-L7197 with a real per_period_effects invariance check, and add the same per_period_effects regression for by_path + trends_nonparam.
  2. Add a by_path + trends_linear + placebo=True regression that asserts path_placebo_event_study negative horizons are populated and NaN-consistent, rather than only executing that path and skipping it in parity.
  3. Add parameter-specific bootstrap/sup-t regressions for the newly threaded paths, asserting that path_sup_t_bands and to_dataframe(level="by_path") cband_* columns are populated correctly under by_path + trends_nonparam (and ideally trends_linear) when n_bootstrap > 0.

igerber added a commit that referenced this pull request Apr 26, 2026
P1 fix — test coverage gaps for newly lifted parameter interactions:

- test_per_period_effects_unaffected_by_trends_linear_by_path now
  actually inspects per_period_effects (was checking event_study_effects
  by mistake). Asserts no_bp == bp on did_plus_t / did_minus_t at every
  horizon since per-period DID uses raw Y_mat per the contract at
  chaisemartin_dhaultfoeuille.py:1493-1496.
- Symmetric test_per_period_effects_unaffected_by_trends_nonparam_by_path
  added — set_ids doesn't touch the per-period path either.
- test_per_path_placebos_with_trends_linear_present asserts
  path_placebo_event_study is populated under by_path + trends_linear +
  placebo=True with finite values. The R-parity path skips placebo
  rows due to documented divergence; this regression locks the
  Python-side surface population so the path doesn't silently regress.
- test_sup_t_bands_with_trends_linear_finite_crit and the nonparam
  symmetric test pin the bootstrap-collector path: path_sup_t_bands
  populated, cband_lower/upper columns surface on
  to_dataframe(level="by_path") under both new combinations with
  n_bootstrap > 0.

P3 fix — docstring schema staleness:
- by_path docstring corrected: path_effects[path]["horizons"][l]
  (not path_effects[path][l]).
- to_dataframe(level="by_path") schema now lists cumulated_effect /
  cumulated_se columns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 6cf673ca287bc18bd5ebf26328d8e7f3288d3f7c


Overall Assessment

✅ Looks good - the prior re-review P1 is addressed, and I found no unmitigated P0/P1 issues in the changed estimator paths. One P3 documentation inconsistency remains.

Executive Summary

  • The prior P1 test-surface gap is closed: the PR now explicitly covers per_period_effects, trends_linear per-path placebos, and trends_nonparam bootstrap/sup-t/cband paths in tests/test_chaisemartin_dhaultfoeuille.py:L6941-L7825 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L900-L1162.
  • Methodology looks aligned with the registry: by_path + trends_linear now warns on the documented multi-baseline R divergence, and the new per-path cumulated layer is computed after bootstrap from the final per-horizon SEs, mirroring the global linear_trends_effects contract in diff_diff/chaisemartin_dhaultfoeuille.py:L1657-L1691, L3119-L3153, L3462-L3511.
  • trends_nonparam appears fully threaded through the by-path analytical, placebo, bootstrap, and sup-t helper stack via set_ids in diff_diff/chaisemartin_dhaultfoeuille.py:L2134-L2140, L2831-L2876, L5506-L6215.
  • I did not find new NaN/inference, empty-result, or control-pool correctness defects in the changed code.
  • P3: the methodology registry still contains stale by-path schema text for this new surface in docs/methodology/REGISTRY.md:L641.
  • pytest was unavailable here (pytest: command not found), so this review is static plus a source-level compile() parse check of the touched Python files.

Methodology

No findings. The affected methods are Phase 3 by_path disaggregation plus the trends_linear and trends_nonparam extensions. The implementation matches the documented contract: the multi-baseline trends_linear deviation is surfaced with a warning in diff_diff/chaisemartin_dhaultfoeuille.py:L1657-L1691, set_ids is threaded through the four by-path analytical/bootstrap helpers in diff_diff/chaisemartin_dhaultfoeuille.py:L2134-L2140, L2831-L2876, L5506-L6215, and the new cumulated layer is built after bootstrap from final per-horizon SEs in diff_diff/chaisemartin_dhaultfoeuille.py:L3119-L3153, L5690-L5814. The main deviations from R are documented in docs/methodology/REGISTRY.md:L641-L643, so they are mitigated rather than silent defects.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

The previous re-review P1 is resolved: per-period invariance, trends-linear negative-horizon placebo population, and trends_nonparam bootstrap/sup-t/cband coverage are now explicitly tested in tests/test_chaisemartin_dhaultfoeuille.py:L7161-L7805, with new parity coverage in tests/test_chaisemartin_dhaultfoeuille_parity.py:L900-L1162.

  • Severity: P3. Impact: docs/methodology/REGISTRY.md:L641 still describes the old by-path result shape. The top-level to_dataframe(level="by_path") schema omits cumulated_effect / cumulated_se, and the new linear-trends text still says path_effects[path][l] instead of path_effects[path]["horizons"][l]. The same stale shape also appears in the new test-class docstring at tests/test_chaisemartin_dhaultfoeuille.py:L6944-L6948. Concrete fix: update those docstrings to match the shipped API already described in diff_diff/chaisemartin_dhaultfoeuille_results.py:L1417-L1436.

igerber added a commit that referenced this pull request Apr 26, 2026
Final P3 cleanup round, no methodology / behavior changes:

- REGISTRY.md by_path Note schema now lists cumulated_effect /
  cumulated_se in the to_dataframe(level="by_path") column list
  (was missing).
- REGISTRY.md trends_linear paragraph: corrected
  path_effects[path][l] -> path_effects[path]["horizons"][l] to
  match the shipped result-class shape.
- TestByPathTrendsLinear class docstring: same path_effects shape
  correction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3d45bf34f170019fe924809b9da378a67005f65f


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The affected methods are Phase 3 by_path disaggregation under trends_linear (per-path DID^{fd} plus cumulated level effects) and trends_nonparam (set-restricted controls). I did not find an undocumented estimator/SE deviation against docs/methodology/REGISTRY.md.
  • The prior documentation issue looks resolved: the registry now documents the new by_path + trends_linear / by_path + trends_nonparam contracts.
  • I did not find a new NaN-inference or control-pool bug in the changed estimator code. safe_inference() is used consistently, and set_ids is threaded into the intended per-path helpers.
  • P1: the newly reachable negative-horizon bootstrap path is still untested for by_path + trends_linear + placebo + n_bootstrap.
  • P1: the new set_ids threading into the per-path placebo bootstrap collector is still untested for by_path + trends_nonparam + placebo + n_bootstrap.
  • I could not run pytest here because pytest is not installed; this is a static review plus source compile() checks on the touched Python files.

Methodology

No findings. The implementation now matches the documented registry contract for:

  • per-path raw DID^{fd} on path_effects[path]["horizons"][l]
  • per-path cumulated level effects on path_cumulated_event_study
  • the documented multi-baseline divergence warning for by_path + trends_linear
  • set-restricted control pools for by_path + trends_nonparam

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not see an existing TODO.md entry that would mitigate the test-surface gaps below.

Security

No findings.

Documentation/Tests

  • Severity: P1. Impact: diff_diff/chaisemartin_dhaultfoeuille.py:L3155-L3201 and diff_diff/chaisemartin_dhaultfoeuille.py:L6116-L6244 are now reachable under by_path + trends_linear + placebo + n_bootstrap, but the new tests only cover analytical negative horizons at tests/test_chaisemartin_dhaultfoeuille.py:L7251-L7298 and positive-horizon bootstrap at tests/test_chaisemartin_dhaultfoeuille.py:L7228-L7249. No test asserts that negative-horizon rows get bootstrap-derived se / p_value / conf_int on the first-differenced path, so a silent placebo-bootstrap regression would currently pass. Concrete fix: add a TestByPathTrendsLinear case with placebo=True, n_bootstrap>0 that checks negative-horizon path_placebo_event_study entries receive bootstrap-populated inference fields, plus an n_bootstrap=1 check for the full-NaN contract on negative horizons.

  • Severity: P1. Impact: the PR explicitly threads set_ids into _collect_path_placebo_bootstrap_inputs at diff_diff/chaisemartin_dhaultfoeuille.py:L6116-L6244 and its call site at diff_diff/chaisemartin_dhaultfoeuille.py:L2862-L2875, but the new trends_nonparam tests only cover analytical placebos via parity at tests/test_chaisemartin_dhaultfoeuille_parity.py:L1092-L1185 and positive-horizon bootstrap/sup-t at tests/test_chaisemartin_dhaultfoeuille.py:L7729-L7852. No test exercises by_path + trends_nonparam + placebo + n_bootstrap>0, so a wrong negative-horizon control pool would silently ship. Concrete fix: add a TestByPathTrendsNonparam case with placebo=True, n_bootstrap>0 on a fixture where set restriction changes placebo controls, then assert at least one negative-horizon bootstrap result differs from the no-set fit and remains finite.

  • Severity: P3. Impact: diff_diff/chaisemartin_dhaultfoeuille_results.py:L384-L440 still omits the new path_cumulated_event_study attribute from the results-class attribute docs even though the field was added at diff_diff/chaisemartin_dhaultfoeuille_results.py:L555-L570. Public API docs are slightly incomplete. Concrete fix: add a short attribute entry mirroring the field docstring and the to_dataframe(level="by_path") schema.

Path to Approval

  1. Add a negative-horizon bootstrap regression for by_path + trends_linear + placebo + n_bootstrap, including the n_bootstrap=1 NaN-inference contract.
  2. Add a negative-horizon bootstrap regression for by_path + trends_nonparam + placebo + n_bootstrap that proves set_ids reaches the placebo bootstrap collector.

igerber added a commit that referenced this pull request Apr 26, 2026
…string

P1 fixes — coverage gaps for newly reachable bootstrap+placebo paths:

- test_per_path_placebos_with_trends_linear_bootstrap_inference:
  asserts negative-horizon SE differs between analytical and
  bootstrap fits, proving the placebo bootstrap propagation block
  runs through the first-differenced path.
- test_per_path_placebos_with_trends_linear_bootstrap_nan_consistent:
  n_bootstrap=1 case asserting NaN-consistent inference on negative
  horizons (locks the library-wide NaN-on-invalid contract on this
  newly reachable surface).
- test_per_path_placebos_with_trends_nonparam_bootstrap_inference:
  comparison fit with vs without trends_nonparam under bootstrap +
  placebo; asserts negative-horizon SE differs, proving set_ids
  reaches _collect_path_placebo_bootstrap_inputs.

P3 fix — Results dataclass attribute documentation:
- Added a path_cumulated_event_study attribute entry to
  ChaisemartinDHaultfoeuilleResults attributes docstring (was added
  as a field but missing from the public attribute table).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: e890d604006e7c791fa1907bfd9bdbfb728d8242


Overall Assessment

⚠️ Needs changes

Executive Summary

  • Affected methods: ChaisemartinDHaultfoeuille.by_path under trends_linear=True (per-path DID^{fd} plus new path_cumulated_event_study) and under trends_nonparam=... (set-restricted controls threaded into per-path analytical/bootstrapped/placebo paths).
  • The prior re-review P1s look addressed: the diff now adds explicit negative-horizon bootstrap coverage for by_path + trends_linear + placebo and by_path + trends_nonparam + placebo in tests/test_chaisemartin_dhaultfoeuille.py and tests/test_chaisemartin_dhaultfoeuille.py.
  • The set_ids threading itself is consistent across the four intended per-path helpers, so the new trends_nonparam control-pool restriction looks aligned with the existing registry contract.
  • P1: by_path + trends_linear still has an undocumented divergence regime on single-baseline panels with F_g=3. This PR’s benchmark/parity comments say only F_g >= 4 avoids the boundary-case divergence, but the new public contract still states single-baseline parity broadly and the runtime warning only covers the multi-baseline case.
  • I could not run pytest here because pytest is not installed; this is a static review. I did syntax-check the touched Python files with compile() successfully.

Methodology

  • Severity: P1. Impact: The new public contract for by_path + trends_linear overstates when Python matches R. The user-facing docstring says parity holds on “single-baseline panels” and the new runtime warning only fires for multi-baseline switcher panels, but this same PR’s benchmark generator and parity-test docstring explicitly say there is still a separate F_g=3 boundary-case divergence and that the parity fixture had to be restricted to F_g >= 4 to avoid it. That means a paper-valid, single-baseline panel can still produce R-divergent per-path point estimates/cumulated effects with no warning and no registry note. References: diff_diff/chaisemartin_dhaultfoeuille.py, diff_diff/chaisemartin_dhaultfoeuille.py, benchmarks/R/generate_dcdh_dynr_test_values.R, tests/test_chaisemartin_dhaultfoeuille_parity.py, docs/methodology/REGISTRY.md, CHANGELOG.md.
    Concrete fix: Either make the F_g=3 regime match R, or document and warn on it explicitly. At minimum, narrow the contract from “single-baseline panels” to the actually parity-tested regime (for example, selected-path switchers with sufficient pre-window depth, F_g >= 4), add the same limitation to REGISTRY.md/CHANGELOG.md, and emit a targeted UserWarning when any selected path includes F_g=3 switchers.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

No findings on the PR itself. The last review’s two test-surface P1s appear resolved by the new negative-horizon bootstrap tests. Review limitation: pytest is unavailable in this environment, so I could only do static inspection plus compile() on the touched Python files.

Path to Approval

  1. Update the by_path + trends_linear methodology contract to cover the known F_g=3 boundary divergence: either fix the implementation to match R there, or explicitly document and warn on that regime in diff_diff/chaisemartin_dhaultfoeuille.py, docs/methodology/REGISTRY.md, and CHANGELOG.md.
  2. Add a regression test with a single-baseline by_path + trends_linear panel that includes selected-path F_g=3 switchers and asserts the intended behavior: either R-aligned estimates after a code fix, or the new targeted warning plus narrowed contract if the divergence remains documented.

igerber added a commit that referenced this pull request Apr 26, 2026
…ontract

CI reviewer flagged that the by_path + trends_linear public contract
overstated R parity. The shipped contract said "single-baseline panels"
broadly, but the parity scenario was restricted to F_g >= 4 because
F_g=3 switchers trigger a 30%+ point-estimate divergence between
Python's global-then-disaggregate architecture and R's per-path full-
pipeline call (only 1 valid pre-window Z value remains after
first-differencing + time==1 filter, triggering different control-
eligibility treatment).

Fix: add a targeted UserWarning fired at fit-time when by_path +
trends_linear is set AND any switcher has F_g==3 (predicate:
first_switch_idx_arr == 2). Mirrors the existing F_g < 3 exclusion
warning but for the boundary case rather than the exclusion case.

Documentation: docstring at the by_path paragraph now narrows the
parity claim to "single-baseline panels with sufficient pre-window
depth (F_g >= 4 for every selected-path switcher)" and explicitly
documents the F_g=3 boundary regime + warning. REGISTRY adds the
same narrowing + warning text to the per-path linear-trends Note.
CHANGELOG adds the divergence note to the Unreleased entry.

Regression test test_F_g_three_boundary_case_emits_warning added in
TestByPathTrendsLinear: 4 F_g=3 switchers + 4 F_g=4 switchers in a
single-baseline panel, asserts the new warning fires (named
"by_path + trends_linear" + "F_g=3").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented Apr 26, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f38c7034c431e5fa39f212a95005e5d135ef94fc


Overall Assessment
✅ Looks good

Executive Summary

Methodology

Code Quality
No findings. The changed by-path effect, cumulated, and placebo paths use safe_inference() rather than introducing new inline inference anti-patterns. diff_diff/chaisemartin_dhaultfoeuille.py:L5719-L5725 diff_diff/chaisemartin_dhaultfoeuille.py:L5846-L5858 diff_diff/chaisemartin_dhaultfoeuille.py:L6022-L6032

Performance
No findings.

Maintainability
No findings. The new path_cumulated_event_study surface is wired consistently through the result docs, dataclass, summary(), and to_dataframe(level="by_path"). diff_diff/chaisemartin_dhaultfoeuille_results.py:L421-L440 diff_diff/chaisemartin_dhaultfoeuille_results.py:L575-L590 diff_diff/chaisemartin_dhaultfoeuille_results.py:L1328-L1353 diff_diff/chaisemartin_dhaultfoeuille_results.py:L1700-L1803

Tech Debt
No findings.

Security
No findings.

Documentation/Tests
No findings. The documentation and tests now cover the new behavior and the known deviation regimes, including the targeted F_g=3 warning and dedicated R-parity fixtures for both lifted gates. Review limitation: I could not run pytest in this environment. CHANGELOG.md:L8-L11 tests/test_chaisemartin_dhaultfoeuille.py:L6941-L8065 tests/test_chaisemartin_dhaultfoeuille_parity.py:L900-L1115

@igerber igerber added the ready-for-ci Triggers CI test workflows label Apr 26, 2026
igerber added a commit that referenced this pull request Apr 26, 2026
…strap

CI reviewer flagged that the cumulated layer was built before the
bootstrap propagation block, so when n_bootstrap > 0 it surfaced stale
analytical SEs / t-stats / p-values / CIs while path_effects[path][l]
had been overwritten with bootstrap stats — a public-surface
inconsistency with the library-wide bootstrap contract.

Fix: move the _compute_path_cumulated_event_study() call to AFTER the
bootstrap propagation block at chaisemartin_dhaultfoeuille.py:3034-3081
(mirrors the global linear_trends_effects placement at :3405-3454).
The helper reads path_effects[path]["horizons"][l]["se"] which by then
is bootstrap-overwritten under n_bootstrap > 0, so cumulated SE
becomes a running sum of bootstrap per-horizon SEs.

Also addresses the P2 test-coverage gap with two new regressions in
TestByPathTrendsLinear:

- test_bootstrap_cumulated_uses_post_bootstrap_per_horizon_se: positive
  case asserting cumulated SE == running sum of post-bootstrap per-
  horizon SEs (rtol 1e-12).
- test_bootstrap_cumulated_nan_consistent_when_n_bootstrap_one:
  n_bootstrap=1 case asserting cumulated SE / t_stat / p_value /
  conf_int are all NaN per the library-wide NaN-on-invalid contract.

REGISTRY note updated with the post-bootstrap recomputation contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the dcdh-by-path-trends branch from f38c703 to 7c7b790 Compare April 26, 2026 15:24
igerber added a commit that referenced this pull request Apr 26, 2026
CI reviewer flagged that by_path + trends_linear was ungated on all
panels but only warned on F_g<3 exclusions, while the analogous
by_path + controls path warns on multi-baseline switcher panels
(D_{g,1} multiplicity) where R's per-path full-pipeline call diverges
from Python's global-then-disaggregate architecture. The by_path
docstring already acknowledged the "same multi-baseline divergence
pattern as controls" (line 461-462), but the runtime warning was
missing — a docstring-vs-implementation gap.

Fix: mirror the controls warning at chaisemartin_dhaultfoeuille.py
:1565-1584 inside the trends_linear block (right after the F_g<3
warning). Same predicate (switcher mask on first_switch_idx_arr,
unique baseline check). Different message naming the trends_linear
mechanism (full-pipeline including first-differencing per path
subset).

REGISTRY note updated to spell out the warning explicitly.

Three regression tests added in TestByPathTrendsLinear, mirroring
TestByPathControls:
- test_multi_baseline_panel_emits_r_deviation_warning: panel with
  joiners and leavers must emit the warning
- test_single_baseline_panel_does_not_emit_r_deviation_warning:
  single-baseline panel must not warn
- test_single_baseline_heterogeneous_F_g_does_not_warn: pins the
  precise predicate (baseline multiplicity, NOT F_g multiplicity)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 26, 2026
P1 fix — test coverage gaps for newly lifted parameter interactions:

- test_per_period_effects_unaffected_by_trends_linear_by_path now
  actually inspects per_period_effects (was checking event_study_effects
  by mistake). Asserts no_bp == bp on did_plus_t / did_minus_t at every
  horizon since per-period DID uses raw Y_mat per the contract at
  chaisemartin_dhaultfoeuille.py:1493-1496.
- Symmetric test_per_period_effects_unaffected_by_trends_nonparam_by_path
  added — set_ids doesn't touch the per-period path either.
- test_per_path_placebos_with_trends_linear_present asserts
  path_placebo_event_study is populated under by_path + trends_linear +
  placebo=True with finite values. The R-parity path skips placebo
  rows due to documented divergence; this regression locks the
  Python-side surface population so the path doesn't silently regress.
- test_sup_t_bands_with_trends_linear_finite_crit and the nonparam
  symmetric test pin the bootstrap-collector path: path_sup_t_bands
  populated, cband_lower/upper columns surface on
  to_dataframe(level="by_path") under both new combinations with
  n_bootstrap > 0.

P3 fix — docstring schema staleness:
- by_path docstring corrected: path_effects[path]["horizons"][l]
  (not path_effects[path][l]).
- to_dataframe(level="by_path") schema now lists cumulated_effect /
  cumulated_se columns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 26, 2026
Final P3 cleanup round, no methodology / behavior changes:

- REGISTRY.md by_path Note schema now lists cumulated_effect /
  cumulated_se in the to_dataframe(level="by_path") column list
  (was missing).
- REGISTRY.md trends_linear paragraph: corrected
  path_effects[path][l] -> path_effects[path]["horizons"][l] to
  match the shipped result-class shape.
- TestByPathTrendsLinear class docstring: same path_effects shape
  correction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 26, 2026
…string

P1 fixes — coverage gaps for newly reachable bootstrap+placebo paths:

- test_per_path_placebos_with_trends_linear_bootstrap_inference:
  asserts negative-horizon SE differs between analytical and
  bootstrap fits, proving the placebo bootstrap propagation block
  runs through the first-differenced path.
- test_per_path_placebos_with_trends_linear_bootstrap_nan_consistent:
  n_bootstrap=1 case asserting NaN-consistent inference on negative
  horizons (locks the library-wide NaN-on-invalid contract on this
  newly reachable surface).
- test_per_path_placebos_with_trends_nonparam_bootstrap_inference:
  comparison fit with vs without trends_nonparam under bootstrap +
  placebo; asserts negative-horizon SE differs, proving set_ids
  reaches _collect_path_placebo_bootstrap_inputs.

P3 fix — Results dataclass attribute documentation:
- Added a path_cumulated_event_study attribute entry to
  ChaisemartinDHaultfoeuilleResults attributes docstring (was added
  as a field but missing from the public attribute table).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 26, 2026
…ontract

CI reviewer flagged that the by_path + trends_linear public contract
overstated R parity. The shipped contract said "single-baseline panels"
broadly, but the parity scenario was restricted to F_g >= 4 because
F_g=3 switchers trigger a 30%+ point-estimate divergence between
Python's global-then-disaggregate architecture and R's per-path full-
pipeline call (only 1 valid pre-window Z value remains after
first-differencing + time==1 filter, triggering different control-
eligibility treatment).

Fix: add a targeted UserWarning fired at fit-time when by_path +
trends_linear is set AND any switcher has F_g==3 (predicate:
first_switch_idx_arr == 2). Mirrors the existing F_g < 3 exclusion
warning but for the boundary case rather than the exclusion case.

Documentation: docstring at the by_path paragraph now narrows the
parity claim to "single-baseline panels with sufficient pre-window
depth (F_g >= 4 for every selected-path switcher)" and explicitly
documents the F_g=3 boundary regime + warning. REGISTRY adds the
same narrowing + warning text to the per-path linear-trends Note.
CHANGELOG adds the divergence note to the Unreleased entry.

Regression test test_F_g_three_boundary_case_emits_warning added in
TestByPathTrendsLinear: 4 F_g=3 switchers + 4 F_g=4 switchers in a
single-baseline panel, asserts the new warning fires (named
"by_path + trends_linear" + "F_g=3").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber and others added 3 commits April 26, 2026 16:33
Wave 3 #6+#7 of the dCDH by_path follow-up sequence (after PR #378
shipped #5 by_path + controls). Removes the two NotImplementedError
gates at chaisemartin_dhaultfoeuille.py:1014-1023 and adds:

- New `path_cumulated_event_study` Results field surfacing the cumulated
  level effect delta_l per path under trends_linear=True (mirrors the
  global linear_trends_effects cumulation; this is what R returns under
  did_multiplegt_dyn(..., by_path, trends_lin)).
- `set_ids` parameter threaded through the four per-path IF helpers
  so trends_nonparam's set-restricted control pool reaches per-path
  analytical SE, bootstrap, placebos, and sup-t bands automatically.
- to_dataframe(level="by_path") gains cumulated_effect / cumulated_se
  columns (always present, NaN-when-None — mirrors cband_*).
- summary() renders a per-path "Cumulated Level Effects" sub-section.

Validated against R DIDmultiplegtDYN 2.3.3 via two new golden scenarios:
- single_baseline_multi_path_by_path_trends_lin (custom DGP: F_g >= 4,
  cohort-single-path, n_periods=13) — per-path cumulated point estimates
  match R bit-exactly (POINT_RTOL=1e-9), cumulated SE within ±20%
- multi_path_reversible_by_path_trends_nonparam — per-path point
  estimates AND placebos match R bit-exactly, SE within ±15%

Placebo parity for trends_linear is intentionally skipped: R's per-path
placebo re-runs on the path-restricted subsample with different control
eligibility than Python's global-then-disaggregate architecture, so the
divergence is methodological, not a tolerance issue. Internal regression
covers placebo + trends_linear (finite values, bootstrap inheritance).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…strap

CI reviewer flagged that the cumulated layer was built before the
bootstrap propagation block, so when n_bootstrap > 0 it surfaced stale
analytical SEs / t-stats / p-values / CIs while path_effects[path][l]
had been overwritten with bootstrap stats — a public-surface
inconsistency with the library-wide bootstrap contract.

Fix: move the _compute_path_cumulated_event_study() call to AFTER the
bootstrap propagation block at chaisemartin_dhaultfoeuille.py:3034-3081
(mirrors the global linear_trends_effects placement at :3405-3454).
The helper reads path_effects[path]["horizons"][l]["se"] which by then
is bootstrap-overwritten under n_bootstrap > 0, so cumulated SE
becomes a running sum of bootstrap per-horizon SEs.

Also addresses the P2 test-coverage gap with two new regressions in
TestByPathTrendsLinear:

- test_bootstrap_cumulated_uses_post_bootstrap_per_horizon_se: positive
  case asserting cumulated SE == running sum of post-bootstrap per-
  horizon SEs (rtol 1e-12).
- test_bootstrap_cumulated_nan_consistent_when_n_bootstrap_one:
  n_bootstrap=1 case asserting cumulated SE / t_stat / p_value /
  conf_int are all NaN per the library-wide NaN-on-invalid contract.

REGISTRY note updated with the post-bootstrap recomputation contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI reviewer flagged that by_path + trends_linear was ungated on all
panels but only warned on F_g<3 exclusions, while the analogous
by_path + controls path warns on multi-baseline switcher panels
(D_{g,1} multiplicity) where R's per-path full-pipeline call diverges
from Python's global-then-disaggregate architecture. The by_path
docstring already acknowledged the "same multi-baseline divergence
pattern as controls" (line 461-462), but the runtime warning was
missing — a docstring-vs-implementation gap.

Fix: mirror the controls warning at chaisemartin_dhaultfoeuille.py
:1565-1584 inside the trends_linear block (right after the F_g<3
warning). Same predicate (switcher mask on first_switch_idx_arr,
unique baseline check). Different message naming the trends_linear
mechanism (full-pipeline including first-differencing per path
subset).

REGISTRY note updated to spell out the warning explicitly.

Three regression tests added in TestByPathTrendsLinear, mirroring
TestByPathControls:
- test_multi_baseline_panel_emits_r_deviation_warning: panel with
  joiners and leavers must emit the warning
- test_single_baseline_panel_does_not_emit_r_deviation_warning:
  single-baseline panel must not warn
- test_single_baseline_heterogeneous_F_g_does_not_warn: pins the
  precise predicate (baseline multiplicity, NOT F_g multiplicity)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber and others added 4 commits April 26, 2026 16:33
P1 fix — test coverage gaps for newly lifted parameter interactions:

- test_per_period_effects_unaffected_by_trends_linear_by_path now
  actually inspects per_period_effects (was checking event_study_effects
  by mistake). Asserts no_bp == bp on did_plus_t / did_minus_t at every
  horizon since per-period DID uses raw Y_mat per the contract at
  chaisemartin_dhaultfoeuille.py:1493-1496.
- Symmetric test_per_period_effects_unaffected_by_trends_nonparam_by_path
  added — set_ids doesn't touch the per-period path either.
- test_per_path_placebos_with_trends_linear_present asserts
  path_placebo_event_study is populated under by_path + trends_linear +
  placebo=True with finite values. The R-parity path skips placebo
  rows due to documented divergence; this regression locks the
  Python-side surface population so the path doesn't silently regress.
- test_sup_t_bands_with_trends_linear_finite_crit and the nonparam
  symmetric test pin the bootstrap-collector path: path_sup_t_bands
  populated, cband_lower/upper columns surface on
  to_dataframe(level="by_path") under both new combinations with
  n_bootstrap > 0.

P3 fix — docstring schema staleness:
- by_path docstring corrected: path_effects[path]["horizons"][l]
  (not path_effects[path][l]).
- to_dataframe(level="by_path") schema now lists cumulated_effect /
  cumulated_se columns.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final P3 cleanup round, no methodology / behavior changes:

- REGISTRY.md by_path Note schema now lists cumulated_effect /
  cumulated_se in the to_dataframe(level="by_path") column list
  (was missing).
- REGISTRY.md trends_linear paragraph: corrected
  path_effects[path][l] -> path_effects[path]["horizons"][l] to
  match the shipped result-class shape.
- TestByPathTrendsLinear class docstring: same path_effects shape
  correction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…string

P1 fixes — coverage gaps for newly reachable bootstrap+placebo paths:

- test_per_path_placebos_with_trends_linear_bootstrap_inference:
  asserts negative-horizon SE differs between analytical and
  bootstrap fits, proving the placebo bootstrap propagation block
  runs through the first-differenced path.
- test_per_path_placebos_with_trends_linear_bootstrap_nan_consistent:
  n_bootstrap=1 case asserting NaN-consistent inference on negative
  horizons (locks the library-wide NaN-on-invalid contract on this
  newly reachable surface).
- test_per_path_placebos_with_trends_nonparam_bootstrap_inference:
  comparison fit with vs without trends_nonparam under bootstrap +
  placebo; asserts negative-horizon SE differs, proving set_ids
  reaches _collect_path_placebo_bootstrap_inputs.

P3 fix — Results dataclass attribute documentation:
- Added a path_cumulated_event_study attribute entry to
  ChaisemartinDHaultfoeuilleResults attributes docstring (was added
  as a field but missing from the public attribute table).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ontract

CI reviewer flagged that the by_path + trends_linear public contract
overstated R parity. The shipped contract said "single-baseline panels"
broadly, but the parity scenario was restricted to F_g >= 4 because
F_g=3 switchers trigger a 30%+ point-estimate divergence between
Python's global-then-disaggregate architecture and R's per-path full-
pipeline call (only 1 valid pre-window Z value remains after
first-differencing + time==1 filter, triggering different control-
eligibility treatment).

Fix: add a targeted UserWarning fired at fit-time when by_path +
trends_linear is set AND any switcher has F_g==3 (predicate:
first_switch_idx_arr == 2). Mirrors the existing F_g < 3 exclusion
warning but for the boundary case rather than the exclusion case.

Documentation: docstring at the by_path paragraph now narrows the
parity claim to "single-baseline panels with sufficient pre-window
depth (F_g >= 4 for every selected-path switcher)" and explicitly
documents the F_g=3 boundary regime + warning. REGISTRY adds the
same narrowing + warning text to the per-path linear-trends Note.
CHANGELOG adds the divergence note to the Unreleased entry.

Regression test test_F_g_three_boundary_case_emits_warning added in
TestByPathTrendsLinear: 4 F_g=3 switchers + 4 F_g=4 switchers in a
single-baseline panel, asserts the new warning fires (named
"by_path + trends_linear" + "F_g=3").

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the dcdh-by-path-trends branch from 7c7b790 to 41d54b9 Compare April 26, 2026 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant