dCDH: add by_path per-path event-study disaggregation#357
Conversation
Implements R did_multiplegt_dyn(..., by_path=k) semantics: top-k most common observed treatment paths in the window [F_g-1, F_g-1+L_max] with per-path, per-horizon DID + SE. SE follows the joiners/leavers IF precedent — zero out switcher contributions for non-path groups, keep control pool and cohort structure intact, plug-in SE with path-specific divisor N_l_path. Binary treatment + drop_larger_lower=False + L_max>=1 are preconditions; all Phase 3 combinations gated behind NotImplementedError for a strict first PR. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Overall Assessment
Executive Summary
Methodology
Code QualityNo additional findings in scope. PerformanceNo material performance findings in the changed code. MaintainabilityNo additional findings beyond the empty-state/result-contract issue above. Tech DebtNo new tracked-tech-debt finding beyond the items already noted above. SecurityNo findings. Documentation/Tests
Path to Approval
Static review only. I could not run the Python test suite in this environment because the local runtime is missing project dependencies such as |
…tring
Fixes:
- P1 empty-path: `_compute_path_effects` returned `None` when `by_path`
was requested but no switcher had a complete `[F_g-1, F_g-1+L_max]`
window, collapsing "requested but empty" into "not requested". Now
returns `{}` with a targeted `UserWarning`; result layer distinguishes
`None` (not requested) from `{}` (requested but empty). Summary
renders an explicit "no observed paths" notice; `to_dataframe(
level="by_path")` returns an empty DataFrame with canonical columns
(mirrors the `linear_trends` pattern at `:1329-1348`).
- P2 degenerate-cohort warning: per-(path, horizon) `UserWarning` now
fires when the path-subset centered IF is identically zero (every
variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)`
cohort, or the path has a single contributing group), scoped to the
(path, horizon) pair. Mirrors the overall-path degenerate-cohort
surface.
- P3 docstring: class docstring said the by-path plug-in SE uses the
full-panel divisor `N_l`; implementation and REGISTRY both use
`N_l_path`. Docstring updated to match.
Tests: new `TestByPathEdgeCases` class with
`test_empty_path_surface_when_no_complete_window` (late-switcher panel
where every window falls outside) and
`test_degenerate_cohort_path_nan_inference_and_warning` (4-group all-
singleton cohort panel). REGISTRY.md updated with the empty-state
contract and path-scoped degenerate-cohort behavior.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good — the prior Executive Summary
MethodologyNo findings. The changed Code QualityNo findings. PerformanceNo findings. The added work reuses the existing multi-horizon IF helper per selected path and does not introduce an obvious asymptotic regression in the changed scope. MaintainabilityNo findings. The earlier ambiguous Tech DebtNo findings. The remaining SecurityNo findings. Documentation/TestsNo findings in the diff scope. The PR now adds targeted coverage for the previously missing empty-path and degenerate-path branches ( |
Summary
by_path: Optional[int] = NonetoChaisemartinDHaultfoeuille— top-k most common observed treatment paths in the window[F_g-1, F_g-1+L_max]get their own per-horizon DID + SE, mirroring Rdid_multiplegt_dyn(..., by_path=k). Suggested by Clément de Chaisemartin.(D_{g,1}, F_g, S_g)unchanged; plug-in SE with path-specific divisorN_l_path(same pattern asjoiners_seusingjoiner_total).drop_larger_lower=False+L_max >= 1+ binary treatment required; combinations withcontrols,trends_linear,trends_nonparam,heterogeneity,design2,honest_did,survey_design, andn_bootstrap > 0raiseNotImplementedErrorwith targeted messages.What practitioners see
The smoke case in the test fixture (6 switchers × 3 distinct paths, treatment effect = 2.0) yields:
(0,1,1,1)(stay on) → effect ≈ 2.0 at all horizons(0,1,0,0)(single pulse) → effect ≈ 2.0 atl=1, then ≈ 0 atl=2, 3(0,1,1,0)(two on, then off) → effect ≈ 2.0 atl=1, 2, then ≈ 0 atl=3Results exposed on
results.path_effects: Dict[Tuple[int, ...], Dict[str, Any]]andresults.to_dataframe(level="by_path"); summary grows a "Treatment-Path Disaggregation" block. Ties in path frequency are broken lexicographically on the path tuple for deterministic ranking. Overflow (by_path > n_observed_paths) returns all observed paths with aUserWarning.Methodology
See
docs/methodology/REGISTRY.md§ChaisemartinDHaultfoeuilleNote (Phase 3 by_path per-path event-study disaggregation)for the full contract (window convention, top-k selection rule + tiebreak, per-path SE derivation, strict-first-PR scope).Test plan
tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates— 13 gate tests (param validation, drop_larger_lower precondition, L_max precondition, n_bootstrap/controls/trends/heterogeneity/design2/honest_did/survey_design/non-binary gates,get_params/set_paramsplumbing)tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBehavior— top-k selection, lexicographic tiebreak, overflow warning, result dict shape, hand-calculable DGP recovery, summary render,to_dataframe("by_path")tests/test_chaisemartin_dhaultfoeuille{,_parity,}.py+test_methodology_chaisemartin_dhaultfoeuille.py+test_survey_dcdh*.py+test_dcdh_*_coverage.py(323 tests pass, no regressions)by_path— deferred to a follow-up PR (the SE convention against R'sdid_multiplegt_dynis to be confirmed in the parity work)Deferred (follow-up PRs)
TestDCDHDynRParityByPathpaths_of_interest: List[Tuple[int, ...]]for user-specified path selectionby_path+ each ofcontrols/trends_linear/trends_nonparam/heterogeneity/design2/honest_did/survey_design/ non-binary treatment /n_bootstrap > 0DID^{pl}_lper path) and per-path sup-t bands🤖 Generated with Claude Code