From c0c0d4e2aecb395f3ab58ccf0d534193965127f1 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 15:00:12 -0400 Subject: [PATCH 1/3] Add per-path joint sup-t bands to ChaisemartinDHaultfoeuille.by_path MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are now computed across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all valid horizons, producing correlated bootstrap distributions. The path-specific critical value `c_p = quantile(max_l |t_l|, 1-α)` is applied per horizon as `cband_conf_int = (eff - c_p·se, eff + c_p·se)` and surfaced at top level as `results.path_sup_t_bands[path]`. Closes Wave 2 #4 of the by_path follow-up sequence (#357 foundation, #360 R-parity, #364 bootstrap, #371 placebos). **Methodology asymmetry vs OVERALL** (intentional, documented): per-path sup-t draws fresh shared weights AFTER the per-path SE bootstrap block has populated `path_ses` via independent per-(path, horizon) draws. Asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Preserves RNG-state isolation for existing per-path SE seed-reproducibility tests. **Gates** mirror OVERALL: `>=2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Otherwise the path is absent from `path_sup_t_bands`. **Empty-state contract**: `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates (covers two cases: `path_effects == {}` upstream OR all paths fail gates downstream). **Deviation from R**: `did_multiplegt_dyn` provides no joint / sup-t bands at any surface — Python-only methodology extension consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Bundled pre-audit fix** (sibling-surface check): the existing OVERALL `sup_t_bands` field's stale "Phase 2 placeholder" docstring updated to the actual contract description. Tests: new `TestByPathSupTBands` class with 13 tests covering: attr None when no bootstrap / no by_path; keys match `path_effects` with finite crit; band wider than pointwise; crit finite and positive; seed reproducibility; single-horizon-path-skip; L_max=1 skip; n_valid_horizons matches; absent-path-no-cband-keys; summary renders; empty-dict-when-no-complete-window; strict-majority-gate-at-exact-50pct (monkeypatches the weight generator to inject NaN into half the bootstrap rows, asserting both `sup_t_bands is None` and `path_sup_t_bands == {}` at the boundary). All `@pytest.mark.slow`. Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 1 + diff_diff/chaisemartin_dhaultfoeuille.py | 81 +++ .../chaisemartin_dhaultfoeuille_bootstrap.py | 88 ++++ .../chaisemartin_dhaultfoeuille_results.py | 80 ++- docs/api/chaisemartin_dhaultfoeuille.rst | 5 +- docs/methodology/REGISTRY.md | 2 + tests/test_chaisemartin_dhaultfoeuille.py | 475 ++++++++++++++++++ 7 files changed, 730 insertions(+), 2 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 5ee0ce38..a902c296 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology. +- **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`) and as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`. Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract. - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract. - **Tutorial 19: dCDH for Marketing Pulse Campaigns** (`docs/tutorials/19_dcdh_marketing_pulse.ipynb`) — end-to-end practitioner walkthrough on a 60-market reversible-treatment panel covering the TWFE decomposition diagnostic (`twowayfeweights`), `DCDH` Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo), the `L_max` multi-horizon event study with multiplier bootstrap, a stakeholder communication template, and drift guards. README listing for Tutorial 17 (Brand Awareness Survey) backfilled in the same edit. Cross-link from `docs/practitioner_decision_tree.rst` § "Reversible Treatment" added. diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py index ef901ad3..f626199b 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille.py +++ b/diff_diff/chaisemartin_dhaultfoeuille.py @@ -431,6 +431,21 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin): cross-path cohort-sharing deviation from R is inherited from the analytical event-study path. + With ``n_bootstrap > 0``, per-path joint sup-t simultaneous + confidence bands are also computed across horizons + ``1..L_max`` within each path. A path-specific critical value + ``c_p`` (constructed from a fresh shared-weights multiplier- + bootstrap draw per path) is surfaced at top level as + ``results.path_sup_t_bands[path] = {"crit_value", "alpha", + "n_bootstrap", "method", "n_valid_horizons"}`` and applied + per-horizon as ``cband_conf_int`` on + ``path_effects[path]["horizons"][l]``. Bands cover joint + inference WITHIN a single path across horizons; they do NOT + provide simultaneous coverage across paths. Python-only + library extension; R ``did_multiplegt_dyn`` provides no joint + bands at any surface. See REGISTRY.md ``Note (Phase 3 by_path + per-path joint sup-t bands)``. + SE convention: per-path IF parallels the joiners / leavers construction — the switcher-side contribution is zeroed for groups not in the selected path, and the cohort structure and @@ -2986,6 +3001,33 @@ def fit( path_placebos[path_key][neg_key]["conf_int"] = (np.nan, np.nan) path_placebos[path_key][neg_key]["t_stat"] = np.nan + # Phase 3: propagate per-path sup-t critical values to per- + # horizon `cband_conf_int` entries on path_effects (by_path + + # n_bootstrap > 0). Sibling of the OVERALL event-study cband + # propagation at `:2865-2875`. For each path with a finite + # crit, write `cband_conf_int = (eff - c_p*se, eff + c_p*se)` + # into each horizon's dict whose bootstrap-replaced SE is + # finite > 0. Mirror the OVERALL absent-key pattern: non-finite + # SE horizons simply don't get the `cband_conf_int` key. + if ( + bootstrap_results is not None + and bootstrap_results.path_cband_crit_values is not None + and path_effects is not None + ): + for path_key, crit in bootstrap_results.path_cband_crit_values.items(): + if path_key not in path_effects: + continue + if not np.isfinite(crit): + continue + for l_h, h_dict in path_effects[path_key]["horizons"].items(): + se = h_dict.get("se", np.nan) + eff = h_dict.get("effect", np.nan) + if np.isfinite(se) and se > 0: + h_dict["cband_conf_int"] = ( + eff - crit * se, + eff + crit * se, + ) + # When L_max >= 1 and the per-group path is active, sync # overall_* from event_study_effects[1] AFTER bootstrap propagation # so that bootstrap SE/p/CI flow to the top-level surface. @@ -3618,6 +3660,45 @@ def fit( ), path_effects=path_effects, path_placebo_event_study=path_placebos, + path_sup_t_bands=( + # When by_path + n_bootstrap > 0 is active, surface a + # dict (possibly empty) — preserving the documented + # `None` (not requested) vs `{}` (requested but empty) + # contract that mirrors `path_effects` / `path_placebo_ + # event_study` empty-state behavior. The empty case + # arises in two ways: + # 1. `path_effects == {}` — no observed path has a + # complete window; the per-path bootstrap collector + # is skipped upstream and `path_cband_crit_values` + # stays `None`. We materialize `{}` here. + # 2. Bootstrap ran but no path passed both gates + # (>=2 valid horizons AND a strict majority — more + # than 50% — of finite sup-t draws); + # `path_cband_crit_values == {}` — passes through. + { + path_key: { + "crit_value": crit, + "alpha": self.alpha, + "n_bootstrap": self.n_bootstrap, + "method": "multiplier_bootstrap", + "n_valid_horizons": ( + bootstrap_results.path_cband_n_valid_horizons.get(path_key, 0) + if bootstrap_results is not None + and bootstrap_results.path_cband_n_valid_horizons is not None + else 0 + ), + } + for path_key, crit in ( + bootstrap_results.path_cband_crit_values + if bootstrap_results is not None + and bootstrap_results.path_cband_crit_values is not None + else {} + ).items() + if np.isfinite(crit) + } + if (self.by_path is not None and self.n_bootstrap > 0) + else None + ), survey_metadata=survey_metadata, _estimator_ref=self, ) diff --git a/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py b/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py index 5047cd64..a63aad07 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py +++ b/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py @@ -778,6 +778,94 @@ def _compute_dcdh_bootstrap( results.path_placebo_cis = path_pl_cis results.path_placebo_p_values = path_pl_pvals + # --- Phase 3: Per-path joint sup-t (by_path + n_bootstrap > 0) --- + # Sibling of the OVERALL event-study sup-t at the multi-horizon + # block above (`:599-614`). Per-path joint simultaneous + # confidence bands across horizons 1..L_max within each path: + # one shared (n_bootstrap, n_eligible) multiplier weight matrix + # (using `self.bootstrap_weights` — Rademacher / Mammen / Webb) + # per path is broadcast across all valid horizons of that path, + # producing correlated bootstrap distributions across horizons. + # The path-specific critical value + # `c_p = quantile(max_l |t_l|, 1-alpha)` is the band half-width + # multiplier applied to each horizon's bootstrap SE in fit(). + # + # Note (asymmetry vs OVERALL): this draws a FRESH shared-weights + # matrix per path AFTER the per-path SE block above has populated + # results.path_ses via independent per-(path, horizon) draws. + # Numerator: fresh shared draws; denominator: bootstrap SEs from + # the earlier independent draws. Asymptotically equivalent to + # OVERALL's self-consistent reuse, but NOT bit-identical. The + # fresh draw is intentional: it preserves RNG-state isolation + # for existing per-path SE seed-reproducibility tests. + # + # Gates: a path needs >=2 valid horizons (finite bootstrap SE>0) + # AND a strict majority (>50%) of finite sup-t draws to receive + # a band. Otherwise the path is absent from + # path_cband_crit_values (mirrors OVERALL absent-key pattern at + # `:605,612`; the strict-majority gate matches the OVERALL + # `finite_mask.sum() > 0.5 * n_bootstrap` semantics — exactly + # half finite is NOT enough). + if path_bootstrap_inputs is not None and results.path_ses: + path_cband_crits: Dict[Tuple[int, ...], float] = {} + path_cband_n_valid: Dict[Tuple[int, ...], int] = {} + + for path_key, horizon_inputs in path_bootstrap_inputs.items(): + bs_ses_for_path = results.path_ses.get(path_key, {}) + valid_horizons = [] + for l_h, (u_h, n_h, eff_h, _u_pp_h) in sorted(horizon_inputs.items()): + if u_h.size == 0 or n_h <= 0: + continue + bs_se = bs_ses_for_path.get(l_h, np.nan) + if not np.isfinite(bs_se) or bs_se <= 0: + continue + valid_horizons.append((l_h, u_h, n_h, eff_h, bs_se)) + + if len(valid_horizons) < 2: + continue + + # All horizons within a path use the same n_eligible + # (variance-eligible group ordering enforced by + # _collect_path_bootstrap_inputs's use of + # eligible_mask_var for cohort-recentering); use the + # first valid horizon's IF size as the shared dim. + n_dim = valid_horizons[0][1].size + map_path = _map_for_target( + n_dim, + group_id_to_psu_code, + eligible_group_ids, + ) + with np.errstate(invalid="ignore", divide="ignore"): + shared_weights = _generate_psu_or_group_weights( + n_bootstrap=self.n_bootstrap, + n_groups_target=n_dim, + weight_type=self.bootstrap_weights, + rng=rng, + group_to_psu_map=map_path, + ) + es_dists_path = [] + for _l_h, u_h, n_h, eff_h, _bs_se in valid_horizons: + deviations = (shared_weights @ u_h) / n_h + es_dists_path.append(eff_h + deviations) + boot_matrix = np.asarray(es_dists_path) + effects_vec = np.array([v[3] for v in valid_horizons]) + ses_vec = np.array([v[4] for v in valid_horizons]) + t_stats = np.abs((boot_matrix - effects_vec[:, None]) / ses_vec[:, None]) + sup_t_dist = np.max(t_stats, axis=0) + finite_mask = np.isfinite(sup_t_dist) + if finite_mask.sum() <= 0.5 * self.n_bootstrap: + continue + crit_p = float(np.quantile(sup_t_dist[finite_mask], 1.0 - self.alpha)) + + if not np.isfinite(crit_p): + continue + + path_cband_crits[path_key] = crit_p + path_cband_n_valid[path_key] = len(valid_horizons) + + results.path_cband_crit_values = path_cband_crits + results.path_cband_n_valid_horizons = path_cband_n_valid + return results diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py index f7596ecc..c464abc8 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille_results.py +++ b/diff_diff/chaisemartin_dhaultfoeuille_results.py @@ -161,6 +161,21 @@ class DCDHBootstrapResults: default=None, repr=False ) + # --- Phase 3: per-path joint sup-t critical values (by_path + n_bootstrap > 0) --- + # Per-path sup-t simultaneous-band critical value `c_p = + # quantile(max_l |t_l|, 1-alpha)` from a fresh shared-weights + # multiplier-bootstrap draw per path. Naming parity with the OVERALL + # `cband_crit_value` scalar at line 131 (singular -> plural since one + # crit per path). Gates: a path appears only when (>=2 valid horizons + # with finite bootstrap SE > 0) AND (a strict majority — more than + # 50% — of sup-t draws are finite); paths failing either gate are + # absent from the dict. `None` when bootstrap didn't run; empty dict + # when ran but no path passed both gates. + path_cband_crit_values: Optional[Dict[Tuple[int, ...], float]] = field(default=None, repr=False) + path_cband_n_valid_horizons: Optional[Dict[Tuple[int, ...], int]] = field( + default=None, repr=False + ) + @dataclass class ChaisemartinDHaultfoeuilleResults: @@ -354,7 +369,16 @@ class ChaisemartinDHaultfoeuilleResults: cost_benefit_delta : dict, optional Cost-benefit aggregate ``delta``. Populated when ``L_max >= 2``. sup_t_bands : dict, optional - Phase 2 placeholder (sup-t simultaneous confidence bands). + Sup-t simultaneous confidence-band metadata for the OVERALL + event-study surface. Holds ``{"crit_value": float, "alpha": + float, "n_bootstrap": int, "method": str}``. Populated when + ``n_bootstrap > 0`` AND there are at least 2 valid horizons + with finite bootstrap SE > 0 AND a strict majority (more than + 50%) of sup-t draws are finite. The band itself is written + per-horizon as + ``cband_conf_int`` on ``event_study_effects[l]``. ``None`` + otherwise. Python-only library extension; R + ``did_multiplegt_dyn`` provides no joint / sup-t bands. covariate_residuals : pd.DataFrame, optional ``DID^X`` first-stage diagnostics: per-baseline ``theta_hat``, ``n_obs``, and ``r_squared``. Populated when ``controls`` is set. @@ -394,6 +418,27 @@ class ChaisemartinDHaultfoeuilleResults: cohort-sharing SE deviation from R documented for ``path_effects``. See REGISTRY.md ``Note (Phase 3 by_path ...)`` → "Per-path placebos". + path_sup_t_bands : dict, optional + Per-path joint sup-t simultaneous-band metadata, keyed by + observed treatment trajectory (tuple of int). Each entry holds + ``{"crit_value": float, "alpha": float, "n_bootstrap": int, + "method": str, "n_valid_horizons": int}``. Populated when + ``by_path`` is a positive int AND ``n_bootstrap > 0``. The + band itself is applied per-horizon as ``cband_conf_int`` on + ``path_effects[path]["horizons"][l]``. Empty-state contract: + ``None`` when not requested (no bootstrap or ``by_path is None``); + ``{}`` when requested but no path passed both gates (``>=2`` + valid horizons with finite bootstrap SE ``> 0`` AND a strict + majority — more than 50% — of finite sup-t draws). Bands + cover joint inference WITHIN a + single path across horizons; they do NOT provide simultaneous + coverage across paths. Inherits the cross-path cohort-sharing + SE deviation from R documented for ``path_effects`` (the + bootstrap SE used as the t-stat denominator carries the same + deviation). Python-only library extension; R + ``did_multiplegt_dyn`` provides no joint / sup-t bands at any + surface. See REGISTRY.md ``Note (Phase 3 by_path per-path + joint sup-t bands)``. honest_did_results : HonestDiDResults, optional HonestDiD sensitivity analysis bounds (Rambachan & Roth 2023). Populated when ``honest_did=True`` in ``fit()`` or by calling @@ -505,6 +550,23 @@ class ChaisemartinDHaultfoeuilleResults: path_placebo_event_study: Optional[Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]]] = field( default=None, repr=False ) + # Per-path joint sup-t simultaneous-band metadata. Keyed by path + # tuple; each entry holds `{"crit_value", "alpha", "n_bootstrap", + # "method", "n_valid_horizons"}`. Populated when `by_path` is a + # positive int AND `n_bootstrap > 0`. The joint band itself is + # written per-horizon as `cband_conf_int` on + # `path_effects[path]["horizons"][l]` (mirrors the OVERALL + # `event_study_effects[l]["cband_conf_int"]` pattern at + # `chaisemartin_dhaultfoeuille.py:2865-2875`). Empty-state contract: + # `None` when not requested (no bootstrap or `by_path is None`); `{}` + # when requested but no path passed both gates (>=2 valid horizons + # AND a strict majority — more than 50% — of finite sup-t draws). + # The bands cover joint inference + # WITHIN a single path across horizons; they do NOT provide + # simultaneous coverage across paths. + path_sup_t_bands: Optional[Dict[Tuple[int, ...], Dict[str, Any]]] = field( + default=None, repr=False + ) honest_did_results: Optional["HonestDiDResults"] = field(default=None, repr=False) # --- Repr-suppressed metadata --- @@ -783,6 +845,9 @@ def summary(self, alpha: Optional[float] = None) -> str: for entry in self.path_placebo_event_study.values() for h in entry.values() ) + path_sup_t_has_finite_crit = self.path_sup_t_bands is not None and any( + np.isfinite(v.get("crit_value", np.nan)) for v in self.path_sup_t_bands.values() + ) any_finite_bootstrap_inference = ( np.isfinite(self.overall_se) or event_study_has_finite_bootstrap_se @@ -790,6 +855,7 @@ def summary(self, alpha: Optional[float] = None) -> str: or leavers_has_finite_bootstrap_se or path_effects_has_finite_bootstrap_se or path_placebo_has_finite_bootstrap_se + or path_sup_t_has_finite_crit ) if self.bootstrap_results is not None and np.isfinite(self.overall_se) and not is_delta: lines.append("Note: p-value and CI are multiplier-bootstrap percentile inference") @@ -823,6 +889,8 @@ def summary(self, alpha: Optional[float] = None) -> str: live_targets.append("per-path") if path_placebo_has_finite_bootstrap_se: live_targets.append("per-path placebo") + if path_sup_t_has_finite_crit: + live_targets.append("per-path sup-t") lines.append( f"Note: bootstrap ({self.bootstrap_results.n_bootstrap} iterations) " f"produced non-finite SE on the overall/event-study target; " @@ -1219,6 +1287,16 @@ def _render_path_effects_section( h["p_value"], ) ) + # Per-path joint sup-t critical value (when populated). + # Mirrors the OVERALL sup-t crit print at line ~1019. + if self.path_sup_t_bands is not None and path in self.path_sup_t_bands: + crit_p = self.path_sup_t_bands[path].get("crit_value", np.nan) + if np.isfinite(crit_p): + conf_level = int((1 - self.alpha) * 100) + lines.append( + f" Sup-t critical value: {crit_p:.4f} " + f"(simultaneous {conf_level}% bands)" + ) lines.extend([thin]) lines.extend([""]) diff --git a/docs/api/chaisemartin_dhaultfoeuille.rst b/docs/api/chaisemartin_dhaultfoeuille.rst index 2e6cd093..576c5c6d 100644 --- a/docs/api/chaisemartin_dhaultfoeuille.rst +++ b/docs/api/chaisemartin_dhaultfoeuille.rst @@ -16,7 +16,10 @@ covariate adjustment (``controls``); group-specific linear trends heterogeneity testing; non-binary treatment; HonestDiD sensitivity integration on placebos; survey support via Taylor-series linearization (pweight + strata/PSU/FPC); and per-path event-study disaggregation via -``by_path=k`` (mirrors R ``did_multiplegt_dyn(..., by_path=k)``). +``by_path=k`` (mirrors R ``did_multiplegt_dyn(..., by_path=k)``, +including per-path backward placebos and per-path joint sup-t +simultaneous bands when ``n_bootstrap > 0`` — Python-only extension +beyond R, which provides no joint bands at any surface). The estimator: diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index e4424dc1..c50dac35 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -640,6 +640,8 @@ The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and - **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs]`. Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). +- **Note (Phase 3 `by_path` per-path joint sup-t bands):** When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are computed across horizons `1..L_max` within each path. **Methodology:** a single `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons within the path. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is then used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon, surfaced in `path_effects[path]["horizons"][l]["cband_conf_int"]` and at top-level `results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}`. **Gates:** a path must have `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band; otherwise the path is absent from `path_sup_t_bands`. Both gates mirror the OVERALL `event_study_sup_t_bands` semantics at `chaisemartin_dhaultfoeuille_bootstrap.py:605,612`: `len(valid_horizons) >= 2` AND `finite_mask.sum() > 0.5 * n_bootstrap`. Exactly half-finite draws are NOT enough — the gate is strictly greater than half. **Empty-state contract:** `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL:** OVERALL sup-t reuses the same multi-horizon shared-draw distribution for both the SE in the t-stat denominator and the bootstrap distribution in the numerator. The per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — numerator: fresh shared draws, denominator: bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse, but NOT bit-identical. The fresh draw is intentional: it preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable post-implementation. **Inherited deviation from R:** the bootstrap SE used as the t-stat denominator carries the cross-path cohort-sharing SE deviation from R documented for `path_effects` above; the per-path sup-t crit therefore inherits the same deviation. **Interpretation:** the band covers joint inference *within a single path across horizons*; it does NOT provide simultaneous coverage *across paths* (a different inference target requiring a `path × horizon` re-derivation, deferred to a future wave). **Deviation from R:** `did_multiplegt_dyn` provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension, consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Regression test anchor: `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands`. + **Reference implementation(s):** - R: [`DIDmultiplegtDYN`](https://cran.r-project.org/package=DIDmultiplegtDYN) (CRAN, maintained by the paper authors). The Python implementation matches `did_multiplegt_dyn(..., effects=1)` at horizon `l = 1`. Parity tests live in `tests/test_chaisemartin_dhaultfoeuille_parity.py`. - Stata: `did_multiplegt_dyn` (SSC, also maintained by the paper authors). diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py index cedb90e6..80e914c7 100644 --- a/tests/test_chaisemartin_dhaultfoeuille.py +++ b/tests/test_chaisemartin_dhaultfoeuille.py @@ -5485,3 +5485,478 @@ def test_bootstrap_seed_reproducibility(self): f"path={path} lag={lag_key}: seed-pinned SEs " f"diverge: {entry_a['se']} vs {entry_b['se']}" ) + + +@pytest.mark.slow +class TestByPathSupTBands: + """``by_path`` combined with ``n_bootstrap > 0`` — per-path joint + sup-t simultaneous confidence bands across horizons ``1..L_max`` + within each path. + + A single shared ``(n_bootstrap, n_eligible)`` multiplier weight + matrix (using the estimator's configured ``bootstrap_weights`` — + Rademacher / Mammen / Webb) is drawn per path and broadcast across + all valid horizons of that path (``finite bootstrap SE > 0``), + producing correlated bootstrap distributions across horizons within + the path. + The path-specific critical value + ``c_p = quantile(max_l |t_l|, 1-alpha)`` is then used to construct + symmetric joint bands ``effect_l ± c_p · se_l`` per horizon. + + Mirrors the existing OVERALL ``event_study_sup_t_bands`` pattern at + ``chaisemartin_dhaultfoeuille_bootstrap.py:599-614``, just stratified + by path. Methodology asymmetry (intentional): per-path sup-t draws + fresh shared weights AFTER the per-path SE block has populated + ``results.path_ses`` via independent per-(path, horizon) draws. + Asymptotically equivalent to OVERALL's self-consistent reuse, but + NOT bit-identical. See REGISTRY.md for the full contract. + + Marked ``@pytest.mark.slow`` because each test runs a real bootstrap + with at least 200 draws to keep MC noise below the wider-than- + pointwise tolerance. + """ + + def _fit_with_bootstrap( + self, + data, + by_path: int, + L_max: int = 3, + n_bootstrap: int = 200, + bootstrap_weights: str = "rademacher", + seed: int = 42, + placebo: bool = False, + ): + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, + by_path=by_path, + n_bootstrap=n_bootstrap, + bootstrap_weights=bootstrap_weights, + seed=seed, + twfe_diagnostic=False, + placebo=placebo, + ) + results = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + L_max=L_max, + ) + return est, results + + def test_path_sup_t_bands_attr_none_when_no_bootstrap(self): + """``n_bootstrap=0`` -> ``results.path_sup_t_bands is None``.""" + data = _by_path_three_path_data() + _est, res = _fit_by_path(data, by_path=2, L_max=3) + assert res.path_sup_t_bands is None + + def test_path_sup_t_bands_attr_none_when_no_by_path(self): + """``by_path=None`` -> ``results.path_sup_t_bands is None`` + even with bootstrap active.""" + data = _by_path_three_path_data() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, + by_path=None, + n_bootstrap=200, + seed=42, + twfe_diagnostic=False, + ) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + L_max=3, + ) + assert res.path_sup_t_bands is None + + def test_path_sup_t_bands_keys_match_path_effects_with_finite_crit(self): + """For each path with >=2 horizons that have finite bootstrap + SE > 0, the path appears in ``path_sup_t_bands`` with a finite + ``crit_value``. Paths with <2 valid horizons are absent.""" + data = _by_path_three_path_data() + _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200) + assert res.path_sup_t_bands is not None + # For each path: count finite bootstrap SEs across its horizons. + # If >=2 are finite, the path should be in path_sup_t_bands with + # a finite crit; otherwise it should be absent. + for path, entry in res.path_effects.items(): + n_valid = sum( + 1 + for h in entry["horizons"].values() + if np.isfinite(h["se"]) and h["se"] > 0 + ) + if n_valid >= 2: + # Must be present (assuming gate also passes); if it's + # absent, that's the 50%-finite gate failing — log but + # don't hard-fail since the gate is a methodology + # safety net. + if path in res.path_sup_t_bands: + crit = res.path_sup_t_bands[path]["crit_value"] + assert np.isfinite(crit), ( + f"path={path}: present in path_sup_t_bands but " + f"crit_value is non-finite: {crit}" + ) + else: + assert path not in res.path_sup_t_bands, ( + f"path={path} has only {n_valid} valid horizons; " + f"should be absent from path_sup_t_bands per the " + f">=2 horizons gate" + ) + + def test_path_sup_t_band_wider_than_pointwise(self): + """Per-path joint band must be at least as wide as the marginal + CI for every (path, horizon) where both are populated. Mirrors + the OVERALL invariant `test_cband_wider_than_pointwise` at + `:2235`. + """ + data = _by_path_three_path_data() + _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=400) + assert res.path_sup_t_bands, "Need at least one path with a finite crit" + any_band_checked = False + for path, entry in res.path_effects.items(): + if path not in res.path_sup_t_bands: + continue + for l_h, h in entry["horizons"].items(): + cband = h.get("cband_conf_int") + if cband is None: + continue + pw_ci = h["conf_int"] + if not (np.isfinite(pw_ci[0]) and np.isfinite(pw_ci[1])): + continue + # Joint band must be at least as wide as marginal. + # Tolerance accounts for percentile MC noise. + assert cband[0] <= pw_ci[0] + 1e-10, ( + f"path={path} l={l_h}: cband_lower {cband[0]} > " + f"conf_int_lower {pw_ci[0]} - violates joint >= marginal" + ) + assert cband[1] >= pw_ci[1] - 1e-10, ( + f"path={path} l={l_h}: cband_upper {cband[1]} < " + f"conf_int_upper {pw_ci[1]} - violates joint >= marginal" + ) + any_band_checked = True + assert any_band_checked, "Expected at least one path/horizon with a populated cband" + + def test_path_sup_t_crit_finite_and_positive(self): + """For every path with a populated entry, ``crit_value`` is + finite and strictly positive. The wider-than-pointwise + invariant (above) is the stronger statement; this test pins + the per-path entry's basic shape (alpha / n_bootstrap / method + / n_valid_horizons round-trip).""" + data = _by_path_three_path_data() + _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200) + assert res.path_sup_t_bands + for path, entry in res.path_sup_t_bands.items(): + crit = entry["crit_value"] + assert np.isfinite(crit), f"path={path}: crit_value not finite ({crit})" + assert crit > 0, f"path={path}: crit_value not positive ({crit})" + assert entry["alpha"] == 0.05 + assert entry["n_bootstrap"] == 200 + assert entry["method"] == "multiplier_bootstrap" + assert entry["n_valid_horizons"] >= 2 + + def test_path_sup_t_seed_reproducibility(self): + """Same seed -> bit-identical ``crit_value`` for every path.""" + data = _by_path_three_path_data() + _est_a, res_a = self._fit_with_bootstrap( + data, by_path=3, L_max=3, n_bootstrap=200, seed=42 + ) + _est_b, res_b = self._fit_with_bootstrap( + data, by_path=3, L_max=3, n_bootstrap=200, seed=42 + ) + assert res_a.path_sup_t_bands is not None + assert res_b.path_sup_t_bands is not None + assert set(res_a.path_sup_t_bands.keys()) == set(res_b.path_sup_t_bands.keys()) + for path in res_a.path_sup_t_bands: + crit_a = res_a.path_sup_t_bands[path]["crit_value"] + crit_b = res_b.path_sup_t_bands[path]["crit_value"] + assert crit_a == crit_b, ( + f"path={path}: seed-pinned crits diverge: {crit_a} vs {crit_b}" + ) + + def test_path_sup_t_skipped_when_path_has_only_one_valid_horizon(self): + """A path with only 1 valid horizon (degenerate cohort at later + horizons) is absent from ``path_sup_t_bands`` per the >=2 gate. + + Uses the standard fixture and walks the result to find any + path with <2 finite bootstrap SE horizons, asserting it's + absent from path_sup_t_bands. + """ + data = _by_path_three_path_data() + _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200) + assert res.path_sup_t_bands is not None + single_horizon_paths = [ + path + for path, entry in res.path_effects.items() + if sum( + 1 + for h in entry["horizons"].values() + if np.isfinite(h["se"]) and h["se"] > 0 + ) + < 2 + ] + for path in single_horizon_paths: + assert path not in res.path_sup_t_bands, ( + f"path={path} has <2 valid horizons; should be absent " + f"from path_sup_t_bands" + ) + # And no horizon should have cband_conf_int populated. + for l_h, h in res.path_effects[path]["horizons"].items(): + assert "cband_conf_int" not in h, ( + f"path={path} l={l_h}: cband_conf_int written despite " + f"path being absent from path_sup_t_bands" + ) + + def test_path_sup_t_skipped_at_L_max_1(self): + """At ``L_max=1`` every path has at most 1 valid horizon; the + >=2 horizons gate rejects every path so ``path_sup_t_bands == + {}``. Replaces the H=1 normal-reduction test: at L_max=1 the + joint surface is correctly absent rather than collapsing to a + normal quantile.""" + data = _by_path_three_path_data() + _est, res = self._fit_with_bootstrap(data, by_path=2, L_max=1, n_bootstrap=200) + # Bootstrap ran with by_path so dict is initialized; gate + # rejected every path so dict is empty. + assert res.path_sup_t_bands == {}, ( + f"Expected path_sup_t_bands == {{}} at L_max=1 (no path has " + f">=2 horizons); got {res.path_sup_t_bands}" + ) + # No horizon should have cband_conf_int. + for path, entry in res.path_effects.items(): + for l_h, h in entry["horizons"].items(): + assert "cband_conf_int" not in h, ( + f"path={path} l={l_h}: cband_conf_int written at " + f"L_max=1 despite path_sup_t_bands == {{}}" + ) + + def test_path_sup_t_n_valid_horizons_matches(self): + """``n_valid_horizons`` field equals the count of finite-SE + horizons under each path.""" + data = _by_path_three_path_data() + _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200) + assert res.path_sup_t_bands + br = res.bootstrap_results + assert br is not None and br.path_ses is not None + for path, entry in res.path_sup_t_bands.items(): + n_claimed = entry["n_valid_horizons"] + n_actual = sum( + 1 + for l_h, bs_se in br.path_ses.get(path, {}).items() + if np.isfinite(bs_se) and bs_se > 0 + ) + assert n_claimed == n_actual, ( + f"path={path}: n_valid_horizons claimed {n_claimed} but " + f"counted {n_actual} finite bootstrap SE horizons" + ) + + def test_path_sup_t_absent_path_has_no_cband_keys(self): + """Library-wide NaN-on-invalid contract: when a path is absent + from ``path_sup_t_bands`` (gate failure at >=2 horizons OR + <=50% finite sup-t draws — i.e., strict-majority gate fails), + no horizon under that path receives a ``cband_conf_int`` key. + Mirrors OVERALL absent-key pattern at + ``chaisemartin_dhaultfoeuille.py:2865-2875``. + + Uses ``L_max=1`` to deterministically force ``path_sup_t_bands + == {}`` (every path has only 1 horizon, so the >=2 gate fails + for all paths) and verifies no horizon writes a cband. + """ + data = _by_path_three_path_data() + _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=1, n_bootstrap=200) + assert res.path_sup_t_bands == {} + for path, entry in res.path_effects.items(): + for l_h, h in entry["horizons"].items(): + assert "cband_conf_int" not in h, ( + f"path={path} l={l_h}: cband_conf_int present despite " + f"path being absent from path_sup_t_bands " + f"(violates NaN-on-invalid absent-key contract)" + ) + + def test_path_sup_t_band_renders_in_summary(self): + """``summary()`` text includes 'Sup-t critical value:' once per + path with a finite crit (mirroring the OVERALL crit print).""" + data = _by_path_three_path_data() + _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200) + assert res.path_sup_t_bands + s = res.summary() + n_finite_paths = sum( + 1 + for entry in res.path_sup_t_bands.values() + if np.isfinite(entry.get("crit_value", np.nan)) + ) + # The OVERALL surface also prints "Sup-t critical value:" once; + # so the per-path block contributes n_finite_paths additional + # occurrences. + n_occurrences = s.count("Sup-t critical value:") + # >= because OVERALL may or may not print depending on its own + # finite-horizon count; the per-path block should add at least + # n_finite_paths occurrences. + assert n_occurrences >= n_finite_paths, ( + f"Expected at least {n_finite_paths} 'Sup-t critical value:' " + f"strings in summary (one per path with finite crit), got " + f"{n_occurrences}" + ) + + def test_path_sup_t_bands_empty_dict_when_no_complete_window(self): + """When ``by_path + n_bootstrap > 0`` is requested but every + switcher's window falls outside the panel (so + ``path_effects == {}``), ``path_sup_t_bands`` must be ``{}`` + (not ``None``). Mirrors the documented empty-state contract that + distinguishes "feature not requested" from "requested but + empty" (see ``test_empty_path_surface_when_no_complete_window`` + for the analytical sibling at ``:4015+``). + + This is the regression test for the requested-but-empty + sentinel on the new sup-t surface. + """ + rng = np.random.default_rng(0) + rows = [] + # Switchers switch at t=3 with L_max=3 -> window [2, 5] falls + # past the 4-period panel. Same construction as the analytical + # empty-window test at :4015+. + for g in (1, 2, 3, 4): + for t in range(4): + d = 1 if t >= 3 else 0 + rows.append( + {"group": g, "period": t, "treatment": d, "outcome": rng.normal()} + ) + for g in (5, 6): + for t in range(4): + rows.append( + {"group": g, "period": t, "treatment": 0, "outcome": rng.normal()} + ) + data = pd.DataFrame(rows) + + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, + by_path=3, + n_bootstrap=200, + seed=42, + twfe_diagnostic=False, + placebo=False, + ) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + L_max=3, + ) + + # Empty-state contract: requested but empty -> {} not None. + assert res.path_effects == {}, ( + f"Expected path_effects == {{}} on no-complete-window panel; " + f"got {res.path_effects}" + ) + assert res.path_sup_t_bands == {}, ( + f"Expected path_sup_t_bands == {{}} (not None) when " + f"by_path + n_bootstrap is active but path_effects == {{}}; " + f"got {res.path_sup_t_bands}. This violates the documented " + f"None-vs-{{}} empty-state contract." + ) + # Sanity: no path_effects entries means no horizons exist, but + # also nothing should write cband_conf_int into anything. + # (Iterating over empty dict is a no-op; this just pins the + # invariant explicitly.) + for path, entry in res.path_effects.items(): # pragma: no cover + for l_h, h in entry["horizons"].items(): + assert "cband_conf_int" not in h + + def test_path_sup_t_strict_majority_gate_at_exact_50pct(self, monkeypatch): + """The 50%-finite-draws gate is **strict majority**, not >=: + the implementation requires ``finite_mask.sum() > 0.5 * + n_bootstrap`` (mirrors OVERALL gate at + ``chaisemartin_dhaultfoeuille_bootstrap.py:612``). At exactly + 50% finite draws the gate fails and the path is absent from + ``path_sup_t_bands``. + + This forces the boundary by monkey-patching + ``_generate_psu_or_group_weights`` (used by both the OVERALL + and per-path sup-t blocks) to return overflow-magnitude + weights in exactly half the bootstrap draws — those rows + produce non-finite ``boot_dist`` -> non-finite t-stats -> + non-finite ``sup_t_dist`` entries. With ``n_bootstrap=4`` and + 2 overflow rows, ``finite_mask.sum() == 2 == 0.5 * 4``, the + gate ``2 > 2.0`` is False, and the path is skipped. + + Pins the prose contract documented in REGISTRY.md and the + result-class docstring: "strict majority (more than 50%) of + finite sup-t draws". + """ + from diff_diff import chaisemartin_dhaultfoeuille_bootstrap as bs_mod + + original_generator = bs_mod._generate_psu_or_group_weights + + def fake_generator( + n_bootstrap, n_groups_target, weight_type, rng, group_to_psu_map + ): + # Call the original to get a sane base, then inject NaN into + # exactly half of the bootstrap rows. The NaN propagates + # through `weights @ u_centered` -> NaN deviations -> NaN + # boot_dist -> NaN t-stats -> NaN sup_t entries, so + # `finite_mask.sum() == n_bootstrap // 2` exactly. + base = original_generator( + n_bootstrap, n_groups_target, weight_type, rng, group_to_psu_map + ) + n_poison = n_bootstrap // 2 + base[:n_poison, :] = np.nan + return base + + monkeypatch.setattr(bs_mod, "_generate_psu_or_group_weights", fake_generator) + + data = _by_path_three_path_data() + with warnings.catch_warnings(): + warnings.simplefilter("ignore", (UserWarning, RuntimeWarning)) + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, + by_path=3, + n_bootstrap=4, + seed=42, + twfe_diagnostic=False, + placebo=False, + ) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + L_max=3, + ) + + # At exactly 50% finite draws the strict-majority gate fails — + # no path passes, so the requested-but-empty surface is `{}`. + assert res.path_sup_t_bands == {}, ( + f"Expected path_sup_t_bands == {{}} at exactly-50%-finite " + f"draws (strict-majority gate semantics); got " + f"{res.path_sup_t_bands}. This violates the documented " + f"`finite_mask.sum() > 0.5 * n_bootstrap` contract." + ) + # And the OVERALL `sup_t_bands` is also None since the same + # patched generator drives the multi-horizon block (gate failure + # at exactly 50% finite draws there too). + assert res.sup_t_bands is None, ( + f"Expected sup_t_bands is None at exactly-50%-finite draws " + f"on the OVERALL surface; got {res.sup_t_bands}" + ) + # No horizon (per-path or overall) should have cband_conf_int. + for path, entry in res.path_effects.items(): + for l_h, h in entry["horizons"].items(): + assert "cband_conf_int" not in h, ( + f"path={path} l={l_h}: cband_conf_int written despite " + f"strict-majority gate failure at exactly 50% finite" + ) + for l_h, h in res.event_study_effects.items(): + assert "cband_conf_int" not in h, ( + f"l={l_h}: OVERALL cband_conf_int written despite " + f"strict-majority gate failure at exactly 50% finite" + ) From 2df79a00c36941d78007ea9ac7935a5129741425 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 15:18:41 -0400 Subject: [PATCH 2/3] Self-audit: extend to_dataframe(level=by_path) with cband_lower/upper MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Cross-surface gap caught in self-audit: OVERALL `to_dataframe(level= "event_study")` includes `cband_lower` / `cband_upper` columns (`chaisemartin_dhaultfoeuille_results.py:1495-1496,1531-1532`) but the per-path table at `level="by_path"` does not — even though per-path now produces `cband_conf_int` writes via the new sup-t propagation block. Cross-surface twin asymmetry the CI reviewer didn't flag; caught by my own grep audit on `cband_conf_int` consumers. Fix: extend `to_dataframe(level="by_path")` to emit the same two columns. Populated for positive-horizon rows of paths with a finite sup-t crit (read from `path_effects[path]["horizons"][l] ["cband_conf_int"]`); NaN for placebo rows (no joint band per the positive-only sup-t spec), unbanded paths, and the requested-but-empty fallback DataFrame (which now includes the columns in its canonical schema). Tests added: - `test_path_sup_t_to_dataframe_emits_cband_columns` — column presence + per-row alignment with the dict surface - `test_path_sup_t_to_dataframe_empty_path_fallback_has_cband_columns` — empty-path fallback DataFrame schema parity Docs updated: - REGISTRY.md: `to_dataframe(level="by_path")` integration note added to the new sup-t Note; canonical column list in the existing `Note (Phase 3 by_path ...)` block extended with `cband_lower / cband_upper` - CHANGELOG entry: surface listing now mentions to_dataframe columns - `by_path` parameter docstring: rendering surface listing extended - `path_sup_t_bands` Attributes docstring: rendering surface listing extended Suite: 263 tests pass (was 261, +2 new tests). Co-Authored-By: Claude Opus 4.7 (1M context) --- CHANGELOG.md | 2 +- diff_diff/chaisemartin_dhaultfoeuille.py | 7 +- .../chaisemartin_dhaultfoeuille_results.py | 21 +++++- docs/methodology/REGISTRY.md | 4 +- tests/test_chaisemartin_dhaultfoeuille.py | 74 +++++++++++++++++++ 5 files changed, 102 insertions(+), 6 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index a902c296..ff37dbc2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -9,7 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ### Added - **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology. -- **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`) and as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`. Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract. +- **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`); as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`; and as `cband_lower` / `cband_upper` columns on `results.to_dataframe(level="by_path")` (mirrors the OVERALL `level="event_study"` schema; positive-horizon rows of banded paths get populated values, placebo / unbanded / empty-window rows get NaN). Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract. - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract. - **Tutorial 19: dCDH for Marketing Pulse Campaigns** (`docs/tutorials/19_dcdh_marketing_pulse.ipynb`) — end-to-end practitioner walkthrough on a 60-market reversible-treatment panel covering the TWFE decomposition diagnostic (`twowayfeweights`), `DCDH` Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo), the `L_max` multi-horizon event study with multiplier bootstrap, a stakeholder communication template, and drift guards. README listing for Tutorial 17 (Brand Awareness Survey) backfilled in the same edit. Cross-link from `docs/practitioner_decision_tree.rst` § "Reversible Treatment" added. diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py index f626199b..2d11e252 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille.py +++ b/diff_diff/chaisemartin_dhaultfoeuille.py @@ -437,9 +437,12 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin): ``c_p`` (constructed from a fresh shared-weights multiplier- bootstrap draw per path) is surfaced at top level as ``results.path_sup_t_bands[path] = {"crit_value", "alpha", - "n_bootstrap", "method", "n_valid_horizons"}`` and applied + "n_bootstrap", "method", "n_valid_horizons"}``, applied per-horizon as ``cband_conf_int`` on - ``path_effects[path]["horizons"][l]``. Bands cover joint + ``path_effects[path]["horizons"][l]``, and rendered as + ``cband_lower`` / ``cband_upper`` columns on + ``results.to_dataframe(level="by_path")`` (mirroring the + OVERALL ``level="event_study"`` schema). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Python-only library extension; R ``did_multiplegt_dyn`` provides no joint diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py index c464abc8..e93db4bd 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille_results.py +++ b/diff_diff/chaisemartin_dhaultfoeuille_results.py @@ -425,7 +425,9 @@ class ChaisemartinDHaultfoeuilleResults: "method": str, "n_valid_horizons": int}``. Populated when ``by_path`` is a positive int AND ``n_bootstrap > 0``. The band itself is applied per-horizon as ``cband_conf_int`` on - ``path_effects[path]["horizons"][l]``. Empty-state contract: + ``path_effects[path]["horizons"][l]`` and rendered as + ``cband_lower`` / ``cband_upper`` columns on + ``to_dataframe(level="by_path")``. Empty-state contract: ``None`` when not requested (no bootstrap or ``by_path is None``); ``{}`` when requested but no path passed both gates (``>=2`` valid horizons with finite bootstrap SE ``> 0`` AND a strict @@ -1632,6 +1634,8 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame: "conf_int_lower", "conf_int_upper", "n_obs", + "cband_lower", + "cband_upper", ] ) rows = [] @@ -1655,6 +1659,12 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame: ) for lag_key in sorted(placebo_horizons.keys()): ph_entry = placebo_horizons[lag_key] + # Placebos do not get joint sup-t bands in this + # release (only positive event-study horizons do — + # mirrors OVERALL placebo / event-study sup-t + # convention). Emit NaN cband columns for schema + # parity with the OVERALL level="event_study" table. + ph_cband = ph_entry.get("cband_conf_int", (np.nan, np.nan)) rows.append( { "path": path, @@ -1668,10 +1678,17 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame: "conf_int_lower": ph_entry["conf_int"][0], "conf_int_upper": ph_entry["conf_int"][1], "n_obs": ph_entry["n_obs"], + "cband_lower": ph_cband[0] if ph_cband else np.nan, + "cband_upper": ph_cband[1] if ph_cband else np.nan, } ) for l_h in sorted(horizons.keys()): h_entry = horizons[l_h] + # Per-path joint sup-t band (when populated) mirrors + # OVERALL `level="event_study"` cband emission. Absent + # key / missing path entry -> NaN columns. Pinned at + # `TestByPathSupTBands::test_path_sup_t_to_dataframe_emits_cband_columns`. + h_cband = h_entry.get("cband_conf_int", (np.nan, np.nan)) rows.append( { "path": path, @@ -1685,6 +1702,8 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame: "conf_int_lower": h_entry["conf_int"][0], "conf_int_upper": h_entry["conf_int"][1], "n_obs": h_entry["n_obs"], + "cband_lower": h_cband[0] if h_cband else np.nan, + "cband_upper": h_cband[1] if h_cband else np.nan, } ) return pd.DataFrame(rows) diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md index c50dac35..1f7bc9d3 100644 --- a/docs/methodology/REGISTRY.md +++ b/docs/methodology/REGISTRY.md @@ -638,9 +638,9 @@ The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and - **Note (Phase 3 Design-2 switch-in/switch-out):** Convenience wrapper for Web Appendix Section 1.6 (Assumption 16). Identifies groups with exactly 2 treatment changes (join then leave), reports switch-in and switch-out mean effects. This is a descriptive summary, not a full re-estimation with specialized control pools as described in the paper. **Always uses raw (unadjusted) outcomes** regardless of active `controls`, `trends_linear`, or `trends_nonparam` options - those adjustments apply to the main estimator surface but not to the Design-2 descriptive block. For full adjusted Design-2 estimation with proper control pools, the paper recommends "running the command on a restricted subsample and using `trends_nonparam` for the entry-timing grouping." Activated via `design2=True` in `fit()`, requires `drop_larger_lower=False` to retain 2-switch groups. -- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs]`. Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). +- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs, cband_lower, cband_upper]` (the last two are added by the joint sup-t Note below; populated for positive-horizon rows of paths with a finite sup-t crit, NaN otherwise). Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass). -- **Note (Phase 3 `by_path` per-path joint sup-t bands):** When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are computed across horizons `1..L_max` within each path. **Methodology:** a single `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons within the path. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is then used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon, surfaced in `path_effects[path]["horizons"][l]["cband_conf_int"]` and at top-level `results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}`. **Gates:** a path must have `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band; otherwise the path is absent from `path_sup_t_bands`. Both gates mirror the OVERALL `event_study_sup_t_bands` semantics at `chaisemartin_dhaultfoeuille_bootstrap.py:605,612`: `len(valid_horizons) >= 2` AND `finite_mask.sum() > 0.5 * n_bootstrap`. Exactly half-finite draws are NOT enough — the gate is strictly greater than half. **Empty-state contract:** `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL:** OVERALL sup-t reuses the same multi-horizon shared-draw distribution for both the SE in the t-stat denominator and the bootstrap distribution in the numerator. The per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — numerator: fresh shared draws, denominator: bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse, but NOT bit-identical. The fresh draw is intentional: it preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable post-implementation. **Inherited deviation from R:** the bootstrap SE used as the t-stat denominator carries the cross-path cohort-sharing SE deviation from R documented for `path_effects` above; the per-path sup-t crit therefore inherits the same deviation. **Interpretation:** the band covers joint inference *within a single path across horizons*; it does NOT provide simultaneous coverage *across paths* (a different inference target requiring a `path × horizon` re-derivation, deferred to a future wave). **Deviation from R:** `did_multiplegt_dyn` provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension, consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Regression test anchor: `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands`. +- **Note (Phase 3 `by_path` per-path joint sup-t bands):** When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are computed across horizons `1..L_max` within each path. **Methodology:** a single `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons within the path. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is then used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon, surfaced in `path_effects[path]["horizons"][l]["cband_conf_int"]` and at top-level `results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}`. **Gates:** a path must have `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band; otherwise the path is absent from `path_sup_t_bands`. Both gates mirror the OVERALL `event_study_sup_t_bands` semantics at `chaisemartin_dhaultfoeuille_bootstrap.py:605,612`: `len(valid_horizons) >= 2` AND `finite_mask.sum() > 0.5 * n_bootstrap`. Exactly half-finite draws are NOT enough — the gate is strictly greater than half. **Empty-state contract:** `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates. **`to_dataframe(level="by_path")` integration:** the table now includes `cband_lower` / `cband_upper` columns for parity with OVERALL `level="event_study"`; populated for positive-horizon rows of paths with a finite sup-t crit, NaN for placebo rows / unbanded paths / the requested-but-empty fallback DataFrame. **Methodology asymmetry vs OVERALL:** OVERALL sup-t reuses the same multi-horizon shared-draw distribution for both the SE in the t-stat denominator and the bootstrap distribution in the numerator. The per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — numerator: fresh shared draws, denominator: bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse, but NOT bit-identical. The fresh draw is intentional: it preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable post-implementation. **Inherited deviation from R:** the bootstrap SE used as the t-stat denominator carries the cross-path cohort-sharing SE deviation from R documented for `path_effects` above; the per-path sup-t crit therefore inherits the same deviation. **Interpretation:** the band covers joint inference *within a single path across horizons*; it does NOT provide simultaneous coverage *across paths* (a different inference target requiring a `path × horizon` re-derivation, deferred to a future wave). **Deviation from R:** `did_multiplegt_dyn` provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension, consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Regression test anchor: `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands`. **Reference implementation(s):** - R: [`DIDmultiplegtDYN`](https://cran.r-project.org/package=DIDmultiplegtDYN) (CRAN, maintained by the paper authors). The Python implementation matches `did_multiplegt_dyn(..., effects=1)` at horizon `l = 1`. Parity tests live in `tests/test_chaisemartin_dhaultfoeuille_parity.py`. diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py index 80e914c7..1d51b9da 100644 --- a/tests/test_chaisemartin_dhaultfoeuille.py +++ b/tests/test_chaisemartin_dhaultfoeuille.py @@ -5871,6 +5871,80 @@ def test_path_sup_t_bands_empty_dict_when_no_complete_window(self): for l_h, h in entry["horizons"].items(): assert "cband_conf_int" not in h + def test_path_sup_t_to_dataframe_emits_cband_columns(self): + """``to_dataframe(level="by_path")`` includes ``cband_lower`` / + ``cband_upper`` columns mirroring the OVERALL + ``level="event_study"`` table at ``:1495-1496,1531-1532``. + + For positive-horizon rows of paths with a finite sup-t crit, + the columns equal the per-horizon ``cband_conf_int`` tuple. For + placebo rows (negative horizons) and rows of paths absent from + ``path_sup_t_bands``, the columns are NaN. The empty-window + fallback (``path_effects == {}``) also includes the columns in + its canonical schema.""" + data = _by_path_three_path_data() + _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200) + df = res.to_dataframe(level="by_path") + assert "cband_lower" in df.columns + assert "cband_upper" in df.columns + # Per-row alignment with `path_effects[path]["horizons"][l] + # ["cband_conf_int"]`. Only positive horizons can have populated + # cband (placebos and unbanded paths get NaN). + for _, row in df.iterrows(): + path = row["path"] + horizon = int(row["horizon"]) + if horizon > 0 and path in res.path_sup_t_bands: + # Should match the horizon's cband_conf_int. + expected_cband = res.path_effects[path]["horizons"][horizon].get( + "cband_conf_int" + ) + if expected_cband is not None: + np.testing.assert_allclose(row["cband_lower"], expected_cband[0]) + np.testing.assert_allclose(row["cband_upper"], expected_cband[1]) + else: + assert np.isnan(row["cband_lower"]), ( + f"path={path} horizon={horizon}: cband_lower should be NaN " + f"(placebo / unbanded path), got {row['cband_lower']}" + ) + assert np.isnan(row["cband_upper"]) + + def test_path_sup_t_to_dataframe_empty_path_fallback_has_cband_columns(self): + """The ``path_effects == {}`` fallback DataFrame schema includes + the cband columns for parity with the populated-path schema.""" + rng = np.random.default_rng(0) + rows = [] + # Empty-window panel: switchers at t=3, L_max=3 -> window past panel. + for g in (1, 2, 3, 4): + for t in range(4): + d = 1 if t >= 3 else 0 + rows.append( + {"group": g, "period": t, "treatment": d, "outcome": rng.normal()} + ) + for g in (5, 6): + for t in range(4): + rows.append( + {"group": g, "period": t, "treatment": 0, "outcome": rng.normal()} + ) + data = pd.DataFrame(rows) + with warnings.catch_warnings(): + warnings.simplefilter("ignore", UserWarning) + est = ChaisemartinDHaultfoeuille( + drop_larger_lower=False, by_path=3, twfe_diagnostic=False, placebo=False + ) + res = est.fit( + data, + outcome="outcome", + group="group", + time="period", + treatment="treatment", + L_max=3, + ) + assert res.path_effects == {} + df = res.to_dataframe(level="by_path") + assert df.empty + assert "cband_lower" in df.columns + assert "cband_upper" in df.columns + def test_path_sup_t_strict_majority_gate_at_exact_50pct(self, monkeypatch): """The 50%-finite-draws gate is **strict majority**, not >=: the implementation requires ``finite_mask.sum() > 0.5 * From 1febbb190836089855af8f9ccfbe825f1adce840 Mon Sep 17 00:00:00 2001 From: igerber Date: Sat, 25 Apr 2026 15:26:04 -0400 Subject: [PATCH 3/3] Address PR #374 R5 P3: stale to_dataframe docstring + Mammen/Webb coverage P3 #1: ``to_dataframe`` method docstring at ``chaisemartin_dhaultfoeuille_results.py:1375-1379`` listed the pre-change ``level="by_path"`` schema (no ``cband_*`` columns) even though the implementation now returns them. Updated the bullet to include ``cband_lower / cband_upper``, document the negative-horizon placebo convention, and document the NaN-on-absent-band behavior. P3 #2: ``TestByPathSupTBands::test_path_sup_t_seed_reproducibility`` only exercised the default ``rademacher`` weight family. Parameterized over ``["rademacher", "mammen", "webb"]`` to pin that the per-path sup-t branch correctly threads ``self.bootstrap_weights`` through ``_generate_psu_or_group_weights`` for all three multiplier families the feature advertises. The existing OVERALL machinery handles all three uniformly, but the per-path surface lacked direct coverage. Each variant must produce a finite, reproducible crit on the standard 3-path fixture. 17 tests pass on TestByPathSupTBands (was 15: +2 new parameterized variants on the existing seed_reproducibility test). Co-Authored-By: Claude Opus 4.7 (1M context) --- .../chaisemartin_dhaultfoeuille_results.py | 13 +++++- tests/test_chaisemartin_dhaultfoeuille.py | 41 ++++++++++++++++--- 2 files changed, 47 insertions(+), 7 deletions(-) diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py index e93db4bd..8f090a58 100644 --- a/diff_diff/chaisemartin_dhaultfoeuille_results.py +++ b/diff_diff/chaisemartin_dhaultfoeuille_results.py @@ -1373,10 +1373,19 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame: - ``"design2"``: Design-2 switch-in/switch-out descriptive summary. Available when ``design2=True``. - ``"by_path"``: one row per (path, horizon) when - ``by_path=k`` was passed to the estimator. Columns include + ``by_path=k`` was passed to the estimator. Columns: ``path``, ``frequency_rank``, ``n_groups``, ``horizon``, ``effect``, ``se``, ``t_stat``, ``p_value``, - ``conf_int_lower``, ``conf_int_upper``, ``n_obs``. + ``conf_int_lower``, ``conf_int_upper``, ``n_obs``, + ``cband_lower``, ``cband_upper``. The ``horizon`` column + takes negative ints for placebo rows when + ``placebo=True``. The ``cband_*`` columns mirror the + OVERALL ``level="event_study"`` schema (joint sup-t + simultaneous bands); they are populated for positive- + horizon rows of paths with a finite per-path sup-t crit + (``n_bootstrap > 0``) and NaN otherwise (placebo rows, + unbanded paths, or the requested-but-empty fallback + DataFrame). Returns ------- diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py index 1d51b9da..ed80f83c 100644 --- a/tests/test_chaisemartin_dhaultfoeuille.py +++ b/tests/test_chaisemartin_dhaultfoeuille.py @@ -5661,23 +5661,54 @@ def test_path_sup_t_crit_finite_and_positive(self): assert entry["method"] == "multiplier_bootstrap" assert entry["n_valid_horizons"] >= 2 - def test_path_sup_t_seed_reproducibility(self): - """Same seed -> bit-identical ``crit_value`` for every path.""" + @pytest.mark.parametrize("bootstrap_weights", ["rademacher", "mammen", "webb"]) + def test_path_sup_t_seed_reproducibility(self, bootstrap_weights): + """Same seed -> bit-identical ``crit_value`` for every path, + across all three multiplier-weight families. Pins that the + per-path sup-t branch correctly threads ``bootstrap_weights`` + through ``_generate_psu_or_group_weights`` and that + Rademacher / Mammen / Webb each produce a finite, reproducible + crit (the helper handles all three uniformly under the + existing OVERALL sup-t machinery; this is a per-path direct + regression on that contract).""" data = _by_path_three_path_data() _est_a, res_a = self._fit_with_bootstrap( - data, by_path=3, L_max=3, n_bootstrap=200, seed=42 + data, + by_path=3, + L_max=3, + n_bootstrap=200, + seed=42, + bootstrap_weights=bootstrap_weights, ) _est_b, res_b = self._fit_with_bootstrap( - data, by_path=3, L_max=3, n_bootstrap=200, seed=42 + data, + by_path=3, + L_max=3, + n_bootstrap=200, + seed=42, + bootstrap_weights=bootstrap_weights, ) assert res_a.path_sup_t_bands is not None assert res_b.path_sup_t_bands is not None assert set(res_a.path_sup_t_bands.keys()) == set(res_b.path_sup_t_bands.keys()) + # At least one path should produce a finite crit on this fixture + # (3 paths each with 3 valid horizons under all three weight + # families); pinning that the new dispatch path actually fires + # for `mammen` / `webb`, not just `rademacher`. + assert len(res_a.path_sup_t_bands) >= 1, ( + f"bootstrap_weights={bootstrap_weights}: expected at least " + f"one path with a finite crit; got empty dict" + ) for path in res_a.path_sup_t_bands: crit_a = res_a.path_sup_t_bands[path]["crit_value"] crit_b = res_b.path_sup_t_bands[path]["crit_value"] + assert np.isfinite(crit_a), ( + f"bootstrap_weights={bootstrap_weights} path={path}: " + f"crit_value not finite ({crit_a})" + ) assert crit_a == crit_b, ( - f"path={path}: seed-pinned crits diverge: {crit_a} vs {crit_b}" + f"bootstrap_weights={bootstrap_weights} path={path}: " + f"seed-pinned crits diverge: {crit_a} vs {crit_b}" ) def test_path_sup_t_skipped_when_path_has_only_one_valid_horizon(self):