igerber · igerber · Apr 25, 2026 · Apr 25, 2026 · Apr 25, 2026 · Apr 25, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 - **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology.
+- **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`); as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`; and as `cband_lower` / `cband_upper` columns on `results.to_dataframe(level="by_path")` (mirrors the OVERALL `level="event_study"` schema; positive-horizon rows of banded paths get populated values, placebo / unbanded / empty-window rows get NaN). Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract.
 - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract.
 - **Tutorial 19: dCDH for Marketing Pulse Campaigns** (`docs/tutorials/19_dcdh_marketing_pulse.ipynb`) — end-to-end practitioner walkthrough on a 60-market reversible-treatment panel covering the TWFE decomposition diagnostic (`twowayfeweights`), `DCDH` Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo), the `L_max` multi-horizon event study with multiplier bootstrap, a stakeholder communication template, and drift guards. README listing for Tutorial 17 (Brand Awareness Survey) backfilled in the same edit. Cross-link from `docs/practitioner_decision_tree.rst` § "Reversible Treatment" added.
 

diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -431,6 +431,24 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin):
         cross-path cohort-sharing deviation from R is inherited from
         the analytical event-study path.
 
+        With ``n_bootstrap > 0``, per-path joint sup-t simultaneous
+        confidence bands are also computed across horizons
+        ``1..L_max`` within each path. A path-specific critical value
+        ``c_p`` (constructed from a fresh shared-weights multiplier-
+        bootstrap draw per path) is surfaced at top level as
+        ``results.path_sup_t_bands[path] = {"crit_value", "alpha",
+        "n_bootstrap", "method", "n_valid_horizons"}``, applied
+        per-horizon as ``cband_conf_int`` on
+        ``path_effects[path]["horizons"][l]``, and rendered as
+        ``cband_lower`` / ``cband_upper`` columns on
+        ``results.to_dataframe(level="by_path")`` (mirroring the
+        OVERALL ``level="event_study"`` schema). Bands cover joint
+        inference WITHIN a single path across horizons; they do NOT
+        provide simultaneous coverage across paths. Python-only
+        library extension; R ``did_multiplegt_dyn`` provides no joint
+        bands at any surface. See REGISTRY.md ``Note (Phase 3 by_path
+        per-path joint sup-t bands)``.
+
         SE convention: per-path IF parallels the joiners / leavers
         construction — the switcher-side contribution is zeroed for
         groups not in the selected path, and the cohort structure and
@@ -2986,6 +3004,33 @@ def fit(
                         path_placebos[path_key][neg_key]["conf_int"] = (np.nan, np.nan)
                         path_placebos[path_key][neg_key]["t_stat"] = np.nan
 
+        # Phase 3: propagate per-path sup-t critical values to per-
+        # horizon `cband_conf_int` entries on path_effects (by_path +
+        # n_bootstrap > 0). Sibling of the OVERALL event-study cband
+        # propagation at `:2865-2875`. For each path with a finite
+        # crit, write `cband_conf_int = (eff - c_p*se, eff + c_p*se)`
+        # into each horizon's dict whose bootstrap-replaced SE is
+        # finite > 0. Mirror the OVERALL absent-key pattern: non-finite
+        # SE horizons simply don't get the `cband_conf_int` key.
+        if (
+            bootstrap_results is not None
+            and bootstrap_results.path_cband_crit_values is not None
+            and path_effects is not None
+        ):
+            for path_key, crit in bootstrap_results.path_cband_crit_values.items():
+                if path_key not in path_effects:
+                    continue
+                if not np.isfinite(crit):
+                    continue
+                for l_h, h_dict in path_effects[path_key]["horizons"].items():
+                    se = h_dict.get("se", np.nan)
+                    eff = h_dict.get("effect", np.nan)
+                    if np.isfinite(se) and se > 0:
+                        h_dict["cband_conf_int"] = (
+                            eff - crit * se,
+                            eff + crit * se,
+                        )
+
         # When L_max >= 1 and the per-group path is active, sync
         # overall_* from event_study_effects[1] AFTER bootstrap propagation
         # so that bootstrap SE/p/CI flow to the top-level surface.
@@ -3618,6 +3663,45 @@ def fit(
             ),
             path_effects=path_effects,
             path_placebo_event_study=path_placebos,
+            path_sup_t_bands=(
+                # When by_path + n_bootstrap > 0 is active, surface a
+                # dict (possibly empty) — preserving the documented
+                # `None` (not requested) vs `{}` (requested but empty)
+                # contract that mirrors `path_effects` / `path_placebo_
+                # event_study` empty-state behavior. The empty case
+                # arises in two ways:
+                #   1. `path_effects == {}` — no observed path has a
+                #      complete window; the per-path bootstrap collector
+                #      is skipped upstream and `path_cband_crit_values`
+                #      stays `None`. We materialize `{}` here.
+                #   2. Bootstrap ran but no path passed both gates
+                #      (>=2 valid horizons AND a strict majority — more
+                #      than 50% — of finite sup-t draws);
+                #      `path_cband_crit_values == {}` — passes through.
+                {
+                    path_key: {
+                        "crit_value": crit,
+                        "alpha": self.alpha,
+                        "n_bootstrap": self.n_bootstrap,
+                        "method": "multiplier_bootstrap",
+                        "n_valid_horizons": (
+                            bootstrap_results.path_cband_n_valid_horizons.get(path_key, 0)
+                            if bootstrap_results is not None
+                            and bootstrap_results.path_cband_n_valid_horizons is not None
+                            else 0
+                        ),
+                    }
+                    for path_key, crit in (
+                        bootstrap_results.path_cband_crit_values
+                        if bootstrap_results is not None
+                        and bootstrap_results.path_cband_crit_values is not None
+                        else {}
+                    ).items()
+                    if np.isfinite(crit)
+                }
+                if (self.by_path is not None and self.n_bootstrap > 0)
+                else None
+            ),
             survey_metadata=survey_metadata,
             _estimator_ref=self,
         )

diff --git a/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py b/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py
@@ -778,6 +778,94 @@ def _compute_dcdh_bootstrap(
             results.path_placebo_cis = path_pl_cis
             results.path_placebo_p_values = path_pl_pvals
 
+        # --- Phase 3: Per-path joint sup-t (by_path + n_bootstrap > 0) ---
+        # Sibling of the OVERALL event-study sup-t at the multi-horizon
+        # block above (`:599-614`). Per-path joint simultaneous
+        # confidence bands across horizons 1..L_max within each path:
+        # one shared (n_bootstrap, n_eligible) multiplier weight matrix
+        # (using `self.bootstrap_weights` — Rademacher / Mammen / Webb)
+        # per path is broadcast across all valid horizons of that path,
+        # producing correlated bootstrap distributions across horizons.
+        # The path-specific critical value
+        # `c_p = quantile(max_l |t_l|, 1-alpha)` is the band half-width
+        # multiplier applied to each horizon's bootstrap SE in fit().
+        #
+        # Note (asymmetry vs OVERALL): this draws a FRESH shared-weights
+        # matrix per path AFTER the per-path SE block above has populated
+        # results.path_ses via independent per-(path, horizon) draws.
+        # Numerator: fresh shared draws; denominator: bootstrap SEs from
+        # the earlier independent draws. Asymptotically equivalent to
+        # OVERALL's self-consistent reuse, but NOT bit-identical. The
+        # fresh draw is intentional: it preserves RNG-state isolation
+        # for existing per-path SE seed-reproducibility tests.
+        #
+        # Gates: a path needs >=2 valid horizons (finite bootstrap SE>0)
+        # AND a strict majority (>50%) of finite sup-t draws to receive
+        # a band. Otherwise the path is absent from
+        # path_cband_crit_values (mirrors OVERALL absent-key pattern at
+        # `:605,612`; the strict-majority gate matches the OVERALL
+        # `finite_mask.sum() > 0.5 * n_bootstrap` semantics — exactly
+        # half finite is NOT enough).
+        if path_bootstrap_inputs is not None and results.path_ses:
+            path_cband_crits: Dict[Tuple[int, ...], float] = {}
+            path_cband_n_valid: Dict[Tuple[int, ...], int] = {}
+
+            for path_key, horizon_inputs in path_bootstrap_inputs.items():
+                bs_ses_for_path = results.path_ses.get(path_key, {})
+                valid_horizons = []
+                for l_h, (u_h, n_h, eff_h, _u_pp_h) in sorted(horizon_inputs.items()):
+                    if u_h.size == 0 or n_h <= 0:
+                        continue
+                    bs_se = bs_ses_for_path.get(l_h, np.nan)
+                    if not np.isfinite(bs_se) or bs_se <= 0:
+                        continue
+                    valid_horizons.append((l_h, u_h, n_h, eff_h, bs_se))
+
+                if len(valid_horizons) < 2:
+                    continue
+
+                # All horizons within a path use the same n_eligible
+                # (variance-eligible group ordering enforced by
+                # _collect_path_bootstrap_inputs's use of
+                # eligible_mask_var for cohort-recentering); use the
+                # first valid horizon's IF size as the shared dim.
+                n_dim = valid_horizons[0][1].size
+                map_path = _map_for_target(
+                    n_dim,
+                    group_id_to_psu_code,
+                    eligible_group_ids,
+                )
+                with np.errstate(invalid="ignore", divide="ignore"):
+                    shared_weights = _generate_psu_or_group_weights(
+                        n_bootstrap=self.n_bootstrap,
+                        n_groups_target=n_dim,
+                        weight_type=self.bootstrap_weights,
+                        rng=rng,
+                        group_to_psu_map=map_path,
+                    )
+                    es_dists_path = []
+                    for _l_h, u_h, n_h, eff_h, _bs_se in valid_horizons:
+                        deviations = (shared_weights @ u_h) / n_h
+                        es_dists_path.append(eff_h + deviations)
+                    boot_matrix = np.asarray(es_dists_path)
+                    effects_vec = np.array([v[3] for v in valid_horizons])
+                    ses_vec = np.array([v[4] for v in valid_horizons])
+                    t_stats = np.abs((boot_matrix - effects_vec[:, None]) / ses_vec[:, None])
+                    sup_t_dist = np.max(t_stats, axis=0)
+                    finite_mask = np.isfinite(sup_t_dist)
+                    if finite_mask.sum() <= 0.5 * self.n_bootstrap:
+                        continue
+                    crit_p = float(np.quantile(sup_t_dist[finite_mask], 1.0 - self.alpha))
+
+                if not np.isfinite(crit_p):
+                    continue
+
+                path_cband_crits[path_key] = crit_p
+                path_cband_n_valid[path_key] = len(valid_horizons)
+
+            results.path_cband_crit_values = path_cband_crits
+            results.path_cband_n_valid_horizons = path_cband_n_valid
+
         return results