From c0c0d4e2aecb395f3ab58ccf0d534193965127f1 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 25 Apr 2026 15:00:12 -0400
Subject: [PATCH 1/3] Add per-path joint sup-t bands to
 ChaisemartinDHaultfoeuille.by_path
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t
simultaneous confidence bands are now computed across horizons
`1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)`
multiplier weight matrix (using the estimator's configured
`bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path
and broadcast across all valid horizons, producing correlated
bootstrap distributions. The path-specific critical value
`c_p = quantile(max_l |t_l|, 1-α)` is applied per horizon as
`cband_conf_int = (eff - c_p·se, eff + c_p·se)` and surfaced at top
level as `results.path_sup_t_bands[path]`.

Closes Wave 2 #4 of the by_path follow-up sequence (#357 foundation,
#360 R-parity, #364 bootstrap, #371 placebos).

**Methodology asymmetry vs OVERALL** (intentional, documented):
per-path sup-t draws fresh shared weights AFTER the per-path SE
bootstrap block has populated `path_ses` via independent per-(path,
horizon) draws. Asymptotically equivalent to OVERALL's self-consistent
reuse but NOT bit-identical. Preserves RNG-state isolation for
existing per-path SE seed-reproducibility tests.

**Gates** mirror OVERALL: `>=2` valid horizons (finite bootstrap SE > 0)
AND a strict majority (more than 50%) of finite sup-t draws to receive
a band. Otherwise the path is absent from `path_sup_t_bands`.

**Empty-state contract**: `path_sup_t_bands is None` when not requested
(no bootstrap or `by_path is None`); `{}` when requested but no path
passes both gates (covers two cases: `path_effects == {}` upstream OR
all paths fail gates downstream).

**Deviation from R**: `did_multiplegt_dyn` provides no joint / sup-t
bands at any surface — Python-only methodology extension consistent
with the existing OVERALL `event_study_sup_t_bands` (also Python-only).
Inherits the cross-path cohort-sharing SE deviation from R documented
for `path_effects`.

**Bundled pre-audit fix** (sibling-surface check): the existing OVERALL
`sup_t_bands` field's stale "Phase 2 placeholder" docstring updated to
the actual contract description.

Tests: new `TestByPathSupTBands` class with 13 tests covering: attr
None when no bootstrap / no by_path; keys match `path_effects` with
finite crit; band wider than pointwise; crit finite and positive;
seed reproducibility; single-horizon-path-skip; L_max=1 skip;
n_valid_horizons matches; absent-path-no-cband-keys; summary renders;
empty-dict-when-no-complete-window; strict-majority-gate-at-exact-50pct
(monkeypatches the weight generator to inject NaN into half the
bootstrap rows, asserting both `sup_t_bands is None` and
`path_sup_t_bands == {}` at the boundary). All `@pytest.mark.slow`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                                  |   1 +
 diff_diff/chaisemartin_dhaultfoeuille.py      |  81 +++
 .../chaisemartin_dhaultfoeuille_bootstrap.py  |  88 ++++
 .../chaisemartin_dhaultfoeuille_results.py    |  80 ++-
 docs/api/chaisemartin_dhaultfoeuille.rst      |   5 +-
 docs/methodology/REGISTRY.md                  |   2 +
 tests/test_chaisemartin_dhaultfoeuille.py     | 475 ++++++++++++++++++
 7 files changed, 730 insertions(+), 2 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 5ee0ce38..a902c296 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 - **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology.
+- **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`) and as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`. Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract.
 - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract.
 - **Tutorial 19: dCDH for Marketing Pulse Campaigns** (`docs/tutorials/19_dcdh_marketing_pulse.ipynb`) — end-to-end practitioner walkthrough on a 60-market reversible-treatment panel covering the TWFE decomposition diagnostic (`twowayfeweights`), `DCDH` Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo), the `L_max` multi-horizon event study with multiplier bootstrap, a stakeholder communication template, and drift guards. README listing for Tutorial 17 (Brand Awareness Survey) backfilled in the same edit. Cross-link from `docs/practitioner_decision_tree.rst` § "Reversible Treatment" added.
 
diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
index ef901ad3..f626199b 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -431,6 +431,21 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin):
         cross-path cohort-sharing deviation from R is inherited from
         the analytical event-study path.
 
+        With ``n_bootstrap > 0``, per-path joint sup-t simultaneous
+        confidence bands are also computed across horizons
+        ``1..L_max`` within each path. A path-specific critical value
+        ``c_p`` (constructed from a fresh shared-weights multiplier-
+        bootstrap draw per path) is surfaced at top level as
+        ``results.path_sup_t_bands[path] = {"crit_value", "alpha",
+        "n_bootstrap", "method", "n_valid_horizons"}`` and applied
+        per-horizon as ``cband_conf_int`` on
+        ``path_effects[path]["horizons"][l]``. Bands cover joint
+        inference WITHIN a single path across horizons; they do NOT
+        provide simultaneous coverage across paths. Python-only
+        library extension; R ``did_multiplegt_dyn`` provides no joint
+        bands at any surface. See REGISTRY.md ``Note (Phase 3 by_path
+        per-path joint sup-t bands)``.
+
         SE convention: per-path IF parallels the joiners / leavers
         construction — the switcher-side contribution is zeroed for
         groups not in the selected path, and the cohort structure and
@@ -2986,6 +3001,33 @@ def fit(
                         path_placebos[path_key][neg_key]["conf_int"] = (np.nan, np.nan)
                         path_placebos[path_key][neg_key]["t_stat"] = np.nan
 
+        # Phase 3: propagate per-path sup-t critical values to per-
+        # horizon `cband_conf_int` entries on path_effects (by_path +
+        # n_bootstrap > 0). Sibling of the OVERALL event-study cband
+        # propagation at `:2865-2875`. For each path with a finite
+        # crit, write `cband_conf_int = (eff - c_p*se, eff + c_p*se)`
+        # into each horizon's dict whose bootstrap-replaced SE is
+        # finite > 0. Mirror the OVERALL absent-key pattern: non-finite
+        # SE horizons simply don't get the `cband_conf_int` key.
+        if (
+            bootstrap_results is not None
+            and bootstrap_results.path_cband_crit_values is not None
+            and path_effects is not None
+        ):
+            for path_key, crit in bootstrap_results.path_cband_crit_values.items():
+                if path_key not in path_effects:
+                    continue
+                if not np.isfinite(crit):
+                    continue
+                for l_h, h_dict in path_effects[path_key]["horizons"].items():
+                    se = h_dict.get("se", np.nan)
+                    eff = h_dict.get("effect", np.nan)
+                    if np.isfinite(se) and se > 0:
+                        h_dict["cband_conf_int"] = (
+                            eff - crit * se,
+                            eff + crit * se,
+                        )
+
         # When L_max >= 1 and the per-group path is active, sync
         # overall_* from event_study_effects[1] AFTER bootstrap propagation
         # so that bootstrap SE/p/CI flow to the top-level surface.
@@ -3618,6 +3660,45 @@ def fit(
             ),
             path_effects=path_effects,
             path_placebo_event_study=path_placebos,
+            path_sup_t_bands=(
+                # When by_path + n_bootstrap > 0 is active, surface a
+                # dict (possibly empty) — preserving the documented
+                # `None` (not requested) vs `{}` (requested but empty)
+                # contract that mirrors `path_effects` / `path_placebo_
+                # event_study` empty-state behavior. The empty case
+                # arises in two ways:
+                #   1. `path_effects == {}` — no observed path has a
+                #      complete window; the per-path bootstrap collector
+                #      is skipped upstream and `path_cband_crit_values`
+                #      stays `None`. We materialize `{}` here.
+                #   2. Bootstrap ran but no path passed both gates
+                #      (>=2 valid horizons AND a strict majority — more
+                #      than 50% — of finite sup-t draws);
+                #      `path_cband_crit_values == {}` — passes through.
+                {
+                    path_key: {
+                        "crit_value": crit,
+                        "alpha": self.alpha,
+                        "n_bootstrap": self.n_bootstrap,
+                        "method": "multiplier_bootstrap",
+                        "n_valid_horizons": (
+                            bootstrap_results.path_cband_n_valid_horizons.get(path_key, 0)
+                            if bootstrap_results is not None
+                            and bootstrap_results.path_cband_n_valid_horizons is not None
+                            else 0
+                        ),
+                    }
+                    for path_key, crit in (
+                        bootstrap_results.path_cband_crit_values
+                        if bootstrap_results is not None
+                        and bootstrap_results.path_cband_crit_values is not None
+                        else {}
+                    ).items()
+                    if np.isfinite(crit)
+                }
+                if (self.by_path is not None and self.n_bootstrap > 0)
+                else None
+            ),
             survey_metadata=survey_metadata,
             _estimator_ref=self,
         )
diff --git a/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py b/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py
index 5047cd64..a63aad07 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille_bootstrap.py
@@ -778,6 +778,94 @@ def _compute_dcdh_bootstrap(
             results.path_placebo_cis = path_pl_cis
             results.path_placebo_p_values = path_pl_pvals
 
+        # --- Phase 3: Per-path joint sup-t (by_path + n_bootstrap > 0) ---
+        # Sibling of the OVERALL event-study sup-t at the multi-horizon
+        # block above (`:599-614`). Per-path joint simultaneous
+        # confidence bands across horizons 1..L_max within each path:
+        # one shared (n_bootstrap, n_eligible) multiplier weight matrix
+        # (using `self.bootstrap_weights` — Rademacher / Mammen / Webb)
+        # per path is broadcast across all valid horizons of that path,
+        # producing correlated bootstrap distributions across horizons.
+        # The path-specific critical value
+        # `c_p = quantile(max_l |t_l|, 1-alpha)` is the band half-width
+        # multiplier applied to each horizon's bootstrap SE in fit().
+        #
+        # Note (asymmetry vs OVERALL): this draws a FRESH shared-weights
+        # matrix per path AFTER the per-path SE block above has populated
+        # results.path_ses via independent per-(path, horizon) draws.
+        # Numerator: fresh shared draws; denominator: bootstrap SEs from
+        # the earlier independent draws. Asymptotically equivalent to
+        # OVERALL's self-consistent reuse, but NOT bit-identical. The
+        # fresh draw is intentional: it preserves RNG-state isolation
+        # for existing per-path SE seed-reproducibility tests.
+        #
+        # Gates: a path needs >=2 valid horizons (finite bootstrap SE>0)
+        # AND a strict majority (>50%) of finite sup-t draws to receive
+        # a band. Otherwise the path is absent from
+        # path_cband_crit_values (mirrors OVERALL absent-key pattern at
+        # `:605,612`; the strict-majority gate matches the OVERALL
+        # `finite_mask.sum() > 0.5 * n_bootstrap` semantics — exactly
+        # half finite is NOT enough).
+        if path_bootstrap_inputs is not None and results.path_ses:
+            path_cband_crits: Dict[Tuple[int, ...], float] = {}
+            path_cband_n_valid: Dict[Tuple[int, ...], int] = {}
+
+            for path_key, horizon_inputs in path_bootstrap_inputs.items():
+                bs_ses_for_path = results.path_ses.get(path_key, {})
+                valid_horizons = []
+                for l_h, (u_h, n_h, eff_h, _u_pp_h) in sorted(horizon_inputs.items()):
+                    if u_h.size == 0 or n_h <= 0:
+                        continue
+                    bs_se = bs_ses_for_path.get(l_h, np.nan)
+                    if not np.isfinite(bs_se) or bs_se <= 0:
+                        continue
+                    valid_horizons.append((l_h, u_h, n_h, eff_h, bs_se))
+
+                if len(valid_horizons) < 2:
+                    continue
+
+                # All horizons within a path use the same n_eligible
+                # (variance-eligible group ordering enforced by
+                # _collect_path_bootstrap_inputs's use of
+                # eligible_mask_var for cohort-recentering); use the
+                # first valid horizon's IF size as the shared dim.
+                n_dim = valid_horizons[0][1].size
+                map_path = _map_for_target(
+                    n_dim,
+                    group_id_to_psu_code,
+                    eligible_group_ids,
+                )
+                with np.errstate(invalid="ignore", divide="ignore"):
+                    shared_weights = _generate_psu_or_group_weights(
+                        n_bootstrap=self.n_bootstrap,
+                        n_groups_target=n_dim,
+                        weight_type=self.bootstrap_weights,
+                        rng=rng,
+                        group_to_psu_map=map_path,
+                    )
+                    es_dists_path = []
+                    for _l_h, u_h, n_h, eff_h, _bs_se in valid_horizons:
+                        deviations = (shared_weights @ u_h) / n_h
+                        es_dists_path.append(eff_h + deviations)
+                    boot_matrix = np.asarray(es_dists_path)
+                    effects_vec = np.array([v[3] for v in valid_horizons])
+                    ses_vec = np.array([v[4] for v in valid_horizons])
+                    t_stats = np.abs((boot_matrix - effects_vec[:, None]) / ses_vec[:, None])
+                    sup_t_dist = np.max(t_stats, axis=0)
+                    finite_mask = np.isfinite(sup_t_dist)
+                    if finite_mask.sum() <= 0.5 * self.n_bootstrap:
+                        continue
+                    crit_p = float(np.quantile(sup_t_dist[finite_mask], 1.0 - self.alpha))
+
+                if not np.isfinite(crit_p):
+                    continue
+
+                path_cband_crits[path_key] = crit_p
+                path_cband_n_valid[path_key] = len(valid_horizons)
+
+            results.path_cband_crit_values = path_cband_crits
+            results.path_cband_n_valid_horizons = path_cband_n_valid
+
         return results
 
 
diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py
index f7596ecc..c464abc8 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille_results.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille_results.py
@@ -161,6 +161,21 @@ class DCDHBootstrapResults:
         default=None, repr=False
     )
 
+    # --- Phase 3: per-path joint sup-t critical values (by_path + n_bootstrap > 0) ---
+    # Per-path sup-t simultaneous-band critical value `c_p =
+    # quantile(max_l |t_l|, 1-alpha)` from a fresh shared-weights
+    # multiplier-bootstrap draw per path. Naming parity with the OVERALL
+    # `cband_crit_value` scalar at line 131 (singular -> plural since one
+    # crit per path). Gates: a path appears only when (>=2 valid horizons
+    # with finite bootstrap SE > 0) AND (a strict majority — more than
+    # 50% — of sup-t draws are finite); paths failing either gate are
+    # absent from the dict. `None` when bootstrap didn't run; empty dict
+    # when ran but no path passed both gates.
+    path_cband_crit_values: Optional[Dict[Tuple[int, ...], float]] = field(default=None, repr=False)
+    path_cband_n_valid_horizons: Optional[Dict[Tuple[int, ...], int]] = field(
+        default=None, repr=False
+    )
+
 
 @dataclass
 class ChaisemartinDHaultfoeuilleResults:
@@ -354,7 +369,16 @@ class ChaisemartinDHaultfoeuilleResults:
     cost_benefit_delta : dict, optional
         Cost-benefit aggregate ``delta``. Populated when ``L_max >= 2``.
     sup_t_bands : dict, optional
-        Phase 2 placeholder (sup-t simultaneous confidence bands).
+        Sup-t simultaneous confidence-band metadata for the OVERALL
+        event-study surface. Holds ``{"crit_value": float, "alpha":
+        float, "n_bootstrap": int, "method": str}``. Populated when
+        ``n_bootstrap > 0`` AND there are at least 2 valid horizons
+        with finite bootstrap SE > 0 AND a strict majority (more than
+        50%) of sup-t draws are finite. The band itself is written
+        per-horizon as
+        ``cband_conf_int`` on ``event_study_effects[l]``. ``None``
+        otherwise. Python-only library extension; R
+        ``did_multiplegt_dyn`` provides no joint / sup-t bands.
     covariate_residuals : pd.DataFrame, optional
         ``DID^X`` first-stage diagnostics: per-baseline ``theta_hat``,
         ``n_obs``, and ``r_squared``. Populated when ``controls`` is set.
@@ -394,6 +418,27 @@ class ChaisemartinDHaultfoeuilleResults:
         cohort-sharing SE deviation from R documented for
         ``path_effects``. See REGISTRY.md
         ``Note (Phase 3 by_path ...)`` → "Per-path placebos".
+    path_sup_t_bands : dict, optional
+        Per-path joint sup-t simultaneous-band metadata, keyed by
+        observed treatment trajectory (tuple of int). Each entry holds
+        ``{"crit_value": float, "alpha": float, "n_bootstrap": int,
+        "method": str, "n_valid_horizons": int}``. Populated when
+        ``by_path`` is a positive int AND ``n_bootstrap > 0``. The
+        band itself is applied per-horizon as ``cband_conf_int`` on
+        ``path_effects[path]["horizons"][l]``. Empty-state contract:
+        ``None`` when not requested (no bootstrap or ``by_path is None``);
+        ``{}`` when requested but no path passed both gates (``>=2``
+        valid horizons with finite bootstrap SE ``> 0`` AND a strict
+        majority — more than 50% — of finite sup-t draws). Bands
+        cover joint inference WITHIN a
+        single path across horizons; they do NOT provide simultaneous
+        coverage across paths. Inherits the cross-path cohort-sharing
+        SE deviation from R documented for ``path_effects`` (the
+        bootstrap SE used as the t-stat denominator carries the same
+        deviation). Python-only library extension; R
+        ``did_multiplegt_dyn`` provides no joint / sup-t bands at any
+        surface. See REGISTRY.md ``Note (Phase 3 by_path per-path
+        joint sup-t bands)``.
     honest_did_results : HonestDiDResults, optional
         HonestDiD sensitivity analysis bounds (Rambachan & Roth 2023).
         Populated when ``honest_did=True`` in ``fit()`` or by calling
@@ -505,6 +550,23 @@ class ChaisemartinDHaultfoeuilleResults:
     path_placebo_event_study: Optional[Dict[Tuple[int, ...], Dict[int, Dict[str, Any]]]] = field(
         default=None, repr=False
     )
+    # Per-path joint sup-t simultaneous-band metadata. Keyed by path
+    # tuple; each entry holds `{"crit_value", "alpha", "n_bootstrap",
+    # "method", "n_valid_horizons"}`. Populated when `by_path` is a
+    # positive int AND `n_bootstrap > 0`. The joint band itself is
+    # written per-horizon as `cband_conf_int` on
+    # `path_effects[path]["horizons"][l]` (mirrors the OVERALL
+    # `event_study_effects[l]["cband_conf_int"]` pattern at
+    # `chaisemartin_dhaultfoeuille.py:2865-2875`). Empty-state contract:
+    # `None` when not requested (no bootstrap or `by_path is None`); `{}`
+    # when requested but no path passed both gates (>=2 valid horizons
+    # AND a strict majority — more than 50% — of finite sup-t draws).
+    # The bands cover joint inference
+    # WITHIN a single path across horizons; they do NOT provide
+    # simultaneous coverage across paths.
+    path_sup_t_bands: Optional[Dict[Tuple[int, ...], Dict[str, Any]]] = field(
+        default=None, repr=False
+    )
     honest_did_results: Optional["HonestDiDResults"] = field(default=None, repr=False)
 
     # --- Repr-suppressed metadata ---
@@ -783,6 +845,9 @@ def summary(self, alpha: Optional[float] = None) -> str:
             for entry in self.path_placebo_event_study.values()
             for h in entry.values()
         )
+        path_sup_t_has_finite_crit = self.path_sup_t_bands is not None and any(
+            np.isfinite(v.get("crit_value", np.nan)) for v in self.path_sup_t_bands.values()
+        )
         any_finite_bootstrap_inference = (
             np.isfinite(self.overall_se)
             or event_study_has_finite_bootstrap_se
@@ -790,6 +855,7 @@ def summary(self, alpha: Optional[float] = None) -> str:
             or leavers_has_finite_bootstrap_se
             or path_effects_has_finite_bootstrap_se
             or path_placebo_has_finite_bootstrap_se
+            or path_sup_t_has_finite_crit
         )
         if self.bootstrap_results is not None and np.isfinite(self.overall_se) and not is_delta:
             lines.append("Note: p-value and CI are multiplier-bootstrap percentile inference")
@@ -823,6 +889,8 @@ def summary(self, alpha: Optional[float] = None) -> str:
                 live_targets.append("per-path")
             if path_placebo_has_finite_bootstrap_se:
                 live_targets.append("per-path placebo")
+            if path_sup_t_has_finite_crit:
+                live_targets.append("per-path sup-t")
             lines.append(
                 f"Note: bootstrap ({self.bootstrap_results.n_bootstrap} iterations) "
                 f"produced non-finite SE on the overall/event-study target; "
@@ -1219,6 +1287,16 @@ def _render_path_effects_section(
                         h["p_value"],
                     )
                 )
+            # Per-path joint sup-t critical value (when populated).
+            # Mirrors the OVERALL sup-t crit print at line ~1019.
+            if self.path_sup_t_bands is not None and path in self.path_sup_t_bands:
+                crit_p = self.path_sup_t_bands[path].get("crit_value", np.nan)
+                if np.isfinite(crit_p):
+                    conf_level = int((1 - self.alpha) * 100)
+                    lines.append(
+                        f"  Sup-t critical value: {crit_p:.4f} "
+                        f"(simultaneous {conf_level}% bands)"
+                    )
             lines.extend([thin])
         lines.extend([""])
 
diff --git a/docs/api/chaisemartin_dhaultfoeuille.rst b/docs/api/chaisemartin_dhaultfoeuille.rst
index 2e6cd093..576c5c6d 100644
--- a/docs/api/chaisemartin_dhaultfoeuille.rst
+++ b/docs/api/chaisemartin_dhaultfoeuille.rst
@@ -16,7 +16,10 @@ covariate adjustment (``controls``); group-specific linear trends
 heterogeneity testing; non-binary treatment; HonestDiD sensitivity
 integration on placebos; survey support via Taylor-series linearization
 (pweight + strata/PSU/FPC); and per-path event-study disaggregation via
-``by_path=k`` (mirrors R ``did_multiplegt_dyn(..., by_path=k)``).
+``by_path=k`` (mirrors R ``did_multiplegt_dyn(..., by_path=k)``,
+including per-path backward placebos and per-path joint sup-t
+simultaneous bands when ``n_bootstrap > 0`` — Python-only extension
+beyond R, which provides no joint bands at any surface).
 
 The estimator:
 
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index e4424dc1..c50dac35 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -640,6 +640,8 @@ The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and
 
 - **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs]`. Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass).
 
+- **Note (Phase 3 `by_path` per-path joint sup-t bands):** When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are computed across horizons `1..L_max` within each path. **Methodology:** a single `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons within the path. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is then used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon, surfaced in `path_effects[path]["horizons"][l]["cband_conf_int"]` and at top-level `results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}`. **Gates:** a path must have `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band; otherwise the path is absent from `path_sup_t_bands`. Both gates mirror the OVERALL `event_study_sup_t_bands` semantics at `chaisemartin_dhaultfoeuille_bootstrap.py:605,612`: `len(valid_horizons) >= 2` AND `finite_mask.sum() > 0.5 * n_bootstrap`. Exactly half-finite draws are NOT enough — the gate is strictly greater than half. **Empty-state contract:** `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL:** OVERALL sup-t reuses the same multi-horizon shared-draw distribution for both the SE in the t-stat denominator and the bootstrap distribution in the numerator. The per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — numerator: fresh shared draws, denominator: bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse, but NOT bit-identical. The fresh draw is intentional: it preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable post-implementation. **Inherited deviation from R:** the bootstrap SE used as the t-stat denominator carries the cross-path cohort-sharing SE deviation from R documented for `path_effects` above; the per-path sup-t crit therefore inherits the same deviation. **Interpretation:** the band covers joint inference *within a single path across horizons*; it does NOT provide simultaneous coverage *across paths* (a different inference target requiring a `path × horizon` re-derivation, deferred to a future wave). **Deviation from R:** `did_multiplegt_dyn` provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension, consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Regression test anchor: `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands`.
+
 **Reference implementation(s):**
 - R: [`DIDmultiplegtDYN`](https://cran.r-project.org/package=DIDmultiplegtDYN) (CRAN, maintained by the paper authors). The Python implementation matches `did_multiplegt_dyn(..., effects=1)` at horizon `l = 1`. Parity tests live in `tests/test_chaisemartin_dhaultfoeuille_parity.py`.
 - Stata: `did_multiplegt_dyn` (SSC, also maintained by the paper authors).
diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py
index cedb90e6..80e914c7 100644
--- a/tests/test_chaisemartin_dhaultfoeuille.py
+++ b/tests/test_chaisemartin_dhaultfoeuille.py
@@ -5485,3 +5485,478 @@ def test_bootstrap_seed_reproducibility(self):
                             f"path={path} lag={lag_key}: seed-pinned SEs "
                             f"diverge: {entry_a['se']} vs {entry_b['se']}"
                         )
+
+
+@pytest.mark.slow
+class TestByPathSupTBands:
+    """``by_path`` combined with ``n_bootstrap > 0`` — per-path joint
+    sup-t simultaneous confidence bands across horizons ``1..L_max``
+    within each path.
+
+    A single shared ``(n_bootstrap, n_eligible)`` multiplier weight
+    matrix (using the estimator's configured ``bootstrap_weights`` —
+    Rademacher / Mammen / Webb) is drawn per path and broadcast across
+    all valid horizons of that path (``finite bootstrap SE > 0``),
+    producing correlated bootstrap distributions across horizons within
+    the path.
+    The path-specific critical value
+    ``c_p = quantile(max_l |t_l|, 1-alpha)`` is then used to construct
+    symmetric joint bands ``effect_l ± c_p · se_l`` per horizon.
+
+    Mirrors the existing OVERALL ``event_study_sup_t_bands`` pattern at
+    ``chaisemartin_dhaultfoeuille_bootstrap.py:599-614``, just stratified
+    by path. Methodology asymmetry (intentional): per-path sup-t draws
+    fresh shared weights AFTER the per-path SE block has populated
+    ``results.path_ses`` via independent per-(path, horizon) draws.
+    Asymptotically equivalent to OVERALL's self-consistent reuse, but
+    NOT bit-identical. See REGISTRY.md for the full contract.
+
+    Marked ``@pytest.mark.slow`` because each test runs a real bootstrap
+    with at least 200 draws to keep MC noise below the wider-than-
+    pointwise tolerance.
+    """
+
+    def _fit_with_bootstrap(
+        self,
+        data,
+        by_path: int,
+        L_max: int = 3,
+        n_bootstrap: int = 200,
+        bootstrap_weights: str = "rademacher",
+        seed: int = 42,
+        placebo: bool = False,
+    ):
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            est = ChaisemartinDHaultfoeuille(
+                drop_larger_lower=False,
+                by_path=by_path,
+                n_bootstrap=n_bootstrap,
+                bootstrap_weights=bootstrap_weights,
+                seed=seed,
+                twfe_diagnostic=False,
+                placebo=placebo,
+            )
+            results = est.fit(
+                data,
+                outcome="outcome",
+                group="group",
+                time="period",
+                treatment="treatment",
+                L_max=L_max,
+            )
+        return est, results
+
+    def test_path_sup_t_bands_attr_none_when_no_bootstrap(self):
+        """``n_bootstrap=0`` -> ``results.path_sup_t_bands is None``."""
+        data = _by_path_three_path_data()
+        _est, res = _fit_by_path(data, by_path=2, L_max=3)
+        assert res.path_sup_t_bands is None
+
+    def test_path_sup_t_bands_attr_none_when_no_by_path(self):
+        """``by_path=None`` -> ``results.path_sup_t_bands is None``
+        even with bootstrap active."""
+        data = _by_path_three_path_data()
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            est = ChaisemartinDHaultfoeuille(
+                drop_larger_lower=False,
+                by_path=None,
+                n_bootstrap=200,
+                seed=42,
+                twfe_diagnostic=False,
+            )
+            res = est.fit(
+                data,
+                outcome="outcome",
+                group="group",
+                time="period",
+                treatment="treatment",
+                L_max=3,
+            )
+        assert res.path_sup_t_bands is None
+
+    def test_path_sup_t_bands_keys_match_path_effects_with_finite_crit(self):
+        """For each path with >=2 horizons that have finite bootstrap
+        SE > 0, the path appears in ``path_sup_t_bands`` with a finite
+        ``crit_value``. Paths with <2 valid horizons are absent."""
+        data = _by_path_three_path_data()
+        _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200)
+        assert res.path_sup_t_bands is not None
+        # For each path: count finite bootstrap SEs across its horizons.
+        # If >=2 are finite, the path should be in path_sup_t_bands with
+        # a finite crit; otherwise it should be absent.
+        for path, entry in res.path_effects.items():
+            n_valid = sum(
+                1
+                for h in entry["horizons"].values()
+                if np.isfinite(h["se"]) and h["se"] > 0
+            )
+            if n_valid >= 2:
+                # Must be present (assuming gate also passes); if it's
+                # absent, that's the 50%-finite gate failing — log but
+                # don't hard-fail since the gate is a methodology
+                # safety net.
+                if path in res.path_sup_t_bands:
+                    crit = res.path_sup_t_bands[path]["crit_value"]
+                    assert np.isfinite(crit), (
+                        f"path={path}: present in path_sup_t_bands but "
+                        f"crit_value is non-finite: {crit}"
+                    )
+            else:
+                assert path not in res.path_sup_t_bands, (
+                    f"path={path} has only {n_valid} valid horizons; "
+                    f"should be absent from path_sup_t_bands per the "
+                    f">=2 horizons gate"
+                )
+
+    def test_path_sup_t_band_wider_than_pointwise(self):
+        """Per-path joint band must be at least as wide as the marginal
+        CI for every (path, horizon) where both are populated. Mirrors
+        the OVERALL invariant `test_cband_wider_than_pointwise` at
+        `:2235`.
+        """
+        data = _by_path_three_path_data()
+        _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=400)
+        assert res.path_sup_t_bands, "Need at least one path with a finite crit"
+        any_band_checked = False
+        for path, entry in res.path_effects.items():
+            if path not in res.path_sup_t_bands:
+                continue
+            for l_h, h in entry["horizons"].items():
+                cband = h.get("cband_conf_int")
+                if cband is None:
+                    continue
+                pw_ci = h["conf_int"]
+                if not (np.isfinite(pw_ci[0]) and np.isfinite(pw_ci[1])):
+                    continue
+                # Joint band must be at least as wide as marginal.
+                # Tolerance accounts for percentile MC noise.
+                assert cband[0] <= pw_ci[0] + 1e-10, (
+                    f"path={path} l={l_h}: cband_lower {cband[0]} > "
+                    f"conf_int_lower {pw_ci[0]} - violates joint >= marginal"
+                )
+                assert cband[1] >= pw_ci[1] - 1e-10, (
+                    f"path={path} l={l_h}: cband_upper {cband[1]} < "
+                    f"conf_int_upper {pw_ci[1]} - violates joint >= marginal"
+                )
+                any_band_checked = True
+        assert any_band_checked, "Expected at least one path/horizon with a populated cband"
+
+    def test_path_sup_t_crit_finite_and_positive(self):
+        """For every path with a populated entry, ``crit_value`` is
+        finite and strictly positive. The wider-than-pointwise
+        invariant (above) is the stronger statement; this test pins
+        the per-path entry's basic shape (alpha / n_bootstrap / method
+        / n_valid_horizons round-trip)."""
+        data = _by_path_three_path_data()
+        _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200)
+        assert res.path_sup_t_bands
+        for path, entry in res.path_sup_t_bands.items():
+            crit = entry["crit_value"]
+            assert np.isfinite(crit), f"path={path}: crit_value not finite ({crit})"
+            assert crit > 0, f"path={path}: crit_value not positive ({crit})"
+            assert entry["alpha"] == 0.05
+            assert entry["n_bootstrap"] == 200
+            assert entry["method"] == "multiplier_bootstrap"
+            assert entry["n_valid_horizons"] >= 2
+
+    def test_path_sup_t_seed_reproducibility(self):
+        """Same seed -> bit-identical ``crit_value`` for every path."""
+        data = _by_path_three_path_data()
+        _est_a, res_a = self._fit_with_bootstrap(
+            data, by_path=3, L_max=3, n_bootstrap=200, seed=42
+        )
+        _est_b, res_b = self._fit_with_bootstrap(
+            data, by_path=3, L_max=3, n_bootstrap=200, seed=42
+        )
+        assert res_a.path_sup_t_bands is not None
+        assert res_b.path_sup_t_bands is not None
+        assert set(res_a.path_sup_t_bands.keys()) == set(res_b.path_sup_t_bands.keys())
+        for path in res_a.path_sup_t_bands:
+            crit_a = res_a.path_sup_t_bands[path]["crit_value"]
+            crit_b = res_b.path_sup_t_bands[path]["crit_value"]
+            assert crit_a == crit_b, (
+                f"path={path}: seed-pinned crits diverge: {crit_a} vs {crit_b}"
+            )
+
+    def test_path_sup_t_skipped_when_path_has_only_one_valid_horizon(self):
+        """A path with only 1 valid horizon (degenerate cohort at later
+        horizons) is absent from ``path_sup_t_bands`` per the >=2 gate.
+
+        Uses the standard fixture and walks the result to find any
+        path with <2 finite bootstrap SE horizons, asserting it's
+        absent from path_sup_t_bands.
+        """
+        data = _by_path_three_path_data()
+        _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200)
+        assert res.path_sup_t_bands is not None
+        single_horizon_paths = [
+            path
+            for path, entry in res.path_effects.items()
+            if sum(
+                1
+                for h in entry["horizons"].values()
+                if np.isfinite(h["se"]) and h["se"] > 0
+            )
+            < 2
+        ]
+        for path in single_horizon_paths:
+            assert path not in res.path_sup_t_bands, (
+                f"path={path} has <2 valid horizons; should be absent "
+                f"from path_sup_t_bands"
+            )
+            # And no horizon should have cband_conf_int populated.
+            for l_h, h in res.path_effects[path]["horizons"].items():
+                assert "cband_conf_int" not in h, (
+                    f"path={path} l={l_h}: cband_conf_int written despite "
+                    f"path being absent from path_sup_t_bands"
+                )
+
+    def test_path_sup_t_skipped_at_L_max_1(self):
+        """At ``L_max=1`` every path has at most 1 valid horizon; the
+        >=2 horizons gate rejects every path so ``path_sup_t_bands ==
+        {}``. Replaces the H=1 normal-reduction test: at L_max=1 the
+        joint surface is correctly absent rather than collapsing to a
+        normal quantile."""
+        data = _by_path_three_path_data()
+        _est, res = self._fit_with_bootstrap(data, by_path=2, L_max=1, n_bootstrap=200)
+        # Bootstrap ran with by_path so dict is initialized; gate
+        # rejected every path so dict is empty.
+        assert res.path_sup_t_bands == {}, (
+            f"Expected path_sup_t_bands == {{}} at L_max=1 (no path has "
+            f">=2 horizons); got {res.path_sup_t_bands}"
+        )
+        # No horizon should have cband_conf_int.
+        for path, entry in res.path_effects.items():
+            for l_h, h in entry["horizons"].items():
+                assert "cband_conf_int" not in h, (
+                    f"path={path} l={l_h}: cband_conf_int written at "
+                    f"L_max=1 despite path_sup_t_bands == {{}}"
+                )
+
+    def test_path_sup_t_n_valid_horizons_matches(self):
+        """``n_valid_horizons`` field equals the count of finite-SE
+        horizons under each path."""
+        data = _by_path_three_path_data()
+        _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200)
+        assert res.path_sup_t_bands
+        br = res.bootstrap_results
+        assert br is not None and br.path_ses is not None
+        for path, entry in res.path_sup_t_bands.items():
+            n_claimed = entry["n_valid_horizons"]
+            n_actual = sum(
+                1
+                for l_h, bs_se in br.path_ses.get(path, {}).items()
+                if np.isfinite(bs_se) and bs_se > 0
+            )
+            assert n_claimed == n_actual, (
+                f"path={path}: n_valid_horizons claimed {n_claimed} but "
+                f"counted {n_actual} finite bootstrap SE horizons"
+            )
+
+    def test_path_sup_t_absent_path_has_no_cband_keys(self):
+        """Library-wide NaN-on-invalid contract: when a path is absent
+        from ``path_sup_t_bands`` (gate failure at >=2 horizons OR
+        <=50% finite sup-t draws — i.e., strict-majority gate fails),
+        no horizon under that path receives a ``cband_conf_int`` key.
+        Mirrors OVERALL absent-key pattern at
+        ``chaisemartin_dhaultfoeuille.py:2865-2875``.
+
+        Uses ``L_max=1`` to deterministically force ``path_sup_t_bands
+        == {}`` (every path has only 1 horizon, so the >=2 gate fails
+        for all paths) and verifies no horizon writes a cband.
+        """
+        data = _by_path_three_path_data()
+        _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=1, n_bootstrap=200)
+        assert res.path_sup_t_bands == {}
+        for path, entry in res.path_effects.items():
+            for l_h, h in entry["horizons"].items():
+                assert "cband_conf_int" not in h, (
+                    f"path={path} l={l_h}: cband_conf_int present despite "
+                    f"path being absent from path_sup_t_bands "
+                    f"(violates NaN-on-invalid absent-key contract)"
+                )
+
+    def test_path_sup_t_band_renders_in_summary(self):
+        """``summary()`` text includes 'Sup-t critical value:' once per
+        path with a finite crit (mirroring the OVERALL crit print)."""
+        data = _by_path_three_path_data()
+        _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200)
+        assert res.path_sup_t_bands
+        s = res.summary()
+        n_finite_paths = sum(
+            1
+            for entry in res.path_sup_t_bands.values()
+            if np.isfinite(entry.get("crit_value", np.nan))
+        )
+        # The OVERALL surface also prints "Sup-t critical value:" once;
+        # so the per-path block contributes n_finite_paths additional
+        # occurrences.
+        n_occurrences = s.count("Sup-t critical value:")
+        # >= because OVERALL may or may not print depending on its own
+        # finite-horizon count; the per-path block should add at least
+        # n_finite_paths occurrences.
+        assert n_occurrences >= n_finite_paths, (
+            f"Expected at least {n_finite_paths} 'Sup-t critical value:' "
+            f"strings in summary (one per path with finite crit), got "
+            f"{n_occurrences}"
+        )
+
+    def test_path_sup_t_bands_empty_dict_when_no_complete_window(self):
+        """When ``by_path + n_bootstrap > 0`` is requested but every
+        switcher's window falls outside the panel (so
+        ``path_effects == {}``), ``path_sup_t_bands`` must be ``{}``
+        (not ``None``). Mirrors the documented empty-state contract that
+        distinguishes "feature not requested" from "requested but
+        empty" (see ``test_empty_path_surface_when_no_complete_window``
+        for the analytical sibling at ``:4015+``).
+
+        This is the regression test for the requested-but-empty
+        sentinel on the new sup-t surface.
+        """
+        rng = np.random.default_rng(0)
+        rows = []
+        # Switchers switch at t=3 with L_max=3 -> window [2, 5] falls
+        # past the 4-period panel. Same construction as the analytical
+        # empty-window test at :4015+.
+        for g in (1, 2, 3, 4):
+            for t in range(4):
+                d = 1 if t >= 3 else 0
+                rows.append(
+                    {"group": g, "period": t, "treatment": d, "outcome": rng.normal()}
+                )
+        for g in (5, 6):
+            for t in range(4):
+                rows.append(
+                    {"group": g, "period": t, "treatment": 0, "outcome": rng.normal()}
+                )
+        data = pd.DataFrame(rows)
+
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            est = ChaisemartinDHaultfoeuille(
+                drop_larger_lower=False,
+                by_path=3,
+                n_bootstrap=200,
+                seed=42,
+                twfe_diagnostic=False,
+                placebo=False,
+            )
+            res = est.fit(
+                data,
+                outcome="outcome",
+                group="group",
+                time="period",
+                treatment="treatment",
+                L_max=3,
+            )
+
+        # Empty-state contract: requested but empty -> {} not None.
+        assert res.path_effects == {}, (
+            f"Expected path_effects == {{}} on no-complete-window panel; "
+            f"got {res.path_effects}"
+        )
+        assert res.path_sup_t_bands == {}, (
+            f"Expected path_sup_t_bands == {{}} (not None) when "
+            f"by_path + n_bootstrap is active but path_effects == {{}}; "
+            f"got {res.path_sup_t_bands}. This violates the documented "
+            f"None-vs-{{}} empty-state contract."
+        )
+        # Sanity: no path_effects entries means no horizons exist, but
+        # also nothing should write cband_conf_int into anything.
+        # (Iterating over empty dict is a no-op; this just pins the
+        # invariant explicitly.)
+        for path, entry in res.path_effects.items():  # pragma: no cover
+            for l_h, h in entry["horizons"].items():
+                assert "cband_conf_int" not in h
+
+    def test_path_sup_t_strict_majority_gate_at_exact_50pct(self, monkeypatch):
+        """The 50%-finite-draws gate is **strict majority**, not >=:
+        the implementation requires ``finite_mask.sum() > 0.5 *
+        n_bootstrap`` (mirrors OVERALL gate at
+        ``chaisemartin_dhaultfoeuille_bootstrap.py:612``). At exactly
+        50% finite draws the gate fails and the path is absent from
+        ``path_sup_t_bands``.
+
+        This forces the boundary by monkey-patching
+        ``_generate_psu_or_group_weights`` (used by both the OVERALL
+        and per-path sup-t blocks) to return overflow-magnitude
+        weights in exactly half the bootstrap draws — those rows
+        produce non-finite ``boot_dist`` -> non-finite t-stats ->
+        non-finite ``sup_t_dist`` entries. With ``n_bootstrap=4`` and
+        2 overflow rows, ``finite_mask.sum() == 2 == 0.5 * 4``, the
+        gate ``2 > 2.0`` is False, and the path is skipped.
+
+        Pins the prose contract documented in REGISTRY.md and the
+        result-class docstring: "strict majority (more than 50%) of
+        finite sup-t draws".
+        """
+        from diff_diff import chaisemartin_dhaultfoeuille_bootstrap as bs_mod
+
+        original_generator = bs_mod._generate_psu_or_group_weights
+
+        def fake_generator(
+            n_bootstrap, n_groups_target, weight_type, rng, group_to_psu_map
+        ):
+            # Call the original to get a sane base, then inject NaN into
+            # exactly half of the bootstrap rows. The NaN propagates
+            # through `weights @ u_centered` -> NaN deviations -> NaN
+            # boot_dist -> NaN t-stats -> NaN sup_t entries, so
+            # `finite_mask.sum() == n_bootstrap // 2` exactly.
+            base = original_generator(
+                n_bootstrap, n_groups_target, weight_type, rng, group_to_psu_map
+            )
+            n_poison = n_bootstrap // 2
+            base[:n_poison, :] = np.nan
+            return base
+
+        monkeypatch.setattr(bs_mod, "_generate_psu_or_group_weights", fake_generator)
+
+        data = _by_path_three_path_data()
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", (UserWarning, RuntimeWarning))
+            est = ChaisemartinDHaultfoeuille(
+                drop_larger_lower=False,
+                by_path=3,
+                n_bootstrap=4,
+                seed=42,
+                twfe_diagnostic=False,
+                placebo=False,
+            )
+            res = est.fit(
+                data,
+                outcome="outcome",
+                group="group",
+                time="period",
+                treatment="treatment",
+                L_max=3,
+            )
+
+        # At exactly 50% finite draws the strict-majority gate fails —
+        # no path passes, so the requested-but-empty surface is `{}`.
+        assert res.path_sup_t_bands == {}, (
+            f"Expected path_sup_t_bands == {{}} at exactly-50%-finite "
+            f"draws (strict-majority gate semantics); got "
+            f"{res.path_sup_t_bands}. This violates the documented "
+            f"`finite_mask.sum() > 0.5 * n_bootstrap` contract."
+        )
+        # And the OVERALL `sup_t_bands` is also None since the same
+        # patched generator drives the multi-horizon block (gate failure
+        # at exactly 50% finite draws there too).
+        assert res.sup_t_bands is None, (
+            f"Expected sup_t_bands is None at exactly-50%-finite draws "
+            f"on the OVERALL surface; got {res.sup_t_bands}"
+        )
+        # No horizon (per-path or overall) should have cband_conf_int.
+        for path, entry in res.path_effects.items():
+            for l_h, h in entry["horizons"].items():
+                assert "cband_conf_int" not in h, (
+                    f"path={path} l={l_h}: cband_conf_int written despite "
+                    f"strict-majority gate failure at exactly 50% finite"
+                )
+        for l_h, h in res.event_study_effects.items():
+            assert "cband_conf_int" not in h, (
+                f"l={l_h}: OVERALL cband_conf_int written despite "
+                f"strict-majority gate failure at exactly 50% finite"
+            )

From 2df79a00c36941d78007ea9ac7935a5129741425 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 25 Apr 2026 15:18:41 -0400
Subject: [PATCH 2/3] Self-audit: extend to_dataframe(level=by_path) with
 cband_lower/upper
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Cross-surface gap caught in self-audit: OVERALL `to_dataframe(level=
"event_study")` includes `cband_lower` / `cband_upper` columns
(`chaisemartin_dhaultfoeuille_results.py:1495-1496,1531-1532`) but the
per-path table at `level="by_path"` does not — even though per-path
now produces `cband_conf_int` writes via the new sup-t propagation
block. Cross-surface twin asymmetry the CI reviewer didn't flag;
caught by my own grep audit on `cband_conf_int` consumers.

Fix: extend `to_dataframe(level="by_path")` to emit the same two
columns. Populated for positive-horizon rows of paths with a finite
sup-t crit (read from `path_effects[path]["horizons"][l]
["cband_conf_int"]`); NaN for placebo rows (no joint band per the
positive-only sup-t spec), unbanded paths, and the requested-but-empty
fallback DataFrame (which now includes the columns in its canonical
schema).

Tests added:
- `test_path_sup_t_to_dataframe_emits_cband_columns` — column
  presence + per-row alignment with the dict surface
- `test_path_sup_t_to_dataframe_empty_path_fallback_has_cband_columns`
  — empty-path fallback DataFrame schema parity

Docs updated:
- REGISTRY.md: `to_dataframe(level="by_path")` integration note added
  to the new sup-t Note; canonical column list in the existing
  `Note (Phase 3 by_path ...)` block extended with `cband_lower /
  cband_upper`
- CHANGELOG entry: surface listing now mentions to_dataframe columns
- `by_path` parameter docstring: rendering surface listing extended
- `path_sup_t_bands` Attributes docstring: rendering surface listing
  extended

Suite: 263 tests pass (was 261, +2 new tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 CHANGELOG.md                                  |  2 +-
 diff_diff/chaisemartin_dhaultfoeuille.py      |  7 +-
 .../chaisemartin_dhaultfoeuille_results.py    | 21 +++++-
 docs/methodology/REGISTRY.md                  |  4 +-
 tests/test_chaisemartin_dhaultfoeuille.py     | 74 +++++++++++++++++++
 5 files changed, 102 insertions(+), 6 deletions(-)

diff --git a/CHANGELOG.md b/CHANGELOG.md
index a902c296..ff37dbc2 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -9,7 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Added
 - **HAD linearity-family pretests under survey (Phase 4.5 C).** `stute_test`, `yatchew_hr_test`, `stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test`, and `did_had_pretest_workflow` now accept `weights=` / `survey=` keyword-only kwargs. Stute family uses **PSU-level Mammen multiplier bootstrap** via `bootstrap_utils.generate_survey_multiplier_weights_batch` (the same kernel as PR #363's HAD event-study sup-t bootstrap): each replicate draws an `(n_bootstrap, n_psu)` Mammen multiplier matrix, broadcast to per-obs perturbation `eta_obs[g] = eta_psu[psu(g)]`, weighted OLS refit, weighted CvM via new `_cvm_statistic_weighted` helper. Joint Stute SHARES the multiplier matrix across horizons within each replicate, preserving both the vector-valued empirical-process unit-level dependence AND PSU clustering. Yatchew uses **closed-form weighted OLS + pweight-sandwich variance components** (no bootstrap): `sigma2_lin = sum(w·eps²)/sum(w)`, `sigma2_diff = sum(w_avg·diff²)/(2·sum(w))` with arithmetic-mean pair weights `w_avg_g = (w_g+w_{g-1})/2`, `sigma4_W = sum(w_avg·prod)/sum(w_avg)`, `T_hr = sqrt(sum(w))·(sigma2_lin-sigma2_diff)/sigma2_W`. All three Yatchew components reduce bit-exactly to the unweighted formulas at `w=ones(G)` (locked at `atol=1e-14` by direct helper test). The pweight `weights=` shortcut routes through a synthetic trivial `ResolvedSurveyDesign` (new `survey._make_trivial_resolved` helper) so the same kernel handles both entry paths. `did_had_pretest_workflow(..., survey=, weights=)` removes the Phase 4.5 C0 `NotImplementedError`, dispatches to the survey-aware sub-tests, **skips the QUG step with `UserWarning`** (per C0 deferral), sets `qug=None` on the report, and appends a `"linearity-conditional verdict; QUG-under-survey deferred per Phase 4.5 C0"` suffix to the verdict. `HADPretestReport.qug` retyped from `QUGTestResults` to `Optional[QUGTestResults]`; `summary()` / `to_dict()` / `to_dataframe()` updated to None-tolerant rendering. Replicate-weight survey designs (BRR/Fay/JK1/JKn/SDR) raise `NotImplementedError` at every entry point (defense in depth, reciprocal-guard discipline) — parallel follow-up after this PR. **Stratified designs (`SurveyDesign(strata=...)`) also raise `NotImplementedError` on the Stute family** — the within-stratum demean + `sqrt(n_h/(n_h-1))` correction that the HAD sup-t bootstrap applies to match the Binder-TSL stratified target has not been derived for the Stute CvM functional, so applying raw multipliers from `generate_survey_multiplier_weights_batch` directly to residual perturbations would leave the bootstrap p-value silently miscalibrated. Phase 4.5 C narrows survey support to **pweight-only**, **PSU-only** (`SurveyDesign(weights=, psu=)`), and **FPC-only** (`SurveyDesign(weights=, fpc=)`) designs; stratified is a follow-up after the matching Stute-CvM stratified-correction derivation lands. Strictly positive weights required on Yatchew (the adjacent-difference variance is undefined under contiguous-zero blocks). Per-row `weights=` / `survey=col` aggregated to per-unit via existing HAD helpers `_aggregate_unit_weights` / `_aggregate_unit_resolved_survey` (constant-within-unit invariant enforced). Unweighted code paths preserved bit-exactly. Patch-level addition (additive on stable surfaces). See `docs/methodology/REGISTRY.md` § "QUG Null Test" — Note (Phase 4.5 C) for the full methodology.
-- **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`) and as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`. Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract.
+- **`ChaisemartinDHaultfoeuille.by_path` + `n_bootstrap > 0` joint sup-t bands** — per-path joint sup-t simultaneous confidence intervals across horizons `1..L_max` within each path. A single shared `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon. Surfaced on `results.path_sup_t_bands` (dict keyed by path tuple, each entry with `crit_value / alpha / n_bootstrap / method / n_valid_horizons`); as `cband_conf_int` per horizon entry on `path_effects[path]["horizons"][l]`; and as `cband_lower` / `cband_upper` columns on `results.to_dataframe(level="by_path")` (mirrors the OVERALL `level="event_study"` schema; positive-horizon rows of banded paths get populated values, placebo / unbanded / empty-window rows get NaN). Gates: a path needs `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band. Empty-state contract: `path_sup_t_bands is None` when not requested; `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL `event_study_sup_t_bands`:** the per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — asymptotically equivalent to OVERALL's self-consistent reuse but NOT bit-identical. Documented intentional choice to preserve RNG-state isolation for existing per-path SE seed-reproducibility tests. Inherits the cross-path cohort-sharing SE deviation from R documented for `path_effects`. **Deviation from R:** `did_multiplegt_dyn` does not provide joint / sup-t bands at any surface — this is a Python-only methodology extension consistent with the existing OVERALL sup-t bands (also Python-only). Bands cover joint inference WITHIN a single path across horizons; they do NOT provide simultaneous coverage across paths. Pre-audit fix bundled: stale "Phase 2 placeholder" docstring on the existing `sup_t_bands` field updated to the actual contract description. Tests at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands` (`@pytest.mark.slow`). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path joint sup-t bands)` for the full contract.
 - **`ChaisemartinDHaultfoeuille.by_path` + `placebo=True`** — per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max`. The same per-path SE convention used for the event-study (joiners/leavers IF precedent: switcher-side contributions zeroed for non-path groups; cohort structure and control pool unchanged; plug-in SE with path-specific divisor `N^{pl}_{l, path}`) is applied to backward horizons via the new `switcher_subset_mask` parameter on `_compute_per_group_if_placebo_horizon`. Surfaced on `results.path_placebo_event_study[path][-l]` (negative-int inner keys mirroring `placebo_event_study`); `summary()` renders the rows alongside per-path event-study horizons; `to_dataframe(level="by_path")` emits negative-horizon rows alongside the existing positive-horizon rows. **Bootstrap** (when `n_bootstrap > 0`) propagates per-`(path, lag)` percentile CI / p-value through the same `_bootstrap_one_target` dispatch as the per-path event-study, with the canonical NaN-on-invalid contract enforced on the new surface (PR #364 library-wide invariant). **SE inherits the cross-path cohort-sharing deviation from R** documented for `path_effects` (full-panel cohort-centered plug-in vs R's per-path re-run): tracks R within tolerance on single-path-cohort panels, diverges materially on cohort-mixed panels — the bootstrap SE is a Monte Carlo analog of the analytical SE and inherits the same deviation. R-parity confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the new `multi_path_reversible_by_path_placebo` scenario (point estimates exact match; SE within Phase-2 envelope rtol ≤ 5%); positive analytical + bootstrap invariants at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (and the gated `::TestBootstrap` subclass). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path ...)` → "Per-path placebos" for the full contract.
 - **Tutorial 19: dCDH for Marketing Pulse Campaigns** (`docs/tutorials/19_dcdh_marketing_pulse.ipynb`) — end-to-end practitioner walkthrough on a 60-market reversible-treatment panel covering the TWFE decomposition diagnostic (`twowayfeweights`), `DCDH` Phase 1 (DID_M, joiners-vs-leavers, single-lag placebo), the `L_max` multi-horizon event study with multiplier bootstrap, a stakeholder communication template, and drift guards. README listing for Tutorial 17 (Brand Awareness Survey) backfilled in the same edit. Cross-link from `docs/practitioner_decision_tree.rst` § "Reversible Treatment" added.
 
diff --git a/diff_diff/chaisemartin_dhaultfoeuille.py b/diff_diff/chaisemartin_dhaultfoeuille.py
index f626199b..2d11e252 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille.py
@@ -437,9 +437,12 @@ class ChaisemartinDHaultfoeuille(ChaisemartinDHaultfoeuilleBootstrapMixin):
         ``c_p`` (constructed from a fresh shared-weights multiplier-
         bootstrap draw per path) is surfaced at top level as
         ``results.path_sup_t_bands[path] = {"crit_value", "alpha",
-        "n_bootstrap", "method", "n_valid_horizons"}`` and applied
+        "n_bootstrap", "method", "n_valid_horizons"}``, applied
         per-horizon as ``cband_conf_int`` on
-        ``path_effects[path]["horizons"][l]``. Bands cover joint
+        ``path_effects[path]["horizons"][l]``, and rendered as
+        ``cband_lower`` / ``cband_upper`` columns on
+        ``results.to_dataframe(level="by_path")`` (mirroring the
+        OVERALL ``level="event_study"`` schema). Bands cover joint
         inference WITHIN a single path across horizons; they do NOT
         provide simultaneous coverage across paths. Python-only
         library extension; R ``did_multiplegt_dyn`` provides no joint
diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py
index c464abc8..e93db4bd 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille_results.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille_results.py
@@ -425,7 +425,9 @@ class ChaisemartinDHaultfoeuilleResults:
         "method": str, "n_valid_horizons": int}``. Populated when
         ``by_path`` is a positive int AND ``n_bootstrap > 0``. The
         band itself is applied per-horizon as ``cband_conf_int`` on
-        ``path_effects[path]["horizons"][l]``. Empty-state contract:
+        ``path_effects[path]["horizons"][l]`` and rendered as
+        ``cband_lower`` / ``cband_upper`` columns on
+        ``to_dataframe(level="by_path")``. Empty-state contract:
         ``None`` when not requested (no bootstrap or ``by_path is None``);
         ``{}`` when requested but no path passed both gates (``>=2``
         valid horizons with finite bootstrap SE ``> 0`` AND a strict
@@ -1632,6 +1634,8 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
                         "conf_int_lower",
                         "conf_int_upper",
                         "n_obs",
+                        "cband_lower",
+                        "cband_upper",
                     ]
                 )
             rows = []
@@ -1655,6 +1659,12 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
                 )
                 for lag_key in sorted(placebo_horizons.keys()):
                     ph_entry = placebo_horizons[lag_key]
+                    # Placebos do not get joint sup-t bands in this
+                    # release (only positive event-study horizons do —
+                    # mirrors OVERALL placebo / event-study sup-t
+                    # convention). Emit NaN cband columns for schema
+                    # parity with the OVERALL level="event_study" table.
+                    ph_cband = ph_entry.get("cband_conf_int", (np.nan, np.nan))
                     rows.append(
                         {
                             "path": path,
@@ -1668,10 +1678,17 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
                             "conf_int_lower": ph_entry["conf_int"][0],
                             "conf_int_upper": ph_entry["conf_int"][1],
                             "n_obs": ph_entry["n_obs"],
+                            "cband_lower": ph_cband[0] if ph_cband else np.nan,
+                            "cband_upper": ph_cband[1] if ph_cband else np.nan,
                         }
                     )
                 for l_h in sorted(horizons.keys()):
                     h_entry = horizons[l_h]
+                    # Per-path joint sup-t band (when populated) mirrors
+                    # OVERALL `level="event_study"` cband emission. Absent
+                    # key / missing path entry -> NaN columns. Pinned at
+                    # `TestByPathSupTBands::test_path_sup_t_to_dataframe_emits_cband_columns`.
+                    h_cband = h_entry.get("cband_conf_int", (np.nan, np.nan))
                     rows.append(
                         {
                             "path": path,
@@ -1685,6 +1702,8 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
                             "conf_int_lower": h_entry["conf_int"][0],
                             "conf_int_upper": h_entry["conf_int"][1],
                             "n_obs": h_entry["n_obs"],
+                            "cband_lower": h_cband[0] if h_cband else np.nan,
+                            "cband_upper": h_cband[1] if h_cband else np.nan,
                         }
                     )
             return pd.DataFrame(rows)
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
index c50dac35..1f7bc9d3 100644
--- a/docs/methodology/REGISTRY.md
+++ b/docs/methodology/REGISTRY.md
@@ -638,9 +638,9 @@ The guard is fired by `_survey_se_from_group_if` (analytical and replicate) and
 
 - **Note (Phase 3 Design-2 switch-in/switch-out):** Convenience wrapper for Web Appendix Section 1.6 (Assumption 16). Identifies groups with exactly 2 treatment changes (join then leave), reports switch-in and switch-out mean effects. This is a descriptive summary, not a full re-estimation with specialized control pools as described in the paper. **Always uses raw (unadjusted) outcomes** regardless of active `controls`, `trends_linear`, or `trends_nonparam` options - those adjustments apply to the main estimator surface but not to the Design-2 descriptive block. For full adjusted Design-2 estimation with proper control pools, the paper recommends "running the command on a restricted subsample and using `trends_nonparam` for the entry-timing grouping." Activated via `design2=True` in `fit()`, requires `drop_larger_lower=False` to retain 2-switch groups.
 
-- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs]`. Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass).
+- **Note (Phase 3 `by_path` per-path event-study disaggregation):** Per-path disaggregation of the multi-horizon event study, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Activated via `ChaisemartinDHaultfoeuille(by_path=k, drop_larger_lower=False)` where `k` is a positive integer (top-k most common observed paths by switcher-group frequency). **Window convention:** the path tuple for a switcher group `g` is `(D_{g, F_g-1}, D_{g, F_g}, ..., D_{g, F_g-1+L_max})` — length `L_max + 1`, matching R's window `[F_{g-1}, F_{g-1+l}]`. **Ranking:** paths are ranked by descending frequency; ties are broken lexicographically on the path tuple for deterministic ordering, so every selected path has a unique `frequency_rank`. If `by_path` exceeds the number of observed paths, all observed paths are returned with a `UserWarning`. **Per-path SE convention (joiners/leavers precedent):** the per-path influence function follows the joiners-only / leavers-only IF construction at `chaisemartin_dhaultfoeuille.py:5495-5504`: the switcher-side contribution `+S_g * (Y_{g,out} - Y_{g,ref})` is zeroed for groups whose observed trajectory is NOT the selected path; control contributions and the full cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. After applying the singleton-baseline eligible mask and cohort-recentering with the original cohort IDs, the plug-in SE uses the path-specific divisor `N_l_path` (count of path switchers eligible at horizon `l`) — same pattern as `joiners_se` using `joiner_total`. This gives the **within-path mean** estimand `DID_{path,l}` as the within-path average of `DID_{g,l}`. **Degenerate-cohort behavior per path:** when a path's centered IF at some horizon is identically zero (every variance-eligible path switcher forms its own `(D_{g,1}, F_g, S_g)` cohort, or the path has a single contributing group), SE / t_stat / p_value / conf_int are NaN-consistent and a `UserWarning` is emitted scoped to `(path, horizon)`. This mirrors the overall-path degenerate-cohort surface and is common for rare paths with few contributing groups. **Empty-state contract:** `results.path_effects` distinguishes "not requested" (`None`) from "requested but empty" (`{}` — all switchers have windows outside the panel or unobserved cells). The empty-dict case emits a `UserWarning` at fit-time and renders as an explicit "no observed paths" notice in `summary()`; `to_dataframe(level="by_path")` returns an empty DataFrame with the canonical column set (mirrors the `linear_trends` pattern when `trends_linear=True` but no horizons survive). **Requirements:** `drop_larger_lower=False` (multi-switch groups are the object of interest; default `True` filters them out) and `L_max >= 1` (path window depends on the horizon). **Scope:** binary treatment only; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, and `survey_design` remain gated behind explicit `NotImplementedError` (deferred to follow-up wave PRs). `n_bootstrap > 0` is now supported — see the **Bootstrap SE** paragraph below. `placebo=True` is now supported per-path — see the **Per-path placebos** paragraph below. **TWFE diagnostic** remains a sample-level summary (not computed per path) in this release. Results are exposed on `results.path_effects` as `Dict[Tuple[int, ...], Dict[str, Any]]` with nested `horizons` dicts per horizon `l`, and on `results.to_dataframe(level="by_path")` as a long-format table with columns `[path, frequency_rank, n_groups, horizon, effect, se, t_stat, p_value, conf_int_lower, conf_int_upper, n_obs, cband_lower, cband_upper]` (the last two are added by the joint sup-t Note below; populated for positive-horizon rows of paths with a finite sup-t crit, NaN otherwise). Gated tests live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathGates` / `::TestByPathBehavior` / `::TestByPathEdgeCases`. **R-parity** against `DIDmultiplegtDYN 2.3.3` is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath` via two scenarios: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 paths, `by_path=3`; path-assignment deterministic on `F_g` so each `(D_{g,1}, F_g, S_g)` cohort contains switchers from a single path). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path mixed scenario, ≤ 4.2% on the 4-path cohort-clean scenario). **Deviation from R (cross-path cohort-sharing SE):** our analytical SE is the marginal variance of the path-contribution estimator cohort-centered on the *full-panel* cohort structure (joiners/leavers precedent — non-path switchers contribute to cohort means via their zeroed switcher row). R's `did_multiplegt_dyn(..., by_path=k)` re-runs the estimator per path, so cohort means are computed over the path's own switchers only. When a cohort `(D_{g,1}, F_g, S_g)` spans multiple observed paths, Python and R SE diverge materially (our empirical probes with random post-window toggling saw rtol > 100%); when every cohort is single-path (scenario 13 by design, scenario 14 by construction), the two approaches coincide up to the documented Phase 2 envelope. Practitioners with cohort structures that mix paths should interpret the per-path SE as a within-full-panel marginal variance, not a per-path conditional variance. **Bootstrap SE:** when `n_bootstrap > 0` is set, the top-k paths are enumerated once on the observed data (R-faithful: matches `did_multiplegt_dyn(..., by_path=k, bootstrap=B)`'s path-stability convention — verified empirically against DIDmultiplegtDYN 2.3.3) and the multiplier bootstrap (`bootstrap_weights ∈ {"rademacher", "mammen", "webb"}`) runs per `(path, horizon)` target via the shared `_bootstrap_one_target` / `compute_effect_bootstrap_stats` helpers. Point estimates are unchanged from the analytical path. Bootstrap SE replaces the analytical SE in `path_effects[path]["horizons"][l]["se"]`, and `p_value` / `conf_int` are taken as the **bootstrap percentile** statistics, matching the Round-10 library convention for overall / joiners / leavers / multi-horizon bootstrap (see the `Note (bootstrap inference surface)` elsewhere in this file and the pinned regression `test_bootstrap_p_value_and_ci_propagated_to_top_level`). `t_stat` is SE-derived via `safe_inference` per the anti-pattern rule. Interpretation: inference is *conditional on the observed path set*. **SE inherits the analytical cross-path cohort-sharing deviation:** the bootstrap input is the exact same full-panel cohort-centered path IF that the analytical path computes (`_collect_path_bootstrap_inputs` reuses the same enumeration / cohort IDs / IF construction), so the bootstrap SE is a Monte Carlo analog of the analytical SE — it inherits the same cross-path cohort-sharing deviation from R's per-path re-run convention documented above. On single-path-cohort panels (scenarios 13 and 14 of the R-parity fixture, and any DGP where `(D_{g,1}, F_g, S_g)` cohorts never span multiple observed paths), bootstrap SE tracks analytical SE up to Monte Carlo noise and both coincide with R up to the Phase 2 envelope. On cross-path cohort panels, bootstrap SE inherits the >100% rtol divergence from R that analytical already has. **Deviation from R (CI method):** R's per-path CI is normal-theory around the bootstrap SE (half-width ≈ `1.96·se`); ours is the bootstrap percentile CI, intentionally diverging from R to keep the dCDH inference surface internally consistent across all bootstrap targets. Practitioners who want *unconditional* inference capturing path-selection uncertainty need a pairs-bootstrap (deferred — no R precedent). Positive regressions live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathBootstrap` (gated `@pytest.mark.slow`): point-estimate invariance, finite positive SE on non-degenerate panels, SE-within-30%-rtol of analytical on cohort-clean fixtures, degenerate-cohort NaN propagation, Rademacher/Mammen/Webb parity, seed reproducibility, and percentile-vs-normal-theory CI pinning. **Per-path placebos:** when `placebo=True` (and `L_max >= 1`) is combined with `by_path=k`, per-path backward-horizon placebos `DID^{pl}_{path, l}` for `l = 1..L_max` are computed using the same joiners/leavers IF precedent applied to `_compute_per_group_if_placebo_horizon` (with the new `switcher_subset_mask` parameter): switcher contributions are zeroed for groups not in the path; the control pool and the variance-eligible cohort structure `(D_{g,1}, F_g, S_g)` are unchanged. Plug-in SE uses the path-specific divisor `N^{pl}_{l, path}` (count of path switchers eligible at backward lag `l`). Surfaced on `results.path_placebo_event_study[path][-l]` with the same `{effect, se, t_stat, p_value, conf_int, n_obs}` shape as `placebo_event_study` (negative-int inner keys parallel the existing per-path event-study positive-int keys, so a unified forward+backward view is well-formed). **Inherits the cross-path cohort-sharing SE deviation from R** documented above for `path_effects` (same convention applied backward); tracks R within numerical tolerance on single-path-cohort panels and diverges on cohort-mixed panels. Multiplier bootstrap (when `n_bootstrap > 0`) runs per `(path, lag)` target via the same `_bootstrap_one_target` dispatch used for the per-path event-study, with the canonical NaN-on-invalid contract. The bootstrap SE is a Monte Carlo analog of the analytical placebo SE — same per-path centered IF input — and inherits the same deviation. Surfaced through `summary()` (negative-keyed rows rendered alongside positive-keyed event-study rows under each path block) and `to_dataframe(level="by_path")` (`horizon` column takes negative ints for placebo rows). **Empty-state contract:** `results.path_placebo_event_study` mirrors `path_effects` — `None` when `by_path + placebo` was not requested, `{}` when requested but no observed path has a complete window within the panel (same regime that returns `{}` for `path_effects`, with the same fit-time `UserWarning`). R-parity is confirmed at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathPlacebo` on the `multi_path_reversible_by_path_placebo` scenario; positive analytical + bootstrap invariants live in `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathPlacebo` (with the gated `::TestByPathPlacebo::TestBootstrap` subclass).
 
-- **Note (Phase 3 `by_path` per-path joint sup-t bands):** When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are computed across horizons `1..L_max` within each path. **Methodology:** a single `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons within the path. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is then used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon, surfaced in `path_effects[path]["horizons"][l]["cband_conf_int"]` and at top-level `results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}`. **Gates:** a path must have `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band; otherwise the path is absent from `path_sup_t_bands`. Both gates mirror the OVERALL `event_study_sup_t_bands` semantics at `chaisemartin_dhaultfoeuille_bootstrap.py:605,612`: `len(valid_horizons) >= 2` AND `finite_mask.sum() > 0.5 * n_bootstrap`. Exactly half-finite draws are NOT enough — the gate is strictly greater than half. **Empty-state contract:** `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates. **Methodology asymmetry vs OVERALL:** OVERALL sup-t reuses the same multi-horizon shared-draw distribution for both the SE in the t-stat denominator and the bootstrap distribution in the numerator. The per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — numerator: fresh shared draws, denominator: bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse, but NOT bit-identical. The fresh draw is intentional: it preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable post-implementation. **Inherited deviation from R:** the bootstrap SE used as the t-stat denominator carries the cross-path cohort-sharing SE deviation from R documented for `path_effects` above; the per-path sup-t crit therefore inherits the same deviation. **Interpretation:** the band covers joint inference *within a single path across horizons*; it does NOT provide simultaneous coverage *across paths* (a different inference target requiring a `path × horizon` re-derivation, deferred to a future wave). **Deviation from R:** `did_multiplegt_dyn` provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension, consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Regression test anchor: `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands`.
+- **Note (Phase 3 `by_path` per-path joint sup-t bands):** When `n_bootstrap > 0` is set with `by_path=k`, per-path joint sup-t simultaneous confidence bands are computed across horizons `1..L_max` within each path. **Methodology:** a single `(n_bootstrap, n_eligible)` multiplier weight matrix (using the estimator's configured `bootstrap_weights` — Rademacher / Mammen / Webb) is drawn per path and broadcast across all horizons of that path, producing correlated bootstrap distributions across horizons within the path. The path-specific critical value `c_p = quantile(max_l |t_l|, 1 - α)` is then used to construct symmetric joint bands `effect_l ± c_p · se_l` per horizon, surfaced in `path_effects[path]["horizons"][l]["cband_conf_int"]` and at top-level `results.path_sup_t_bands[path] = {"crit_value", "alpha", "n_bootstrap", "method", "n_valid_horizons"}`. **Gates:** a path must have `>= 2` valid horizons (finite bootstrap SE > 0) AND a strict majority (more than 50%) of finite sup-t draws to receive a band; otherwise the path is absent from `path_sup_t_bands`. Both gates mirror the OVERALL `event_study_sup_t_bands` semantics at `chaisemartin_dhaultfoeuille_bootstrap.py:605,612`: `len(valid_horizons) >= 2` AND `finite_mask.sum() > 0.5 * n_bootstrap`. Exactly half-finite draws are NOT enough — the gate is strictly greater than half. **Empty-state contract:** `path_sup_t_bands is None` when not requested (no bootstrap or `by_path is None`); `{}` when requested but no path passes both gates. **`to_dataframe(level="by_path")` integration:** the table now includes `cband_lower` / `cband_upper` columns for parity with OVERALL `level="event_study"`; populated for positive-horizon rows of paths with a finite sup-t crit, NaN for placebo rows / unbanded paths / the requested-but-empty fallback DataFrame. **Methodology asymmetry vs OVERALL:** OVERALL sup-t reuses the same multi-horizon shared-draw distribution for both the SE in the t-stat denominator and the bootstrap distribution in the numerator. The per-path sup-t draws a fresh shared weight matrix per path AFTER the per-path SE bootstrap block has already populated `results.path_ses` via independent per-(path, horizon) draws — numerator: fresh shared draws, denominator: bootstrap SEs from the earlier independent draws. Asymptotically equivalent to OVERALL's self-consistent reuse, but NOT bit-identical. The fresh draw is intentional: it preserves RNG-state isolation and keeps every existing per-path SE seed-reproducibility test bit-stable post-implementation. **Inherited deviation from R:** the bootstrap SE used as the t-stat denominator carries the cross-path cohort-sharing SE deviation from R documented for `path_effects` above; the per-path sup-t crit therefore inherits the same deviation. **Interpretation:** the band covers joint inference *within a single path across horizons*; it does NOT provide simultaneous coverage *across paths* (a different inference target requiring a `path × horizon` re-derivation, deferred to a future wave). **Deviation from R:** `did_multiplegt_dyn` provides no joint / sup-t / simultaneous bands at any surface — this is a Python-only methodology extension, consistent with the existing OVERALL `event_study_sup_t_bands` (also Python-only). Regression test anchor: `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathSupTBands`.
 
 **Reference implementation(s):**
 - R: [`DIDmultiplegtDYN`](https://cran.r-project.org/package=DIDmultiplegtDYN) (CRAN, maintained by the paper authors). The Python implementation matches `did_multiplegt_dyn(..., effects=1)` at horizon `l = 1`. Parity tests live in `tests/test_chaisemartin_dhaultfoeuille_parity.py`.
diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py
index 80e914c7..1d51b9da 100644
--- a/tests/test_chaisemartin_dhaultfoeuille.py
+++ b/tests/test_chaisemartin_dhaultfoeuille.py
@@ -5871,6 +5871,80 @@ def test_path_sup_t_bands_empty_dict_when_no_complete_window(self):
             for l_h, h in entry["horizons"].items():
                 assert "cband_conf_int" not in h
 
+    def test_path_sup_t_to_dataframe_emits_cband_columns(self):
+        """``to_dataframe(level="by_path")`` includes ``cband_lower`` /
+        ``cband_upper`` columns mirroring the OVERALL
+        ``level="event_study"`` table at ``:1495-1496,1531-1532``.
+
+        For positive-horizon rows of paths with a finite sup-t crit,
+        the columns equal the per-horizon ``cband_conf_int`` tuple. For
+        placebo rows (negative horizons) and rows of paths absent from
+        ``path_sup_t_bands``, the columns are NaN. The empty-window
+        fallback (``path_effects == {}``) also includes the columns in
+        its canonical schema."""
+        data = _by_path_three_path_data()
+        _est, res = self._fit_with_bootstrap(data, by_path=3, L_max=3, n_bootstrap=200)
+        df = res.to_dataframe(level="by_path")
+        assert "cband_lower" in df.columns
+        assert "cband_upper" in df.columns
+        # Per-row alignment with `path_effects[path]["horizons"][l]
+        # ["cband_conf_int"]`. Only positive horizons can have populated
+        # cband (placebos and unbanded paths get NaN).
+        for _, row in df.iterrows():
+            path = row["path"]
+            horizon = int(row["horizon"])
+            if horizon > 0 and path in res.path_sup_t_bands:
+                # Should match the horizon's cband_conf_int.
+                expected_cband = res.path_effects[path]["horizons"][horizon].get(
+                    "cband_conf_int"
+                )
+                if expected_cband is not None:
+                    np.testing.assert_allclose(row["cband_lower"], expected_cband[0])
+                    np.testing.assert_allclose(row["cband_upper"], expected_cband[1])
+            else:
+                assert np.isnan(row["cband_lower"]), (
+                    f"path={path} horizon={horizon}: cband_lower should be NaN "
+                    f"(placebo / unbanded path), got {row['cband_lower']}"
+                )
+                assert np.isnan(row["cband_upper"])
+
+    def test_path_sup_t_to_dataframe_empty_path_fallback_has_cband_columns(self):
+        """The ``path_effects == {}`` fallback DataFrame schema includes
+        the cband columns for parity with the populated-path schema."""
+        rng = np.random.default_rng(0)
+        rows = []
+        # Empty-window panel: switchers at t=3, L_max=3 -> window past panel.
+        for g in (1, 2, 3, 4):
+            for t in range(4):
+                d = 1 if t >= 3 else 0
+                rows.append(
+                    {"group": g, "period": t, "treatment": d, "outcome": rng.normal()}
+                )
+        for g in (5, 6):
+            for t in range(4):
+                rows.append(
+                    {"group": g, "period": t, "treatment": 0, "outcome": rng.normal()}
+                )
+        data = pd.DataFrame(rows)
+        with warnings.catch_warnings():
+            warnings.simplefilter("ignore", UserWarning)
+            est = ChaisemartinDHaultfoeuille(
+                drop_larger_lower=False, by_path=3, twfe_diagnostic=False, placebo=False
+            )
+            res = est.fit(
+                data,
+                outcome="outcome",
+                group="group",
+                time="period",
+                treatment="treatment",
+                L_max=3,
+            )
+        assert res.path_effects == {}
+        df = res.to_dataframe(level="by_path")
+        assert df.empty
+        assert "cband_lower" in df.columns
+        assert "cband_upper" in df.columns
+
     def test_path_sup_t_strict_majority_gate_at_exact_50pct(self, monkeypatch):
         """The 50%-finite-draws gate is **strict majority**, not >=:
         the implementation requires ``finite_mask.sum() > 0.5 *

From 1febbb190836089855af8f9ccfbe825f1adce840 Mon Sep 17 00:00:00 2001
From: igerber <isaac.gerber@gmail.com>
Date: Sat, 25 Apr 2026 15:26:04 -0400
Subject: [PATCH 3/3] Address PR #374 R5 P3: stale to_dataframe docstring +
 Mammen/Webb coverage

P3 #1: ``to_dataframe`` method docstring at
``chaisemartin_dhaultfoeuille_results.py:1375-1379`` listed the
pre-change ``level="by_path"`` schema (no ``cband_*`` columns) even
though the implementation now returns them. Updated the bullet to
include ``cband_lower / cband_upper``, document the negative-horizon
placebo convention, and document the NaN-on-absent-band behavior.

P3 #2: ``TestByPathSupTBands::test_path_sup_t_seed_reproducibility``
only exercised the default ``rademacher`` weight family. Parameterized
over ``["rademacher", "mammen", "webb"]`` to pin that the per-path
sup-t branch correctly threads ``self.bootstrap_weights`` through
``_generate_psu_or_group_weights`` for all three multiplier families
the feature advertises. The existing OVERALL machinery handles all
three uniformly, but the per-path surface lacked direct coverage.
Each variant must produce a finite, reproducible crit on the standard
3-path fixture.

17 tests pass on TestByPathSupTBands (was 15: +2 new parameterized
variants on the existing seed_reproducibility test).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 .../chaisemartin_dhaultfoeuille_results.py    | 13 +++++-
 tests/test_chaisemartin_dhaultfoeuille.py     | 41 ++++++++++++++++---
 2 files changed, 47 insertions(+), 7 deletions(-)

diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py
index e93db4bd..8f090a58 100644
--- a/diff_diff/chaisemartin_dhaultfoeuille_results.py
+++ b/diff_diff/chaisemartin_dhaultfoeuille_results.py
@@ -1373,10 +1373,19 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
             - ``"design2"``: Design-2 switch-in/switch-out descriptive
               summary. Available when ``design2=True``.
             - ``"by_path"``: one row per (path, horizon) when
-              ``by_path=k`` was passed to the estimator. Columns include
+              ``by_path=k`` was passed to the estimator. Columns:
               ``path``, ``frequency_rank``, ``n_groups``, ``horizon``,
               ``effect``, ``se``, ``t_stat``, ``p_value``,
-              ``conf_int_lower``, ``conf_int_upper``, ``n_obs``.
+              ``conf_int_lower``, ``conf_int_upper``, ``n_obs``,
+              ``cband_lower``, ``cband_upper``. The ``horizon`` column
+              takes negative ints for placebo rows when
+              ``placebo=True``. The ``cband_*`` columns mirror the
+              OVERALL ``level="event_study"`` schema (joint sup-t
+              simultaneous bands); they are populated for positive-
+              horizon rows of paths with a finite per-path sup-t crit
+              (``n_bootstrap > 0``) and NaN otherwise (placebo rows,
+              unbanded paths, or the requested-but-empty fallback
+              DataFrame).
 
         Returns
         -------
diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py
index 1d51b9da..ed80f83c 100644
--- a/tests/test_chaisemartin_dhaultfoeuille.py
+++ b/tests/test_chaisemartin_dhaultfoeuille.py
@@ -5661,23 +5661,54 @@ def test_path_sup_t_crit_finite_and_positive(self):
             assert entry["method"] == "multiplier_bootstrap"
             assert entry["n_valid_horizons"] >= 2
 
-    def test_path_sup_t_seed_reproducibility(self):
-        """Same seed -> bit-identical ``crit_value`` for every path."""
+    @pytest.mark.parametrize("bootstrap_weights", ["rademacher", "mammen", "webb"])
+    def test_path_sup_t_seed_reproducibility(self, bootstrap_weights):
+        """Same seed -> bit-identical ``crit_value`` for every path,
+        across all three multiplier-weight families. Pins that the
+        per-path sup-t branch correctly threads ``bootstrap_weights``
+        through ``_generate_psu_or_group_weights`` and that
+        Rademacher / Mammen / Webb each produce a finite, reproducible
+        crit (the helper handles all three uniformly under the
+        existing OVERALL sup-t machinery; this is a per-path direct
+        regression on that contract)."""
         data = _by_path_three_path_data()
         _est_a, res_a = self._fit_with_bootstrap(
-            data, by_path=3, L_max=3, n_bootstrap=200, seed=42
+            data,
+            by_path=3,
+            L_max=3,
+            n_bootstrap=200,
+            seed=42,
+            bootstrap_weights=bootstrap_weights,
         )
         _est_b, res_b = self._fit_with_bootstrap(
-            data, by_path=3, L_max=3, n_bootstrap=200, seed=42
+            data,
+            by_path=3,
+            L_max=3,
+            n_bootstrap=200,
+            seed=42,
+            bootstrap_weights=bootstrap_weights,
         )
         assert res_a.path_sup_t_bands is not None
         assert res_b.path_sup_t_bands is not None
         assert set(res_a.path_sup_t_bands.keys()) == set(res_b.path_sup_t_bands.keys())
+        # At least one path should produce a finite crit on this fixture
+        # (3 paths each with 3 valid horizons under all three weight
+        # families); pinning that the new dispatch path actually fires
+        # for `mammen` / `webb`, not just `rademacher`.
+        assert len(res_a.path_sup_t_bands) >= 1, (
+            f"bootstrap_weights={bootstrap_weights}: expected at least "
+            f"one path with a finite crit; got empty dict"
+        )
         for path in res_a.path_sup_t_bands:
             crit_a = res_a.path_sup_t_bands[path]["crit_value"]
             crit_b = res_b.path_sup_t_bands[path]["crit_value"]
+            assert np.isfinite(crit_a), (
+                f"bootstrap_weights={bootstrap_weights} path={path}: "
+                f"crit_value not finite ({crit_a})"
+            )
             assert crit_a == crit_b, (
-                f"path={path}: seed-pinned crits diverge: {crit_a} vs {crit_b}"
+                f"bootstrap_weights={bootstrap_weights} path={path}: "
+                f"seed-pinned crits diverge: {crit_a} vs {crit_b}"
             )
 
     def test_path_sup_t_skipped_when_path_has_only_one_valid_horizon(self):