igerber · igerber · Apr 24, 2026 · Apr 24, 2026 · Apr 24, 2026 · Apr 24, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
 - **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.
 - **`ChaisemartinDHaultfoeuille.by_path`** — per-path event-study disaggregation, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Passing `by_path=k` (positive int) to the estimator reports separate `DID_{path,l}` + SE + inference for the top-k most common observed treatment paths in the window `[F_g-1, F_g-1+L_max]`, answering the practitioner question "is a single pulse enough, or do you need sustained exposure?" across paths like `(0,1,0,0)` vs `(0,1,1,0)` vs `(0,1,1,1)`. The per-path SE follows the joiners-only / leavers-only IF precedent (switcher-side contribution zeroed for non-path groups; control pool and cohort structure unchanged; plug-in SE with path-specific divisor). Requires `drop_larger_lower=False` (multi-switch groups are the object of interest) and `L_max >= 1`. Binary treatment only in this release; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, `survey_design`, and `n_bootstrap > 0` raise `NotImplementedError` and are deferred to follow-up PRs. Results expose `results.path_effects: Dict[Tuple[int, ...], Dict[str, Any]]` and `results.to_dataframe(level="by_path")`; the summary grows a "Treatment-Path Disaggregation" block. Ties in path frequency are broken lexicographically on the path tuple for deterministic ranking. Overflow (`by_path > n_observed_paths`) returns all observed paths with a `UserWarning`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path event-study disaggregation)` for the full contract.
+- **R-parity for `ChaisemartinDHaultfoeuille.by_path`** against `DIDmultiplegtDYN 2.3.3`. Two new scenarios in `benchmarks/data/dcdh_dynr_golden_values.json` generated from `did_multiplegt_dyn(..., by_path=k)`: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 observed paths, `by_path=3`, via a new deterministic multi-path DGP pattern in the R generator). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path scenario, ≤ 4.2% on the 4-path scenario). Parity tests live at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath`, matching paths by tuple label via set-equality (robust to R's undocumented frequency-tie tiebreak) and cross-checking per-path switcher counts before SE comparison. **Deviation documented:** cross-path cohort sharing — our full-panel cohort-centered plug-in vs R's per-path re-run diverges materially when a `(D_{g,1}, F_g, S_g)` cohort spans multiple observed paths; the two coincide when every cohort is single-path. The parity scenarios are constructed to keep cohorts single-path (scenario 13 by design, scenario 14 via path-assignment-deterministic-on-F_g). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path...)` for the full write-up.
 - **`target_parameter` block in BR/DR schemas (experimental; schema version bumped to 2.0)** — `BUSINESS_REPORT_SCHEMA_VERSION` and `DIAGNOSTIC_REPORT_SCHEMA_VERSION` bumped from `"1.0"` to `"2.0"` because the new `"no_scalar_by_design"` value on the `headline.status` / `headline_metric.status` enum (dCDH `trends_linear=True, L_max>=2` configuration) is a breaking change per the REPORTING.md stability policy. BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.
 - SyntheticDiD coverage Monte Carlo calibration table added to `docs/methodology/REGISTRY.md` §SyntheticDiD — rejection rates at α ∈ {0.01, 0.05, 0.10} across `placebo` / `bootstrap` / `jackknife` on 3 representative DGPs (balanced / exchangeable, unbalanced, and Arkhangelsky et al. (2021) AER §6.3 non-exchangeable). Artifact at `benchmarks/data/sdid_coverage.json` (500 seeds × B=200), regenerable via `benchmarks/python/coverage_sdid.py`.
 

diff --git a/benchmarks/R/generate_dcdh_dynr_test_values.R b/benchmarks/R/generate_dcdh_dynr_test_values.R
@@ -29,6 +29,16 @@ library(DIDmultiplegtDYN)
 library(jsonlite)
 suppressMessages(library(polars))  # required by DIDmultiplegtDYN >= 2.x
 
+# Pin DIDmultiplegtDYN to an exact version because the `by_path` output
+# slots (res$by_levels, res$by_level_i) were introduced in v2.3.3 and
+# the structure is not version-stable per the R package's own docs. A
+# floor constraint (`>= 2.3.3`) could silently drift the fixture schema
+# when regenerated against a future release. Update this pin *and* re-
+# run TestDCDHDynRParityByPath when bumping to a newer known-compatible
+# release; extend to an explicit allowlist (e.g. `%in% c("2.3.3",
+# "2.3.4")`) once a second version is verified.
+stopifnot(packageVersion("DIDmultiplegtDYN") == "2.3.3")
+
 cat("Generating dCDH golden values via DIDmultiplegtDYN at l=1...\n")
 
 output_path <- file.path("benchmarks", "data", "dcdh_dynr_golden_values.json")
@@ -51,7 +61,8 @@ gen_reversible <- function(n_groups, n_periods, pattern, seed,
                            cycle_length = 2, treatment_effect = 2.0,
                            heterogeneous_effects = FALSE, effect_sd = 0.5,
                            group_fe_sd = 2.0, time_trend = 0.1, noise_sd = 0.5,
-                           n_never_treated = 20, n_always_treated = 20) {
+                           n_never_treated = 20, n_always_treated = 20,
+                           L_max = 3) {
   # n_never_treated and n_always_treated add stable control cohorts so
   # both Python (AER 2020 zero-retention) and R DIDmultiplegtDYN (dynamic
   # paper, drop-cohort) implementations have controls available at every
@@ -99,6 +110,76 @@ gen_reversible <- function(n_groups, n_periods, pattern, seed,
         D[g, seq_len(st - 1L)] <- 1L
       }
     }
+  } else if (pattern == "multi_path_reversible") {
+    # Deterministic multi-path DGP designed for by_path R-parity:
+    #   - 4 distinct joiner-style target paths with unequal frequencies
+    #     (so top-k ranking produces unique ranks with no ties)
+    #   - path assignment is a DETERMINISTIC FUNCTION OF F_g, so each
+    #     cohort (D_{g,1}, F_g, S_g) contains switchers from a single
+    #     path. This avoids cross-path cohort sharing in the
+    #     cohort-recentered influence function, which otherwise blows
+    #     out SE parity with R's re-run-per-path convention.
+    #   - post-window treatment is stable at path[L_max+1] (clean
+    #     control-pool eligibility — no post-window contamination)
+    #
+    # Each group:
+    #   - F_g in [2, n_periods - L_max] so the length-(L_max+1) window
+    #     [F_g-1, F_g-1+L_max] fits the panel
+    #   - path is determined by F_g: two F_g values per path (groups in
+    #     {F_g in {2,3}} share path 1, {F_g in {4,5}} share path 2,
+    #     {F_g in {6}} = path 3, {F_g in {7}} = path 4); within a path,
+    #     F_g distribution yields n_groups * path_prop total groups
+    max_switch <- n_periods - L_max - 1L
+    stopifnot(max_switch >= 1L)
+    # With n_periods=10, L_max=3: max_switch=6, F_g in [2,7] (6 values).
+
+    target_paths <- list(
+      c(0L, 1L, 1L, 1L),  # sustained on (rank 1)
+      c(0L, 1L, 1L, 0L),  # on then off  (rank 2)
+      c(0L, 1L, 0L, 0L),  # on briefly   (rank 3)
+      c(0L, 1L, 0L, 1L)   # on-off-on    (rank 4, truncated under by_path=3)
+    )
+    stopifnot(length(target_paths[[1]]) == L_max + 1L)
+
+    # Per-F_g path assignment: 2 F_g values for paths 1 & 2, 1 for paths
+    # 3 & 4. This keeps each (D_{g,1}, F_g, S_g) cohort single-path.
+    f_g_to_path <- c(1L, 1L, 2L, 2L, 3L, 4L)
+    stopifnot(length(f_g_to_path) == max_switch)
+    # Group counts per F_g (rank 1 > rank 2 > rank 3 > rank 4 with unique
+    # ranks — no frequency ties, robust to R's undocumented tiebreak):
+    #   F_g=2: 20 groups (path 1)
+    #   F_g=3: 20 groups (path 1) → rank 1 has 40 switchers total
+    #   F_g=4: 15 groups (path 2)
+    #   F_g=5: 10 groups (path 2) → rank 2 has 25 switchers total
+    #   F_g=6: 10 groups (path 3) → rank 3 has 10 switchers
+    #   F_g=7:  5 groups (path 4) → rank 4 has  5 switchers (excluded by by_path=3)
+    counts_per_F_g <- c(20L, 20L, 15L, 10L, 10L, 5L)
+    stopifnot(sum(counts_per_F_g) == n_groups)
+
+    # Build the group-to-(F_g, path) assignment, deterministic with seed
+    g_idx <- 1L
+    for (f_idx in seq_along(counts_per_F_g)) {
+      F_g <- f_idx + 1L  # F_g in [2, 7] for f_idx in [1, 6]
+      path_idx <- f_g_to_path[f_idx]
+      target <- target_paths[[path_idx]]
+      n_here <- counts_per_F_g[f_idx]
+      for (k in seq_len(n_here)) {
+        g <- g_idx
+        # Pre-baseline [1 .. F_g-2]: initial state
+        if (F_g >= 3L) {
+          D[g, 1:(F_g - 2L)] <- target[1]
+        }
+        # Window [F_g-1 .. F_g-1+L_max]: exactly the target path
+        for (j in 0:L_max) {
+          D[g, F_g - 1L + j] <- target[j + 1L]
+        }
+        # Post-window [F_g+L_max .. n_periods]: stable at path[L_max+1]
+        if (F_g + L_max <= n_periods) {
+          D[g, (F_g + L_max):n_periods] <- target[L_max + 1L]
+        }
+        g_idx <- g_idx + 1L
+      }
+    }
   } else {
     stop(sprintf("Unknown pattern: %s", pattern))
   }
@@ -482,6 +563,87 @@ scenarios$joiners_only_controls_trends_lin <- list(
   results = extract_dcdh_multi(res12, n_effects = 2, n_placebos = 1)
 )
 
+# ---------------------------------------------------------------------------
+# Phase 3 extension: by_path per-path event-study disaggregation
+# ---------------------------------------------------------------------------
+
+# Helper: extract per-path by_path results. When did_multiplegt_dyn is
+# called with by_path=k, the result object has no $results slot; instead
+# per-path results live at res$by_level_1, res$by_level_2, ... in rank
+# order (1 = most frequent observed path). res$by_levels is a character
+# vector of comma-joined path labels (e.g. "0,1,1,1") in the same order.
+extract_dcdh_by_path <- function(res, n_effects) {
+  by_levels <- res$by_levels
+  out <- list()
+  for (i in seq_along(by_levels)) {
+    slot <- res[[paste0("by_level_", i)]]
+    effects <- slot$results$Effects
+    horizons <- list()
+    for (h in seq_len(min(n_effects, nrow(effects)))) {
+      horizons[[as.character(h)]] <- list(
+        effect = as.numeric(effects[h, "Estimate"]),
+        se = as.numeric(effects[h, "SE"]),
+        ci_lo = as.numeric(effects[h, "LB CI"]),
+        ci_hi = as.numeric(effects[h, "UB CI"]),
+        n_switchers = as.numeric(effects[h, "Switchers"]),
+        n_obs = as.numeric(effects[h, "N"])
+      )
+    }
+    out[[i]] <- list(
+      path = by_levels[i],
+      frequency_rank = i,
+      horizons = horizons
+    )
+  }
+  list(by_path = out)
+}
+
+# Scenario 13: mixed_single_switch + by_path=2 (basic 2-path case).
+# The mixed_single_switch DGP produces joiners (path 0,1,1,1) and
+# leavers (path 1,0,0,0) as its only two observed paths at L_max=3, so
+# by_path=2 captures both and tests core per-path parity.
+cat("  Scenario 13: mixed_single_switch_by_path\n")
+d13 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 8,
+                      pattern = "mixed_single_switch", seed = 113)
+res13 <- did_multiplegt_dyn(
+  df = d13, outcome = "outcome", group = "group", time = "period",
+  treatment = "treatment", effects = 3, by_path = 2, ci_level = 95
+)
+scenarios$mixed_single_switch_by_path <- list(
+  data = export_data(d13),
+  params = list(pattern = "mixed_single_switch", n_groups = N_GOLDEN,
+                n_periods = 8, seed = 113, effects = 3, by_path = 2,
+                ci_level = 95),
+  results = extract_dcdh_by_path(res13, n_effects = 3)
+)
+
+# Scenario 14: multi_path_reversible + by_path=3 (top-k ranking case).
+# The `multi_path_reversible` pattern is a DETERMINISTIC multi-path DGP:
+# path assignment is a fixed function of F_g (so every (D_{g,1}, F_g,
+# S_g) cohort contains switchers from a single path), path proportions
+# are fixed at 20/20/15/10/10/5 across the 6 F_g values, and
+# post-window treatment is stable at path[L_max+1]. by_path=3 exercises
+# top-k selection when observed paths exceed k (4 observed paths, top-3
+# selected). n_periods=10 gives every switch_time a complete length-
+# (L_max+1) window. The old `p_switch`-driven random-toggle variant
+# (pre-PR) blew out SE parity with R via cross-path cohort mixing;
+# see the REGISTRY.md `Note (Phase 3 by_path ...)` Deviation bullet.
+cat("  Scenario 14: multi_path_reversible_by_path\n")
+d14 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 10,
+                      pattern = "multi_path_reversible", seed = 114,
+                      L_max = 3)
+res14 <- did_multiplegt_dyn(
+  df = d14, outcome = "outcome", group = "group", time = "period",
+  treatment = "treatment", effects = 3, by_path = 3, ci_level = 95
+)
+scenarios$multi_path_reversible_by_path <- list(
+  data = export_data(d14),
+  params = list(pattern = "multi_path_reversible", n_groups = N_GOLDEN,
+                n_periods = 10, seed = 114, effects = 3, by_path = 3,
+                ci_level = 95),
+  results = extract_dcdh_by_path(res14, n_effects = 3)
+)
+
 # ---------------------------------------------------------------------------
 # Write output
 # ---------------------------------------------------------------------------