Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
- **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.
- **`ChaisemartinDHaultfoeuille.by_path`** — per-path event-study disaggregation, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Passing `by_path=k` (positive int) to the estimator reports separate `DID_{path,l}` + SE + inference for the top-k most common observed treatment paths in the window `[F_g-1, F_g-1+L_max]`, answering the practitioner question "is a single pulse enough, or do you need sustained exposure?" across paths like `(0,1,0,0)` vs `(0,1,1,0)` vs `(0,1,1,1)`. The per-path SE follows the joiners-only / leavers-only IF precedent (switcher-side contribution zeroed for non-path groups; control pool and cohort structure unchanged; plug-in SE with path-specific divisor). Requires `drop_larger_lower=False` (multi-switch groups are the object of interest) and `L_max >= 1`. Binary treatment only in this release; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, `survey_design`, and `n_bootstrap > 0` raise `NotImplementedError` and are deferred to follow-up PRs. Results expose `results.path_effects: Dict[Tuple[int, ...], Dict[str, Any]]` and `results.to_dataframe(level="by_path")`; the summary grows a "Treatment-Path Disaggregation" block. Ties in path frequency are broken lexicographically on the path tuple for deterministic ranking. Overflow (`by_path > n_observed_paths`) returns all observed paths with a `UserWarning`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path event-study disaggregation)` for the full contract.
- **R-parity for `ChaisemartinDHaultfoeuille.by_path`** against `DIDmultiplegtDYN 2.3.3`. Two new scenarios in `benchmarks/data/dcdh_dynr_golden_values.json` generated from `did_multiplegt_dyn(..., by_path=k)`: `mixed_single_switch_by_path` (2 paths, `by_path=2`) and `multi_path_reversible_by_path` (4 observed paths, `by_path=3`, via a new deterministic multi-path DGP pattern in the R generator). Per-path point estimates and per-path switcher counts match R exactly; per-path SE matches within the Phase 2 multi-horizon SE envelope (observed rtol ≤ 10.2% on the 2-path scenario, ≤ 4.2% on the 4-path scenario). Parity tests live at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPath`, matching paths by tuple label via set-equality (robust to R's undocumented frequency-tie tiebreak) and cross-checking per-path switcher counts before SE comparison. **Deviation documented:** cross-path cohort sharing — our full-panel cohort-centered plug-in vs R's per-path re-run diverges materially when a `(D_{g,1}, F_g, S_g)` cohort spans multiple observed paths; the two coincide when every cohort is single-path. The parity scenarios are constructed to keep cohorts single-path (scenario 13 by design, scenario 14 via path-assignment-deterministic-on-F_g). See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path...)` for the full write-up.
- **`target_parameter` block in BR/DR schemas (experimental; schema version bumped to 2.0)** — `BUSINESS_REPORT_SCHEMA_VERSION` and `DIAGNOSTIC_REPORT_SCHEMA_VERSION` bumped from `"1.0"` to `"2.0"` because the new `"no_scalar_by_design"` value on the `headline.status` / `headline_metric.status` enum (dCDH `trends_linear=True, L_max>=2` configuration) is a breaking change per the REPORTING.md stability policy. BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.
- SyntheticDiD coverage Monte Carlo calibration table added to `docs/methodology/REGISTRY.md` §SyntheticDiD — rejection rates at α ∈ {0.01, 0.05, 0.10} across `placebo` / `bootstrap` / `jackknife` on 3 representative DGPs (balanced / exchangeable, unbalanced, and Arkhangelsky et al. (2021) AER §6.3 non-exchangeable). Artifact at `benchmarks/data/sdid_coverage.json` (500 seeds × B=200), regenerable via `benchmarks/python/coverage_sdid.py`.

Expand Down
164 changes: 163 additions & 1 deletion benchmarks/R/generate_dcdh_dynr_test_values.R
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,16 @@ library(DIDmultiplegtDYN)
library(jsonlite)
suppressMessages(library(polars)) # required by DIDmultiplegtDYN >= 2.x

# Pin DIDmultiplegtDYN to an exact version because the `by_path` output
# slots (res$by_levels, res$by_level_i) were introduced in v2.3.3 and
# the structure is not version-stable per the R package's own docs. A
# floor constraint (`>= 2.3.3`) could silently drift the fixture schema
# when regenerated against a future release. Update this pin *and* re-
# run TestDCDHDynRParityByPath when bumping to a newer known-compatible
# release; extend to an explicit allowlist (e.g. `%in% c("2.3.3",
# "2.3.4")`) once a second version is verified.
stopifnot(packageVersion("DIDmultiplegtDYN") == "2.3.3")

cat("Generating dCDH golden values via DIDmultiplegtDYN at l=1...\n")

output_path <- file.path("benchmarks", "data", "dcdh_dynr_golden_values.json")
Expand All @@ -51,7 +61,8 @@ gen_reversible <- function(n_groups, n_periods, pattern, seed,
cycle_length = 2, treatment_effect = 2.0,
heterogeneous_effects = FALSE, effect_sd = 0.5,
group_fe_sd = 2.0, time_trend = 0.1, noise_sd = 0.5,
n_never_treated = 20, n_always_treated = 20) {
n_never_treated = 20, n_always_treated = 20,
L_max = 3) {
# n_never_treated and n_always_treated add stable control cohorts so
# both Python (AER 2020 zero-retention) and R DIDmultiplegtDYN (dynamic
# paper, drop-cohort) implementations have controls available at every
Expand Down Expand Up @@ -99,6 +110,76 @@ gen_reversible <- function(n_groups, n_periods, pattern, seed,
D[g, seq_len(st - 1L)] <- 1L
}
}
} else if (pattern == "multi_path_reversible") {
# Deterministic multi-path DGP designed for by_path R-parity:
# - 4 distinct joiner-style target paths with unequal frequencies
# (so top-k ranking produces unique ranks with no ties)
# - path assignment is a DETERMINISTIC FUNCTION OF F_g, so each
# cohort (D_{g,1}, F_g, S_g) contains switchers from a single
# path. This avoids cross-path cohort sharing in the
# cohort-recentered influence function, which otherwise blows
# out SE parity with R's re-run-per-path convention.
# - post-window treatment is stable at path[L_max+1] (clean
# control-pool eligibility — no post-window contamination)
#
# Each group:
# - F_g in [2, n_periods - L_max] so the length-(L_max+1) window
# [F_g-1, F_g-1+L_max] fits the panel
# - path is determined by F_g: two F_g values per path (groups in
# {F_g in {2,3}} share path 1, {F_g in {4,5}} share path 2,
# {F_g in {6}} = path 3, {F_g in {7}} = path 4); within a path,
# F_g distribution yields n_groups * path_prop total groups
max_switch <- n_periods - L_max - 1L
stopifnot(max_switch >= 1L)
# With n_periods=10, L_max=3: max_switch=6, F_g in [2,7] (6 values).

target_paths <- list(
c(0L, 1L, 1L, 1L), # sustained on (rank 1)
c(0L, 1L, 1L, 0L), # on then off (rank 2)
c(0L, 1L, 0L, 0L), # on briefly (rank 3)
c(0L, 1L, 0L, 1L) # on-off-on (rank 4, truncated under by_path=3)
)
stopifnot(length(target_paths[[1]]) == L_max + 1L)

# Per-F_g path assignment: 2 F_g values for paths 1 & 2, 1 for paths
# 3 & 4. This keeps each (D_{g,1}, F_g, S_g) cohort single-path.
f_g_to_path <- c(1L, 1L, 2L, 2L, 3L, 4L)
stopifnot(length(f_g_to_path) == max_switch)
# Group counts per F_g (rank 1 > rank 2 > rank 3 > rank 4 with unique
# ranks — no frequency ties, robust to R's undocumented tiebreak):
# F_g=2: 20 groups (path 1)
# F_g=3: 20 groups (path 1) → rank 1 has 40 switchers total
# F_g=4: 15 groups (path 2)
# F_g=5: 10 groups (path 2) → rank 2 has 25 switchers total
# F_g=6: 10 groups (path 3) → rank 3 has 10 switchers
# F_g=7: 5 groups (path 4) → rank 4 has 5 switchers (excluded by by_path=3)
counts_per_F_g <- c(20L, 20L, 15L, 10L, 10L, 5L)
stopifnot(sum(counts_per_F_g) == n_groups)

# Build the group-to-(F_g, path) assignment, deterministic with seed
g_idx <- 1L
for (f_idx in seq_along(counts_per_F_g)) {
F_g <- f_idx + 1L # F_g in [2, 7] for f_idx in [1, 6]
path_idx <- f_g_to_path[f_idx]
target <- target_paths[[path_idx]]
n_here <- counts_per_F_g[f_idx]
for (k in seq_len(n_here)) {
g <- g_idx
# Pre-baseline [1 .. F_g-2]: initial state
if (F_g >= 3L) {
D[g, 1:(F_g - 2L)] <- target[1]
}
# Window [F_g-1 .. F_g-1+L_max]: exactly the target path
for (j in 0:L_max) {
D[g, F_g - 1L + j] <- target[j + 1L]
}
# Post-window [F_g+L_max .. n_periods]: stable at path[L_max+1]
if (F_g + L_max <= n_periods) {
D[g, (F_g + L_max):n_periods] <- target[L_max + 1L]
}
g_idx <- g_idx + 1L
}
}
} else {
stop(sprintf("Unknown pattern: %s", pattern))
}
Expand Down Expand Up @@ -482,6 +563,87 @@ scenarios$joiners_only_controls_trends_lin <- list(
results = extract_dcdh_multi(res12, n_effects = 2, n_placebos = 1)
)

# ---------------------------------------------------------------------------
# Phase 3 extension: by_path per-path event-study disaggregation
# ---------------------------------------------------------------------------

# Helper: extract per-path by_path results. When did_multiplegt_dyn is
# called with by_path=k, the result object has no $results slot; instead
# per-path results live at res$by_level_1, res$by_level_2, ... in rank
# order (1 = most frequent observed path). res$by_levels is a character
# vector of comma-joined path labels (e.g. "0,1,1,1") in the same order.
extract_dcdh_by_path <- function(res, n_effects) {
by_levels <- res$by_levels
out <- list()
for (i in seq_along(by_levels)) {
slot <- res[[paste0("by_level_", i)]]
effects <- slot$results$Effects
horizons <- list()
for (h in seq_len(min(n_effects, nrow(effects)))) {
horizons[[as.character(h)]] <- list(
effect = as.numeric(effects[h, "Estimate"]),
se = as.numeric(effects[h, "SE"]),
ci_lo = as.numeric(effects[h, "LB CI"]),
ci_hi = as.numeric(effects[h, "UB CI"]),
n_switchers = as.numeric(effects[h, "Switchers"]),
n_obs = as.numeric(effects[h, "N"])
)
}
out[[i]] <- list(
path = by_levels[i],
frequency_rank = i,
horizons = horizons
)
}
list(by_path = out)
}

# Scenario 13: mixed_single_switch + by_path=2 (basic 2-path case).
# The mixed_single_switch DGP produces joiners (path 0,1,1,1) and
# leavers (path 1,0,0,0) as its only two observed paths at L_max=3, so
# by_path=2 captures both and tests core per-path parity.
cat(" Scenario 13: mixed_single_switch_by_path\n")
d13 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 8,
pattern = "mixed_single_switch", seed = 113)
res13 <- did_multiplegt_dyn(
df = d13, outcome = "outcome", group = "group", time = "period",
treatment = "treatment", effects = 3, by_path = 2, ci_level = 95
)
scenarios$mixed_single_switch_by_path <- list(
data = export_data(d13),
params = list(pattern = "mixed_single_switch", n_groups = N_GOLDEN,
n_periods = 8, seed = 113, effects = 3, by_path = 2,
ci_level = 95),
results = extract_dcdh_by_path(res13, n_effects = 3)
)

# Scenario 14: multi_path_reversible + by_path=3 (top-k ranking case).
# The `multi_path_reversible` pattern is a DETERMINISTIC multi-path DGP:
# path assignment is a fixed function of F_g (so every (D_{g,1}, F_g,
# S_g) cohort contains switchers from a single path), path proportions
# are fixed at 20/20/15/10/10/5 across the 6 F_g values, and
# post-window treatment is stable at path[L_max+1]. by_path=3 exercises
# top-k selection when observed paths exceed k (4 observed paths, top-3
# selected). n_periods=10 gives every switch_time a complete length-
# (L_max+1) window. The old `p_switch`-driven random-toggle variant
# (pre-PR) blew out SE parity with R via cross-path cohort mixing;
# see the REGISTRY.md `Note (Phase 3 by_path ...)` Deviation bullet.
cat(" Scenario 14: multi_path_reversible_by_path\n")
d14 <- gen_reversible(n_groups = N_GOLDEN, n_periods = 10,
pattern = "multi_path_reversible", seed = 114,
L_max = 3)
res14 <- did_multiplegt_dyn(
df = d14, outcome = "outcome", group = "group", time = "period",
treatment = "treatment", effects = 3, by_path = 3, ci_level = 95
)
scenarios$multi_path_reversible_by_path <- list(
data = export_data(d14),
params = list(pattern = "multi_path_reversible", n_groups = N_GOLDEN,
n_periods = 10, seed = 114, effects = 3, by_path = 3,
ci_level = 95),
results = extract_dcdh_by_path(res14, n_effects = 3)
)

# ---------------------------------------------------------------------------
# Write output
# ---------------------------------------------------------------------------
Expand Down
Loading
Loading