igerber · igerber · Apr 19, 2026 · Apr 19, 2026 · Apr 19, 2026 · Apr 19, 2026
diff --git a/ROADMAP.md b/ROADMAP.md
@@ -100,7 +100,7 @@ Research-informed candidates. Each has a rationale, a tractability note, and a c
 
 ### Methodology extensions
 
-- **DiD with no untreated group** (de Chaisemartin, Ciccia, D'Haultfœuille & Knau, arXiv:2405.04465, 2024, plus continuous-treatment-with-no-stayers companion, AEA P&P 2024). New estimator for designs where treatment is universal with heterogeneous dose (the inverse of the few-treated-many-donors case). Uses quasi-untreated units as controls. No existing diff-diff estimator handles this. Tractability: medium; closed-form identification. **Commit when**: methodology plan drafted and validated against the paper's Pierce (2016) solar-panel replication.
+- **DiD with no untreated group** (de Chaisemartin, Ciccia, D'Haultfœuille & Knau, arXiv:2405.04465, 2024, plus continuous-treatment-with-no-stayers companion, AEA P&P 2024). New estimator for designs where treatment is universal with heterogeneous dose (the inverse of the few-treated-many-donors case). Uses quasi-untreated units as controls. No existing diff-diff estimator handles this. Tractability: medium; closed-form identification. **Status (2026-04-18):** methodology plan approved; paper review at `docs/methodology/papers/dechaisemartin-2026-review.md`, REGISTRY stub at `docs/methodology/REGISTRY.md#heterogeneousadoptiondid`, class name `HeterogeneousAdoptionDiD`, implementation queued across 7 phased PRs. **Commit when**: methodology plan drafted and validated against the paper's Pierce and Schott (2016) PNTR manufacturing-employment replication (Figure 2).
 - **Nonparametric / flexible outcome regression for `EfficientDiD` DR covariate path** (Chen, Sant'Anna & Xie, arXiv:2506.17729, 2025, Section 4). The shipped staggered `EfficientDiD` uses a linear OLS outcome regression in its doubly-robust covariate path; that preserves DR consistency but does not generically attain the semiparametric efficiency bound unless the conditional mean is linear in the covariates. Replacing the OLS outcome regression with sieve / kernel / ML nuisance estimation (as the paper's Section 4 allows) would close the efficiency gap on the covariate path. Tractability: medium; the hook points are in `diff_diff/efficient_did_covariates.py`. **Commit when**: a paper-review synthesis is written, with an implementation plan for the nonparametric OR that preserves the existing DR consistency guarantees and survey-weighted variance surface.
 - **Distributional DiD for staggered timing** (Ciaccio, arXiv:2408.01208, 2024). New estimator extending Callaway-Li QTT to staggered adoption. `CallawaySantAnna` currently gives mean ATT only; this unlocks quantile effects. Tractability: medium. **Commit when**: a health-econ or public-health user reports need for quantile effects in a repeated-cross-section design.
 - **Local Projections DiD** (Dube, Girardi, Jordà & Taylor, JAE 2025). New estimator with flexible impulse-response and robustness to dynamic misspecification; natural for anticipation-prone settings. Tractability: well-scoped. **Commit when**: a methodology review confirms the dynamic variant's variance derivation fits our SE helpers.

diff --git a/TODO.md b/TODO.md
@@ -77,6 +77,11 @@ Deferred items from PR reviews that were not addressed before merge.
 | WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
 | WooldridgeDiD: canonical link requirement (W2023 Prop 3.1) not enforced — no warning if user applies wrong method to outcome type. Estimator is consistent regardless, but equivalence with imputation breaks. | `wooldridge.py` | #216 | Low |
 | WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
+| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
+| Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
+| HC2 / HC2 + Bell-McCaffrey on absorbed-FE fits currently raises `NotImplementedError` in three places: `TwoWayFixedEffects` unconditionally; `DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})`; `MultiPeriodDiD(absorb=..., vcov_type in {"hc2","hc2_bm"})`. Within-transformation preserves coefficients and residuals under FWL but not the hat matrix, so the reduced-design `h_ii` is not the diagonal of the full FE projection and CR2's block adjustment `A_g = (I - H_gg)^{-1/2}` is likewise wrong on absorbed cluster blocks. Lifting the guard needs HC2/CR2-BM computed from the full absorbed projection (unit/time FE dummies reconstructed internally, or a FE-aware hat-matrix formulation) and a parity harness against a full-dummy OLS run or R `fixest`/`clubSandwich`. HC1/CR1 are unaffected by this because they have no leverage term. | `twfe.py::fit`, `estimators.py::DifferenceInDifferences.fit`, `estimators.py::MultiPeriodDiD.fit` | Phase 1a | Medium |
+| Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. | `linalg.py::_compute_cr2_bm` | Phase 1a | Medium |
+| Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium |
 
 #### Performance
 
@@ -85,6 +90,9 @@ Deferred items from PR reviews that were not addressed before merge.
 | ImputationDiD event-study SEs recompute full conservative variance per horizon (should cache A0/A1 factorization) | `imputation.py` | #141 | Low |
 | Rust faer SVD ndarray-to-faer conversion overhead (minimal vs SVD cost) | `rust/src/linalg.rs:67` | #115 | Low |
 | Unrelated label events (e.g., adding `bug` label) re-trigger CI workflows when `ready-for-ci` is already present; filter `labeled`/`unlabeled` events to only `ready-for-ci` transitions | `.github/workflows/rust-test.yml`, `notebooks.yml` | #269 | Low |
+| `bread_inv` as a performance kwarg on `compute_robust_vcov` to avoid re-inverting `(X'WX)` when the caller already has it. Deferred from Phase 1a for scope. HC2 and HC2+BM both need the bread inverse, so a shared hint would save one `np.linalg.solve` per sandwich. | `linalg.py::compute_robust_vcov` | Phase 1a | Low |
+| Rust-backend HC2 implementation. Current Rust path only supports HC1; HC2 and CR2 Bell-McCaffrey fall through to the NumPy backend. For large-n fits this is noticeable. | `rust/src/linalg.rs` | Phase 1a | Low |
+| CR2 Bell-McCaffrey DOF uses a naive `O(n² k)` per-coefficient loop over cluster pairs. Pustejovsky-Tipton (2018) Appendix B has a scores-based formulation that avoids the full `n × n` `M` matrix. Switch when a user hits a large-`n` cluster-robust design. | `linalg.py::_compute_cr2_bm` | Phase 1a | Low |
 
 #### Testing/Docs
 

diff --git a/benchmarks/R/generate_clubsandwich_golden.R b/benchmarks/R/generate_clubsandwich_golden.R
@@ -0,0 +1,82 @@
+# Generate CR2 Bell-McCaffrey golden values via R clubSandwich.
+#
+# This script is the parity source for CR2 Bell-McCaffrey cluster-robust
+# inference implemented in diff_diff/linalg.py::_compute_cr2_bm.
+#
+# Usage:
+#   Rscript benchmarks/R/generate_clubsandwich_golden.R
+#
+# Requirements:
+#   clubSandwich (CRAN), jsonlite, readr
+#
+# Output:
+#   benchmarks/data/clubsandwich_cr2_golden.json
+#
+# Phase 1a of the HeterogeneousAdoptionDiD implementation (de Chaisemartin,
+# Ciccia, D'Haultfoeuille & Knau 2026, arXiv:2405.04465v6). The parity
+# dataset below consists of three small deterministic designs; the Python
+# test at tests/test_linalg_hc2_bm.py::TestCR2BMParityClubSandwich loads
+# this JSON and checks agreement to 6 digits.
+
+suppressPackageStartupMessages({
+  library(clubSandwich)
+  library(jsonlite)
+})
+
+set.seed(20260420)
+
+# --- Three deterministic datasets ---------------------------------------------
+
+make_dataset <- function(name, n_clusters, cluster_sizes, seed) {
+  set.seed(seed)
+  cluster_ids <- rep(seq_len(n_clusters), times = cluster_sizes)
+  n <- length(cluster_ids)
+  x <- runif(n, 0, 1)
+  # Cluster-level shock to induce within-cluster correlation, plus idiosyncratic noise.
+  shock <- rnorm(n_clusters, sd = 0.5)
+  y <- 1 + 0.5 * x + shock[cluster_ids] + rnorm(n, sd = 0.2)
+  data.frame(name = name, cluster = cluster_ids, x = x, y = y)
+}
+
+datasets <- list(
+  balanced_small = make_dataset("balanced_small", 5, rep(6, 5), 101),
+  unbalanced_medium = make_dataset("unbalanced_medium", 8, c(3, 4, 5, 6, 7, 8, 9, 10), 202),
+  singletons_present = make_dataset("singletons_present", 10, c(1, 1, 2, 3, 4, 5, 6, 7, 8, 9), 303)
+)
+
+output <- list()
+
+for (nm in names(datasets)) {
+  d <- datasets[[nm]]
+  fit <- lm(y ~ x, data = d)
+  vcov_cr2 <- vcovCR(fit, cluster = d$cluster, type = "CR2")
+  # Per-contrast Bell-McCaffrey DOF: one per coefficient via a unit contrast.
+  coef_names <- names(coef(fit))
+  dof_vec <- sapply(coef_names, function(nm_coef) {
+    ctr <- setNames(as.numeric(names(coef(fit)) == nm_coef), names(coef(fit)))
+    Wald_test(fit, constraints = matrix(ctr, 1), vcov = vcov_cr2, test = "Satterthwaite")$df
+  })
+  output[[nm]] <- list(
+    x = d$x,
+    y = d$y,
+    cluster = d$cluster,
+    coef = as.numeric(coef(fit)),
+    coef_names = coef_names,
+    vcov_cr2 = as.numeric(vcov_cr2),
+    vcov_shape = dim(vcov_cr2),
+    dof_bm = as.numeric(dof_vec),
+    cluster_sizes = as.numeric(table(d$cluster))
+  )
+}
+
+output$meta <- list(
+  source = "clubSandwich",
+  clubSandwich_version = as.character(packageVersion("clubSandwich")),
+  R_version = R.version.string,
+  generated_at = format(Sys.time(), tz = "UTC", usetz = TRUE),
+  note = "CR2 Bell-McCaffrey cluster-robust parity target for diff_diff._compute_cr2_bm"
+)
+
+out_path <- file.path("benchmarks", "data", "clubsandwich_cr2_golden.json")
+writeLines(toJSON(output, pretty = TRUE, digits = 15, auto_unbox = TRUE), out_path)
+cat("Wrote", out_path, "\n")