igerber · igerber · May 21, 2026 · May 20, 2026 · May 20, 2026 · May 20, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
diff --git a/TODO.md b/TODO.md
@@ -101,8 +101,14 @@ Deferred items from PR reviews that were not addressed before merge.
 | Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 7 standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham` (this row tracks the remaining 7). | multiple | Phase 1b | Medium |
 | Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium |
 | Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |
-| Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
-| Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. | `linalg.py::_compute_cr2_bm` | Phase 1a | Medium |
+<!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
+     form matches clubSandwich's specific algebra (W not sqrt(W), W^2 in bias term,
+     unweighted residuals) rather than the textbook PT2018 transform-once derivation
+     which historically diverged by 0.5-30%. Pinned at atol=1e-10 on 6 weighted
+     scenarios in benchmarks/data/clubsandwich_cr2_golden.json. -->
+| ~~Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster)~~ — LIFTED via clubSandwich WLS-CR2 port. Routes through `_compute_cr2_bm_contrast_dof` with singleton-cluster reduction (each obs is its own cluster). See REGISTRY.md Phase 1a row for the algebra. | `linalg.py` | LIFTED | — |
+| ~~Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`)~~ — LIFTED via clubSandwich WLS-CR2 port. `_compute_cr2_bm` now accepts `weights=` and threads through clubSandwich's specific WLS-CR2 algebra. | `linalg.py::_compute_cr2_bm` | LIFTED | — |
+| `LinearRegression.fit()` pays the CR2 cost twice on the weighted `hc2_bm` path: once inside `solve_ols(..., return_vcov=True)` and again via `compute_robust_vcov(..., return_dof=True)` to populate `_bm_dof`. Correct but redundant. Fix: thread `return_dof` through `solve_ols` so the same CR2 computation produces both vcov + DOF, or cache the per-cluster `A_g` / `MUWTWUM` precomputes between calls. CI codex P3 on PR #475. | `linalg.py::LinearRegression.fit`, `linalg.py::solve_ols` | PR #475 follow-up | Low |
 | `TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` with replicate-weight survey designs raises `NotImplementedError` (`twfe.py:~233`). The replicate path re-demeans per replicate (re-demeaning depends on the per-replicate weight vector), which doesn't compose with the full-dummy HC2/HC2-BM build — a correct implementation would need per-replicate full-dummy refit. Workaround: use `vcov_type="hc1"` for replicate-weight CR1. | `twfe.py::fit` | follow-up | Low |
 | TWFE's HC2/HC2-BM inline full-dummy build (`twfe.py:280-315`) duplicates the dummy-construction logic in `DifferenceInDifferences(fixed_effects=...)` (`estimators.py:478-486`). Extract a shared helper (or delegate TWFE's HC2/HC2-BM path to DiD's `fixed_effects=` branch, with TWFE-specific cluster default threading) to reduce drift risk on FE naming, survey behavior, and result-surface conventions. Substantive refactor — touches both estimators. | `twfe.py::fit`, `estimators.py::DifferenceInDifferences.fit` | follow-up | Low |
 | Unify Rust local-method `estimate_model` solver path to `solve_wls_svd` (the same SVD helper used by the global-method since PR #348) for sub-1e-14 bootstrap SE parity. Current local-method bootstrap parity test (`tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility_local`) passes at `atol=1e-5` — the residual ~1e-7 gap is roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq`, which accumulates differently across per-replicate bootstrap fits. Main-fit ATT parity is regime-dependent (`atol=1e-14` for `lambda_nn=inf`, `atol=1e-10` for finite `lambda_nn` — see `test_local_method_main_fit_parity`); the bootstrap gap is a same-solver-path roundoff concern and not a user-visible correctness bug. | `rust/src/trop.rs::estimate_model`, `rust/src/linalg.rs::solve_wls_svd` | follow-up | Low |
@@ -201,7 +207,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
 #### Tier C — Heavy / derivation required
 
 - HonestDiD Δ^RM ARP conditional/hybrid confidence sets (`honest_did.py`)
-- Weighted one-way Bell-McCaffrey + weighted CR2 Bell-McCaffrey (linalg derivations + R parity harness) (`linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_compute_cr2_bm`)
+- ~~Weighted one-way Bell-McCaffrey + weighted CR2 Bell-McCaffrey~~ — LIFTED via the clubSandwich WLS-CR2 port (PR #475); see rows 109-110 above for the closed rows.
 - Multi-absorb weighted demeaning: alternating-projection iteration for N>1 absorb + weights (`estimators.py`)
 - ImputationDiD dense `(A0'A0).toarray()` OOM: alternative dense fallback or richer sparse strategy (`imputation.py:1531`)
 - HAD mass-point `vcov_type ∈ {hc2, hc2_bm}`: 2SLS-specific leverage derivation (`had.py::_fit_mass_point_2sls`)

diff --git a/benchmarks/R/generate_clubsandwich_golden.R b/benchmarks/R/generate_clubsandwich_golden.R
@@ -23,6 +23,7 @@ suppressPackageStartupMessages({
   library(jsonlite)
 })
 
+stopifnot(packageVersion("clubSandwich") >= "0.7.0")
 set.seed(20260420)
 
 # --- Three deterministic datasets ---------------------------------------------
@@ -468,12 +469,242 @@ output$sun_abraham_two_cohort <- list(
   dof_bm_contrast_overall_unit = unname(sa_dof_bm_overall_unit)
 )
 
+# =============================================================================
+# Weighted scenarios (clubSandwich WLS-CR2 port)
+# =============================================================================
+# Pin diff-diff's weighted CR2-BM port against clubSandwich's specific WLS-CR2
+# algebra (R/CR-adjustments.R::CR2 + R/get_arrays.R::get_GH + coef_test.R).
+# Verified algorithm uses W (not sqrt(W)) in the hat matrix, W^2 in the bias
+# correction term, and unweighted residuals in the score construction.
+
+# ---- Scenario: weighted_one_way_no_cluster ----------------------------------
+# 12 observations, single "cluster" via vcovCR(cluster=1:n, type="CR2") for the
+# one-way HC2-BM reduction. Heteroskedastic weights.
+set.seed(50100)
+d_w_oneway <- data.frame(
+  x = c(-1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0, -0.8, 0.3, 1.2, -1.2),
+  z = c(0.2, -0.3, 0.5, -0.1, 0.4, -0.5, 0.6, -0.2, 0.1, -0.4, 0.3, 0.5)
+)
+d_w_oneway$y <- 1 + 0.5 * d_w_oneway$x + 0.3 * d_w_oneway$z +
+  rnorm(nrow(d_w_oneway), sd = 0.5)
+w_oneway <- c(0.5, 0.5, 1, 1, 1.5, 1.5, 2, 2, 2.5, 2.5, 3, 3)
+fit_w_oneway <- lm(y ~ x + z, data = d_w_oneway, weights = w_oneway)
+vcov_w_oneway <- vcovCR(fit_w_oneway,
+                        cluster = seq_len(nrow(d_w_oneway)), type = "CR2")
+ct_w_oneway <- coef_test(fit_w_oneway, vcov = vcov_w_oneway)
+output$weighted_one_way_no_cluster <- list(
+  x = d_w_oneway$x,
+  z = d_w_oneway$z,
+  y = d_w_oneway$y,
+  weights = w_oneway,
+  coef = as.numeric(coef(fit_w_oneway)),
+  coef_names = names(coef(fit_w_oneway)),
+  vcov_hc2_bm = as.numeric(vcov_w_oneway),
+  vcov_hc2_bm_shape = dim(vcov_w_oneway),
+  dof_bm_one_way = as.numeric(ct_w_oneway$df_Satt),
+  se_hc2_bm = as.numeric(ct_w_oneway$SE)
+)
+
+# ---- Scenario: weighted_balanced_clusters -----------------------------------
+# 20 observations, 4 clusters of 5; weights vary within and across clusters.
+set.seed(50200)
+n_w_bal <- 20
+cluster_w_bal <- rep(1:4, each = 5)
+d_w_bal <- data.frame(
+  cluster = cluster_w_bal,
+  x = rnorm(n_w_bal),
+  z = rnorm(n_w_bal)
+)
+d_w_bal$y <- 1 + 0.5 * d_w_bal$x + 0.3 * d_w_bal$z + rnorm(n_w_bal, sd = 0.5)
+w_bal <- runif(n_w_bal, min = 0.5, max = 3.0)
+fit_w_bal <- lm(y ~ x + z, data = d_w_bal, weights = w_bal)
+vcov_w_bal <- vcovCR(fit_w_bal, cluster = d_w_bal$cluster, type = "CR2")
+ct_w_bal <- coef_test(fit_w_bal, vcov = vcov_w_bal)
+output$weighted_balanced_clusters <- list(
+  x = d_w_bal$x,
+  z = d_w_bal$z,
+  y = d_w_bal$y,
+  weights = w_bal,
+  cluster = d_w_bal$cluster,
+  coef = as.numeric(coef(fit_w_bal)),
+  coef_names = names(coef(fit_w_bal)),
+  vcov_cr2 = as.numeric(vcov_w_bal),
+  vcov_cr2_shape = dim(vcov_w_bal),
+  dof_bm = as.numeric(ct_w_bal$df_Satt),
+  se_cr2 = as.numeric(ct_w_bal$SE)
+)
+
+# ---- Scenario: weighted_unbalanced_clusters ---------------------------------
+# 52 observations, 8 clusters of sizes 3-10. Heteroskedastic weights.
+set.seed(50300)
+cluster_sizes_unbal <- c(3, 4, 5, 6, 7, 8, 9, 10)
+cluster_w_unbal <- rep(1:8, times = cluster_sizes_unbal)
+n_w_unbal <- length(cluster_w_unbal)
+d_w_unbal <- data.frame(
+  cluster = cluster_w_unbal,
+  x = rnorm(n_w_unbal),
+  z = rnorm(n_w_unbal)
+)
+shock_unbal <- rnorm(8, sd = 0.3)
+d_w_unbal$y <- 1 + 0.5 * d_w_unbal$x + 0.3 * d_w_unbal$z +
+  shock_unbal[d_w_unbal$cluster] + rnorm(n_w_unbal, sd = 0.4)
+w_unbal <- runif(n_w_unbal, min = 0.3, max = 3.0)
+fit_w_unbal <- lm(y ~ x + z, data = d_w_unbal, weights = w_unbal)
+vcov_w_unbal <- vcovCR(fit_w_unbal, cluster = d_w_unbal$cluster, type = "CR2")
+ct_w_unbal <- coef_test(fit_w_unbal, vcov = vcov_w_unbal)
+output$weighted_unbalanced_clusters <- list(
+  x = d_w_unbal$x,
+  z = d_w_unbal$z,
+  y = d_w_unbal$y,
+  weights = w_unbal,
+  cluster = d_w_unbal$cluster,
+  coef = as.numeric(coef(fit_w_unbal)),
+  coef_names = names(coef(fit_w_unbal)),
+  vcov_cr2 = as.numeric(vcov_w_unbal),
+  vcov_cr2_shape = dim(vcov_w_unbal),
+  dof_bm = as.numeric(ct_w_unbal$df_Satt),
+  se_cr2 = as.numeric(ct_w_unbal$SE),
+  cluster_sizes = as.numeric(table(d_w_unbal$cluster))
+)
+
+# ---- Scenario: weighted_singletons_present ----------------------------------
+# Adversarial: prior PT2018 transform-once derivation hit ~30% gap on
+# singleton-cluster scenarios. Verifies the clubSandwich port handles this.
+set.seed(50400)
+cluster_sizes_sing <- c(1, 1, 2, 3, 4, 5, 6, 6, 4, 3)
+cluster_w_sing <- rep(1:10, times = cluster_sizes_sing)
+n_w_sing <- length(cluster_w_sing)
+d_w_sing <- data.frame(
+  cluster = cluster_w_sing,
+  x = rnorm(n_w_sing),
+  z = rnorm(n_w_sing)
+)
+shock_sing <- rnorm(10, sd = 0.3)
+d_w_sing$y <- 1 + 0.5 * d_w_sing$x + 0.3 * d_w_sing$z +
+  shock_sing[d_w_sing$cluster] + rnorm(n_w_sing, sd = 0.4)
+w_sing <- runif(n_w_sing, min = 0.3, max = 3.0)
+fit_w_sing <- lm(y ~ x + z, data = d_w_sing, weights = w_sing)
+vcov_w_sing <- vcovCR(fit_w_sing, cluster = d_w_sing$cluster, type = "CR2")
+ct_w_sing <- coef_test(fit_w_sing, vcov = vcov_w_sing)
+output$weighted_singletons_present <- list(
+  x = d_w_sing$x,
+  z = d_w_sing$z,
+  y = d_w_sing$y,
+  weights = w_sing,
+  cluster = d_w_sing$cluster,
+  coef = as.numeric(coef(fit_w_sing)),
+  coef_names = names(coef(fit_w_sing)),
+  vcov_cr2 = as.numeric(vcov_w_sing),
+  vcov_cr2_shape = dim(vcov_w_sing),
+  dof_bm = as.numeric(ct_w_sing$df_Satt),
+  dof_per_coef = as.numeric(ct_w_sing$df_Satt),
+  se_cr2 = as.numeric(ct_w_sing$SE),
+  cluster_sizes = as.numeric(table(d_w_sing$cluster))
+)
+
+# ---- Scenario: weighted_did_absorbed_fe -------------------------------------
+# DiD-style analytical CR2 design-matrix parity fixture: 8 units x 4 periods,
+# treat_post + unit + period FE, analytics weights varying by unit. Pins the
+# clubSandwich WLS-CR2 analytical surface (compute_robust_vcov / solve_ols /
+# LinearRegression direct callers) on a DiD-shaped design. NOTE: this is NOT
+# a public-estimator `DifferenceInDifferences(survey_design=...)` parity
+# fixture — estimator-level survey designs route through the TSL survey
+# variance path, which takes precedence over the analytical CR2 sandwich.
+set.seed(50500)
+d_did_w <- make_did_panel(n_units = 8, n_periods = 4, treatment_period = 2,
+                          seed = 50501)
+# Unit-level weight (stratum-like): vary by unit, constant within unit-period.
+unit_w_did <- runif(8, min = 0.5, max = 2.5)
+d_did_w$weights <- unit_w_did[d_did_w$unit_int]
+fit_did_w <- lm(y ~ treat_post + unit + period, data = d_did_w,
+                weights = d_did_w$weights)
+vcov_did_w_cr2 <- vcovCR(fit_did_w, cluster = d_did_w$unit_int, type = "CR2")
+ct_did_w_cr2 <- coef_test(fit_did_w, vcov = vcov_did_w_cr2)
+output$weighted_did_absorbed_fe <- list(
+  unit = d_did_w$unit_int,
+  period = d_did_w$period_int,
+  treated = d_did_w$treated,
+  post = d_did_w$post,
+  treat_post = d_did_w$treat_post,
+  y = d_did_w$y,
+  weights = d_did_w$weights,
+  coef = as.numeric(coef(fit_did_w)),
+  coef_names = names(coef(fit_did_w)),
+  vcov_cr2 = as.numeric(vcov_did_w_cr2),
+  vcov_cr2_shape = dim(vcov_did_w_cr2),
+  dof_cr2 = as.numeric(ct_did_w_cr2$df_Satt)
+)
+
+# ---- Scenario: weighted_mpd_avg_att_dof -------------------------------------
+# MPD-style analytical CR2 contrast-DOF parity fixture: 15 units x 4 periods,
+# MPD parameterization with analytics weights + cluster=unit. Compound
+# contrast = post-period-average ATT. Pins the clubSandwich `Wald_test(test=
+# "HTZ")$df_denom` against `_compute_cr2_bm_contrast_dof(weights=)` on an
+# MPD-shaped design. NOTE: like `weighted_did_absorbed_fe`, this fixture
+# targets the analytical surface only — estimator-level
+# `MultiPeriodDiD(survey_design=...)` paths use TSL.
+set.seed(50600)
+d_mpd_w <- make_mpd_panel(n_total = 15, units_per_cohort = 5, n_periods = 4,
+                          seed = 50601)
+d_mpd_w$period_f <- relevel(factor(d_mpd_w$period), ref = "1")
+for (p in 2:4) {
+  d_mpd_w[[paste0("treated_period_", p)]] <-
+    d_mpd_w$treated * (d_mpd_w$period == p)
+}
+unit_w_mpd <- runif(15, min = 0.5, max = 2.5)
+d_mpd_w$weights <- unit_w_mpd[d_mpd_w$unit]
+fit_mpd_w <- lm(y ~ treated + period_f +
+                    treated_period_2 + treated_period_3 + treated_period_4 +
+                    factor(unit),
+                data = d_mpd_w, weights = d_mpd_w$weights)
+vcov_mpd_w_cr2 <- vcovCR(fit_mpd_w, cluster = d_mpd_w$unit, type = "CR2")
+ct_mpd_w_cr2 <- coef_test(fit_mpd_w, vcov = vcov_mpd_w_cr2)
+# Compound contrast: post-period-average over treated_period_{2,3,4}.
+all_coef_names_w <- names(coef(fit_mpd_w))
+n_coef_w <- length(all_coef_names_w)
+c_avg_vec_w <- setNames(rep(0, n_coef_w), all_coef_names_w)
+post_names_w <- c("treated_period_2", "treated_period_3", "treated_period_4")
+c_avg_vec_w[post_names_w] <- 1 / length(post_names_w)
+finite_mask_w <- !is.na(coef(fit_mpd_w))
+c_avg_kept_w <- c_avg_vec_w[finite_mask_w]
+dof_avg_w <- Wald_test(
+  fit_mpd_w,
+  constraints = matrix(c_avg_kept_w, 1),
+  vcov = vcov_mpd_w_cr2,
+  test = "HTZ"
+)$df_denom
+output$weighted_mpd_avg_att_dof <- list(
+  unit = d_mpd_w$unit,
+  period = d_mpd_w$period,
+  treated = d_mpd_w$treated,
+  y = d_mpd_w$y,
+  weights = d_mpd_w$weights,
+  cluster = d_mpd_w$unit,
+  coef = as.numeric(coef(fit_mpd_w)),
+  coef_names = all_coef_names_w,
+  finite_coef_names = all_coef_names_w[finite_mask_w],
+  vcov_cr2 = as.numeric(vcov_mpd_w_cr2),
+  vcov_cr2_shape = dim(vcov_mpd_w_cr2),
+  dof_per_coef = as.numeric(ct_mpd_w_cr2$df_Satt),
+  c_avg = as.numeric(c_avg_kept_w),
+  dof_avg = unname(dof_avg_w),
+  post_interaction_names = post_names_w,
+  reference_period = 1L,
+  n_post_periods = length(post_names_w)
+)
+
 output$meta <- list(
   source = "clubSandwich",
   clubSandwich_version = as.character(packageVersion("clubSandwich")),
   R_version = R.version.string,
   generated_at = format(Sys.time(), tz = "UTC", usetz = TRUE),
-  note = "CR2 Bell-McCaffrey cluster-robust parity target for diff_diff._compute_cr2_bm"
+  note = paste0(
+    "CR2 Bell-McCaffrey cluster-robust parity target for ",
+    "diff_diff._compute_cr2_bm. Unweighted scenarios pin against ",
+    "_compute_cr2_bm / _compute_bm_dof_oneway; weighted scenarios pin ",
+    "the clubSandwich WLS-CR2 port (W not sqrt(W), W^2 bias term, ",
+    "unweighted residuals)."
+  )
 )
 
 out_path <- file.path("benchmarks", "data", "clubsandwich_cr2_golden.json")