Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md

Large diffs are not rendered by default.

12 changes: 9 additions & 3 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,8 +101,14 @@ Deferred items from PR reviews that were not addressed before merge.
| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 7 standalone estimators that expose `cluster=` but not yet `vcov_type=`: `CallawaySantAnna`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added the chain to DiD/MPD/TWFE; Phase 1b PR 1/8 added `SunAbraham` (this row tracks the remaining 7). | multiple | Phase 1b | Medium |
| Extend `SunAbraham` with `vcov_type="conley"` (Conley spatial-HAC) as a first-class feature: thread `conley_coords` / `conley_cutoff_km` / `conley_metric` / `conley_kernel` / `conley_time` / `conley_unit` / `conley_lag_cutoff` through `_fit_saturated_regression`. Phase 1b PR 1/8 deferred this; SA currently rejects `vcov_type="conley"` at `__init__` with a deferral message. | `diff_diff/sun_abraham.py` | follow-up | Medium |
| Harmonize SunAbraham's HC1 within-transform finite-sample correction with `fixest::sunab()`. SA's `solve_ols` applies `n / (n - k_dm)` (within-transform columns only); fixest applies `n / (n - k_total)` (counts absorbed FE). SE values differ by ~1-2% on typical panel sizes (documented in REGISTRY.md "Deviation from R"; pinned at `atol=5e-3` in `tests/test_methodology_sun_abraham.py`). Either thread `df_adjustment` into the vcov scaling or document as an intentional difference. | `diff_diff/sun_abraham.py`, `diff_diff/linalg.py::compute_robust_vcov` | follow-up | Low |
| Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
| Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. | `linalg.py::_compute_cr2_bm` | Phase 1a | Medium |
<!-- Rows 104-105 LIFTED 2026-05-20 via the clubSandwich WLS-CR2 port. The diff-diff
form matches clubSandwich's specific algebra (W not sqrt(W), W^2 in bias term,
unweighted residuals) rather than the textbook PT2018 transform-once derivation
which historically diverged by 0.5-30%. Pinned at atol=1e-10 on 6 weighted
scenarios in benchmarks/data/clubsandwich_cr2_golden.json. -->
| ~~Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster)~~ — LIFTED via clubSandwich WLS-CR2 port. Routes through `_compute_cr2_bm_contrast_dof` with singleton-cluster reduction (each obs is its own cluster). See REGISTRY.md Phase 1a row for the algebra. | `linalg.py` | LIFTED | — |
| ~~Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`)~~ — LIFTED via clubSandwich WLS-CR2 port. `_compute_cr2_bm` now accepts `weights=` and threads through clubSandwich's specific WLS-CR2 algebra. | `linalg.py::_compute_cr2_bm` | LIFTED | — |
| `LinearRegression.fit()` pays the CR2 cost twice on the weighted `hc2_bm` path: once inside `solve_ols(..., return_vcov=True)` and again via `compute_robust_vcov(..., return_dof=True)` to populate `_bm_dof`. Correct but redundant. Fix: thread `return_dof` through `solve_ols` so the same CR2 computation produces both vcov + DOF, or cache the per-cluster `A_g` / `MUWTWUM` precomputes between calls. CI codex P3 on PR #475. | `linalg.py::LinearRegression.fit`, `linalg.py::solve_ols` | PR #475 follow-up | Low |
| `TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})` with replicate-weight survey designs raises `NotImplementedError` (`twfe.py:~233`). The replicate path re-demeans per replicate (re-demeaning depends on the per-replicate weight vector), which doesn't compose with the full-dummy HC2/HC2-BM build — a correct implementation would need per-replicate full-dummy refit. Workaround: use `vcov_type="hc1"` for replicate-weight CR1. | `twfe.py::fit` | follow-up | Low |
| TWFE's HC2/HC2-BM inline full-dummy build (`twfe.py:280-315`) duplicates the dummy-construction logic in `DifferenceInDifferences(fixed_effects=...)` (`estimators.py:478-486`). Extract a shared helper (or delegate TWFE's HC2/HC2-BM path to DiD's `fixed_effects=` branch, with TWFE-specific cluster default threading) to reduce drift risk on FE naming, survey behavior, and result-surface conventions. Substantive refactor — touches both estimators. | `twfe.py::fit`, `estimators.py::DifferenceInDifferences.fit` | follow-up | Low |
| Unify Rust local-method `estimate_model` solver path to `solve_wls_svd` (the same SVD helper used by the global-method since PR #348) for sub-1e-14 bootstrap SE parity. Current local-method bootstrap parity test (`tests/test_rust_backend.py::TestTROPRustEdgeCaseParity::test_bootstrap_seed_reproducibility_local`) passes at `atol=1e-5` — the residual ~1e-7 gap is roundoff between Rust's `estimate_model` matrix factorization and numpy's `lstsq`, which accumulates differently across per-replicate bootstrap fits. Main-fit ATT parity is regime-dependent (`atol=1e-14` for `lambda_nn=inf`, `atol=1e-10` for finite `lambda_nn` — see `test_local_method_main_fit_parity`); the bootstrap gap is a same-solver-path roundoff concern and not a user-visible correctness bug. | `rust/src/trop.rs::estimate_model`, `rust/src/linalg.rs::solve_wls_svd` | follow-up | Low |
Expand Down Expand Up @@ -201,7 +207,7 @@ Ordered paydown view across the tables above. Tier A → D is by effort × risk,
#### Tier C — Heavy / derivation required

- HonestDiD Δ^RM ARP conditional/hybrid confidence sets (`honest_did.py`)
- Weighted one-way Bell-McCaffrey + weighted CR2 Bell-McCaffrey (linalg derivations + R parity harness) (`linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_compute_cr2_bm`)
- ~~Weighted one-way Bell-McCaffrey + weighted CR2 Bell-McCaffrey~~ — LIFTED via the clubSandwich WLS-CR2 port (PR #475); see rows 109-110 above for the closed rows.
- Multi-absorb weighted demeaning: alternating-projection iteration for N>1 absorb + weights (`estimators.py`)
- ImputationDiD dense `(A0'A0).toarray()` OOM: alternative dense fallback or richer sparse strategy (`imputation.py:1531`)
- HAD mass-point `vcov_type ∈ {hc2, hc2_bm}`: 2SLS-specific leverage derivation (`had.py::_fit_mass_point_2sls`)
Expand Down
233 changes: 232 additions & 1 deletion benchmarks/R/generate_clubsandwich_golden.R
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ suppressPackageStartupMessages({
library(jsonlite)
})

stopifnot(packageVersion("clubSandwich") >= "0.7.0")
set.seed(20260420)

# --- Three deterministic datasets ---------------------------------------------
Expand Down Expand Up @@ -468,12 +469,242 @@ output$sun_abraham_two_cohort <- list(
dof_bm_contrast_overall_unit = unname(sa_dof_bm_overall_unit)
)

# =============================================================================
# Weighted scenarios (clubSandwich WLS-CR2 port)
# =============================================================================
# Pin diff-diff's weighted CR2-BM port against clubSandwich's specific WLS-CR2
# algebra (R/CR-adjustments.R::CR2 + R/get_arrays.R::get_GH + coef_test.R).
# Verified algorithm uses W (not sqrt(W)) in the hat matrix, W^2 in the bias
# correction term, and unweighted residuals in the score construction.

# ---- Scenario: weighted_one_way_no_cluster ----------------------------------
# 12 observations, single "cluster" via vcovCR(cluster=1:n, type="CR2") for the
# one-way HC2-BM reduction. Heteroskedastic weights.
set.seed(50100)
d_w_oneway <- data.frame(
x = c(-1.5, -1.0, -0.5, 0.0, 0.5, 1.0, 1.5, 2.0, -0.8, 0.3, 1.2, -1.2),
z = c(0.2, -0.3, 0.5, -0.1, 0.4, -0.5, 0.6, -0.2, 0.1, -0.4, 0.3, 0.5)
)
d_w_oneway$y <- 1 + 0.5 * d_w_oneway$x + 0.3 * d_w_oneway$z +
rnorm(nrow(d_w_oneway), sd = 0.5)
w_oneway <- c(0.5, 0.5, 1, 1, 1.5, 1.5, 2, 2, 2.5, 2.5, 3, 3)
fit_w_oneway <- lm(y ~ x + z, data = d_w_oneway, weights = w_oneway)
vcov_w_oneway <- vcovCR(fit_w_oneway,
cluster = seq_len(nrow(d_w_oneway)), type = "CR2")
ct_w_oneway <- coef_test(fit_w_oneway, vcov = vcov_w_oneway)
output$weighted_one_way_no_cluster <- list(
x = d_w_oneway$x,
z = d_w_oneway$z,
y = d_w_oneway$y,
weights = w_oneway,
coef = as.numeric(coef(fit_w_oneway)),
coef_names = names(coef(fit_w_oneway)),
vcov_hc2_bm = as.numeric(vcov_w_oneway),
vcov_hc2_bm_shape = dim(vcov_w_oneway),
dof_bm_one_way = as.numeric(ct_w_oneway$df_Satt),
se_hc2_bm = as.numeric(ct_w_oneway$SE)
)

# ---- Scenario: weighted_balanced_clusters -----------------------------------
# 20 observations, 4 clusters of 5; weights vary within and across clusters.
set.seed(50200)
n_w_bal <- 20
cluster_w_bal <- rep(1:4, each = 5)
d_w_bal <- data.frame(
cluster = cluster_w_bal,
x = rnorm(n_w_bal),
z = rnorm(n_w_bal)
)
d_w_bal$y <- 1 + 0.5 * d_w_bal$x + 0.3 * d_w_bal$z + rnorm(n_w_bal, sd = 0.5)
w_bal <- runif(n_w_bal, min = 0.5, max = 3.0)
fit_w_bal <- lm(y ~ x + z, data = d_w_bal, weights = w_bal)
vcov_w_bal <- vcovCR(fit_w_bal, cluster = d_w_bal$cluster, type = "CR2")
ct_w_bal <- coef_test(fit_w_bal, vcov = vcov_w_bal)
output$weighted_balanced_clusters <- list(
x = d_w_bal$x,
z = d_w_bal$z,
y = d_w_bal$y,
weights = w_bal,
cluster = d_w_bal$cluster,
coef = as.numeric(coef(fit_w_bal)),
coef_names = names(coef(fit_w_bal)),
vcov_cr2 = as.numeric(vcov_w_bal),
vcov_cr2_shape = dim(vcov_w_bal),
dof_bm = as.numeric(ct_w_bal$df_Satt),
se_cr2 = as.numeric(ct_w_bal$SE)
)

# ---- Scenario: weighted_unbalanced_clusters ---------------------------------
# 52 observations, 8 clusters of sizes 3-10. Heteroskedastic weights.
set.seed(50300)
cluster_sizes_unbal <- c(3, 4, 5, 6, 7, 8, 9, 10)
cluster_w_unbal <- rep(1:8, times = cluster_sizes_unbal)
n_w_unbal <- length(cluster_w_unbal)
d_w_unbal <- data.frame(
cluster = cluster_w_unbal,
x = rnorm(n_w_unbal),
z = rnorm(n_w_unbal)
)
shock_unbal <- rnorm(8, sd = 0.3)
d_w_unbal$y <- 1 + 0.5 * d_w_unbal$x + 0.3 * d_w_unbal$z +
shock_unbal[d_w_unbal$cluster] + rnorm(n_w_unbal, sd = 0.4)
w_unbal <- runif(n_w_unbal, min = 0.3, max = 3.0)
fit_w_unbal <- lm(y ~ x + z, data = d_w_unbal, weights = w_unbal)
vcov_w_unbal <- vcovCR(fit_w_unbal, cluster = d_w_unbal$cluster, type = "CR2")
ct_w_unbal <- coef_test(fit_w_unbal, vcov = vcov_w_unbal)
output$weighted_unbalanced_clusters <- list(
x = d_w_unbal$x,
z = d_w_unbal$z,
y = d_w_unbal$y,
weights = w_unbal,
cluster = d_w_unbal$cluster,
coef = as.numeric(coef(fit_w_unbal)),
coef_names = names(coef(fit_w_unbal)),
vcov_cr2 = as.numeric(vcov_w_unbal),
vcov_cr2_shape = dim(vcov_w_unbal),
dof_bm = as.numeric(ct_w_unbal$df_Satt),
se_cr2 = as.numeric(ct_w_unbal$SE),
cluster_sizes = as.numeric(table(d_w_unbal$cluster))
)

# ---- Scenario: weighted_singletons_present ----------------------------------
# Adversarial: prior PT2018 transform-once derivation hit ~30% gap on
# singleton-cluster scenarios. Verifies the clubSandwich port handles this.
set.seed(50400)
cluster_sizes_sing <- c(1, 1, 2, 3, 4, 5, 6, 6, 4, 3)
cluster_w_sing <- rep(1:10, times = cluster_sizes_sing)
n_w_sing <- length(cluster_w_sing)
d_w_sing <- data.frame(
cluster = cluster_w_sing,
x = rnorm(n_w_sing),
z = rnorm(n_w_sing)
)
shock_sing <- rnorm(10, sd = 0.3)
d_w_sing$y <- 1 + 0.5 * d_w_sing$x + 0.3 * d_w_sing$z +
shock_sing[d_w_sing$cluster] + rnorm(n_w_sing, sd = 0.4)
w_sing <- runif(n_w_sing, min = 0.3, max = 3.0)
fit_w_sing <- lm(y ~ x + z, data = d_w_sing, weights = w_sing)
vcov_w_sing <- vcovCR(fit_w_sing, cluster = d_w_sing$cluster, type = "CR2")
ct_w_sing <- coef_test(fit_w_sing, vcov = vcov_w_sing)
output$weighted_singletons_present <- list(
x = d_w_sing$x,
z = d_w_sing$z,
y = d_w_sing$y,
weights = w_sing,
cluster = d_w_sing$cluster,
coef = as.numeric(coef(fit_w_sing)),
coef_names = names(coef(fit_w_sing)),
vcov_cr2 = as.numeric(vcov_w_sing),
vcov_cr2_shape = dim(vcov_w_sing),
dof_bm = as.numeric(ct_w_sing$df_Satt),
dof_per_coef = as.numeric(ct_w_sing$df_Satt),
se_cr2 = as.numeric(ct_w_sing$SE),
cluster_sizes = as.numeric(table(d_w_sing$cluster))
)

# ---- Scenario: weighted_did_absorbed_fe -------------------------------------
# DiD-style analytical CR2 design-matrix parity fixture: 8 units x 4 periods,
# treat_post + unit + period FE, analytics weights varying by unit. Pins the
# clubSandwich WLS-CR2 analytical surface (compute_robust_vcov / solve_ols /
# LinearRegression direct callers) on a DiD-shaped design. NOTE: this is NOT
# a public-estimator `DifferenceInDifferences(survey_design=...)` parity
# fixture — estimator-level survey designs route through the TSL survey
# variance path, which takes precedence over the analytical CR2 sandwich.
set.seed(50500)
d_did_w <- make_did_panel(n_units = 8, n_periods = 4, treatment_period = 2,
seed = 50501)
# Unit-level weight (stratum-like): vary by unit, constant within unit-period.
unit_w_did <- runif(8, min = 0.5, max = 2.5)
d_did_w$weights <- unit_w_did[d_did_w$unit_int]
fit_did_w <- lm(y ~ treat_post + unit + period, data = d_did_w,
weights = d_did_w$weights)
vcov_did_w_cr2 <- vcovCR(fit_did_w, cluster = d_did_w$unit_int, type = "CR2")
ct_did_w_cr2 <- coef_test(fit_did_w, vcov = vcov_did_w_cr2)
output$weighted_did_absorbed_fe <- list(
unit = d_did_w$unit_int,
period = d_did_w$period_int,
treated = d_did_w$treated,
post = d_did_w$post,
treat_post = d_did_w$treat_post,
y = d_did_w$y,
weights = d_did_w$weights,
coef = as.numeric(coef(fit_did_w)),
coef_names = names(coef(fit_did_w)),
vcov_cr2 = as.numeric(vcov_did_w_cr2),
vcov_cr2_shape = dim(vcov_did_w_cr2),
dof_cr2 = as.numeric(ct_did_w_cr2$df_Satt)
)

# ---- Scenario: weighted_mpd_avg_att_dof -------------------------------------
# MPD-style analytical CR2 contrast-DOF parity fixture: 15 units x 4 periods,
# MPD parameterization with analytics weights + cluster=unit. Compound
# contrast = post-period-average ATT. Pins the clubSandwich `Wald_test(test=
# "HTZ")$df_denom` against `_compute_cr2_bm_contrast_dof(weights=)` on an
# MPD-shaped design. NOTE: like `weighted_did_absorbed_fe`, this fixture
# targets the analytical surface only — estimator-level
# `MultiPeriodDiD(survey_design=...)` paths use TSL.
set.seed(50600)
d_mpd_w <- make_mpd_panel(n_total = 15, units_per_cohort = 5, n_periods = 4,
seed = 50601)
d_mpd_w$period_f <- relevel(factor(d_mpd_w$period), ref = "1")
for (p in 2:4) {
d_mpd_w[[paste0("treated_period_", p)]] <-
d_mpd_w$treated * (d_mpd_w$period == p)
}
unit_w_mpd <- runif(15, min = 0.5, max = 2.5)
d_mpd_w$weights <- unit_w_mpd[d_mpd_w$unit]
fit_mpd_w <- lm(y ~ treated + period_f +
treated_period_2 + treated_period_3 + treated_period_4 +
factor(unit),
data = d_mpd_w, weights = d_mpd_w$weights)
vcov_mpd_w_cr2 <- vcovCR(fit_mpd_w, cluster = d_mpd_w$unit, type = "CR2")
ct_mpd_w_cr2 <- coef_test(fit_mpd_w, vcov = vcov_mpd_w_cr2)
# Compound contrast: post-period-average over treated_period_{2,3,4}.
all_coef_names_w <- names(coef(fit_mpd_w))
n_coef_w <- length(all_coef_names_w)
c_avg_vec_w <- setNames(rep(0, n_coef_w), all_coef_names_w)
post_names_w <- c("treated_period_2", "treated_period_3", "treated_period_4")
c_avg_vec_w[post_names_w] <- 1 / length(post_names_w)
finite_mask_w <- !is.na(coef(fit_mpd_w))
c_avg_kept_w <- c_avg_vec_w[finite_mask_w]
dof_avg_w <- Wald_test(
fit_mpd_w,
constraints = matrix(c_avg_kept_w, 1),
vcov = vcov_mpd_w_cr2,
test = "HTZ"
)$df_denom
output$weighted_mpd_avg_att_dof <- list(
unit = d_mpd_w$unit,
period = d_mpd_w$period,
treated = d_mpd_w$treated,
y = d_mpd_w$y,
weights = d_mpd_w$weights,
cluster = d_mpd_w$unit,
coef = as.numeric(coef(fit_mpd_w)),
coef_names = all_coef_names_w,
finite_coef_names = all_coef_names_w[finite_mask_w],
vcov_cr2 = as.numeric(vcov_mpd_w_cr2),
vcov_cr2_shape = dim(vcov_mpd_w_cr2),
dof_per_coef = as.numeric(ct_mpd_w_cr2$df_Satt),
c_avg = as.numeric(c_avg_kept_w),
dof_avg = unname(dof_avg_w),
post_interaction_names = post_names_w,
reference_period = 1L,
n_post_periods = length(post_names_w)
)

output$meta <- list(
source = "clubSandwich",
clubSandwich_version = as.character(packageVersion("clubSandwich")),
R_version = R.version.string,
generated_at = format(Sys.time(), tz = "UTC", usetz = TRUE),
note = "CR2 Bell-McCaffrey cluster-robust parity target for diff_diff._compute_cr2_bm"
note = paste0(
"CR2 Bell-McCaffrey cluster-robust parity target for ",
"diff_diff._compute_cr2_bm. Unweighted scenarios pin against ",
"_compute_cr2_bm / _compute_bm_dof_oneway; weighted scenarios pin ",
"the clubSandwich WLS-CR2 port (W not sqrt(W), W^2 bias term, ",
"unweighted residuals)."
)
)

out_path <- file.path("benchmarks", "data", "clubsandwich_cr2_golden.json")
Expand Down
Loading
Loading