Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ Research-informed candidates. Each has a rationale, a tractability note, and a c

### Methodology extensions

- **DiD with no untreated group** (de Chaisemartin, Ciccia, D'Haultfœuille & Knau, arXiv:2405.04465, 2024, plus continuous-treatment-with-no-stayers companion, AEA P&P 2024). New estimator for designs where treatment is universal with heterogeneous dose (the inverse of the few-treated-many-donors case). Uses quasi-untreated units as controls. No existing diff-diff estimator handles this. Tractability: medium; closed-form identification. **Commit when**: methodology plan drafted and validated against the paper's Pierce (2016) solar-panel replication.
- **DiD with no untreated group** (de Chaisemartin, Ciccia, D'Haultfœuille & Knau, arXiv:2405.04465, 2024, plus continuous-treatment-with-no-stayers companion, AEA P&P 2024). New estimator for designs where treatment is universal with heterogeneous dose (the inverse of the few-treated-many-donors case). Uses quasi-untreated units as controls. No existing diff-diff estimator handles this. Tractability: medium; closed-form identification. **Status (2026-04-18):** methodology plan approved; paper review at `docs/methodology/papers/dechaisemartin-2026-review.md`, REGISTRY stub at `docs/methodology/REGISTRY.md#heterogeneousadoptiondid`, class name `HeterogeneousAdoptionDiD`, implementation queued across 7 phased PRs. **Commit when**: methodology plan drafted and validated against the paper's Pierce and Schott (2016) PNTR manufacturing-employment replication (Figure 2).
- **Nonparametric / flexible outcome regression for `EfficientDiD` DR covariate path** (Chen, Sant'Anna & Xie, arXiv:2506.17729, 2025, Section 4). The shipped staggered `EfficientDiD` uses a linear OLS outcome regression in its doubly-robust covariate path; that preserves DR consistency but does not generically attain the semiparametric efficiency bound unless the conditional mean is linear in the covariates. Replacing the OLS outcome regression with sieve / kernel / ML nuisance estimation (as the paper's Section 4 allows) would close the efficiency gap on the covariate path. Tractability: medium; the hook points are in `diff_diff/efficient_did_covariates.py`. **Commit when**: a paper-review synthesis is written, with an implementation plan for the nonparametric OR that preserves the existing DR consistency guarantees and survey-weighted variance surface.
- **Distributional DiD for staggered timing** (Ciaccio, arXiv:2408.01208, 2024). New estimator extending Callaway-Li QTT to staggered adoption. `CallawaySantAnna` currently gives mean ATT only; this unlocks quantile effects. Tractability: medium. **Commit when**: a health-econ or public-health user reports need for quantile effects in a repeated-cross-section design.
- **Local Projections DiD** (Dube, Girardi, Jordà & Taylor, JAE 2025). New estimator with flexible impulse-response and robustness to dynamic misspecification; natural for anticipation-prone settings. Tractability: well-scoped. **Commit when**: a methodology review confirms the dynamic variant's variance derivation fits our SE helpers.
Expand Down
8 changes: 8 additions & 0 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,11 @@ Deferred items from PR reviews that were not addressed before merge.
| WooldridgeDiD: aggregation weights use cell-level n_{g,t} counts. Paper (W2025 Eqs. 7.2-7.4) defines cohort-share weights. Add optional `weights="cohort_share"` parameter to `aggregate()`. | `wooldridge_results.py` | #216 | Medium |
| WooldridgeDiD: canonical link requirement (W2023 Prop 3.1) not enforced — no warning if user applies wrong method to outcome type. Estimator is consistent regardless, but equivalence with imputation breaks. | `wooldridge.py` | #216 | Low |
| WooldridgeDiD: Stata `jwdid` golden value tests — add R/Stata reference script and `TestReferenceValues` class. | `tests/test_wooldridge.py` | #216 | Medium |
| Thread `vcov_type` (classical / hc1 / hc2 / hc2_bm) through the 8 standalone estimators that expose `cluster=`: `CallawaySantAnna`, `SunAbraham`, `ImputationDiD`, `TwoStageDiD`, `TripleDifference`, `StackedDiD`, `WooldridgeDiD`, `EfficientDiD`. Phase 1a added `vcov_type` to the `DifferenceInDifferences` inheritance chain only. | multiple | Phase 1a | Medium |
| Weighted one-way Bell-McCaffrey (`vcov_type="hc2_bm"` + `weights`, no cluster) currently raises `NotImplementedError`. `_compute_bm_dof_from_contrasts` builds its hat matrix from the unscaled design via `X (X'WX)^{-1} X' W`, but `solve_ols` solves the WLS problem by transforming to `X* = sqrt(w) X`, so the correct symmetric idempotent residual-maker is `M* = I - sqrt(W) X (X'WX)^{-1} X' sqrt(W)`. Rederive the Satterthwaite `(tr G)^2 / tr(G^2)` ratio on the transformed design and add weighted parity tests before lifting the guard. | `linalg.py::_compute_bm_dof_from_contrasts`, `linalg.py::_validate_vcov_args` | Phase 1a | Medium |
| HC2 / HC2 + Bell-McCaffrey on absorbed-FE fits currently raises `NotImplementedError` in three places: `TwoWayFixedEffects` unconditionally; `DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})`; `MultiPeriodDiD(absorb=..., vcov_type in {"hc2","hc2_bm"})`. Within-transformation preserves coefficients and residuals under FWL but not the hat matrix, so the reduced-design `h_ii` is not the diagonal of the full FE projection and CR2's block adjustment `A_g = (I - H_gg)^{-1/2}` is likewise wrong on absorbed cluster blocks. Lifting the guard needs HC2/CR2-BM computed from the full absorbed projection (unit/time FE dummies reconstructed internally, or a FE-aware hat-matrix formulation) and a parity harness against a full-dummy OLS run or R `fixest`/`clubSandwich`. HC1/CR1 are unaffected by this because they have no leverage term. | `twfe.py::fit`, `estimators.py::DifferenceInDifferences.fit`, `estimators.py::MultiPeriodDiD.fit` | Phase 1a | Medium |
| Weighted CR2 Bell-McCaffrey cluster-robust (`vcov_type="hc2_bm"` + `cluster_ids` + `weights`) currently raises `NotImplementedError`. Weighted hat matrix and residual rebalancing need threading per clubSandwich WLS handling. | `linalg.py::_compute_cr2_bm` | Phase 1a | Medium |
| Regenerate `benchmarks/data/clubsandwich_cr2_golden.json` from R (`Rscript benchmarks/R/generate_clubsandwich_golden.R`). Current JSON has `source: python_self_reference` as a stability anchor until an authoritative R run. | `benchmarks/R/generate_clubsandwich_golden.R` | Phase 1a | Medium |

#### Performance

Expand All @@ -85,6 +90,9 @@ Deferred items from PR reviews that were not addressed before merge.
| ImputationDiD event-study SEs recompute full conservative variance per horizon (should cache A0/A1 factorization) | `imputation.py` | #141 | Low |
| Rust faer SVD ndarray-to-faer conversion overhead (minimal vs SVD cost) | `rust/src/linalg.rs:67` | #115 | Low |
| Unrelated label events (e.g., adding `bug` label) re-trigger CI workflows when `ready-for-ci` is already present; filter `labeled`/`unlabeled` events to only `ready-for-ci` transitions | `.github/workflows/rust-test.yml`, `notebooks.yml` | #269 | Low |
| `bread_inv` as a performance kwarg on `compute_robust_vcov` to avoid re-inverting `(X'WX)` when the caller already has it. Deferred from Phase 1a for scope. HC2 and HC2+BM both need the bread inverse, so a shared hint would save one `np.linalg.solve` per sandwich. | `linalg.py::compute_robust_vcov` | Phase 1a | Low |
| Rust-backend HC2 implementation. Current Rust path only supports HC1; HC2 and CR2 Bell-McCaffrey fall through to the NumPy backend. For large-n fits this is noticeable. | `rust/src/linalg.rs` | Phase 1a | Low |
| CR2 Bell-McCaffrey DOF uses a naive `O(n² k)` per-coefficient loop over cluster pairs. Pustejovsky-Tipton (2018) Appendix B has a scores-based formulation that avoids the full `n × n` `M` matrix. Switch when a user hits a large-`n` cluster-robust design. | `linalg.py::_compute_cr2_bm` | Phase 1a | Low |

#### Testing/Docs

Expand Down
82 changes: 82 additions & 0 deletions benchmarks/R/generate_clubsandwich_golden.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# Generate CR2 Bell-McCaffrey golden values via R clubSandwich.
#
# This script is the parity source for CR2 Bell-McCaffrey cluster-robust
# inference implemented in diff_diff/linalg.py::_compute_cr2_bm.
#
# Usage:
# Rscript benchmarks/R/generate_clubsandwich_golden.R
#
# Requirements:
# clubSandwich (CRAN), jsonlite, readr
#
# Output:
# benchmarks/data/clubsandwich_cr2_golden.json
#
# Phase 1a of the HeterogeneousAdoptionDiD implementation (de Chaisemartin,
# Ciccia, D'Haultfoeuille & Knau 2026, arXiv:2405.04465v6). The parity
# dataset below consists of three small deterministic designs; the Python
# test at tests/test_linalg_hc2_bm.py::TestCR2BMParityClubSandwich loads
# this JSON and checks agreement to 6 digits.

suppressPackageStartupMessages({
library(clubSandwich)
library(jsonlite)
})

set.seed(20260420)

# --- Three deterministic datasets ---------------------------------------------

make_dataset <- function(name, n_clusters, cluster_sizes, seed) {
set.seed(seed)
cluster_ids <- rep(seq_len(n_clusters), times = cluster_sizes)
n <- length(cluster_ids)
x <- runif(n, 0, 1)
# Cluster-level shock to induce within-cluster correlation, plus idiosyncratic noise.
shock <- rnorm(n_clusters, sd = 0.5)
y <- 1 + 0.5 * x + shock[cluster_ids] + rnorm(n, sd = 0.2)
data.frame(name = name, cluster = cluster_ids, x = x, y = y)
}

datasets <- list(
balanced_small = make_dataset("balanced_small", 5, rep(6, 5), 101),
unbalanced_medium = make_dataset("unbalanced_medium", 8, c(3, 4, 5, 6, 7, 8, 9, 10), 202),
singletons_present = make_dataset("singletons_present", 10, c(1, 1, 2, 3, 4, 5, 6, 7, 8, 9), 303)
)

output <- list()

for (nm in names(datasets)) {
d <- datasets[[nm]]
fit <- lm(y ~ x, data = d)
vcov_cr2 <- vcovCR(fit, cluster = d$cluster, type = "CR2")
# Per-contrast Bell-McCaffrey DOF: one per coefficient via a unit contrast.
coef_names <- names(coef(fit))
dof_vec <- sapply(coef_names, function(nm_coef) {
ctr <- setNames(as.numeric(names(coef(fit)) == nm_coef), names(coef(fit)))
Wald_test(fit, constraints = matrix(ctr, 1), vcov = vcov_cr2, test = "Satterthwaite")$df
})
output[[nm]] <- list(
x = d$x,
y = d$y,
cluster = d$cluster,
coef = as.numeric(coef(fit)),
coef_names = coef_names,
vcov_cr2 = as.numeric(vcov_cr2),
vcov_shape = dim(vcov_cr2),
dof_bm = as.numeric(dof_vec),
cluster_sizes = as.numeric(table(d$cluster))
)
}

output$meta <- list(
source = "clubSandwich",
clubSandwich_version = as.character(packageVersion("clubSandwich")),
R_version = R.version.string,
generated_at = format(Sys.time(), tz = "UTC", usetz = TRUE),
note = "CR2 Bell-McCaffrey cluster-robust parity target for diff_diff._compute_cr2_bm"
)

out_path <- file.path("benchmarks", "data", "clubsandwich_cr2_golden.json")
writeLines(toJSON(output, pretty = TRUE, digits = 15, auto_unbox = TRUE), out_path)
cat("Wrote", out_path, "\n")
Loading
Loading