Skip to content

DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally#458

Merged
igerber merged 6 commits into
mainfrom
wave-5-hc2-cr2-bm-extensions
May 17, 2026
Merged

DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally#458
igerber merged 6 commits into
mainfrom
wave-5-hc2-cr2-bm-extensions

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 16, 2026

Summary

  • Lift DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) NotImplementedError at diff_diff/estimators.py:382 (was line 373). Auto-route promotes absorb columns to fixed_effects= internally for HC2/HC2-BM fits so the existing full-dummy-design code path computes the algebraically correct vcov.
  • Empirical methodology gate: read clubSandwich source (R/CR-adjustments.R) before writing code. Confirmed the unweighted CR2 algebra (A_g = (I - H_gg)^{-1/2} with H on the full model matrix) is what diff-diff's existing _compute_cr2_bm already produces. Singleton-cluster CR2 trick (cluster=1:n) reduces to one-way HC2-BM Satterthwaite DOF.
  • R-parity at ~1e-10 vs lm() + sandwich::vcovHC(type="HC2") and lm() + clubSandwich::vcovCR(cluster=..., type="CR2") via new absorbed_fe_did scenario in benchmarks/data/clubsandwich_cr2_golden.json and new tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity test class.
  • TODO row 100 partial drain: DiD-absorb sub-gate addressed; TWFE and MPD-absorb sub-gates remain as documented follow-ups (different fit-path structure).

Methodology references (required if estimator / math changes)

  • Method name(s): HC2 leverage (Eicker-Huber-White), CR2 Bell-McCaffrey Satterthwaite DOF (Bell & McCaffrey 2002; Imbens & Kolesar 2016; Pustejovsky & Tipton 2018).
  • Paper / source link(s): Pustejovsky & Tipton (2018, J Business & Economic Statistics) §3.3 "absorbing form"; clubSandwich source at jepusto/clubSandwich/R/CR-adjustments.R.
  • Any intentional deviations from the source (and why): None. The auto-route produces algebra bit-equal to clubSandwich's vcovCR.lm(... type="CR2") for the unweighted case. Weighted variant (with the more elaborate Theta quadratic correction documented in CR-adjustments.R's inverse_var=FALSE branch) is deferred to a follow-up.

Validation

  • Tests added/updated:
    • New tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity class (2 tests) — R-parity at 1e-10 against absorbed_fe_did golden scenario.
    • tests/test_estimators_vcov_type.py::test_did_absorb_rejects_hc2_and_hc2_bm flipped from "raises NotImplementedError" to "auto-routes; matches fixed_effects= path bit-equal" (renamed test_did_absorb_hc2_and_hc2_bm_auto_route).
    • tests/test_linalg_hc2_bm.py::test_cr2_parity_with_golden skips scenarios that don't fit the simple y ~ x 2-column contract.
  • Backtest / simulation / notebook evidence (if applicable): N/A — no tutorial updates.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

Needs changes

Executive Summary

  • The core routing idea is methodologically plausible: HC2 depends on projection-matrix leverage, and CR2 depends on cluster-block hat-matrix adjustments; Pustejovsky-Tipton’s absorbed-form result supports the unweighted fixed-effects case, so recovering the full FE hat via a full-dummy path is consistent with the source material. citeturn1search15turn0search0turn1search14
  • DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) is now supported in code, but the official methodology registry still says that exact surface must raise NotImplementedError.
  • The new parity harness generates clustered CR2 and Bell-McCaffrey df targets, but the tests only pin the unclustered SE/ATT path and explicitly discard the df target.
  • No security issues stood out in the changed files.

Methodology

  • P1 Impact: the shipped methodology contract is now self-contradictory. docs/methodology/REGISTRY.md L2550-L2553 still documents DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) as rejected, while diff_diff/estimators.py L366-L386 now silently auto-routes that case through fixed_effects=. Under your review rubric, this is an undocumented methodology/default-behavior change, and the changelog note at CHANGELOG.md does not mitigate it. Concrete fix: replace the stale REGISTRY note with a new **Note:** describing the supported DiD auto-route, its unweighted scope, the remaining TWFE/MPD rejections, and the user-visible result-object behavior.

Code Quality

  • No material findings in the changed implementation beyond the contract issues below.

Performance

  • No blocker. The dummy-expansion cost is the expected tradeoff of this implementation choice.

Maintainability

  • P2 Impact: the auto-route changes more than the variance path. Once diff_diff/estimators.py L382-L386 flips absorb into fixed_effects, the stored coefficients, vcov, residuals, fitted_values, and r_squared all come from the full-dummy branch at diff_diff/estimators.py L635-L678. The changelog only documents result.coefficients, and existing absorb guidance still talks about within-transformed R² semantics (for example tests/test_methodology_did.py L1247-L1252). Concrete fix: either normalize the returned DiDResults back to the absorb-style surface, or document and regression-test the full set of changed fields in REGISTRY.

Tech Debt

  • No separate finding. Updating TODO.md to keep only the remaining TWFE/MPD gates is appropriate.

Security

  • No findings.

Documentation/Tests

  • P1 Impact: the new validation does not actually pin the newly opened clustered CR2 path or the Bell-McCaffrey df-based inference. The R generator writes both clustered CR2 targets and per-coefficient df at benchmarks/R/generate_clubsandwich_golden.R L97-L117, and the golden JSON carries vcov_cr2 / dof_cr2 at benchmarks/data/clubsandwich_cr2_golden.json L45-L48. But tests/test_estimators_vcov_type.py L1032-L1053 only checks unclustered hc2_bm SE/ATT, explicitly throws away expected_dof_slope, and never exercises DifferenceInDifferences(cluster="unit", vcov_type="hc2_bm").fit(..., absorb=[...]). That leaves a silent mis-inference lane for both clustered CR2 and BM p-value/CI propagation. Concrete fix: add estimator-level golden tests for the clustered absorb+CR2 path using vcov_cr2/dof_cr2, and assert one-way hc2_bm inference through p_value/conf_int (or equivalent df-sensitive quantities), not just se.

Path to Approval

  1. Update docs/methodology/REGISTRY.md so the DiD absorbed-FE HC2/HC2-BM surface is documented as supported via auto-route, with an explicit Note covering scope and remaining exclusions.
  2. Add an end-to-end golden test for DifferenceInDifferences(vcov_type="hc2_bm", cluster="unit").fit(..., absorb=["unit","period"]) against the stored CR2 targets.
  3. Add a df-sensitive regression test for one-way hc2_bm on the absorb path, and either document or normalize the broader DiDResults surface change (coefficients, vcov, residuals, fitted_values, r_squared).

igerber added a commit that referenced this pull request May 16, 2026
R1 review surfaced 2 P1 + 1 P2; all in-scope fixes.

**P1.1 — REGISTRY contradicted code.** REGISTRY.md:2552 still said
`DiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` raises NotImplementedError.
Replaced the blanket-rejection Note with a per-estimator status block:
DiD path is now SUPPORTED via auto-route (with the full DiDResults
surface change documented inline); TWFE and MultiPeriodDiD paths still
reject and are tracked as follow-ups.

**P1.2 — Parity tests missed clustered-CR2 and df-sensitive inference.**
The previous test class pinned only unclustered HC2-BM SE/ATT and
explicitly discarded the df target. Two new tests:
- `test_absorb_hc2_bm_clustered_matches_clubsandwich`: exercises
  `DiD(vcov_type="hc2_bm", cluster="unit").fit(..., absorb=[...])`
  against the R `vcovCR(..., cluster=d$unit, type="CR2")` target,
  asserting SE+ATT match at 1e-10.
- `test_absorb_hc2_bm_df_sensitive_inference`: asserts HC2 and HC2-BM
  give the SAME `se` but DIFFERENT `p_value` and `conf_int` (the BM
  Satterthwaite DOF must propagate to inference; CI width is wider
  under BM). This catches silent regressions where the auto-route
  passes SE through but uses n-k for inference.

**P2 — CHANGELOG only mentioned `result.coefficients`.** The auto-route
also affects `vcov`, `residuals`, `fitted_values`, `r_squared` (all
come from the full-dummy fit under the route; `r_squared` is computed
on the un-demeaned outcome and will typically be higher than the
within-R²). Extended the CHANGELOG entry with the full
`DiDResults`-surface contract change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: d88354606957eafe4bc4030ccdd2a0f2f23687f3


Overall Assessment

Looks good

Executive Summary

  • The prior P1 methodology-contract issue is resolved: the registry now documents the DiD absorbed-FE auto-route, its remaining scope limits, and the broader DiDResults surface change at docs/methodology/REGISTRY.md L2551-L2555; the changelog mirrors that at CHANGELOG.md.
  • The prior P1 validation gap is largely resolved: the PR adds absorbed-FE parity fixtures plus estimator-level checks for auto-route equivalence, clustered CR2, and df-sensitive HC2 vs HC2-BM inference at benchmarks/R/generate_clubsandwich_golden.R L72-L118, tests/test_estimators_vcov_type.py L668-L708, and tests/test_estimators_vcov_type.py L1032-L1136.
  • The routing choice itself is consistent with the cited methodology: Pustejovsky-Tipton extend CR2/BRL to fixed-effects models, including absorbed-FE application, and clubSandwich's CR2 path uses Satterthwaite degrees of freedom; the package also tests CR2 against HC behavior when all clusters are singletons. (jepusto.com)
  • [Newly identified] One non-blocking scope mismatch remains: the legacy multi-absorb survey-weight guard still fires before the new auto-route, so the blanket "SUPPORTED" wording is slightly broader than the current implementation.

Methodology

  • Severity: P2
    Impact: [Newly identified] DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) is documented as supported in docs/methodology/REGISTRY.md L2552-L2555, but DiD.fit() still raises for survey_design= with len(absorb) > 1 because the weighted multi-absorb guard at diff_diff/estimators.py L347-L355 runs before the auto-route at diff_diff/estimators.py L366-L386. The old survey regression test still locks in that rejection at tests/test_survey.py L2648-L2671. This is a scope/documentation mismatch, not a silent numbers bug.
    Concrete fix: Either move the auto-route ahead of the survey-weight guard when the fit is going to the full-dummy fixed_effects= path, or narrow the REGISTRY/CHANGELOG wording to exclude survey-weighted multi-absorb fits.

Code Quality

  • No findings.

Performance

  • No findings. The full-dummy cost is the explicit tradeoff of this implementation choice.

Maintainability

Tech Debt

  • No findings. Remaining absorbed-FE HC2/HC2-BM follow-up work is now correctly narrowed to TwoWayFixedEffects and MultiPeriodDiD in TODO.md.

Security

  • No findings.

Documentation/Tests

  • Severity: P3
    Impact: The new fixture stores one-way and clustered Satterthwaite targets at benchmarks/R/generate_clubsandwich_golden.R L97-L117, but the estimator tests only assert exact SE/ATT parity for the clustered CR2 path and only directional df sensitivity for the one-way BM path at tests/test_estimators_vcov_type.py L1032-L1136. Because BM df is computed centrally in diff_diff/linalg.py L2799-L2841 and consumed in diff_diff/linalg.py L3181-L3226, I do not see evidence of a current correctness bug; this is test hardening, not a blocker.
    Concrete fix: Optionally add exact p_value/CI or critical-value assertions against the stored dof_hc2_bm and dof_cr2 targets.
  • Reviewer note: I could not execute the suite in this environment because pytest and numpy are not installed.

igerber added a commit that referenced this pull request May 16, 2026
R2 review flagged that REGISTRY/CHANGELOG documented
`DiD(absorb=..., vcov_type in {hc2,hc2_bm})` as SUPPORTED, but the
legacy `len(absorb) > 1 + survey_weights` guard at estimators.py:347
fired BEFORE the auto-route, so weighted multi-absorb fits still raised.

The guard's rationale ("single-pass demeaning isn't the correct weighted
FWL projection for N>1 absorbed dimensions") doesn't apply when we're
auto-routing to fixed_effects= — the fixed_effects= path builds the
full-dummy design and solves WLS directly with no within-transform.

Reorder: move the auto-route block above the multi-absorb-survey guard.
The guard now only fires when absorb was NOT consumed by the auto-route
(i.e., hc1/classical/conley/etc. — paths that still demean).

Adds `test_absorb_hc2_bm_survey_multi_absorb_auto_routes` to pin the
new placement against silent regression. The existing `test_survey.py`
multi-absorb-survey rejection tests continue to pass (they use the
default vcov_type=hc1 path which still hits the guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 7d5ecd6d1889a1c565b116d30dc57ba3ff6d4f05


Overall Assessment

✅ Looks good

Executive Summary

  • The prior re-review scope issue is resolved: DifferenceInDifferences.fit() now auto-routes absorb to fixed_effects before the legacy multi-absorb survey guard, and the scope notes/tests were updated to match (diff_diff/estimators.py:L347-L382, tests/test_estimators_vcov_type.py:L1098-L1138, docs/methodology/REGISTRY.md:L2551-L2555, TODO.md:L100-L100).
  • The methodology change is consistent with the existing registry rationale: HC2/CR2 small-sample corrections are now sent through the full-dummy FE path for DiD, while MultiPeriodDiD and TwoWayFixedEffects remain explicitly out of scope (docs/methodology/REGISTRY.md:L2551-L2555).
  • Validation is materially better than in the last review: the PR adds a new clubSandwich golden scenario plus estimator-level checks for unclustered HC2-BM, clustered CR2, survey multi-absorb ordering, and df-sensitive HC2 vs HC2-BM inference (benchmarks/R/generate_clubsandwich_golden.R:L72-L118, tests/test_estimators_vcov_type.py:L981-L1178).
  • I did not find any new unmitigated P0 or P1 issues in the changed diff.
  • [Newly identified] One P3 documentation-precision issue remains around the survey path.

Methodology
No findings. The changed estimator path in diff_diff/estimators.py:L347-L382 matches the registry’s full-hat-matrix rationale for HC2 / HC2-BM on absorbed FE, and the updated scope split in docs/methodology/REGISTRY.md:L2551-L2555 is now internally consistent.

Code Quality
No findings.

Performance
No findings. The explicit full-dummy route is the intended tradeoff for obtaining the correct leverage/hat-matrix quantities.

Maintainability
No findings.

Tech Debt
No findings. The remaining absorbed-FE HC2/HC2-BM follow-up work is still correctly narrowed to TwoWayFixedEffects and MultiPeriodDiD in TODO.md:L100-L100.

Security
No findings.

Documentation/Tests

  • Severity: P3 [Newly identified]
    Impact: The new support wording in CHANGELOG.md:L11-L11 and docs/methodology/REGISTRY.md:L2552-L2555 reads as though absorb + hc2/hc2_bm now directly computes HC2/Bell-McCaffrey variance on every fit path. On survey fits, the library still bypasses analytical HC/CR variance and uses survey TSL or replicate variance instead (diff_diff/linalg.py:L2706-L2713, diff_diff/linalg.py:L2778-L2786, diff_diff/linalg.py:L2913-L2966; summary suppression rationale at diff_diff/results.py:L213-L221). This is not a correctness bug on the shipped path, but the wording is broader than the actual variance behavior.
    Concrete fix: Add one sentence to the CHANGELOG/REGISTRY note clarifying that survey_design= continues to use survey variance, and that the new auto-route only changes the FE handling / removes the old absorbed-FE rejection.

Reviewer note: I could not execute the automated tests here because pytest and numpy are not installed in this environment.

igerber added a commit that referenced this pull request May 16, 2026
R3 informational P3: REGISTRY/CHANGELOG wording could be read as
implying survey fits compute HC2/HC2-BM analytically. Survey fits
actually use Taylor-series linearization or replicate-weight variance
regardless of `vcov_type` — the auto-route only changes the FE
handling and removes the prior absorbed-FE reject.

Added one-sentence clarifications in both surfaces so the documentation
matches the variance dispatch in linalg.py / results.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 366ef8481b231e21321209cd02cc50ffc75519d6


Overall Assessment

✅ Looks good

Executive Summary

Methodology
No findings. The new route in diff_diff/estimators.py:L347-L376 matches the registry rationale in docs/methodology/REGISTRY.md:L2551-L2555: HC2 relies on model leverage (hatvalues), clubSandwich CR2 builds cluster adjustments from cluster hat blocks and an inverse square root, and Pustejovsky-Tipton explicitly discuss that BRL/CR2 adjustments for fixed-effects models depend on the full regression representation rather than a naively absorbed fit. citeturn1view0turn1view1turn1view2

Code Quality
No findings.

Performance
No findings. Routing to the full-dummy path is the expected tradeoff for a leverage-based vcov family.

Maintainability

  • Severity: P3 [Newly identified]
  • Impact: diff_diff/estimators.py:L347-L376 rewrites absorb into fixed_effects before the mixed-input guard, so DifferenceInDifferences(vcov_type="hc2"|"hc2_bm") now silently accepts absorb= together with fixed_effects=. That is probably numerically fine, but it drifts from the existing public contract still described in the fit() docstring at diff_diff/estimators.py:L254-L261 and the unchanged rejection tests at tests/test_survey.py:L3179-L3224.
  • Concrete fix: Either move the absorb and fixed_effects validation above the new auto-route to preserve the old API, or explicitly document and regression-test that mixed fixed_effects + absorb is intentionally supported on the HC2/HC2-BM full-dummy path.

Tech Debt
No findings. The remaining absorbed-FE HC2/HC2-BM work is correctly narrowed to MPD/TWFE in TODO.md:L100.

Security
No findings.

Documentation/Tests
No additional findings. The new coverage is strong for the changed lane, and the prior survey-scope wording issue is addressed. I could not execute the test suite in this environment because pytest and pandas are not installed.

igerber added a commit that referenced this pull request May 16, 2026
R4 informational P3: my prior reordering moved the auto-route ahead of
the multi-absorb-survey guard (correct) but ALSO left it ahead of the
existing `absorb + fixed_effects` mutual-exclusion check. On hc2/hc2_bm
the user-facing rejection vanished — the two args silently merged.

Move the `absorb + fixed_effects` validation ABOVE the auto-route so the
public-contract rejection fires regardless of vcov_type. Add a regression
test that pins the rejection across hc1/hc2/hc2_bm to prevent silent
regression on this contract.

The legacy multi-absorb + survey-weights guard stays BELOW the auto-route
(intentional from R2: when auto-routing, the demeaning rationale of that
guard doesn't apply).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3a8a20195508af1526d43e6c2618a5c9b4cc0f6f


Overall Assessment

✅ Looks good

Executive Summary

  • The prior re-review concern is resolved: the absorb + fixed_effects rejection now runs before the HC2/HC2-BM auto-route in diff_diff/estimators.py:L347-L361, and regression coverage was added in tests/test_estimators_vcov_type.py:L1098-L1127.
  • I did not find any unmitigated P0/P1 methodology defects in the changed DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) path. The implementation matches the registry rationale in docs/methodology/REGISTRY.md:L2551-L2555.
  • Validation is materially stronger: absorb-vs-fixed-effects equivalence is pinned in tests/test_estimators_vcov_type.py:L664-L708, CR2/HC2-BM golden coverage is added in tests/test_estimators_vcov_type.py:L1032-L1096, and the new R fixture is generated in benchmarks/R/generate_clubsandwich_golden.R:L72-L118.
  • Remaining absorbed-FE HC2/HC2-BM limitations are correctly narrowed to TwoWayFixedEffects and MultiPeriodDiD in TODO.md:L100 and docs/methodology/REGISTRY.md:L2553-L2555.
  • I could not execute the targeted tests here because the environment is missing pytest and numpy; this review is based on static diff/code inspection.

Methodology

No findings.

Code Quality

No findings.

Performance

No findings. Routing leverage-based HC2/CR2 work to the full-dummy design is the expected tradeoff for correctness here.

Maintainability

No findings.

Tech Debt

No findings. The remaining unsupported absorbed-FE HC2/HC2-BM surfaces are properly tracked in TODO.md:L100.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: CHANGELOG.md:L11 and docs/methodology/REGISTRY.md:L2552 say the new absorbed-FE DiD route was validated against both sandwich::vcovHC(type="HC2") and clubSandwich::vcovCR(...), but the committed R golden generator only materializes CR2-derived targets (vcov_did_hc2_bm and clustered vcov_did_cr2) in benchmarks/R/generate_clubsandwich_golden.R:L95-L117, and the new hc2 test in tests/test_estimators_vcov_type.py:L1055-L1063 only checks finite SE plus ATT parity, not an external HC2 SE target. That leaves the external HC2-parity claim under-pinned in-tree for this scenario.
Concrete fix: Add an R sandwich::vcovHC(type="HC2") target for absorbed_fe_did and assert the treat_post SE in tests/test_estimators_vcov_type.py:L1055-L1063, or narrow the changelog/registry wording so it only claims external golden parity for the CR2/HC2-BM lanes.

igerber added a commit that referenced this pull request May 16, 2026
R5 informational P3: CHANGELOG/REGISTRY claimed parity against BOTH
`sandwich::vcovHC(type="HC2")` AND `clubSandwich::vcovCR(...)`, but the
R generator only materialized CR2-derived targets for `absorbed_fe_did`.
The HC2-parity claim was verified in a throwaway smoke test but not
pinned in-tree.

Added `vcov_hc2` to the R generator output (computed via
`sandwich::vcovHC(fit_did, type = "HC2")`) and a corresponding Python
parity test `test_absorb_hc2_matches_sandwich_vcovhc` that asserts the
treat_post slope SE matches at 1e-10. Replaces the prior weaker
`test_absorb_hc2_matches_full_dummy_design` (which only checked
finite-SE + ATT parity).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 597d0fc48b35317663749da8d96d1eb2d685e156


Overall Assessment

✅ Looks good

Executive Summary

  • The prior HC2-parity gap is resolved: the R generator now materializes an external sandwich::vcovHC(type="HC2") target for the absorbed-FE DiD scenario, and the test suite now asserts the treat_post SE against it in benchmarks/R/generate_clubsandwich_golden.R:L72-L123 and tests/test_estimators_vcov_type.py:L1055-L1074.
  • The estimator change remains methodologically consistent with the registry: DifferenceInDifferences.fit() only auto-routes absorb to fixed_effects for vcov_type in {"hc2","hc2_bm"}, matching the documented full-hat-matrix rationale in diff_diff/estimators.py:L363-L404 and docs/methodology/REGISTRY.md:L2550-L2555.
  • The earlier contract-drift risk is still closed off: absorb + fixed_effects is rejected before the auto-route, with regression coverage in diff_diff/estimators.py:L347-L361 and tests/test_estimators_vcov_type.py:L1109-L1137.
  • Clustered CR2 parity and the weighted survey multi-absorb placement are both explicitly covered in tests/test_estimators_vcov_type.py:L1076-L1179.
  • The generic CR2 golden harness correctly skips the new multi-column DiD fixture and leaves that case to the DiD-specific parity class in tests/test_linalg_hc2_bm.py:L548-L574.
  • I did not find any unmitigated P0/P1 issues in the changed files. I could not execute pytest here because this environment is missing both pytest and numpy, so this is a static review.

Methodology

No findings. The change in diff_diff/estimators.py:L363-L404 matches the methodology note in docs/methodology/REGISTRY.md:L2550-L2555, and the remaining unsupported absorbed-FE HC2/HC2-BM paths are accurately narrowed to TWFE and MPD in TODO.md:L100-L101.

Code Quality

No findings. The PR preserves the absorb + fixed_effects mutual-exclusion contract before any auto-route logic in diff_diff/estimators.py:L347-L361.

Performance

No findings. Routing only the HC2/HC2-BM branch to the full-dummy path is the expected correctness tradeoff; HC1/CR1 absorbed fits remain on the cheaper within-transformed route in diff_diff/estimators.py:L372-L388.

Maintainability

No findings. The new benchmark/test split is clean: the DiD-specific absorbed-FE scenario lives in the estimator parity tests, while the generic y ~ x CR2 harness stays focused on _compute_cr2_bm inputs in tests/test_linalg_hc2_bm.py:L548-L574.

Tech Debt

No findings. The follow-up surface is documented and tracked in TODO.md:L100-L101.

Security

No findings.

Documentation/Tests

No findings. The previous documentation/test mismatch around external HC2 parity is addressed by the added R generator target and matching assertion in benchmarks/R/generate_clubsandwich_golden.R:L97-L123 and tests/test_estimators_vcov_type.py:L1055-L1074.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 16, 2026
igerber and others added 6 commits May 16, 2026 20:00
Lifts `DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})`
NotImplementedError at `estimators.py:373` (previous) → auto-route at line
382 (new). FWL preserves coefficients and residuals under within-transform
but not the hat matrix, so HC2 leverage and CR2 Bell-McCaffrey DOF need
the FULL FE hat. Internally promoting `absorb=` to `fixed_effects=` for
HC2/HC2-BM fits builds the full-dummy design and routes through the
existing fixed-effects code path, which already computes the correct vcov.

Verified by reading clubSandwich's `R/CR-adjustments.R` source (the CR2
unweighted branch's `A_g = (I - H_gg)^{-1/2}` with H built on the full
model matrix is exactly what diff-diff's existing `_compute_cr2_bm`
produces). Singleton-cluster CR2 (`cluster=1:n`) reduces to one-way
HC2-BM Satterthwaite DOF — the PT2018-blessed workaround we use for the
unclustered HC2-BM goldens.

Parity tested at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and
`lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` via new
`tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity` against
new `absorbed_fe_did` scenario in `benchmarks/data/clubsandwich_cr2_golden.json`
(regenerated via the extended `benchmarks/R/generate_clubsandwich_golden.R`).

Out of scope (TODO.md partial drain): `TwoWayFixedEffects` and
`MultiPeriodDiD(absorb=...)` rejections remain — they have different
fit-path structure that needs separate surgery. Weighted variants
(`hc2_bm + weights`) and Conley + absorb paths are unchanged.

Behavioral note: under the auto-route, `result.coefficients` now contains
the FE-dummy entries (matching the `fixed_effects=` path), not the
slope-only view a plain `absorb=` returns. Downstream consumers reading
`result.att` are unaffected.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R1 review surfaced 2 P1 + 1 P2; all in-scope fixes.

**P1.1 — REGISTRY contradicted code.** REGISTRY.md:2552 still said
`DiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` raises NotImplementedError.
Replaced the blanket-rejection Note with a per-estimator status block:
DiD path is now SUPPORTED via auto-route (with the full DiDResults
surface change documented inline); TWFE and MultiPeriodDiD paths still
reject and are tracked as follow-ups.

**P1.2 — Parity tests missed clustered-CR2 and df-sensitive inference.**
The previous test class pinned only unclustered HC2-BM SE/ATT and
explicitly discarded the df target. Two new tests:
- `test_absorb_hc2_bm_clustered_matches_clubsandwich`: exercises
  `DiD(vcov_type="hc2_bm", cluster="unit").fit(..., absorb=[...])`
  against the R `vcovCR(..., cluster=d$unit, type="CR2")` target,
  asserting SE+ATT match at 1e-10.
- `test_absorb_hc2_bm_df_sensitive_inference`: asserts HC2 and HC2-BM
  give the SAME `se` but DIFFERENT `p_value` and `conf_int` (the BM
  Satterthwaite DOF must propagate to inference; CI width is wider
  under BM). This catches silent regressions where the auto-route
  passes SE through but uses n-k for inference.

**P2 — CHANGELOG only mentioned `result.coefficients`.** The auto-route
also affects `vcov`, `residuals`, `fitted_values`, `r_squared` (all
come from the full-dummy fit under the route; `r_squared` is computed
on the un-demeaned outcome and will typically be higher than the
within-R²). Extended the CHANGELOG entry with the full
`DiDResults`-surface contract change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R2 review flagged that REGISTRY/CHANGELOG documented
`DiD(absorb=..., vcov_type in {hc2,hc2_bm})` as SUPPORTED, but the
legacy `len(absorb) > 1 + survey_weights` guard at estimators.py:347
fired BEFORE the auto-route, so weighted multi-absorb fits still raised.

The guard's rationale ("single-pass demeaning isn't the correct weighted
FWL projection for N>1 absorbed dimensions") doesn't apply when we're
auto-routing to fixed_effects= — the fixed_effects= path builds the
full-dummy design and solves WLS directly with no within-transform.

Reorder: move the auto-route block above the multi-absorb-survey guard.
The guard now only fires when absorb was NOT consumed by the auto-route
(i.e., hc1/classical/conley/etc. — paths that still demean).

Adds `test_absorb_hc2_bm_survey_multi_absorb_auto_routes` to pin the
new placement against silent regression. The existing `test_survey.py`
multi-absorb-survey rejection tests continue to pass (they use the
default vcov_type=hc1 path which still hits the guard).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R3 informational P3: REGISTRY/CHANGELOG wording could be read as
implying survey fits compute HC2/HC2-BM analytically. Survey fits
actually use Taylor-series linearization or replicate-weight variance
regardless of `vcov_type` — the auto-route only changes the FE
handling and removes the prior absorbed-FE reject.

Added one-sentence clarifications in both surfaces so the documentation
matches the variance dispatch in linalg.py / results.py.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R4 informational P3: my prior reordering moved the auto-route ahead of
the multi-absorb-survey guard (correct) but ALSO left it ahead of the
existing `absorb + fixed_effects` mutual-exclusion check. On hc2/hc2_bm
the user-facing rejection vanished — the two args silently merged.

Move the `absorb + fixed_effects` validation ABOVE the auto-route so the
public-contract rejection fires regardless of vcov_type. Add a regression
test that pins the rejection across hc1/hc2/hc2_bm to prevent silent
regression on this contract.

The legacy multi-absorb + survey-weights guard stays BELOW the auto-route
(intentional from R2: when auto-routing, the demeaning rationale of that
guard doesn't apply).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R5 informational P3: CHANGELOG/REGISTRY claimed parity against BOTH
`sandwich::vcovHC(type="HC2")` AND `clubSandwich::vcovCR(...)`, but the
R generator only materialized CR2-derived targets for `absorbed_fe_did`.
The HC2-parity claim was verified in a throwaway smoke test but not
pinned in-tree.

Added `vcov_hc2` to the R generator output (computed via
`sandwich::vcovHC(fit_did, type = "HC2")`) and a corresponding Python
parity test `test_absorb_hc2_matches_sandwich_vcovhc` that asserts the
treat_post slope SE matches at 1e-10. Replaces the prior weaker
`test_absorb_hc2_matches_full_dummy_design` (which only checked
finite-SE + ATT parity).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the wave-5-hc2-cr2-bm-extensions branch from 597d0fc to 3b49324 Compare May 17, 2026 00:03
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 3b49324b55bb23ca3966b34d5a49f9e50c1231b6


Overall Assessment

✅ Looks good

Executive Summary

  • Previous re-review P1+ items appear addressed: the absorb + fixed_effects rejection now fires before the auto-route, preserving the public API contract in diff_diff/estimators.py:L347-L360.
  • The implementation is methodologically aligned with the registry: only DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"}) is lifted, while TWFE and MPD remain rejected and explicitly tracked in docs/methodology/REGISTRY.md:L2550-L2557 and TODO.md:L99-L100.
  • The new parity surface is materially stronger: the R generator now emits absorbed-FE DiD HC2, singleton-cluster HC2-BM, and unit-cluster CR2 targets in benchmarks/R/generate_clubsandwich_golden.R:L72-L123, and the estimator tests exercise those paths in tests/test_estimators_vcov_type.py:L981-L1219.
  • I did not find any unmitigated P0/P1 issues in the changed files.
  • Static review only: I could not execute pytest here because pytest is not installed in this environment.

Methodology

  • No findings. The new route in diff_diff/estimators.py:L363-L404 matches the methodology note in docs/methodology/REGISTRY.md:L2551-L2555, and the remaining unsupported absorbed-FE surfaces are correctly narrowed to TWFE/MPD in TODO.md:L99-L100.

Code Quality

  • No findings. The contract-preserving absorb + fixed_effects rejection is correctly placed ahead of the auto-route in diff_diff/estimators.py:L347-L360.

Performance

  • No findings. The expensive full-dummy path is limited to the hc2 / hc2_bm branch, while HC1/CR1 absorbed fits stay on the cheaper within-transformed route in diff_diff/estimators.py:L363-L404.

Maintainability

  • No findings. The implementation, registry note, TODO narrowing, R benchmark generator, and estimator parity tests are internally consistent across the changed files.

Tech Debt

  • No findings. The remaining absorbed-FE HC2/HC2-BM work is explicitly tracked rather than silently broadened in scope (TODO.md:L99-L100).

Security

  • No findings.

Documentation/Tests

  • Severity: P3. Impact: The new absorbed-FE fixture records exact dof_hc2_bm and dof_cr2, but the estimator tests only pin ATT/SE parity and a qualitative HC2-vs-HC2-BM inference difference; a future regression in exact Satterthwaite DOF propagation on this rank-deficient FE design could still pass as long as BM still widens the CI. Concrete fix: assert exact p_value/conf_int parity (or derived DOF parity) for the treat_post coefficient using the stored fixture values on both the singleton-cluster and unit-cluster CR2 paths. References: benchmarks/R/generate_clubsandwich_golden.R:L102-L123, tests/test_estimators_vcov_type.py:L1040-L1053, tests/test_estimators_vcov_type.py:L1076-L1107, tests/test_estimators_vcov_type.py:L1181-L1219.

@igerber igerber merged commit a7bd40d into main May 17, 2026
26 checks passed
@igerber igerber deleted the wave-5-hc2-cr2-bm-extensions branch May 17, 2026 01:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant