DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally#458
Conversation
|
Overall Assessment Needs changes Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
R1 review surfaced 2 P1 + 1 P2; all in-scope fixes.
**P1.1 — REGISTRY contradicted code.** REGISTRY.md:2552 still said
`DiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` raises NotImplementedError.
Replaced the blanket-rejection Note with a per-estimator status block:
DiD path is now SUPPORTED via auto-route (with the full DiDResults
surface change documented inline); TWFE and MultiPeriodDiD paths still
reject and are tracked as follow-ups.
**P1.2 — Parity tests missed clustered-CR2 and df-sensitive inference.**
The previous test class pinned only unclustered HC2-BM SE/ATT and
explicitly discarded the df target. Two new tests:
- `test_absorb_hc2_bm_clustered_matches_clubsandwich`: exercises
`DiD(vcov_type="hc2_bm", cluster="unit").fit(..., absorb=[...])`
against the R `vcovCR(..., cluster=d$unit, type="CR2")` target,
asserting SE+ATT match at 1e-10.
- `test_absorb_hc2_bm_df_sensitive_inference`: asserts HC2 and HC2-BM
give the SAME `se` but DIFFERENT `p_value` and `conf_int` (the BM
Satterthwaite DOF must propagate to inference; CI width is wider
under BM). This catches silent regressions where the auto-route
passes SE through but uses n-k for inference.
**P2 — CHANGELOG only mentioned `result.coefficients`.** The auto-route
also affects `vcov`, `residuals`, `fitted_values`, `r_squared` (all
come from the full-dummy fit under the route; `r_squared` is computed
on the un-demeaned outcome and will typically be higher than the
within-R²). Extended the CHANGELOG entry with the full
`DiDResults`-surface contract change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R2 review flagged that REGISTRY/CHANGELOG documented
`DiD(absorb=..., vcov_type in {hc2,hc2_bm})` as SUPPORTED, but the
legacy `len(absorb) > 1 + survey_weights` guard at estimators.py:347
fired BEFORE the auto-route, so weighted multi-absorb fits still raised.
The guard's rationale ("single-pass demeaning isn't the correct weighted
FWL projection for N>1 absorbed dimensions") doesn't apply when we're
auto-routing to fixed_effects= — the fixed_effects= path builds the
full-dummy design and solves WLS directly with no within-transform.
Reorder: move the auto-route block above the multi-absorb-survey guard.
The guard now only fires when absorb was NOT consumed by the auto-route
(i.e., hc1/classical/conley/etc. — paths that still demean).
Adds `test_absorb_hc2_bm_survey_multi_absorb_auto_routes` to pin the
new placement against silent regression. The existing `test_survey.py`
multi-absorb-survey rejection tests continue to pass (they use the
default vcov_type=hc1 path which still hits the guard).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology Code Quality Performance Maintainability Tech Debt Security Documentation/Tests
Reviewer note: I could not execute the automated tests here because |
R3 informational P3: REGISTRY/CHANGELOG wording could be read as implying survey fits compute HC2/HC2-BM analytically. Survey fits actually use Taylor-series linearization or replicate-weight variance regardless of `vcov_type` — the auto-route only changes the FE handling and removes the prior absorbed-FE reject. Added one-sentence clarifications in both surfaces so the documentation matches the variance dispatch in linalg.py / results.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology Code Quality Performance Maintainability
Tech Debt Security Documentation/Tests |
R4 informational P3: my prior reordering moved the auto-route ahead of the multi-absorb-survey guard (correct) but ALSO left it ahead of the existing `absorb + fixed_effects` mutual-exclusion check. On hc2/hc2_bm the user-facing rejection vanished — the two args silently merged. Move the `absorb + fixed_effects` validation ABOVE the auto-route so the public-contract rejection fires regardless of vcov_type. Add a regression test that pins the rejection across hc1/hc2/hc2_bm to prevent silent regression on this contract. The legacy multi-absorb + survey-weights guard stays BELOW the auto-route (intentional from R2: when auto-routing, the demeaning rationale of that guard doesn't apply). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. Code Quality No findings. Performance No findings. Routing leverage-based HC2/CR2 work to the full-dummy design is the expected tradeoff for correctness here. Maintainability No findings. Tech Debt No findings. The remaining unsupported absorbed-FE HC2/HC2-BM surfaces are properly tracked in Security No findings. Documentation/Tests Severity: P3 |
R5 informational P3: CHANGELOG/REGISTRY claimed parity against BOTH `sandwich::vcovHC(type="HC2")` AND `clubSandwich::vcovCR(...)`, but the R generator only materialized CR2-derived targets for `absorbed_fe_did`. The HC2-parity claim was verified in a throwaway smoke test but not pinned in-tree. Added `vcov_hc2` to the R generator output (computed via `sandwich::vcovHC(fit_did, type = "HC2")`) and a corresponding Python parity test `test_absorb_hc2_matches_sandwich_vcovhc` that asserts the treat_post slope SE matches at 1e-10. Replaces the prior weaker `test_absorb_hc2_matches_full_dummy_design` (which only checked finite-SE + ATT parity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology No findings. The change in Code Quality No findings. The PR preserves the Performance No findings. Routing only the HC2/HC2-BM branch to the full-dummy path is the expected correctness tradeoff; HC1/CR1 absorbed fits remain on the cheaper within-transformed route in Maintainability No findings. The new benchmark/test split is clean: the DiD-specific absorbed-FE scenario lives in the estimator parity tests, while the generic Tech Debt No findings. The follow-up surface is documented and tracked in Security No findings. Documentation/Tests No findings. The previous documentation/test mismatch around external HC2 parity is addressed by the added R generator target and matching assertion in |
Lifts `DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})`
NotImplementedError at `estimators.py:373` (previous) → auto-route at line
382 (new). FWL preserves coefficients and residuals under within-transform
but not the hat matrix, so HC2 leverage and CR2 Bell-McCaffrey DOF need
the FULL FE hat. Internally promoting `absorb=` to `fixed_effects=` for
HC2/HC2-BM fits builds the full-dummy design and routes through the
existing fixed-effects code path, which already computes the correct vcov.
Verified by reading clubSandwich's `R/CR-adjustments.R` source (the CR2
unweighted branch's `A_g = (I - H_gg)^{-1/2}` with H built on the full
model matrix is exactly what diff-diff's existing `_compute_cr2_bm`
produces). Singleton-cluster CR2 (`cluster=1:n`) reduces to one-way
HC2-BM Satterthwaite DOF — the PT2018-blessed workaround we use for the
unclustered HC2-BM goldens.
Parity tested at ~1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and
`lm() + clubSandwich::vcovCR(cluster=..., type="CR2")` via new
`tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParity` against
new `absorbed_fe_did` scenario in `benchmarks/data/clubsandwich_cr2_golden.json`
(regenerated via the extended `benchmarks/R/generate_clubsandwich_golden.R`).
Out of scope (TODO.md partial drain): `TwoWayFixedEffects` and
`MultiPeriodDiD(absorb=...)` rejections remain — they have different
fit-path structure that needs separate surgery. Weighted variants
(`hc2_bm + weights`) and Conley + absorb paths are unchanged.
Behavioral note: under the auto-route, `result.coefficients` now contains
the FE-dummy entries (matching the `fixed_effects=` path), not the
slope-only view a plain `absorb=` returns. Downstream consumers reading
`result.att` are unaffected.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R1 review surfaced 2 P1 + 1 P2; all in-scope fixes.
**P1.1 — REGISTRY contradicted code.** REGISTRY.md:2552 still said
`DiD(absorb=..., vcov_type in {"hc2","hc2_bm"})` raises NotImplementedError.
Replaced the blanket-rejection Note with a per-estimator status block:
DiD path is now SUPPORTED via auto-route (with the full DiDResults
surface change documented inline); TWFE and MultiPeriodDiD paths still
reject and are tracked as follow-ups.
**P1.2 — Parity tests missed clustered-CR2 and df-sensitive inference.**
The previous test class pinned only unclustered HC2-BM SE/ATT and
explicitly discarded the df target. Two new tests:
- `test_absorb_hc2_bm_clustered_matches_clubsandwich`: exercises
`DiD(vcov_type="hc2_bm", cluster="unit").fit(..., absorb=[...])`
against the R `vcovCR(..., cluster=d$unit, type="CR2")` target,
asserting SE+ATT match at 1e-10.
- `test_absorb_hc2_bm_df_sensitive_inference`: asserts HC2 and HC2-BM
give the SAME `se` but DIFFERENT `p_value` and `conf_int` (the BM
Satterthwaite DOF must propagate to inference; CI width is wider
under BM). This catches silent regressions where the auto-route
passes SE through but uses n-k for inference.
**P2 — CHANGELOG only mentioned `result.coefficients`.** The auto-route
also affects `vcov`, `residuals`, `fitted_values`, `r_squared` (all
come from the full-dummy fit under the route; `r_squared` is computed
on the un-demeaned outcome and will typically be higher than the
within-R²). Extended the CHANGELOG entry with the full
`DiDResults`-surface contract change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R2 review flagged that REGISTRY/CHANGELOG documented
`DiD(absorb=..., vcov_type in {hc2,hc2_bm})` as SUPPORTED, but the
legacy `len(absorb) > 1 + survey_weights` guard at estimators.py:347
fired BEFORE the auto-route, so weighted multi-absorb fits still raised.
The guard's rationale ("single-pass demeaning isn't the correct weighted
FWL projection for N>1 absorbed dimensions") doesn't apply when we're
auto-routing to fixed_effects= — the fixed_effects= path builds the
full-dummy design and solves WLS directly with no within-transform.
Reorder: move the auto-route block above the multi-absorb-survey guard.
The guard now only fires when absorb was NOT consumed by the auto-route
(i.e., hc1/classical/conley/etc. — paths that still demean).
Adds `test_absorb_hc2_bm_survey_multi_absorb_auto_routes` to pin the
new placement against silent regression. The existing `test_survey.py`
multi-absorb-survey rejection tests continue to pass (they use the
default vcov_type=hc1 path which still hits the guard).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R3 informational P3: REGISTRY/CHANGELOG wording could be read as implying survey fits compute HC2/HC2-BM analytically. Survey fits actually use Taylor-series linearization or replicate-weight variance regardless of `vcov_type` — the auto-route only changes the FE handling and removes the prior absorbed-FE reject. Added one-sentence clarifications in both surfaces so the documentation matches the variance dispatch in linalg.py / results.py. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R4 informational P3: my prior reordering moved the auto-route ahead of the multi-absorb-survey guard (correct) but ALSO left it ahead of the existing `absorb + fixed_effects` mutual-exclusion check. On hc2/hc2_bm the user-facing rejection vanished — the two args silently merged. Move the `absorb + fixed_effects` validation ABOVE the auto-route so the public-contract rejection fires regardless of vcov_type. Add a regression test that pins the rejection across hc1/hc2/hc2_bm to prevent silent regression on this contract. The legacy multi-absorb + survey-weights guard stays BELOW the auto-route (intentional from R2: when auto-routing, the demeaning rationale of that guard doesn't apply). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R5 informational P3: CHANGELOG/REGISTRY claimed parity against BOTH `sandwich::vcovHC(type="HC2")` AND `clubSandwich::vcovCR(...)`, but the R generator only materialized CR2-derived targets for `absorbed_fe_did`. The HC2-parity claim was verified in a throwaway smoke test but not pinned in-tree. Added `vcov_hc2` to the R generator output (computed via `sandwich::vcovHC(fit_did, type = "HC2")`) and a corresponding Python parity test `test_absorb_hc2_matches_sandwich_vcovhc` that asserts the treat_post slope SE matches at 1e-10. Replaces the prior weaker `test_absorb_hc2_matches_full_dummy_design` (which only checked finite-SE + ATT parity). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
597d0fc to
3b49324
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
DifferenceInDifferences(absorb=..., vcov_type in {"hc2","hc2_bm"})NotImplementedError atdiff_diff/estimators.py:382(was line 373). Auto-route promotes absorb columns tofixed_effects=internally for HC2/HC2-BM fits so the existing full-dummy-design code path computes the algebraically correct vcov.R/CR-adjustments.R) before writing code. Confirmed the unweighted CR2 algebra (A_g = (I - H_gg)^{-1/2}with H on the full model matrix) is what diff-diff's existing_compute_cr2_bmalready produces. Singleton-cluster CR2 trick (cluster=1:n) reduces to one-way HC2-BM Satterthwaite DOF.lm() + sandwich::vcovHC(type="HC2")andlm() + clubSandwich::vcovCR(cluster=..., type="CR2")via newabsorbed_fe_didscenario inbenchmarks/data/clubsandwich_cr2_golden.jsonand newtests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParitytest class.Methodology references (required if estimator / math changes)
jepusto/clubSandwich/R/CR-adjustments.R.vcovCR.lm(... type="CR2")for the unweighted case. Weighted variant (with the more elaborateThetaquadratic correction documented inCR-adjustments.R'sinverse_var=FALSEbranch) is deferred to a follow-up.Validation
tests/test_estimators_vcov_type.py::TestDiDAbsorbedFERParityclass (2 tests) — R-parity at 1e-10 againstabsorbed_fe_didgolden scenario.tests/test_estimators_vcov_type.py::test_did_absorb_rejects_hc2_and_hc2_bmflipped from "raises NotImplementedError" to "auto-routes; matchesfixed_effects=path bit-equal" (renamedtest_did_absorb_hc2_and_hc2_bm_auto_route).tests/test_linalg_hc2_bm.py::test_cr2_parity_with_goldenskips scenarios that don't fit the simpley ~ x2-column contract.Security / privacy
Generated with Claude Code