Lift Gate 1: HC2/HC2-BM for TwoWayFixedEffects via full-dummy auto-route#469
Conversation
Replaces the unconditional NotImplementedError at twfe.py for
`vcov_type in {"hc2","hc2_bm"}` with an inline full-dummy branch.
TWFE has no absorb=/fixed_effects= parameter to swap (unit + time FE
are baked into the estimator's identity), so the auto-route trick
used for DiD-absorb / MPD-absorb doesn't apply directly. Instead,
`TwoWayFixedEffects.fit()` bypasses the within-transform on
hc2/hc2_bm and stacks [intercept, treated×post, covariates,
factor(unit), factor(time)] explicitly so the leverage correction
and BM DOF compute on the full FE projection (FWL preserves
coefficients and residuals but NOT the hat matrix).
**Auto-cluster default:** preserved on hc2_bm (routes to CR2-BM at
unit) and on hc2 + wild_bootstrap; dropped on explicit hc2 +
analytical to match the one-way contract (the linalg validator
rejects hc2 + cluster_ids).
**Surface change disclosure** (matches DiD-absorb / MPD-absorb):
under vcov_type in {"hc2","hc2_bm"}, result.coefficients,
result.vcov, result.residuals, result.fitted_values, and
result.r_squared reflect the full-dummy fit. FE-dummy entries are
included alongside the "ATT" key (len(coefficients) ==
vcov.shape[0] invariant upheld). result.att, its SE, and analytical
inference are unchanged (FWL-equivalent).
**Rejected combos:** vcov_type in {"hc2","hc2_bm"} + replicate-
weight survey designs raises NotImplementedError because the
replicate path re-demeans per replicate, which doesn't compose with
the full-dummy build. Survey variance precedence: any resolved
SurveyDesign drives variance via TSL/replicate (matches existing
DiD/MPD contract), not the analytical small-sample sandwich.
Verified at atol=1e-10 vs `lm() + sandwich::vcovHC(type="HC2")` and
`lm() + clubSandwich::vcovCR(cluster=seq_len(n), type="CR2") +
coef_test()$df_Satt` on a new `twfe_two_period` scenario in
benchmarks/data/clubsandwich_cr2_golden.json. New tests:
- TestFitBehavior (10 behavioral tests including refactor regression
vs DiD(fixed_effects=[unit, time]) at atol=1e-12, auto-cluster
distinguishability check vs one-way HC2-BM at 1% gap, replicate-
weight rejection, coefficients-vs-vcov alignment invariant)
- TestTWFEHC2RParity (3 R-parity tests at atol=1e-10)
Lifts Gate 1 of the six HC2/HC2-BM NotImplementedError gates — the
last absorbed-FE gate. Remaining gates: weighted one-way HC2-BM,
weighted CR2-BM (both blocked on the clubSandwich WLS algebra
derivation).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good No unmitigated P0/P1 findings. The HC2/HC2-BM methodology change in Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Addresses CI Codex review findings on PR #469: P2 (Performance): the HC2/HC2-BM full-dummy build can OOM on large TWFE panels (n × (n_units + n_times) float64 entries). Add a memory- size warning at >50M entries (~400 MB) suggesting hc1 (within- transform) for large panels. P3 (Docs/Tests): new tests pinned ATT/SE/DOF but not the documented full-surface change (residuals/fitted_values/r_squared reflect the full-dummy fit). Add `test_twfe_hc2_full_surface_matches_did_fixed_effects` parametrized over hc2/hc2_bm, asserting bit-equality against DifferenceInDifferences(fixed_effects=[unit, time]) at atol=1e-12 on all three fields. P3 (Maintainability): TWFE's inline full-dummy builder duplicates DiD's fixed_effects= dummy-construction logic. Substantive refactor — better as a follow-up than inline in this PR. Added a TODO row. P3 (Tech debt — replicate weights): already tracked, no action needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI review flagged the missing end-to-end test for the new non-replicate survey-weighted hc2/hc2_bm TWFE path. A regression there could ship while existing tests still pass. New parametrized test compares TWFE(vcov_type=vcov, cluster=...) with SurveyDesign(weights=...) against DifferenceInDifferences(vcov_type=vcov, cluster=..., fixed_effects=['unit', 'time']) with the same design. Both paths feed the survey-resolved full-dummy X to LinearRegression's compute_survey_vcov (TSL), so ATT and SE match bit-equally at atol=1e-12. Test documents the explicit-cluster requirement on hc2_bm: TWFE's implicit auto-cluster + survey path intentionally drops PSU injection (per the survey-design scope rule in _resolve_effective_cluster); the explicit cluster='unit' form is what aligns with DiD's clustered- survey behavior and is the documented user-facing way to invoke clustered survey-aware HC2-BM on TWFE. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
… for survey path CI review polish: - Add `test_twfe_hc2_with_survey_strata_psu_matches_did_fixed_effects` (parametrized over hc2/hc2_bm) — extends the weights-only survey regression with `SurveyDesign(weights="w", strata="stratum", psu="psu")`. Both TWFE and DiD(fixed_effects=[unit, time]) paths feed the survey-resolved full-dummy X to LinearRegression's TSL with stratified-design adjustments, so ATT/SE match bit-equally at atol=1e-12. - Clarify the "auto-cluster preserved on hc2_bm" wording in TWFE docstring and REGISTRY: the auto-cluster default applies to the non-survey analytical path. Under `survey_design=` with no explicit `cluster=`, TWFE keeps the documented implicit-PSU path (auto-cluster NOT injected into survey PSU); users wanting unit-level PSU under a survey design must pass explicit `cluster="unit"` or set `survey_design.psu`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI review (R4) flagged that the new top entry says TWFE HC2/HC2-BM is supported, but two older bullets in the same [Unreleased] block (DiD- absorb and MPD-absorb entries) still claim "TwoWayFixedEffects rejection remains as a follow-up" / "remain as follow-ups". Those sentences were factually correct when those PRs landed but become contradictory once Gate 1 ships in the same release. Replace both with cross-references to the Gate 1 top entry, noting that TWFE uses a separate full-dummy branch (no fixed_effects= equivalent inside TWFE) rather than the absorb→fixed_effects parameter swap used by DiD/MPD. Per feedback_changelog_accuracy_fixes.md: scanned all CHANGELOG bullets for similar stale claims; the remaining matches are unrelated entries (Spillover, BaconDecomposition, Conley waves) that don't reference TWFE rejection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
CI review (R5) identified a P1 bug in wild_bootstrap_se() that was newly reachable via the TWFE HC2/HC2-BM full-dummy path: Before this fix, wild_bootstrap_se built each draw's pseudo-outcome as `y_star = X @ beta_restricted`. When solve_ols dropped a rank- deficient nuisance column (e.g. a time-invariant covariate collinear with the unit FE on the full-dummy design), beta_restricted contained NaN on the dropped slot, and X @ beta_restricted propagated NaN through every observation. The ATT was analytically identified but the bootstrap crashed because y_star was all-NaN. Pre-PR this was unreachable on TWFE (the within-transform absorbed time-invariant covariates before they entered X), but the new full- dummy HC2/HC2-BM branch keeps unit/time dummies explicit alongside covariates, exposing the bug. Two fixes in wild_bootstrap_se (diff_diff/utils.py): 1. Use solve_ols(return_fitted=True) to get NaN-safe fitted values from the kept columns; build y_star = fitted_restricted + residuals_restricted * obs_weights instead of X @ beta_restricted. fitted_restricted is computed from the kept columns by solve_ols, so dropped nuisance NaN doesn't propagate. 2. Replace bootstrap_t_stats[b] = 0.0 fallback for singular draws with np.nan + a finite_mask filter at the p-value step. Setting t* = 0 biased the p-value downward (|0| < |t_original| counts as non-rejection, but those draws are invalid, not non-rejections). The same nan-safe filter applies to bootstrap_coefs for the SE and percentile CI. New regression test `test_twfe_hc2_wild_bootstrap_survives_rank_deficient_full_dummy` fits TwoWayFixedEffects(vcov_type='hc2', inference='wild_bootstrap', covariates=['x_invariant']) on a panel where x_invariant is time- invariant (collinear with unit FE on the full-dummy design); asserts finite ATT, SE, p-value, and CI. Pre-fix this test crashed with all-NaN y_star. No regression in the existing 53 wild_bootstrap tests across test_wild_bootstrap, test_methodology_did, test_methodology_twfe, test_conley_vcov, test_estimators_vcov_type, test_business_report, test_replicate_weight_expansion, test_survey. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ⛔ Blocker Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…otstrap CI review (R6) caught a new P0 in my R5 wild_bootstrap_se() fix: the degenerate-bootstrap branches violated the all-or-nothing NaN contract from feedback_bootstrap_nan_on_invalid_contract: - n_valid == 0 returned p_value = 1.0 with se = NaN (split inference) - valid_coefs.size == 1 returned a finite percentile CI from a single draw alongside se = NaN - t_stat_original was always finite (analytical), surfacing alongside NaN bootstrap se when bootstrap was degenerate Fix: when n_valid < 2 OR valid_coefs.size < 2, NaN-out the entire inference quadruple (se, p_value, ci_lower, ci_upper) AND the surfaced t_stat_original. The analytical t-stat from step 1 is still computed for diagnostic use inside the helper but not propagated to the user-facing result on a degenerate bootstrap — this prevents the estimator wrapper from emitting an analytical t-stat alongside NaN bootstrap fields, which would mix inference families on the same coefficient. New regression tests in tests/test_wild_bootstrap.py:: TestWildBootstrapDegenerateAllNaN: - test_degenerate_n_valid_zero_returns_all_nan: monkeypatches solve_ols so every bootstrap draw has singular vcov; asserts ALL five user-surface fields are NaN. - test_degenerate_single_valid_draw_returns_all_nan: forces exactly one valid draw (n_valid == 1); asserts ALL five fields NaN — no percentile CI from a single-point sample. Both branches were previously not exercised by the analytical-design tests, which is why the R5 fix passed but the R6 reviewer caught the contract violation via code inspection. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good No unmitigated P0/P1 findings remain in the changed estimator/inference paths. The prior blocker around degenerate wild-bootstrap inference appears resolved. Highest remaining unmitigated issue is a P2 documentation/test-contract gap. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
CI review (R7) flagged a P2 scope creep in the R1 coefficients-dict
fix: I built `_twfe_var_names` on BOTH the full-dummy and within-
transform branches, which silently broadened
`result.coefficients` on HC1/classical/Conley paths from
`{"ATT": att}` to `{"const": c, "ATT": att, ...covariates}`. That's
a user-visible API change on unchanged TWFE paths that wasn't
documented in CHANGELOG/REGISTRY or regression-tested.
Per the reviewer's recommendation, restoring the historical
`{"ATT": att}` contract on within-transform paths by setting
`_twfe_var_names = None` on the else branch (the fallback at the
DiDResults construction site handles None via the existing
`{"ATT": float(att)}` literal). Only the HC2/HC2-BM full-dummy path
now broadens the dict — which is what the REGISTRY/CHANGELOG
surface-change disclosure documents, and what the alignment-invariant
test and full-surface regression test pin.
Verified end-to-end: hc1/classical → `{'ATT'}`; hc2/hc2_bm →
`{'const', 'ATT', '_fe_unit_*', '_fe_time_*'}`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good No unmitigated P0/P1 findings remain in the changed TWFE HC2/HC2-BM or wild-bootstrap paths. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Summary
TwoWayFixedEffects(vcov_type in {"hc2","hc2_bm"})now produces finite small-sample inference via an inline full-dummy build insideTWFE.fit()— the last absorbed-FE HC2/HC2-BM gate (DiD-absorb DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally #458, MPD-absorb Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route #459, MPD cluster+contrast-DOF Lift Gate 6: cluster-aware CR2 Bell-McCaffrey contrast DOF for MultiPeriodDiD avg_att #465 already merged).hc2_bm(routes to CR2-BM at unit) and onhc2 + wild_bootstrap; dropped on explicithc2 + analyticalto match the one-way contract.hc2/hc2_bm,result.coefficients(incl. FE-dummy entries),vcov,residuals,fitted_values,r_squaredreflect the full-dummy fit;result.attand analytical inference are FWL-invariant.len(coefficients) == vcov.shape[0]invariant upheld.hc2 / hc2_bm+ replicate-weight survey designs explicitly rejected (replicate path re-demeans per replicate, doesn't compose with full-dummy).Methodology references (required if estimator / math changes)
sandwich::vcovHC(type="HC2")andclubSandwich::vcovCR(type="CR2") + coef_test()$df_Satt(R parity targets; singleton-cluster trick = one-way HC2-BM)atol=1e-10verified on the newtwfe_two_periodscenario inbenchmarks/data/clubsandwich_cr2_golden.json. The full-dummy build is the FWL-correct algebra (within-transform preserves coefficients but NOT the hat matrix, so HC2 leverage on the demeaned design would be wrong).Validation
tests/test_estimators_vcov_type.py::TestFitBehavior— 10 behavioral tests (rejection flip → behavioral; refactor regression vsDiD(fixed_effects=[unit, time])atatol=1e-12; auto-cluster default coverage onhc2_bmwith distinguishability check vs one-way HC2-BM at >1% gap; explicithc2 + analyticalno-auto-cluster;hc2 + wild_bootstrapauto-cluster preserved;hc2 / hc2_bm + replicaterejection; always-treated unit finite ATT; coefficients-vs-vcov alignment invariant)tests/test_methodology_twfe.py::TestTWFEHC2RParity— 3 R-parity tests atatol=1e-10(HC2 SE vssandwich::vcovHC; one-way BM DOF vs singleton-cluster CR2; CR2-BM clustered-at-unit DOF vsvcovCR(cluster=unit))benchmarks/R/generate_clubsandwich_golden.R— newtwfe_two_periodscenario (8 units × 4 periods, binary post indicator); JSON regenerated withmeta.source = "clubSandwich"Security / privacy
🤖 Generated with Claude Code