Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route by igerber · Pull Request #459 · igerber/diff-diff

igerber · 2026-05-17T11:08:58Z

Summary

Lift MultiPeriodDiD(absorb=..., vcov_type in {"hc2","hc2_bm"}) NotImplementedError gate at estimators.py:1476 via 5-LoC auto-route to fixed_effects= internally — mirrors PR DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally #458's DiD-absorb pattern; same algebraic justification (FWL preserves coefficients/residuals but not the hat matrix; HC2/CR2 leverage corrections need the full FE-dummy hat).
Three-guard reorder so the auto-route sits BETWEEN the absorb+fixed_effects mutual-exclusion check (above) and the multi-absorb+survey-weights reject (below), matching the DiD ordering.
Survey-replicate absorb-refit branch at estimators.py:1693 is correctly short-circuited under the auto-route (standard compute_replicate_vcov path applies to the fixed full-dummy design; no per-replicate refit needed) — verified by new JK1 replicate-weights regression test.
New TestMPDAbsorbedFERParity class (10 tests): bit-equality auto-route invariants (single-absorb, multi-absorb, HC2-BM), R-parity vs sandwich::vcovHC and clubSandwich::vcovCR (1e-10), df-sensitive inference on both period_effects and avg_att, survey-weighted multi-absorb auto-route, JK1 replicate-weight regression, and mutual-exclusion preservation.
After this PR, TODO row 99 narrows to TWFE only (separate surgery — TWFE has no fixed_effects= equivalent path).

Methodology references (required if estimator / math changes)

Method name(s): Eicker-Huber-White HC2 leverage correction; Bell-McCaffrey (2002) / Imbens-Kolesar (2016) / Pustejovsky-Tipton (2018) Satterthwaite DOF; Frisch-Waugh-Lovell theorem
Paper / source link(s): clubSandwich R/CR-adjustments.R (unweighted CR2 algebra reference); PT2018 §3.3 singleton-cluster CR2 trick for one-way HC2-BM; sandwich vcovHC for HC2 anchor. See docs/methodology/REGISTRY.md § HC2 + Bell-McCaffrey scope-limitation block (now-updated MultiPeriodDiD status).
Any intentional deviations from the source: None. Under the auto-route the full-dummy design is bit-identical to lm(y ~ treated + period_X dummies + treated:period_X + factor(unit), data=d); HC2/HC2-BM SEs match R at ~1e-10 (smoke test ≤ 1e-15).

Validation

Tests added/updated: tests/test_estimators_vcov_type.py::TestMPDAbsorbedFERParity (10 new tests); existing test_multi_period_absorb_rejects_hc2_and_hc2_bm removed (its gate is now lifted, and the new class covers the same single-absorb shape with the new contract).
R goldens: new mpd_absorbed_fe_did scenario in benchmarks/R/generate_clubsandwich_golden.R + regenerated benchmarks/data/clubsandwich_cr2_golden.json — 5-cohort × 5-period event-study fixture (4 ever-treated + 1 never-treated cohort) with HC2 (sandwich) + HC2-BM singleton-cluster CR2 (clubSandwich) targets pinned on treated_period_4.
No regressions: 247 tests pass across test_estimators_vcov_type.py (67), test_estimators.py (180), test_linalg_hc2_bm.py.
Backtest / simulation / notebook evidence: N/A (analytical-sandwich methodology only; no tutorials updated).

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…ixed_effects= Mirrors PR #458 (DiD-absorb auto-route) on MultiPeriodDiD: when absorb= is paired with vcov_type in {hc2, hc2_bm}, the fit promotes the absorb columns to fixed_effects= internally so the existing full-dummy-design code path computes the algebraically correct vcov on the event-study design (treated + period_X dummies + treated:period_X interactions + factor(unit)). Verified at ~1e-15 vs lm() + sandwich::vcovHC(type="HC2") and lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2") on a new 5-cohort x 5-period mpd_absorbed_fe_did fixture. Includes three-guard reorder so the auto-route sits BETWEEN the absorb + fixed_effects mutual-exclusion check (above) and the multi-absorb + survey-weights reject (below), matching the DiD ordering. The survey-replicate absorb-refit branch at estimators.py:1689 is short- circuited under the auto-route (the standard compute_replicate_vcov path applies on the fixed full-dummy design; no per-replicate refit needed). Tests: new TestMPDAbsorbedFERParity class (7 tests) mirrors PR #458's TestDiDAbsorbedFERParity, pinning parity targets on per-period interaction coefficients (treated:period_4) to avoid the treated x unit collinearity baked into MPD's time-invariant ever-treated indicator. Existing test_multi_period_absorb_rejects_hc2_and_hc2_bm deleted. REGISTRY.md per-estimator status block updated (MPD moves REJECT -> SUPPORTED; TWFE remains the only REJECT case). TODO row 99 narrowed to TWFE-only. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex local review surfaced two findings on the MPD-absorb gate lift: P3 (methodology accuracy): REGISTRY/CHANGELOG/test-class docstring claimed the `treated` main-effect coefficient becomes NaN under rank deficiency. Empirically false — in the shipped parity fixture solve_ols drops a never-treated unit dummy (`unit_25`) and keeps `treated` finite. The pivot order is data-dependent. Rewrite to say one column from the collinear set is dropped under R-style rank-deficiency handling; per-period interactions and avg_att are identified and invariant to that choice. P2 (test coverage): the new MPD test class missed two surfaces that the DiD analogue covers: 1. Survey-weighted multi-absorb auto-route bypass of the `len(absorb) > 1 + survey_weights` reject — exercised by new `test_absorb_hc2_bm_survey_multi_absorb_auto_routes` with parity assertion against the explicit `fixed_effects=` path on both `period_effects[target]` and `avg_att`. 2. The MPD-specific `avg_att` (post-period-average) contrast did not have a direct HC2-vs-HC2-BM inference pin. Added `test_absorb_hc2_bm_avg_att_df_sensitive_inference` asserting same avg_se but different avg_p_value / wider avg_conf_int under HC2-BM (the contract guard the per-period df_sensitive test cannot reach). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Codex R2 returned ✅ with two P3s. P3 (test coverage upgraded to actionable per feedback_test_coverage_gap_treat_as_actionable.md): the survey test added in R1 used aweight (generic survey-vcov path), but the CHANGELOG/REGISTRY claim specifically that the auto-route short-circuits the absorb-refit replicate-variance branch at estimators.py:1693. Added test_absorb_hc2_bm_replicate_weights_auto_routes using SurveyDesign(replicate_method="JK1", replicate_weights=[...]) that exercises the replicate path and pins SE parity vs the explicit fixed_effects= form on both period_effects[target_period] and avg_att. Passing at atol=1e-12 confirms the documented short-circuit works as claimed. P3 (doc accuracy): REGISTRY/CHANGELOG/test-class docstring described the `treated` alias as "exactly collinear with the sum of treated-cohort unit dummies". Under pd.get_dummies(drop_first=True) the exact alias depends on the omitted FE reference category (and the intercept), not just on the cohort-dummy sum. Rewrite to say `treated` lies in the span of the intercept plus the post-auto-route unit FE dummies; which specific nuisance column gets dropped is pivot-order and dummy-coding dependent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-17T11:15:14Z

Overall assessment

⚠️ Needs changes

Executive summary

The methodology change itself is documented in docs/methodology/REGISTRY.md:L2551-L2555 and matches the same full-hat/FWL rationale already used for DifferenceInDifferences; I did not find an undocumented estimator or variance-formula deviation in the changed MultiPeriodDiD math.
[Newly identified] P1: the new auto-route now makes absorb=["unit","period"] land on a full-dummy design with duplicate period_* names. Those duplicates are later zipped into coef_dict, so MultiPeriodDiDResults.coefficients silently overwrites one set of coefficients with another and no longer faithfully represents the fitted model. diff_diff/estimators.py:L1574-L1609, diff_diff/estimators.py:L1968-L1985.
The new parity tests explicitly exercise that destination path (tests/test_estimators_vcov_type.py:L1283-L1314) but only check period_effects/avg_att; they do not verify the broader result-surface contract that the PR documents.
Survey/replicate short-circuiting looks internally consistent from the source; I did not find a separate SE/DOF defect there.
Runtime verification was not possible here because this environment is missing numpy/pytest, so this is a source-based review.

Methodology

P3 informational. Impact: none. Concrete fix: none required.
The absorb→fixed_effects lift for MultiPeriodDiD(vcov_type in {"hc2","hc2_bm"}) is documented in docs/methodology/REGISTRY.md:L2551-L2555 and matches the in-code rationale at diff_diff/estimators.py:L1464-L1496: HC2/CR2 need the full FE hat matrix, while within-transformation preserves coefficients/residuals but not leverage.

Code Quality

P1 [Newly identified]. Impact: the newly supported multi-absorb route can return a lossy and ambiguous results.coefficients mapping. On absorb=["unit","period"], the auto-route at diff_diff/estimators.py:L1493-L1496 produces fixed_effects=["unit","period"]; later var_names already contains period_2, period_3, ... from the event-study dummies, and pd.get_dummies(..., prefix="period") appends a second period_* block (diff_diff/estimators.py:L1574-L1609). When results are materialized, coef_dict = {name: coef for ...} collapses duplicate keys (diff_diff/estimators.py:L1968-L1985). That silently breaks the exported coefficient surface and makes it impossible to reliably align results.coefficients with the returned vcov.
Concrete fix: ensure the routed full-dummy design has unique exported names before building coef_dict (for example, disambiguate FE dummy labels when fe == time), or normalize away redundant time fixed effects on MPD before result construction. Add an assertion that exported coefficient names are unique.

Performance

No findings.

Maintainability

No findings beyond the naming-collision issue above.

Tech Debt

P3 informational. Impact: none. Concrete fix: none required.
The remaining unsupported HC2/HC2-BM absorbed-FE work is correctly narrowed to TWFE only in TODO.md:L99-L99; that deferral is properly tracked and is not a blocker for this PR.

Security

No findings.

Documentation/Tests

P2. Impact: the new regression coverage does not protect the result-surface behavior the PR now documents. tests/test_estimators_vcov_type.py:L1283-L1314 deliberately exercises the absorb=["unit","period"] / fixed_effects=["unit","period"] collision case, but it only asserts period_effects parity. At the same time, docs/methodology/REGISTRY.md:L2554-L2555 says the routed full-dummy fit changes the broader MultiPeriodDiDResults surface, including coefficients.
Concrete fix: add a regression test on the new multi-absorb route asserting that results.coefficients is complete and uniquely named, and that its cardinality stays compatible with the fitted design / vcov; if redundant time FE inputs are going to be rejected or normalized away instead, document that explicitly.

Path to Approval

Fix the MPD full-dummy naming collision so results.coefficients remains a complete, unambiguous representation on the newly supported absorb=["unit","period"] route.
Add regression coverage for that surface contract on both the plain and survey-weighted multi-absorb auto-route, and update the REGISTRY/CHANGELOG wording if the chosen fix changes how redundant time FEs are exposed.

CI Codex review on PR #459 surfaced a P1 newly exposed by the auto-route: when MPD(absorb=["unit","period"]) auto-routes to fixed_effects=["unit", "period"], the existing fixed_effects= expansion loop adds `period_X` dummies via `pd.get_dummies(prefix="period")` that collide on name with the event-study period dummies MPD already builds for non-reference periods. The duplicate `var_names` entries silently collapse in `coef_dict = {name: coef for name, coef in zip(var_names, coefficients)}`, overwriting the real event-study coefficients with the rank-deficient NaN drops on the redundant FE block. Result: `len(coefficients) < vcov.shape[0]` and `coefficients["period_X"] = NaN` even though `period_effects[X]` (read by position) was correct. Bug was pre-existing on MPD's `fixed_effects=[<time_col>]` path; the auto-route just made it newly reachable via `absorb=`. Fix: in MPD's fixed_effects expansion at estimators.py:1604, skip entries where `fe == time` — MPD's design already absorbs the time dimension via non-reference period dummies, so the FE-block dummies would be perfectly redundant anyway (NaN'd by solve_ols, dropping nothing useful while corrupting the result surface). Empirical evidence: - Pre-fix: `MPD(absorb=["unit","period"])` -> len(coefs)=34, vcov.shape=(38,38), coefs["period_2"]=NaN. - Post-fix: same call -> len(coefs)=34, vcov.shape=(34,34), coefs["period_2"]=0.345 (finite, matches MPD's event-study fit). Tests: new `test_absorb_hc2_result_surface_invariants_multi_absorb` asserts `len(coefficients) == vcov.shape[0]`, no duplicate names, and finite event-study `period_X` on BOTH the auto-route and the explicit `fixed_effects=` paths (Codex P2: regression coverage for the result- surface contract on the newly reachable path). 11/11 MPD tests pass; 249/249 in the broader sweep (test_estimators.py / test_linalg_hc2_bm.py unchanged). REGISTRY/CHANGELOG: documented the time-FE skip rule for both auto-route and pre-existing `fixed_effects=[<time_col>]` invocations. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-17T11:23:01Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 1e25f306f8576501eb861507007bbb686a06b9ce

Overall assessment

✅ Looks good

Executive summary

Re-review focus: the prior P1 on lossy MultiPeriodDiDResults.coefficients output is addressed. The new fe == time skip prevents duplicate period_* names before coef_dict is built, so the coefficients mapping no longer collapses entries or misaligns with vcov. diff_diff/estimators.py:L1603-L1619, diff_diff/estimators.py:L1981-L1985
The methodology change is now explicitly documented in the Methodology Registry, including the absorbed-FE HC2/HC2-BM auto-route, the survey-scope narrowing, and the redundant time-FE skip. That makes the behavior change documented rather than an undocumented deviation. docs/methodology/REGISTRY.md:L2551-L2555
The survey and replicate-weight routing is internally consistent with the new design: once the auto-route clears absorb, the code correctly falls onto the fixed-design survey vcov paths instead of the demeaning/refit branch. diff_diff/estimators.py:L1480-L1508, diff_diff/estimators.py:L1728-L1807
The new regression coverage now pins the exact failure mode from the previous review, along with R-parity and BM-DOF-sensitive inference on both period effects and avg_att. tests/test_estimators_vcov_type.py:L1291-L1610
No new unmitigated P0/P1 findings identified in the changed code.
Runtime verification was not possible here because pytest is not installed in this environment.

Methodology

Severity: P3-informational
Impact: none.
Concrete fix: none required.
The absorbed-FE HC2/HC2-BM support for MultiPeriodDiD is now documented in REGISTRY.md, and the implementation matches that documented contract: full-dummy routing for leverage-based vcov, unchanged HC1/CR1 demeaned paths, survey variance taking precedence, and the redundant time-FE skip called out explicitly as a documented deviation rather than a defect. docs/methodology/REGISTRY.md:L2551-L2555, diff_diff/estimators.py:L1464-L1496, diff_diff/estimators.py:L1603-L1619

Code Quality

No findings. The previously reported duplicate-name collision is resolved by skipping redundant time FE dummies before coef_dict construction, and the repaired surface contract is now regression-tested. diff_diff/estimators.py:L1603-L1619, diff_diff/estimators.py:L1981-L1985, tests/test_estimators_vcov_type.py:L1291-L1341

Performance

No findings. The new route reuses the existing fixed_effects= branch and only activates for vcov_type in {"hc2", "hc2_bm"}. diff_diff/estimators.py:L1464-L1496

Maintainability

No findings. The code, registry, changelog, and TODO entry now tell the same story about what is supported and what remains deferred. diff_diff/estimators.py:L1464-L1496, docs/methodology/REGISTRY.md:L2551-L2557, TODO.md:L99-L99

Tech Debt

Severity: P3-informational
Impact: none.
Concrete fix: none required.
The remaining absorbed-FE HC2/HC2-BM limitation is correctly narrowed to TwoWayFixedEffects, and that follow-up is explicitly tracked in TODO.md, so it is not blocking under the project’s deferral rules. TODO.md:L99-L99

Security

No findings.

Documentation/Tests

No findings. The new tests cover the repaired coefficients/vcov alignment contract, R-parity for HC2 and HC2-BM, DF-sensitive inference for both period effects and avg_att, and the survey/replicate auto-route behavior. tests/test_estimators_vcov_type.py:L1291-L1610

igerber and others added 3 commits May 17, 2026 06:44

igerber added the ready-for-ci Triggers CI test workflows label May 17, 2026

igerber merged commit 6a3e50b into main May 17, 2026
33 of 34 checks passed

igerber deleted the mpd-absorb-hc2-auto-route branch May 17, 2026 12:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route#459

Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route#459
igerber merged 4 commits into
mainfrom
mpd-absorb-hc2-auto-route

igerber commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

github-actions Bot commented May 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 17, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 17, 2026

Overall assessment

Executive summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Path to Approval

Uh oh!

github-actions Bot commented May 17, 2026

Overall assessment

Executive summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant