Skip to content

Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route#459

Merged
igerber merged 4 commits into
mainfrom
mpd-absorb-hc2-auto-route
May 17, 2026
Merged

Lift MultiPeriodDiD-absorb HC2/HC2-BM gate via auto-route#459
igerber merged 4 commits into
mainfrom
mpd-absorb-hc2-auto-route

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 17, 2026

Summary

  • Lift MultiPeriodDiD(absorb=..., vcov_type in {"hc2","hc2_bm"}) NotImplementedError gate at estimators.py:1476 via 5-LoC auto-route to fixed_effects= internally — mirrors PR DiD-absorb HC2/HC2-BM: auto-route to fixed_effects internally #458's DiD-absorb pattern; same algebraic justification (FWL preserves coefficients/residuals but not the hat matrix; HC2/CR2 leverage corrections need the full FE-dummy hat).
  • Three-guard reorder so the auto-route sits BETWEEN the absorb+fixed_effects mutual-exclusion check (above) and the multi-absorb+survey-weights reject (below), matching the DiD ordering.
  • Survey-replicate absorb-refit branch at estimators.py:1693 is correctly short-circuited under the auto-route (standard compute_replicate_vcov path applies to the fixed full-dummy design; no per-replicate refit needed) — verified by new JK1 replicate-weights regression test.
  • New TestMPDAbsorbedFERParity class (10 tests): bit-equality auto-route invariants (single-absorb, multi-absorb, HC2-BM), R-parity vs sandwich::vcovHC and clubSandwich::vcovCR (1e-10), df-sensitive inference on both period_effects and avg_att, survey-weighted multi-absorb auto-route, JK1 replicate-weight regression, and mutual-exclusion preservation.
  • After this PR, TODO row 99 narrows to TWFE only (separate surgery — TWFE has no fixed_effects= equivalent path).

Methodology references (required if estimator / math changes)

  • Method name(s): Eicker-Huber-White HC2 leverage correction; Bell-McCaffrey (2002) / Imbens-Kolesar (2016) / Pustejovsky-Tipton (2018) Satterthwaite DOF; Frisch-Waugh-Lovell theorem
  • Paper / source link(s): clubSandwich R/CR-adjustments.R (unweighted CR2 algebra reference); PT2018 §3.3 singleton-cluster CR2 trick for one-way HC2-BM; sandwich vcovHC for HC2 anchor. See docs/methodology/REGISTRY.md § HC2 + Bell-McCaffrey scope-limitation block (now-updated MultiPeriodDiD status).
  • Any intentional deviations from the source: None. Under the auto-route the full-dummy design is bit-identical to lm(y ~ treated + period_X dummies + treated:period_X + factor(unit), data=d); HC2/HC2-BM SEs match R at ~1e-10 (smoke test ≤ 1e-15).

Validation

  • Tests added/updated: tests/test_estimators_vcov_type.py::TestMPDAbsorbedFERParity (10 new tests); existing test_multi_period_absorb_rejects_hc2_and_hc2_bm removed (its gate is now lifted, and the new class covers the same single-absorb shape with the new contract).
  • R goldens: new mpd_absorbed_fe_did scenario in benchmarks/R/generate_clubsandwich_golden.R + regenerated benchmarks/data/clubsandwich_cr2_golden.json — 5-cohort × 5-period event-study fixture (4 ever-treated + 1 never-treated cohort) with HC2 (sandwich) + HC2-BM singleton-cluster CR2 (clubSandwich) targets pinned on treated_period_4.
  • No regressions: 247 tests pass across test_estimators_vcov_type.py (67), test_estimators.py (180), test_linalg_hc2_bm.py.
  • Backtest / simulation / notebook evidence: N/A (analytical-sandwich methodology only; no tutorials updated).

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

igerber and others added 3 commits May 17, 2026 06:44
…ixed_effects=

Mirrors PR #458 (DiD-absorb auto-route) on MultiPeriodDiD: when absorb=
is paired with vcov_type in {hc2, hc2_bm}, the fit promotes the absorb
columns to fixed_effects= internally so the existing full-dummy-design
code path computes the algebraically correct vcov on the event-study
design (treated + period_X dummies + treated:period_X interactions +
factor(unit)). Verified at ~1e-15 vs lm() + sandwich::vcovHC(type="HC2")
and lm() + clubSandwich::vcovCR(cluster=1:n, type="CR2") on a new
5-cohort x 5-period mpd_absorbed_fe_did fixture.

Includes three-guard reorder so the auto-route sits BETWEEN the absorb +
fixed_effects mutual-exclusion check (above) and the multi-absorb +
survey-weights reject (below), matching the DiD ordering.

The survey-replicate absorb-refit branch at estimators.py:1689 is short-
circuited under the auto-route (the standard compute_replicate_vcov path
applies on the fixed full-dummy design; no per-replicate refit needed).

Tests: new TestMPDAbsorbedFERParity class (7 tests) mirrors PR #458's
TestDiDAbsorbedFERParity, pinning parity targets on per-period
interaction coefficients (treated:period_4) to avoid the treated x unit
collinearity baked into MPD's time-invariant ever-treated indicator.
Existing test_multi_period_absorb_rejects_hc2_and_hc2_bm deleted.

REGISTRY.md per-estimator status block updated (MPD moves REJECT ->
SUPPORTED; TWFE remains the only REJECT case). TODO row 99 narrowed to
TWFE-only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex local review surfaced two findings on the MPD-absorb gate lift:

P3 (methodology accuracy): REGISTRY/CHANGELOG/test-class docstring
claimed the `treated` main-effect coefficient becomes NaN under
rank deficiency. Empirically false — in the shipped parity fixture
solve_ols drops a never-treated unit dummy (`unit_25`) and keeps
`treated` finite. The pivot order is data-dependent. Rewrite to
say one column from the collinear set is dropped under R-style
rank-deficiency handling; per-period interactions and avg_att are
identified and invariant to that choice.

P2 (test coverage): the new MPD test class missed two surfaces that
the DiD analogue covers:
  1. Survey-weighted multi-absorb auto-route bypass of the
     `len(absorb) > 1 + survey_weights` reject — exercised by new
     `test_absorb_hc2_bm_survey_multi_absorb_auto_routes` with
     parity assertion against the explicit `fixed_effects=` path
     on both `period_effects[target]` and `avg_att`.
  2. The MPD-specific `avg_att` (post-period-average) contrast did
     not have a direct HC2-vs-HC2-BM inference pin. Added
     `test_absorb_hc2_bm_avg_att_df_sensitive_inference` asserting
     same avg_se but different avg_p_value / wider avg_conf_int
     under HC2-BM (the contract guard the per-period
     df_sensitive test cannot reach).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex R2 returned ✅ with two P3s.

P3 (test coverage upgraded to actionable per
feedback_test_coverage_gap_treat_as_actionable.md): the survey test
added in R1 used aweight (generic survey-vcov path), but the
CHANGELOG/REGISTRY claim specifically that the auto-route
short-circuits the absorb-refit replicate-variance branch at
estimators.py:1693. Added test_absorb_hc2_bm_replicate_weights_auto_routes
using SurveyDesign(replicate_method="JK1", replicate_weights=[...])
that exercises the replicate path and pins SE parity vs the explicit
fixed_effects= form on both period_effects[target_period] and avg_att.
Passing at atol=1e-12 confirms the documented short-circuit works as
claimed.

P3 (doc accuracy): REGISTRY/CHANGELOG/test-class docstring described
the `treated` alias as "exactly collinear with the sum of treated-cohort
unit dummies". Under pd.get_dummies(drop_first=True) the exact alias
depends on the omitted FE reference category (and the intercept), not
just on the cohort-dummy sum. Rewrite to say `treated` lies in the span
of the intercept plus the post-auto-route unit FE dummies; which
specific nuisance column gets dropped is pivot-order and dummy-coding
dependent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall assessment

⚠️ Needs changes

Executive summary

  • The methodology change itself is documented in docs/methodology/REGISTRY.md:L2551-L2555 and matches the same full-hat/FWL rationale already used for DifferenceInDifferences; I did not find an undocumented estimator or variance-formula deviation in the changed MultiPeriodDiD math.
  • [Newly identified] P1: the new auto-route now makes absorb=["unit","period"] land on a full-dummy design with duplicate period_* names. Those duplicates are later zipped into coef_dict, so MultiPeriodDiDResults.coefficients silently overwrites one set of coefficients with another and no longer faithfully represents the fitted model. diff_diff/estimators.py:L1574-L1609, diff_diff/estimators.py:L1968-L1985.
  • The new parity tests explicitly exercise that destination path (tests/test_estimators_vcov_type.py:L1283-L1314) but only check period_effects/avg_att; they do not verify the broader result-surface contract that the PR documents.
  • Survey/replicate short-circuiting looks internally consistent from the source; I did not find a separate SE/DOF defect there.
  • Runtime verification was not possible here because this environment is missing numpy/pytest, so this is a source-based review.

Methodology

  • P3 informational. Impact: none. Concrete fix: none required.
    The absorb→fixed_effects lift for MultiPeriodDiD(vcov_type in {"hc2","hc2_bm"}) is documented in docs/methodology/REGISTRY.md:L2551-L2555 and matches the in-code rationale at diff_diff/estimators.py:L1464-L1496: HC2/CR2 need the full FE hat matrix, while within-transformation preserves coefficients/residuals but not leverage.

Code Quality

  • P1 [Newly identified]. Impact: the newly supported multi-absorb route can return a lossy and ambiguous results.coefficients mapping. On absorb=["unit","period"], the auto-route at diff_diff/estimators.py:L1493-L1496 produces fixed_effects=["unit","period"]; later var_names already contains period_2, period_3, ... from the event-study dummies, and pd.get_dummies(..., prefix="period") appends a second period_* block (diff_diff/estimators.py:L1574-L1609). When results are materialized, coef_dict = {name: coef for ...} collapses duplicate keys (diff_diff/estimators.py:L1968-L1985). That silently breaks the exported coefficient surface and makes it impossible to reliably align results.coefficients with the returned vcov.
    Concrete fix: ensure the routed full-dummy design has unique exported names before building coef_dict (for example, disambiguate FE dummy labels when fe == time), or normalize away redundant time fixed effects on MPD before result construction. Add an assertion that exported coefficient names are unique.

Performance

  • No findings.

Maintainability

  • No findings beyond the naming-collision issue above.

Tech Debt

  • P3 informational. Impact: none. Concrete fix: none required.
    The remaining unsupported HC2/HC2-BM absorbed-FE work is correctly narrowed to TWFE only in TODO.md:L99-L99; that deferral is properly tracked and is not a blocker for this PR.

Security

  • No findings.

Documentation/Tests

  • P2. Impact: the new regression coverage does not protect the result-surface behavior the PR now documents. tests/test_estimators_vcov_type.py:L1283-L1314 deliberately exercises the absorb=["unit","period"] / fixed_effects=["unit","period"] collision case, but it only asserts period_effects parity. At the same time, docs/methodology/REGISTRY.md:L2554-L2555 says the routed full-dummy fit changes the broader MultiPeriodDiDResults surface, including coefficients.
    Concrete fix: add a regression test on the new multi-absorb route asserting that results.coefficients is complete and uniquely named, and that its cardinality stays compatible with the fitted design / vcov; if redundant time FE inputs are going to be rejected or normalized away instead, document that explicitly.

Path to Approval

  1. Fix the MPD full-dummy naming collision so results.coefficients remains a complete, unambiguous representation on the newly supported absorb=["unit","period"] route.
  2. Add regression coverage for that surface contract on both the plain and survey-weighted multi-absorb auto-route, and update the REGISTRY/CHANGELOG wording if the chosen fix changes how redundant time FEs are exposed.

CI Codex review on PR #459 surfaced a P1 newly exposed by the auto-route:
when MPD(absorb=["unit","period"]) auto-routes to fixed_effects=["unit",
"period"], the existing fixed_effects= expansion loop adds `period_X`
dummies via `pd.get_dummies(prefix="period")` that collide on name with
the event-study period dummies MPD already builds for non-reference
periods. The duplicate `var_names` entries silently collapse in
`coef_dict = {name: coef for name, coef in zip(var_names, coefficients)}`,
overwriting the real event-study coefficients with the rank-deficient
NaN drops on the redundant FE block. Result: `len(coefficients) <
vcov.shape[0]` and `coefficients["period_X"] = NaN` even though
`period_effects[X]` (read by position) was correct.

Bug was pre-existing on MPD's `fixed_effects=[<time_col>]` path; the
auto-route just made it newly reachable via `absorb=`.

Fix: in MPD's fixed_effects expansion at estimators.py:1604, skip
entries where `fe == time` — MPD's design already absorbs the time
dimension via non-reference period dummies, so the FE-block dummies
would be perfectly redundant anyway (NaN'd by solve_ols, dropping
nothing useful while corrupting the result surface).

Empirical evidence:
- Pre-fix: `MPD(absorb=["unit","period"])` -> len(coefs)=34,
  vcov.shape=(38,38), coefs["period_2"]=NaN.
- Post-fix: same call -> len(coefs)=34, vcov.shape=(34,34),
  coefs["period_2"]=0.345 (finite, matches MPD's event-study fit).

Tests: new `test_absorb_hc2_result_surface_invariants_multi_absorb`
asserts `len(coefficients) == vcov.shape[0]`, no duplicate names, and
finite event-study `period_X` on BOTH the auto-route and the explicit
`fixed_effects=` paths (Codex P2: regression coverage for the result-
surface contract on the newly reachable path). 11/11 MPD tests pass;
249/249 in the broader sweep (test_estimators.py / test_linalg_hc2_bm.py
unchanged).

REGISTRY/CHANGELOG: documented the time-FE skip rule for both auto-route
and pre-existing `fixed_effects=[<time_col>]` invocations.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 1e25f306f8576501eb861507007bbb686a06b9ce


Overall assessment

Looks good

Executive summary

  • Re-review focus: the prior P1 on lossy MultiPeriodDiDResults.coefficients output is addressed. The new fe == time skip prevents duplicate period_* names before coef_dict is built, so the coefficients mapping no longer collapses entries or misaligns with vcov. diff_diff/estimators.py:L1603-L1619, diff_diff/estimators.py:L1981-L1985
  • The methodology change is now explicitly documented in the Methodology Registry, including the absorbed-FE HC2/HC2-BM auto-route, the survey-scope narrowing, and the redundant time-FE skip. That makes the behavior change documented rather than an undocumented deviation. docs/methodology/REGISTRY.md:L2551-L2555
  • The survey and replicate-weight routing is internally consistent with the new design: once the auto-route clears absorb, the code correctly falls onto the fixed-design survey vcov paths instead of the demeaning/refit branch. diff_diff/estimators.py:L1480-L1508, diff_diff/estimators.py:L1728-L1807
  • The new regression coverage now pins the exact failure mode from the previous review, along with R-parity and BM-DOF-sensitive inference on both period effects and avg_att. tests/test_estimators_vcov_type.py:L1291-L1610
  • No new unmitigated P0/P1 findings identified in the changed code.
  • Runtime verification was not possible here because pytest is not installed in this environment.

Methodology

  • Severity: P3-informational
    Impact: none.
    Concrete fix: none required.
    The absorbed-FE HC2/HC2-BM support for MultiPeriodDiD is now documented in REGISTRY.md, and the implementation matches that documented contract: full-dummy routing for leverage-based vcov, unchanged HC1/CR1 demeaned paths, survey variance taking precedence, and the redundant time-FE skip called out explicitly as a documented deviation rather than a defect. docs/methodology/REGISTRY.md:L2551-L2555, diff_diff/estimators.py:L1464-L1496, diff_diff/estimators.py:L1603-L1619

Code Quality

  • No findings. The previously reported duplicate-name collision is resolved by skipping redundant time FE dummies before coef_dict construction, and the repaired surface contract is now regression-tested. diff_diff/estimators.py:L1603-L1619, diff_diff/estimators.py:L1981-L1985, tests/test_estimators_vcov_type.py:L1291-L1341

Performance

  • No findings. The new route reuses the existing fixed_effects= branch and only activates for vcov_type in {"hc2", "hc2_bm"}. diff_diff/estimators.py:L1464-L1496

Maintainability

  • No findings. The code, registry, changelog, and TODO entry now tell the same story about what is supported and what remains deferred. diff_diff/estimators.py:L1464-L1496, docs/methodology/REGISTRY.md:L2551-L2557, TODO.md:L99-L99

Tech Debt

  • Severity: P3-informational
    Impact: none.
    Concrete fix: none required.
    The remaining absorbed-FE HC2/HC2-BM limitation is correctly narrowed to TwoWayFixedEffects, and that follow-up is explicitly tracked in TODO.md, so it is not blocking under the project’s deferral rules. TODO.md:L99-L99

Security

  • No findings.

Documentation/Tests

  • No findings. The new tests cover the repaired coefficients/vcov alignment contract, R-parity for HC2 and HC2-BM, DF-sensitive inference for both period effects and avg_att, and the survey/replicate auto-route behavior. tests/test_estimators_vcov_type.py:L1291-L1610

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 17, 2026
@igerber igerber merged commit 6a3e50b into main May 17, 2026
33 of 34 checks passed
@igerber igerber deleted the mpd-absorb-hc2-auto-route branch May 17, 2026 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant