TwoStageDiD: thread vcov_type as narrow {hc1} contract (Phase 1b interstitial #5, final) by igerber · Pull Request #498 · igerber/diff-diff

igerber · 2026-05-30T11:25:13Z

Summary

Thread vcov_type="hc1" through TwoStageDiD (Phase 1b interstitial Add comprehensive code review for diff-diff library #5 — the final standalone estimator, completing the initiative across all 8). Accept only {"hc1"}; reject {classical, hc2, hc2_bm, conley} with GMM-meat-specific messages. Adds vcov_type / cluster_name / n_clusters + to_dict() to TwoStageDiDResults; summary() renders the unit-cluster CR1 label with bootstrap + survey suppression gates. Defensive bootstrap n_clusters<2 / n_psu<2 NaN guard; cluster= + replicate-weights raises NotImplementedError.
Report the post-drop effective cluster count in metadata (always-treated units are excluded before estimation), counted via np.unique(df[cluster_var]) so it matches the variance exactly — including always-treated drops and NaN cluster IDs.
Expose vcov_type explicitly on the two_stage_did() convenience wrapper (was hidden behind **kwargs), matching the sibling wrappers.
Docs: REGISTRY IF-vs-sandwich taxonomy → N=5 + TwoStageDiD **Note**; llms-full.txt signature; both checked-in autosummary RSTs; CHANGELOG; TODO (initiative marked complete + conley follow-up row).

Methodology references

Method name(s): TwoStageDiD — Gardner (2022) two-stage DiD, GMM cluster-sandwich (did2s).
Paper / source link(s): Gardner (2022) arXiv:2207.05943; Butts & Gardner (2022) did2s, R Journal 14(1).
Intentional deviations: vcov_type permanently narrow to {"hc1"} — the GMM-corrected meat S_g = γ̂'c_g − X_2g'ε_2g folds first-stage FE uncertainty into the score, so no single hat matrix spans both stages on which HC2 leverage / Bell-McCaffrey DOF can be defined (documented in REGISTRY.md IF-vs-sandwich taxonomy + TwoStageDiD section, mirroring the SpilloverDiD classical rejection). did2s no-FSA convention: the CR1 family label carries no (n-1)/(n-p) factor (**Note (deviation from R):**). vcov_type="conley" deferred (TODO.md).

Validation

Tests added/updated: tests/test_two_stage.py::TestTwoStageDiDVcovType (36 tests) — invalid-vcov rejection, hc1 no-op bit-equality across analytical / cluster / TSL-survey / replicate-survey / bootstrap paths (parametrized over aggregate modes), degenerate-bootstrap full public NaN propagation, post-drop + NaN-cluster metadata parity, summary labels, set_params fit-time revalidation, cluster=+replicate rejection, convenience-fn threading.
Evidence: ATT/SE bit-identical vs pre-PR baseline across default / cluster / bootstrap (incl. per-horizon + per-group SE). 4 local Codex rounds (full standard mode) converged clean (no actionable findings).

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

…terstitial #5, final) TwoStageDiD's variance is the Gardner/did2s two-stage GMM cluster-sandwich (always clusters; default at unit) — a structural twin of ImputationDiD, NOT the GMM×HC2-BM beast the tracker described (that was SpilloverDiD's helper). Add vcov_type="hc1" accepting only {hc1}; reject {classical,hc2,hc2_bm} (the GMM-corrected meat S_g = gamma_hat' c_g - X_2g' eps_2g folds in first-stage uncertainty, so no single hat matrix spans both stages for HC2 leverage / BM-DOF) and conley (deferred). Results gains vcov_type/cluster_name/n_clusters + to_dict(); summary() renders the unit-cluster CR1 label with bootstrap + survey suppression gates. Bootstrap n_clusters<2 NaN guard (load-bearing, post-drop perturbation count) + survey n_psu<2 defense. cluster= + replicate weights raises NotImplementedError. Docs: REGISTRY taxonomy -> N=5 + TwoStageDiD Note, llms-full signature, both autosummary RSTs, CHANGELOG, TODO (initiative complete + conley follow-up). 34 new tests; ATT/SE bit-identical vs baseline across default/cluster/bootstrap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

… P2) n_clusters / cluster_name were derived from the full input `data`, but the GMM sandwich computes variance over `cluster_ids=df[cluster_var]` on the POST-DROP fit sample (always-treated units are removed before estimation). When an always-treated unit/cluster is excluded, reporting the full-input count overstates the effective G the SE is based on. Count clusters on `df` instead, matching the variance. Survey suppression (cluster_name=None) is unchanged; the Wave E.3 full-domain survey accounting is a separate, intentional path. Adds a regression test asserting n_clusters equals the post-drop count when an always-treated cohort is dropped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…ct tests (codex P3) - two_stage_did(): expose vcov_type="hc1" explicitly (was hidden behind **kwargs) and forward it, matching the imputation_did/efficient_did sibling wrappers — the convenience API surface, generated signature, and IDE help now show the param. - Degenerate-bootstrap tests now assert the FULL public NaN-propagation contract (overall t_stat/p_value/conf_int + every event-study/group inference field) via a shared _assert_full_bootstrap_nan helper, not just overall_se, so a partial regression in _build_nan_bootstrap_results can't slip through. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…(codex P2) n_clusters used Series.nunique() (drops NaN), but the GMM sandwich counts np.unique(cluster_ids) (keeps a single NaN group). A non-survey cluster= column with missing IDs would make the reported G undercount the SE's actual cluster count. Count clusters the same way the variance does — np.unique(df[cluster_var]) — which also consolidates the two non-survey branches and still excludes always-treated-dropped units (df, not data). Adds a NaN-cluster regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-30T11:32:18Z

Overall assessment

✅ Looks good — I did not find any unmitigated P0/P1 issues in the changed TwoStageDiD estimator, bootstrap, or result-reporting paths.

Executive summary

Affected method: TwoStageDiD (Gardner 2022 / Butts & Gardner 2022). The source references in the estimator docstring and the Methodology Registry are aligned. diff_diff/two_stage.py L1297-L1308; docs/methodology/REGISTRY.md L1390-L1408
The new vcov_type={"hc1"} restriction is a documented methodology choice, not an undocumented deviation. Enforcement at __init__ and fit() matches the registry’s Gardner-GMM rationale. diff_diff/two_stage.py L3261-L3308; docs/methodology/REGISTRY.md L348-L357
The new bootstrap degeneracy guards correctly fail closed to all-NaN inference for n_clusters<2 / n_psu<2, and the NaNs are propagated through overall, event-study, and group outputs. diff_diff/two_stage_bootstrap.py L219-L403; diff_diff/two_stage.py L2019-L2065
Result metadata/reporting stays aligned with the actual inference path: default non-survey fits expose unit-cluster CR1 metadata, survey fits suppress cluster metadata, and bootstrap suppresses the analytical variance label. diff_diff/two_stage.py L2067-L2113; diff_diff/two_stage_results.py L231-L260; diff_diff/two_stage_results.py L456-L503
The remaining vcov_type="conley" gap is explicitly tracked as deferred work, so it is non-blocking for this PR. TODO.md L116-L116

Methodology

Severity: P3 informational. Impact: The permanent {"hc1"} contract, CR1-at-unit default labeling, no-finite-sample-adjustment note, replicate-weight cluster= rejection, and deferred conley path are all explicitly documented in the registry and match the changed implementation, so I am not treating them as defects. Concrete fix: none. docs/methodology/REGISTRY.md L348-L357; docs/methodology/REGISTRY.md L1403-L1408; diff_diff/two_stage.py L1403-L1447; diff_diff/two_stage.py L3261-L3308

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 informational. Impact: vcov_type="conley" remains intentionally deferred, and that follow-up is already tracked in TODO.md, so this is not blocker-level debt for the current PR. Concrete fix: none in this PR; handle the tracked follow-up separately. TODO.md L116-L116; diff_diff/two_stage.py L3296-L3303

Security

No findings.

Documentation/Tests

Severity: P3 informational. Impact: The added tests cover the new public contract, summary/reporting behavior, replicate-weight rejection, degenerate-bootstrap NaN propagation, and metadata parity checks. I could not independently execute them in this sandbox because pytest and numpy are not installed here. Concrete fix: rely on CI or rerun tests/test_two_stage.py in the normal project environment. tests/test_two_stage.py L2019-L2280

igerber · 2026-05-30T11:39:57Z

/ai-review

github-actions · 2026-05-30T11:48:52Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 9f3368a7d96af6ff1dee5f22456996d442cb3147

Overall Assessment

✅ Looks good — no unmitigated P0/P1 findings.

Executive Summary

Re-review scope: the prior AI review had no open P1+ items, and I did not find newly introduced P1+ regressions in the changed TwoStageDiD vcov-threading work.
Affected method: TwoStageDiD. The PR constrains the public vcov_type API to the already-implemented Gardner GMM sandwich, without changing point-estimation or analytical variance arithmetic. diff_diff/two_stage.py:L1259-L1268, diff_diff/two_stage.py:L3261-L3308, docs/methodology/REGISTRY.md:L348-L357, docs/methodology/REGISTRY.md:L1403-L1408
The new bootstrap degeneracy guards fail closed correctly: n_clusters<2 / n_psu<2 now return all-NaN bootstrap inference, and the downstream fit path propagates that to overall, event-study, and group outputs. diff_diff/two_stage_bootstrap.py:L219-L267, diff_diff/two_stage_bootstrap.py:L354-L403, diff_diff/two_stage.py:L2019-L2065, tests/test_two_stage.py:L1982-L2221
Results metadata now matches the actual variance path, including post-drop effective cluster counts and NaN-cluster parity. diff_diff/two_stage.py:L2067-L2088, diff_diff/two_stage_results.py:L231-L261, diff_diff/two_stage_results.py:L456-L503, tests/test_two_stage.py:L2225-L2280
Minor P3 only: the shared LLM guide still describes the variance-metadata surface as if TwoStageDiDResults were excluded. diff_diff/guides/llms-full.txt:L1286-L1292

Methodology

Severity: P3 informational. Impact: The affected method is TwoStageDiD; Gardner frames it as a two-stage procedure that estimates group/time effects from untreated observations in the first stage, then estimates ATT after removing those effects, and the did2s R Journal article presents the corresponding implementation with custom SE handling. This PR’s permanent vcov_type={"hc1"} restriction, CR1-at-unit labeling, and cluster= + replicate-weight rejection are all explicitly documented in the Methodology Registry, so I am not treating them as defects. Concrete fix: none. diff_diff/two_stage.py:L1297-L1308, diff_diff/two_stage.py:L1432-L1447, diff_diff/two_stage.py:L3261-L3308, docs/methodology/REGISTRY.md:L1403-L1408. citeturn1view0turn0view0

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 informational. Impact: vcov_type="conley" is still deferred for TwoStageDiD, but this PR tracks that follow-up explicitly in TODO.md, so it is not blocker-level debt for the current change. Concrete fix: none in this PR; handle the tracked follow-up separately. diff_diff/two_stage.py:L3296-L3303, TODO.md:L116-L116

Security

No findings.

Documentation/Tests

Severity: P3. Impact: diff_diff/guides/llms-full.txt:L1286-L1292 still says only ImputationDiDResults and EfficientDiDResults carry the shared variance metadata, and its vcov_type description omits TwoStageDiD’s new permanently narrow contract. That leaves the internal guide partially stale after this PR’s public-surface update. Concrete fix: update that guide block to list TwoStageDiDResults among the results carrying vcov_type / cluster_name / n_clusters, and mention the Gardner-GMM {"hc1"} restriction.
No findings on changed-path test coverage. The added tests cover invalid-vcov rejection, wrapper threading, aggregate/cluster/survey/replicate/bootstrap parity, degenerate-bootstrap NaN propagation, and metadata parity in tests/test_two_stage.py:L2019-L2280. I could not execute pytest here because pytest is not installed in this environment.

…ed stale rows Cleanup of TODO.md now that the vcov_type threading initiative is complete (all 8 standalone estimators merged, TwoStageDiD #498 last). TODO.md only — no methodology or source changes. Compact prune of the completed initiative's leftovers: - the `| Done |` umbrella row + its orphan blank lines (rejoins the methodology table to its header), the Tier B threading bullet, and the stale duplicate TwoStageDiD-Conley row whose "`__init__` lacks `vcov_type`" premise is false post-#498 - the "Rows 104-105 LIFTED" comment block, the two ~~LIFTED~~ weighted-BM rows, and the Tier C LIFTED bullet (clubSandwich WLS-CR2 port, #475) - two resolved-marker HTML comments (WooldridgeDiD cohort_share; PreTrendsPower) - rewrote the Standard Error Consistency prose to "complete" and repointed its weighted-CR2 gate at the open multi-absorb row Staleness audit of the ~50 remaining follow-up rows (5 subagents; every finding re-verified against current source before acting -- the vast majority are genuine open deferrals): - removed the `bias_corrected_local_linear: weights=` row (shipped; residual unweighted-DPI gap already tracked by the sibling row) and narrowed the Tier D lprobust bullet's stale `weights` -> `weight-aware auto-bandwidth DPI` - removed the `compute_survey_metadata`/`raw_w_for_meta` dedup row (done via the shared `survey._resolve_survey_for_fit` helper) - tightened the HAD Phase-4.5-C survey-aware-pretests row: dropped the shipped pweight+PSU+FPC+strata narration, kept the two open items (replicate-weight designs; lonely_psu='adjust'+singleton on the Stute family) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

igerber and others added 4 commits May 30, 2026 07:23

igerber added the ready-for-ci Triggers CI test workflows label May 30, 2026

igerber merged commit b3dc8d0 into main May 30, 2026
33 of 34 checks passed

igerber mentioned this pull request May 30, 2026

chore: prune completed-vcov-initiative leftovers + audit-verified stale TODO rows #502

Merged

igerber deleted the feature/two-stage-did-vcov-type-phase1b branch May 31, 2026 18:42

igerber mentioned this pull request Jun 1, 2026

Bump version to 3.5.0 #516

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TwoStageDiD: thread vcov_type as narrow {hc1} contract (Phase 1b interstitial #5, final)#498

TwoStageDiD: thread vcov_type as narrow {hc1} contract (Phase 1b interstitial #5, final)#498
igerber merged 4 commits into
mainfrom
feature/two-stage-did-vcov-type-phase1b

igerber commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

igerber commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 30, 2026

Summary

Methodology references

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

igerber commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant