TwoStageDiD: thread vcov_type as narrow {hc1} contract (Phase 1b interstitial #5, final)#498
Merged
Merged
Conversation
…terstitial #5, final) TwoStageDiD's variance is the Gardner/did2s two-stage GMM cluster-sandwich (always clusters; default at unit) — a structural twin of ImputationDiD, NOT the GMM×HC2-BM beast the tracker described (that was SpilloverDiD's helper). Add vcov_type="hc1" accepting only {hc1}; reject {classical,hc2,hc2_bm} (the GMM-corrected meat S_g = gamma_hat' c_g - X_2g' eps_2g folds in first-stage uncertainty, so no single hat matrix spans both stages for HC2 leverage / BM-DOF) and conley (deferred). Results gains vcov_type/cluster_name/n_clusters + to_dict(); summary() renders the unit-cluster CR1 label with bootstrap + survey suppression gates. Bootstrap n_clusters<2 NaN guard (load-bearing, post-drop perturbation count) + survey n_psu<2 defense. cluster= + replicate weights raises NotImplementedError. Docs: REGISTRY taxonomy -> N=5 + TwoStageDiD Note, llms-full signature, both autosummary RSTs, CHANGELOG, TODO (initiative complete + conley follow-up). 34 new tests; ATT/SE bit-identical vs baseline across default/cluster/bootstrap. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… P2) n_clusters / cluster_name were derived from the full input `data`, but the GMM sandwich computes variance over `cluster_ids=df[cluster_var]` on the POST-DROP fit sample (always-treated units are removed before estimation). When an always-treated unit/cluster is excluded, reporting the full-input count overstates the effective G the SE is based on. Count clusters on `df` instead, matching the variance. Survey suppression (cluster_name=None) is unchanged; the Wave E.3 full-domain survey accounting is a separate, intentional path. Adds a regression test asserting n_clusters equals the post-drop count when an always-treated cohort is dropped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ct tests (codex P3) - two_stage_did(): expose vcov_type="hc1" explicitly (was hidden behind **kwargs) and forward it, matching the imputation_did/efficient_did sibling wrappers — the convenience API surface, generated signature, and IDE help now show the param. - Degenerate-bootstrap tests now assert the FULL public NaN-propagation contract (overall t_stat/p_value/conf_int + every event-study/group inference field) via a shared _assert_full_bootstrap_nan helper, not just overall_se, so a partial regression in _build_nan_bootstrap_results can't slip through. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…(codex P2) n_clusters used Series.nunique() (drops NaN), but the GMM sandwich counts np.unique(cluster_ids) (keeps a single NaN group). A non-survey cluster= column with missing IDs would make the reported G undercount the SE's actual cluster count. Count clusters the same way the variance does — np.unique(df[cluster_var]) — which also consolidates the two non-survey branches and still excludes always-treated-dropped units (df, not data). Adds a NaN-cluster regression test. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Overall assessment ✅ Looks good — I did not find any unmitigated P0/P1 issues in the changed Executive summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
Owner
Author
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
igerber
added a commit
that referenced
this pull request
May 30, 2026
…ed stale rows Cleanup of TODO.md now that the vcov_type threading initiative is complete (all 8 standalone estimators merged, TwoStageDiD #498 last). TODO.md only — no methodology or source changes. Compact prune of the completed initiative's leftovers: - the `| Done |` umbrella row + its orphan blank lines (rejoins the methodology table to its header), the Tier B threading bullet, and the stale duplicate TwoStageDiD-Conley row whose "`__init__` lacks `vcov_type`" premise is false post-#498 - the "Rows 104-105 LIFTED" comment block, the two ~~LIFTED~~ weighted-BM rows, and the Tier C LIFTED bullet (clubSandwich WLS-CR2 port, #475) - two resolved-marker HTML comments (WooldridgeDiD cohort_share; PreTrendsPower) - rewrote the Standard Error Consistency prose to "complete" and repointed its weighted-CR2 gate at the open multi-absorb row Staleness audit of the ~50 remaining follow-up rows (5 subagents; every finding re-verified against current source before acting -- the vast majority are genuine open deferrals): - removed the `bias_corrected_local_linear: weights=` row (shipped; residual unweighted-DPI gap already tracked by the sibling row) and narrowed the Tier D lprobust bullet's stale `weights` -> `weight-aware auto-bandwidth DPI` - removed the `compute_survey_metadata`/`raw_w_for_meta` dedup row (done via the shared `survey._resolve_survey_for_fit` helper) - tightened the HAD Phase-4.5-C survey-aware-pretests row: dropped the shipped pweight+PSU+FPC+strata narration, kept the two open items (replicate-weight designs; lonely_psu='adjust'+singleton on the Stute family) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
igerber
added a commit
that referenced
this pull request
May 30, 2026
…ed stale rows Cleanup of TODO.md now that the vcov_type threading initiative is complete (all 8 standalone estimators merged, TwoStageDiD #498 last). TODO.md only — no methodology or source changes. Compact prune of the completed initiative's leftovers: - the `| Done |` umbrella row + its orphan blank lines (rejoins the methodology table to its header), the Tier B threading bullet, and the stale duplicate TwoStageDiD-Conley row whose "`__init__` lacks `vcov_type`" premise is false post-#498 - the "Rows 104-105 LIFTED" comment block, the two ~~LIFTED~~ weighted-BM rows, and the Tier C LIFTED bullet (clubSandwich WLS-CR2 port, #475) - two resolved-marker HTML comments (WooldridgeDiD cohort_share; PreTrendsPower) - rewrote the Standard Error Consistency prose to "complete" and repointed its weighted-CR2 gate at the open multi-absorb row Staleness audit of the ~50 remaining follow-up rows (5 subagents; every finding re-verified against current source before acting -- the vast majority are genuine open deferrals): - removed the `bias_corrected_local_linear: weights=` row (shipped; residual unweighted-DPI gap already tracked by the sibling row) and narrowed the Tier D lprobust bullet's stale `weights` -> `weight-aware auto-bandwidth DPI` - removed the `compute_survey_metadata`/`raw_w_for_meta` dedup row (done via the shared `survey._resolve_survey_for_fit` helper) - tightened the HAD Phase-4.5-C survey-aware-pretests row: dropped the shipped pweight+PSU+FPC+strata narration, kept the two open items (replicate-weight designs; lonely_psu='adjust'+singleton on the Stute family) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Merged
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
vcov_type="hc1"throughTwoStageDiD(Phase 1b interstitial Add comprehensive code review for diff-diff library #5 — the final standalone estimator, completing the initiative across all 8). Accept only{"hc1"}; reject{classical, hc2, hc2_bm, conley}with GMM-meat-specific messages. Addsvcov_type/cluster_name/n_clusters+to_dict()toTwoStageDiDResults;summary()renders the unit-cluster CR1 label with bootstrap + survey suppression gates. Defensive bootstrapn_clusters<2/n_psu<2NaN guard;cluster=+ replicate-weights raisesNotImplementedError.np.unique(df[cluster_var])so it matches the variance exactly — including always-treated drops and NaN cluster IDs.vcov_typeexplicitly on thetwo_stage_did()convenience wrapper (was hidden behind**kwargs), matching the sibling wrappers.**Note**;llms-full.txtsignature; both checked-in autosummary RSTs; CHANGELOG; TODO (initiative marked complete +conleyfollow-up row).Methodology references
did2s).did2s, R Journal 14(1).vcov_typepermanently narrow to{"hc1"}— the GMM-corrected meatS_g = γ̂'c_g − X_2g'ε_2gfolds first-stage FE uncertainty into the score, so no single hat matrix spans both stages on which HC2 leverage / Bell-McCaffrey DOF can be defined (documented inREGISTRY.mdIF-vs-sandwich taxonomy + TwoStageDiD section, mirroring the SpilloverDiDclassicalrejection). did2s no-FSA convention: theCR1family label carries no(n-1)/(n-p)factor (**Note (deviation from R):**).vcov_type="conley"deferred (TODO.md).Validation
tests/test_two_stage.py::TestTwoStageDiDVcovType(36 tests) — invalid-vcov rejection,hc1no-op bit-equality across analytical / cluster / TSL-survey / replicate-survey / bootstrap paths (parametrized over aggregate modes), degenerate-bootstrap full public NaN propagation, post-drop + NaN-cluster metadata parity, summary labels,set_paramsfit-time revalidation,cluster=+replicate rejection, convenience-fn threading.Security / privacy
🤖 Generated with Claude Code