chore: prune completed-vcov-initiative leftovers + audit-verified stale TODO rows by igerber · Pull Request #502 · igerber/diff-diff

igerber · 2026-05-30T22:25:56Z

Summary

Prune TODO.md leftovers from the now-complete vcov_type threading initiative (all 8 standalone estimators merged, TwoStageDiD TwoStageDiD: thread vcov_type as narrow {hc1} contract (Phase 1b interstitial #5, final) #498 last): the | Done | umbrella row, the Tier B threading bullet, the stale duplicate TwoStageDiD-Conley row, the "Rows 104-105 LIFTED" comment block + the two ~~LIFTED~~ weighted-BM rows + the Tier C LIFTED bullet, and two resolved-marker HTML comments; rewrite the Standard Error Consistency prose to "complete" and repoint its weighted-CR2 gate at the open multi-absorb row.
Audit-verified prune of stale follow-up rows (5 subagents over ~50 rows, every finding re-verified against current source before acting — the vast majority are genuine open deferrals): remove the bias_corrected_local_linear: weights= row (shipped) and narrow the Tier D lprobust bullet's weights → weight-aware auto-bandwidth DPI; remove the compute_survey_metadata/raw_w_for_meta dedup row (done via the shared survey._resolve_survey_for_fit helper); tighten the HAD Phase-4.5-C survey-aware-pretests row to its two remaining open items.
Net: TODO.md 363 → 346 lines. Documentation/tracker only — no methodology or source changes.

Methodology references (required if estimator / math changes)

Method name(s): N/A — no methodology changes (TODO.md tracker only; nothing changed in diff_diff/, rust/src/, or docs/methodology/)
Paper / source link(s): N/A
Any intentional deviations from the source (and why): None

Validation

Tests added/updated: No test changes (documentation/tracker only)
Backtest / simulation / notebook evidence (if applicable): N/A
Local AI review: codex (full standard mode, --force-fresh) — ✅ clean, zero findings after iterating two P3s (Tier D weights cross-reference; an L111 wording fold reverted to the TwoStageDiD: thread vcov_type as narrow {hc1} contract (Phase 1b interstitial #5, final) #498-merged text)

Security / privacy

Confirm no secrets/PII in this PR: Yes

🤖 Generated with Claude Code

github-actions · 2026-05-30T22:30:59Z

Overall Assessment
✅ Looks good

Executive Summary

This is a tracker/documentation-only PR; I found no methodology or implementation regressions.
The methodology-facing removals and rewrites are consistent with the current registry and code: shipped weighted CR2-BM support, completed vcov_type threading, retained {"hc1"} narrowing for IF/GMM estimators, and the narrowed HAD / local-linear follow-up rows all check out.
No P0/P1 issues found.
One minor P3 documentation drift remains: TODO.md still cites REGISTRY:2382, but the relevant HAD sup-t singleton-strata deviation now lives at docs/methodology/REGISTRY.md.
I could not run pytest in this environment because pytest is not installed; validation here was static against source and existing test files.

Methodology

No findings. The rewritten TODO.md “Standard Error Consistency” summary matches the registry’s variance taxonomy in docs/methodology/REGISTRY.md, the shipped weighted CR2-BM note in docs/methodology/REGISTRY.md, and existing threading coverage for TwoStageDiD in tests/test_two_stage.py.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3
Impact: TODO.md says the Stute singleton-strata gap is “the same gap as the HAD sup-t deviation at REGISTRY:2382,” but docs/methodology/REGISTRY.md is now in the TROP section. The relevant HAD note is docs/methodology/REGISTRY.md. This does not affect behavior, but it weakens the accuracy of an audit/pruning PR.
Concrete fix: Replace the brittle absolute line reference with the current HAD location, or better, use a section title/anchor instead of a raw line number.

Tech Debt

No findings. The deleted helper/weights rows are consistent with current helper/wrapper surfaces, including _resolve_survey_for_fit in diff_diff/survey.py and the documented weighted local-linear gap in diff_diff/local_linear.py.

Security

No findings.

Documentation/Tests

No approval-affecting findings. The narrowed HAD pretest follow-up row matches the current registry/tests (docs/methodology/REGISTRY.md, tests/test_had_pretests.py), and the narrowed local-linear row matches current docs/tests (diff_diff/local_linear.py, tests/test_bias_corrected_lprobust.py).

github-actions · 2026-05-30T22:39:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 770af785917f66d2da6791f5cd917239ea2aaeb2

Overall Assessment
✅ Looks good

Executive Summary

This is still a tracker-only PR. I found no unmitigated P0/P1 issues in the changed lines after cross-checking the live code and Methodology Registry.
The prior review’s stale TODO-to-registry pointer is fixed: the HAD pretests row now cites the current Stute calibration section by name at TODO.md:128, consistent with docs/methodology/REGISTRY.md:2733.
The deleted TwoStageDiD threading row is stale as claimed: vcov_type is already threaded, exposed in params/results, and regression-tested in diff_diff/two_stage.py:1311, diff_diff/two_stage.py:2067, diff_diff/two_stage.py:3262, and tests/test_two_stage.py:2020.
The other pruned tracker rows also match shipped state: weighted WLS-CR2 is documented as lifted in docs/methodology/REGISTRY.md:2827, weighted bias_corrected_local_linear is live in diff_diff/local_linear.py:1019 and tested in tests/test_bias_corrected_lprobust.py:563, and the upstream survey-metadata dedup now lives in diff_diff/survey.py:1232 with only the post-cluster-injection recompute remaining in diff_diff/spillover.py:3018.
One minor P3 wording nit remains in the rewritten Standard Error Consistency paragraph: it attributes TwoStageDiD’s permanent {"hc1"} narrowing to “influence-function variance,” while the registry documents a distinct Gardner GMM-meat rationale at TODO.md:231, docs/methodology/REGISTRY.md:348, and docs/methodology/REGISTRY.md:1408.

Methodology

Severity: P3
Impact: TODO.md:231 correctly says TwoStageDiD is permanently narrow to {"hc1"}, but the rationale is slightly overstated: the four IF-based estimators are narrow because of IF-based variance, while TwoStageDiD is narrow because its Gardner GMM-corrected meat has no single cross-stage hat-matrix analogue for classical/hc2/hc2_bm. The conclusion is right; the audit trail is just less precise than the registry. See docs/methodology/REGISTRY.md:348, docs/methodology/REGISTRY.md:1408, and diff_diff/two_stage.py:3262.
Concrete fix: Split the rationale in that sentence, e.g. “the four IF-based estimators … are permanently narrow to {"hc1"} per their IF variance, and TwoStageDiD is likewise permanently narrow because its GMM-corrected meat has no derived HC2/CR2/classical analogue.”

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings. The previous TODO→REGISTRY drift called out in the last review is resolved in the changed file at TODO.md:128.

Tech Debt

No findings. The removed rows now reflect shipped state rather than hiding open work, per docs/methodology/REGISTRY.md:2827, diff_diff/local_linear.py:1019, and diff_diff/survey.py:1232.

Security

No findings.

Documentation/Tests

No findings. Static review only; I did not run the test suite in this environment.

…ed stale rows Cleanup of TODO.md now that the vcov_type threading initiative is complete (all 8 standalone estimators merged, TwoStageDiD #498 last). TODO.md only — no methodology or source changes. Compact prune of the completed initiative's leftovers: - the `| Done |` umbrella row + its orphan blank lines (rejoins the methodology table to its header), the Tier B threading bullet, and the stale duplicate TwoStageDiD-Conley row whose "`__init__` lacks `vcov_type`" premise is false post-#498 - the "Rows 104-105 LIFTED" comment block, the two ~~LIFTED~~ weighted-BM rows, and the Tier C LIFTED bullet (clubSandwich WLS-CR2 port, #475) - two resolved-marker HTML comments (WooldridgeDiD cohort_share; PreTrendsPower) - rewrote the Standard Error Consistency prose to "complete" and repointed its weighted-CR2 gate at the open multi-absorb row Staleness audit of the ~50 remaining follow-up rows (5 subagents; every finding re-verified against current source before acting -- the vast majority are genuine open deferrals): - removed the `bias_corrected_local_linear: weights=` row (shipped; residual unweighted-DPI gap already tracked by the sibling row) and narrowed the Tier D lprobust bullet's stale `weights` -> `weight-aware auto-bandwidth DPI` - removed the `compute_survey_metadata`/`raw_w_for_meta` dedup row (done via the shared `survey._resolve_survey_for_fit` helper) - tightened the HAD Phase-4.5-C survey-aware-pretests row: dropped the shipped pweight+PSU+FPC+strata narration, kept the two open items (replicate-weight designs; lonely_psu='adjust'+singleton on the Stute family) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-30T22:44:39Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 68c0026e1f2e92ffdba867be97037b1673c2c4e3

Overall Assessment
✅ Looks good

Executive Summary

This is still a tracker-only PR. I found no unmitigated P0/P1 issues in the changed lines.
The previous review’s only methodology nit is addressed: the revised Standard Error Consistency paragraph now separates the four IF-based estimators from TwoStageDiD’s distinct Gardner-GMM rationale, consistent with TODO.md:L229-L231, docs/methodology/REGISTRY.md:L348-L357, and diff_diff/two_stage.py:L3262-L3295.
The deleted standalone-vcov_type tracker rows are stale rather than hiding open work; the remaining per-estimator Conley follow-ups are still tracked at TODO.md:L103-L111.
The removed bias_corrected_local_linear weights= row is also stale: weighted fits are live, and the remaining gap is correctly narrowed to weight-aware auto-bandwidth DPI at TODO.md:L223, diff_diff/local_linear.py:L1019-L1028, and tests/test_bias_corrected_lprobust.py:L563-L678.
The narrowed HAD pretests row matches the registry and current guards: stratified survey support is shipped, while replicate weights and lonely_psu='adjust' + singleton strata remain the only open items at TODO.md:L128, docs/methodology/REGISTRY.md:L2633-L2633, docs/methodology/REGISTRY.md:L2729-L2745, and diff_diff/had_pretests.py:L3308-L3359.
The deleted survey-metadata dedup row appears correctly closed: _resolve_survey_for_fit now owns the shared pre-fit path, and the remaining spillover.py recompute is a distinct post-PSU-injection update rather than the old duplication at diff_diff/survey.py:L1232-L1258, diff_diff/spillover.py:L2371-L2380, and diff_diff/spillover.py:L3018-L3034.

Methodology

No findings. The changed prose in TODO.md:L229-L231 now matches the Methodology Registry and runtime guard for TwoStageDiD; the prior P3 wording issue is resolved.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings. The tracker deletions I spot-checked correspond to shipped state or genuine de-duplication, not disappearing active work.

Tech Debt

No findings. Remaining deferred items are still tracked where expected, including standalone vcov_type="conley" follow-ups and the HAD pretest gaps at TODO.md:L103-L128.

Security

No findings.

Documentation/Tests

No findings. Static review only; I did not run the test suite in this environment.

igerber force-pushed the chore/todo-cleanup branch from 23f390e to 770af78 Compare May 30, 2026 22:34

igerber force-pushed the chore/todo-cleanup branch from 770af78 to 68c0026 Compare May 30, 2026 22:41

igerber added the ready-for-ci Triggers CI test workflows label May 30, 2026

igerber merged commit 56db3a8 into main May 30, 2026
5 of 6 checks passed

igerber deleted the chore/todo-cleanup branch May 30, 2026 22:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: prune completed-vcov-initiative leftovers + audit-verified stale TODO rows#502

chore: prune completed-vcov-initiative leftovers + audit-verified stale TODO rows#502
igerber merged 1 commit into
mainfrom
chore/todo-cleanup

igerber commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 30, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

github-actions Bot commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant