Rank-guarded generalized inverse for DR/OR influence-function SEs (CallawaySantAnna, TripleDifference, StaggeredTripleDifference)#507
Conversation
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
Static review only; I did not run the suite in this environment because the shell Python here is missing the project dependencies. |
034356e to
bac5ad2
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology Affected methods:
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests No additional findings. The re-review target from the prior AI pass is materially improved by the new registry/test coverage for treated-side aliasing in Path to Approval
|
… SEs A near-singular covariate Gram matrix (constant/collinear covariate) made the per-cell propensity-score Hessian / outcome-regression bread blow up in the influence-function SE of CallawaySantAnna, TripleDifference, and StaggeredTripleDifference: np.linalg.solve/inv raise LinAlgError only on EXACTLY singular matrices, so a near-singular Gram returned a garbage inverse (~1e13) that flowed into the SE while the ATT point estimate stayed valid. Reproduced: CS dr overall_se 5.1e13, TD reg se 1.8e17, SDDD SEs 30-100x inflated. - Add shared _rank_guarded_inv(A, *, rcond=1e-10, tracker) to linalg.py: when rank-deficient it inverts a COLUMN-DROPPED principal submatrix (pivoted QR on the symmetric-equilibrated Gram) — the SAME generalized-inverse convention the point estimate uses (_detect_rank_deficiency / R lm()). The fast path returns solve(A, I) unchanged (R-parity); all-NaN only on true rank-0. - Because it uses the point estimate's column-drop (not a minimum-norm pseudo-inverse, which diverges when the IF multiplier leaves range(A)), the analytical SE equals the well-conditioned near-collinear limit (verified se_ratio ~ 1 across reg/ipw/dr) for every per-cell bread, control AND treated. A covariate rank-deficient only within one cell still enters the other cells' full-rank fits, so a degenerate covariate spec legitimately moves the ATT/SE (surfaced by the aggregate warning) — no min-norm divergence. - CS: route _safe_inv through the helper; fix the var_psi>0-else-0.0 mask so a rank-0 cell yields NaN. SDDD: route the OR-IF and PS-Hessian inversions. TD: route 7 inv/pinv sites + add the per-fit aggregate rank-guard warning. - Rank-guard warning suppressed under rank_deficient_action="silent" (uniform); "error" is enforced upstream at the point-estimate solve (raises before any IF SE), so the IF guard is reached only under warn/silent. - Docs: REGISTRY rank-guard Notes (column-drop = full-rank limit + 1e-10 rationale + error enforcement), CHANGELOG entry, TODO structural-inverse follow-up. - Tests: helper contract, finite-SE/warning/golden-unchanged + survey-weighted + RCS/notyettreated + error-mode + cell-aliasing (control vs drop-one, treated vs near-collinear full-rank limit) for CS/TD/SDDD. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bac5ad2 to
a83bb7f
Compare
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology Affected methods:
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
…ate pivot The rank-guarded IF inverse selects kept columns via pivoted QR on the *equilibrated* Gram, which is scale-invariant by construction and so can drop a different member of a collinear set than the point estimate's raw-pivot `_detect_rank_deficiency` under mixed-scale exact collinearity. Downgrade the REGISTRY/CHANGELOG claim of "the same generalized-inverse convention the point estimate uses" to a documented equilibrated column-drop in the same *family*, and add an explicit selection caveat: the differing member choice leaves the identified subspace — and hence the SE — unchanged (order-invariant, verified for both column orders and under survey weighting). The full-rank-limit property (se_ratio ~= 1, column-drop vs minimum-norm) is retained verbatim. Add tests/test_staggered.py::test_exact_duplicate_covariate_survey_weighted: the survey-weighted bread/PS-Hessian gives well-scaled exact-duplicate SE == drop-one, and mixed-scale exact collinearity is order-invariant across both column orders under non-uniform weights. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…lumn-drop The helper's Notes docstring still said it uses "the SAME generalized-inverse convention the point estimate uses," while REGISTRY/CHANGELOG (and the inline comment) now document a narrower contract: same column-drop family, with a scale-invariant equilibrated selection that may drop a different member of a collinear set than the point estimate's raw pivot under mixed-scale exact collinearity. Reword the docstring to match — "same family" + the equilibrated-selection caveat — so the docstring no longer claims raw-pivot parity. Docstring-only; no logic change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…/tests The REGISTRY note and the three test_error_mode_raises_before_rank_guard comments said "error" raises upstream so the IF rank-guard is reached only under warn/silent. That overstates: the upstream gate (_detect_rank_deficiency) uses a 1e-7 *design* threshold, while the IF guard uses a stricter 1e-10 *equilibrated-Gram* threshold — and a Gram squares X's condition number, so the IF guard column-drops once X's singular-value ratio falls below ~1e-5, well above the design's 1e-7. A cell that is near-singular yet still design-full-rank therefore passes the "error" gate without an exception and is still IF-column- dropped (the guard does not re-raise; the aggregate warning still fires under error/warn). Reword the REGISTRY enforcement paragraph and the three test comments to say "error" blocks design-rank-deficient covariates (the exactly- collinear duplicates these tests use), not every near-singular IF bread. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests
|
…-dup tests test_exact_duplicate_covariate[reg] failed in CI (both Pure Python Fallback and the Rust matrix) at the mixed-scale (xbig = 1e8*x1) order-invariance assertion big_ab.overall_se ≈ big_ba.overall_se (rtol=1e-9). It passed locally (bit- identical across column orders on Accelerate BLAS) but CI's LAPACK showed a ~21% swing. Root cause: under reg the SE follows the un-equilibrated local OR solve (TODO-82), whose solution for the near-singular 1e8-scale X'WX (condition ~1e32 after squaring) is roundoff- and column-order-dependent — a point-estimate pathology, not the rank-guard. ipw/dr are genuinely order-invariant (variance flows through the equilibrated rank-guarded inverse) and passed CI. Assert only finiteness for reg at mixed scale; keep the tight order-invariance assertion for ipw/dr. Apply the same guard to the survey-weighted variant (same un-equilibrated reg OR solve). Well-scaled exact-duplicate == drop-one is unchanged (stable, passed CI). Test-only change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality No findings. Performance No findings. Maintainability No findings. Tech Debt
Security No findings. Documentation/Tests No findings. Static review only: I could not execute the test suite in this environment because |
Summary
CallawaySantAnna,TripleDifference, andStaggeredTripleDifference.np.linalg.solve/invraiseLinAlgErroronly on exactly singular matrices, so a near-singular propensity-score Hessian / outcome-regression bread previously returned a garbage inverse (~1e13) that flowed into the SE while the ATT point estimate stayed valid. Reproduced: CSdroverall_se~5.1e13, TDregse~1.8e17, SDDD SEs inflated 30–100×._rank_guarded_inv(A, *, rcond=1e-10, tracker)todiff_diff/linalg.py: symmetric-equilibrated (D^{-1/2} A D^{-1/2}) eigh-truncation. The well-conditioned fast path returnsnp.linalg.solve(A, I)unchanged (R-parity preserved); near-singular cells get a finite SE on the identified covariate subset; an all-NaN inverse (NaN SE) is returned only on true rank-0._safe_invthrough the helper; fix avar_psi > 0 else 0.0mask so a rank-0 cell yields NaN instead of a misleading 0.0. SDDD: route the OR-IF and PS-Hessian inversions through the helper. TD: route its 7 inlineinv/pinvsites through the helper and add the per-fit aggregate rank-guard warning it previously lacked.rank_deficient_action="silent"suppresses the rank-guard warning (uniform across the three);"error"is enforced upstream at the point-estimate solve (solve_ols/solve_logit), which raises before any IF SE is computed.EfficientDiDis already rank-safe (pinv(rcond=tol/max_eigval));ContinuousDiD/SpilloverDiDhave no user-covariate path; non-covariate structural inverse sites are tracked as aTODO.mdfollow-up.Methodology references (required if estimator / math changes)
docs/methodology/REGISTRY.md("rank-guarded IF standard errors" Note under CallawaySantAnna, plus the StaggeredTripleDifference notes).range(A)(PS Hessian; OR bread under consistent aliasing, incl. globally-constant/collinear covariates — verified machine-precision-equal). In the rarer degenerate case where a covariate is collinear only within the control cell but varies in the treated cell, the treated-mean multiplier leavesrange(A)and the minimum-norm inverse diverges from column-drop / Rdidby an estimator/data-dependent amount (≈0.6% CS / ≈2.5% SDDD / up to ~2× the column-drop SE for TD). It stays finite (the estimand is itself extrapolation-dependent in that edge); exact column-drop / R parity is documented and tracked as aTODO.mdfollow-up.Validation
tests/test_linalg.py(helper contract: fast-path ==solve, scale-invariance, rank-0 → NaN, boundary),tests/test_staggered.py,tests/test_methodology_triple_diff.py,tests/test_methodology_staggered_triple_diff.py,tests/test_staggered_triple_diff.py(constant-covariate finite SE == drop-one; survey-weighted; RCS / not-yet-treated;rank_deficient_actioninteractions incl. error-mode raise; control-only-aliasing finite SE; warning fires once / suppressed under silent; well-conditioned goldens unchanged).mypy diff_diffintroduces no new errors.Security / privacy
Generated with Claude Code