Rank-guarded generalized inverse for DR/OR influence-function SEs (CallawaySantAnna, TripleDifference, StaggeredTripleDifference) by igerber · Pull Request #507 · igerber/diff-diff

igerber · 2026-05-31T16:28:06Z

Summary

Fix garbage influence-function standard errors when a constant/collinear covariate makes a per-(g,t) covariate Gram matrix near-singular in CallawaySantAnna, TripleDifference, and StaggeredTripleDifference. np.linalg.solve/inv raise LinAlgError only on exactly singular matrices, so a near-singular propensity-score Hessian / outcome-regression bread previously returned a garbage inverse (~1e13) that flowed into the SE while the ATT point estimate stayed valid. Reproduced: CS dr overall_se ~5.1e13, TD reg se ~1.8e17, SDDD SEs inflated 30–100×.
Add a shared _rank_guarded_inv(A, *, rcond=1e-10, tracker) to diff_diff/linalg.py: symmetric-equilibrated (D^{-1/2} A D^{-1/2}) eigh-truncation. The well-conditioned fast path returns np.linalg.solve(A, I) unchanged (R-parity preserved); near-singular cells get a finite SE on the identified covariate subset; an all-NaN inverse (NaN SE) is returned only on true rank-0.
CS: route _safe_inv through the helper; fix a var_psi > 0 else 0.0 mask so a rank-0 cell yields NaN instead of a misleading 0.0. SDDD: route the OR-IF and PS-Hessian inversions through the helper. TD: route its 7 inline inv/pinv sites through the helper and add the per-fit aggregate rank-guard warning it previously lacked.
rank_deficient_action="silent" suppresses the rank-guard warning (uniform across the three); "error" is enforced upstream at the point-estimate solve (solve_ols/solve_logit), which raises before any IF SE is computed.
Diagnosed the rest of the covariate surface: EfficientDiD is already rank-safe (pinv(rcond=tol/max_eigval)); ContinuousDiD/SpilloverDiD have no user-covariate path; non-covariate structural inverse sites are tracked as a TODO.md follow-up.

Methodology references (required if estimator / math changes)

Method name(s): CallawaySantAnna (2021) group-time ATT; TripleDifference / StaggeredTripleDifference (Ortiz-Villavicencio & Sant'Anna 2025) — the doubly-robust / outcome-regression / IPW influence-function analytical standard errors. This is a numerical-stability fix to the IF variance computation, not a change to identification, weighting, or the point estimate.
Paper / source link(s): documented in docs/methodology/REGISTRY.md ("rank-guarded IF standard errors" Note under CallawaySantAnna, plus the StaggeredTripleDifference notes).
Any intentional deviations from the source (and why): A generalized inverse equals the point estimate's column-drop whenever the IF multiplier lies in range(A) (PS Hessian; OR bread under consistent aliasing, incl. globally-constant/collinear covariates — verified machine-precision-equal). In the rarer degenerate case where a covariate is collinear only within the control cell but varies in the treated cell, the treated-mean multiplier leaves range(A) and the minimum-norm inverse diverges from column-drop / R did by an estimator/data-dependent amount (≈0.6% CS / ≈2.5% SDDD / up to ~2× the column-drop SE for TD). It stays finite (the estimand is itself extrapolation-dependent in that edge); exact column-drop / R parity is documented and tracked as a TODO.md follow-up.

Validation

Tests added/updated: tests/test_linalg.py (helper contract: fast-path == solve, scale-invariance, rank-0 → NaN, boundary), tests/test_staggered.py, tests/test_methodology_triple_diff.py, tests/test_methodology_staggered_triple_diff.py, tests/test_staggered_triple_diff.py (constant-covariate finite SE == drop-one; survey-weighted; RCS / not-yet-treated; rank_deficient_action interactions incl. error-mode raise; control-only-aliasing finite SE; warning fires once / suppressed under silent; well-conditioned goldens unchanged).
Backtest / simulation / notebook evidence: full default suite (excluding the unrelated slow SCM files) green — 6884 passed; R-parity methodology goldens unchanged; both backends unaffected (pure-Python change); mypy diff_diff introduces no new errors.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

github-actions · 2026-05-31T16:35:25Z

Overall Assessment

⚠️ Needs changes

Executive Summary

CallawaySantAnna and StaggeredTripleDifference look aligned with the new methodology notes: the rank-guarded IF-SE behavior and the control-only aliasing caveat are documented in docs/methodology/REGISTRY.md:L389 and L2160-L2162, with the follow-up tracked in TODO.md:L84.
The CallawaySantAnna NaN-propagation fix is correct in the touched analytical-SE branches; the old var_psi > 0 else 0.0 masking is gone at diff_diff/staggered.py:L2726-L2735, L3034-L3043, and L3100-L3109.
TripleDifference now applies _rank_guarded_inv not just to control-side breads/Hessians, but also to the treated-pre/post OR breads in the DR path (diff_diff/triple_diff.py:L1836-L1865).
The new REGISTRY/TODO/tests only cover the control-only aliasing caveat. They do not cover the treated-cell analogue, even though the new treated-bread path uses multipliers built from broader treated samples (diff_diff/triple_diff.py:L1982-L1997).
I did not find other new P0/P1 issues in the touched warning plumbing, safe_inference() usage, performance surface, or security surface.

Methodology

P1 Impact: TripleDifference DR still fits treated/pre/post nuisance models with solve_ols column-drop semantics at diff_diff/triple_diff.py:L1435-L1479, but the new IF path rank-guards the treated-pre/post breads with _rank_guarded_inv at diff_diff/triple_diff.py:L1836-L1865 and then applies them through mom_pre/mom_post built from all treated observations at diff_diff/triple_diff.py:L1982-L1997. That is the same structural setup the new note uses to explain control-only aliasing divergence, but the PR only documents the control-side case (docs/methodology/REGISTRY.md:L2096-L2104, TODO.md:L84, tests/test_methodology_triple_diff.py:L1680-L1702). In repeated-cross-section data, treated-cell-only aliasing can therefore produce SEs that diverge from the point-estimate/R column-drop convention with no documented contract. Concrete fix: either preserve treated-cell column-drop parity in these IF breads, or explicitly document and track the treated-pre/post aliasing convention in REGISTRY.md and TODO.md.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings beyond the methodology contract gap above.

Tech Debt

P3 Impact: the remaining non-covariate structural inverse sites are explicitly tracked and scoped out in TODO.md:L83. Concrete fix: none required in this PR.

Security

No findings.

Documentation/Tests

P2 Impact: the new regression coverage is otherwise strong, but TripleDifference only adds a control-only aliasing case (tests/test_methodology_triple_diff.py:L1680-L1702). There is no treated-post-only or treated-pre-only aliasing test for the new treated-bread rank-guard path. Concrete fix: add both cases and assert the intended SE/warning contract.

Path to Approval

Decide the intended contract for treated-pre/post-only aliasing in TripleDifference DR.
Either implement treated-cell column-drop parity in the IF breads, or document the minimum-norm treated-cell divergence in docs/methodology/REGISTRY.md and track it in TODO.md.
Add tests/test_methodology_triple_diff.py regressions that isolate aliasing in treated-post and treated-pre cells and lock the chosen behavior.

Static review only; I did not run the suite in this environment because the shell Python here is missing the project dependencies.

github-actions · 2026-05-31T17:11:12Z

🔁 AI review rerun (requested by @igerber)

Head SHA: bac5ad2d9465f0b8e77e337747f2ef48e3aeaadd

Overall Assessment

⚠️ Needs changes

Executive Summary

The previous re-review concern around TripleDifference treated-cell aliasing looks addressed: the DR path now rank-guards treated pre/post breads, the registry note covers treated-side behavior, and new tests exercise a treated-cell aliasing case (tests/test_methodology_triple_diff.py:L1680-L1737).
CallawaySantAnna’s NaN-propagation fix in the touched analytical-SE branches looks correct: non-finite var_psi now yields NaN SE instead of a misleading 0.0 (diff_diff/staggered.py:L2726-L2733, diff_diff/staggered.py:L3034-L3041, diff_diff/staggered.py:L3100-L3107).
One P1 remains in the new shared _rank_guarded_inv() contract: its kept-column selection can diverge from the point-estimate/R column-drop rule in mixed-scale exact-collinearity, even though the new registry/changelog now claim exact parity.
I did not find additional P0/P1 issues in the touched warning suppression, safe_inference() usage, performance surface, or security surface.
The new structural-inverse follow-up row in TODO.md is malformed/merged with the next row, so deferred-work tracking is not cleanly rendered (TODO.md:L84-L84).

Methodology

Affected methods: CallawaySantAnna, TripleDifference, StaggeredTripleDifference analytical influence-function SEs.

Severity: P1. Impact: _detect_rank_deficiency() explicitly preserves the raw pivoted-QR drop selection for genuine collinearity (diff_diff/linalg.py:L152-L190, locked by tests/test_linalg.py:L1492-L1513). That is the nuisance-fit contract CallawaySantAnna uses directly (diff_diff/staggered.py:L1319-L1341) and TripleDifference / StaggeredTripleDifference inherit through solve_ols() (diff_diff/triple_diff.py:L1454-L1477, diff_diff/staggered_triple_diff.py:L1647-L1656). _rank_guarded_inv(), however, always re-pivots on the symmetrically equilibrated Gram (diff_diff/linalg.py:L385-L429). In an exact-collinearity + mixed-scale case (for example x_dup = 1e8 * x), equilibration erases the raw-scale distinction that decides which redundant column survives, so the IF-SE path can silently drop a different column than the documented point-estimate / R lm() convention. That contradicts the new methodology contract claimed in docs/methodology/REGISTRY.md:L389-L389 and CHANGELOG.md:L22-L22. Concrete fix: either make the guarded inverse reuse the same kept-column set as the weighted nuisance fit (for example from _detect_rank_deficiency(sqrt(W) * X) or by threading kept columns through the solve), or explicitly document mixed-scale exact-collinearity as a deviation instead of claiming point-estimate/R parity.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3. Impact: the new structural-inverse follow-up entry is concatenated with the pre-existing ImputationDiD OOM row, so the markdown table no longer tracks those deferrals as separate items (TODO.md:L84-L84). Because this project uses TODO.md to mitigate deferred review items, that tracking should stay clean. Concrete fix: split the merged line into two distinct table rows.

Security

No findings.

Documentation/Tests

No additional findings. The re-review target from the prior AI pass is materially improved by the new registry/test coverage for treated-side aliasing in TripleDifference (docs/methodology/REGISTRY.md:L2096-L2105, tests/test_methodology_triple_diff.py:L1680-L1737). Static review only; I could not run the suite here because the available Python in this environment is missing NumPy/project dependencies.

Path to Approval

Either align _rank_guarded_inv() with the nuisance fit’s actual kept-column selection, or replace the new parity claims in REGISTRY.md / CHANGELOG.md with an explicit deviation note for mixed-scale exact-collinearity.
Add a regression covering exact-collinear mixed-scale duplicates in both column orders, plus at least one weighted branch, so the chosen IF-SE contract is locked.

… SEs A near-singular covariate Gram matrix (constant/collinear covariate) made the per-cell propensity-score Hessian / outcome-regression bread blow up in the influence-function SE of CallawaySantAnna, TripleDifference, and StaggeredTripleDifference: np.linalg.solve/inv raise LinAlgError only on EXACTLY singular matrices, so a near-singular Gram returned a garbage inverse (~1e13) that flowed into the SE while the ATT point estimate stayed valid. Reproduced: CS dr overall_se 5.1e13, TD reg se 1.8e17, SDDD SEs 30-100x inflated. - Add shared _rank_guarded_inv(A, *, rcond=1e-10, tracker) to linalg.py: when rank-deficient it inverts a COLUMN-DROPPED principal submatrix (pivoted QR on the symmetric-equilibrated Gram) — the SAME generalized-inverse convention the point estimate uses (_detect_rank_deficiency / R lm()). The fast path returns solve(A, I) unchanged (R-parity); all-NaN only on true rank-0. - Because it uses the point estimate's column-drop (not a minimum-norm pseudo-inverse, which diverges when the IF multiplier leaves range(A)), the analytical SE equals the well-conditioned near-collinear limit (verified se_ratio ~ 1 across reg/ipw/dr) for every per-cell bread, control AND treated. A covariate rank-deficient only within one cell still enters the other cells' full-rank fits, so a degenerate covariate spec legitimately moves the ATT/SE (surfaced by the aggregate warning) — no min-norm divergence. - CS: route _safe_inv through the helper; fix the var_psi>0-else-0.0 mask so a rank-0 cell yields NaN. SDDD: route the OR-IF and PS-Hessian inversions. TD: route 7 inv/pinv sites + add the per-fit aggregate rank-guard warning. - Rank-guard warning suppressed under rank_deficient_action="silent" (uniform); "error" is enforced upstream at the point-estimate solve (raises before any IF SE), so the IF guard is reached only under warn/silent. - Docs: REGISTRY rank-guard Notes (column-drop = full-rank limit + 1e-10 rationale + error enforcement), CHANGELOG entry, TODO structural-inverse follow-up. - Tests: helper contract, finite-SE/warning/golden-unchanged + survey-weighted + RCS/notyettreated + error-mode + cell-aliasing (control vs drop-one, treated vs near-collinear full-rank limit) for CS/TD/SDDD. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-31T17:31:31Z

🔁 AI review rerun (requested by @igerber)

Head SHA: a83bb7fa2de012c1156fbf446d35df902d453e4e

Overall Assessment

⚠️ Needs changes

Executive Summary

The previous TripleDifference treated-cell aliasing concern looks materially addressed by the new regression coverage and registry updates.
The CallawaySantAnna NaN-propagation fix in the touched analytical-SE branches now fail-closes correctly instead of turning non-finite IF variance into a misleading 0.0 SE.
One P1 remains: the shared _rank_guarded_inv() still does not implement the same kept-column selection rule that the new REGISTRY/CHANGELOG text says it does.
I did not find additional P0/P1 issues in the new warning suppression, tracker reset, safe_inference() usage, or TODO.md tracking.
Static review only; I could not run the test suite here because the available Python environment is missing NumPy/project dependencies.

Methodology

Affected methods: CallawaySantAnna, TripleDifference, StaggeredTripleDifference analytical influence-function SEs.

Severity: P1. Impact: the source papers do not prescribe a specific generalized inverse here, so the binding contract is the new Methodology Registry note. That note now says the IF inverse uses “the same generalized-inverse convention the point estimate uses” / R-style column drop, but the code still diverges from that contract. _detect_rank_deficiency() preserves the raw pivot when collinearity is genuine (diff_diff/linalg.py:L152-L185), while _rank_guarded_inv() always re-pivots the symmetrically equilibrated Gram to choose kept columns (diff_diff/linalg.py:L415-L424). In mixed-scale exact-collinearity, those rules can keep different redundant columns, so the analytical-SE path is not actually guaranteed to follow the nuisance fit’s documented/R-style drop convention even though REGISTRY.md and CHANGELOG.md now claim that it does (docs/methodology/REGISTRY.md:L389-L389, CHANGELOG.md:L22-L22). The new tests improve coverage for cell-aliasing and CS mixed-scale order-invariance, but they still do not lock equality to the nuisance-fit kept-column rule in this exact-collinear mixed-scale case (tests/test_staggered.py:L1458-L1496, tests/test_methodology_triple_diff.py:L1680-L1737, tests/test_methodology_staggered_triple_diff.py:L784-L815). Concrete fix: either thread the actual kept-column set from each nuisance fit into _rank_guarded_inv()/its callers, or revise the REGISTRY/CHANGELOG to document an explicit equilibrated-column-drop deviation instead of claiming parity with the point estimate / R, then add regressions for exact-collinear mixed-scale duplicates in both column orders and at least one weighted branch.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The new TODO.md follow-up row is now cleanly separated and usable for deferred tracking.

Security

No findings.

Documentation/Tests

No additional findings beyond the Methodology item. The re-review target around TD treated-cell aliasing now has direct regression coverage (tests/test_methodology_triple_diff.py:L1680-L1737), and the CS NaN fail-closed path is covered by the new monkeypatch test (tests/test_staggered.py:L1285-L1307). I could not execute the suite in this environment.

Path to Approval

Make _rank_guarded_inv() reuse the nuisance fit’s actual kept-column set for rank-deficient cells, or explicitly downgrade the new REGISTRY/CHANGELOG claim from “same convention as the point estimate” to a documented equilibrated-column-drop deviation.
Add regression coverage for exact-collinear mixed-scale duplicates in the shared helper and estimator-level analytical-SE paths, with both column orders and a weighted branch, asserting whichever contract you choose to ship.

…ate pivot The rank-guarded IF inverse selects kept columns via pivoted QR on the *equilibrated* Gram, which is scale-invariant by construction and so can drop a different member of a collinear set than the point estimate's raw-pivot `_detect_rank_deficiency` under mixed-scale exact collinearity. Downgrade the REGISTRY/CHANGELOG claim of "the same generalized-inverse convention the point estimate uses" to a documented equilibrated column-drop in the same *family*, and add an explicit selection caveat: the differing member choice leaves the identified subspace — and hence the SE — unchanged (order-invariant, verified for both column orders and under survey weighting). The full-rank-limit property (se_ratio ~= 1, column-drop vs minimum-norm) is retained verbatim. Add tests/test_staggered.py::test_exact_duplicate_covariate_survey_weighted: the survey-weighted bread/PS-Hessian gives well-scaled exact-duplicate SE == drop-one, and mixed-scale exact collinearity is order-invariant across both column orders under non-uniform weights. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-31T17:42:46Z

🔁 AI review rerun (requested by @igerber)

Head SHA: dac0cd9234e054651d80de2444ee857da93d0482

Overall Assessment

✅ Looks good

Executive Summary

Re-review outcome: the previous P1 is resolved. The registry/changelog now document the equilibrated-column-selection caveat instead of claiming raw-pivot parity, which matches the implementation in _rank_guarded_inv() (docs/methodology/REGISTRY.md:L389-L389, CHANGELOG.md:L22-L22, diff_diff/linalg.py:L315-L429).
Affected methods are CallawaySantAnna, TripleDifference, and StaggeredTripleDifference analytical influence-function SEs. In the changed estimator paths, the new helper is threaded into the OR-bread / PS-Hessian inversions consistently (diff_diff/staggered.py:L2700-L3109, diff_diff/triple_diff.py:L1176-L1189, diff_diff/triple_diff.py:L1690-L1863, diff_diff/staggered_triple_diff.py:L1474-L1589).
The CallawaySantAnna NaN fail-closed fix is correct: touched branches now propagate non-finite var_psi to se=np.nan instead of silently returning 0.0, so downstream safe_inference() will also return NaN inference fields (diff_diff/staggered.py:L2725-L2735, diff_diff/staggered.py:L3033-L3043, diff_diff/staggered.py:L3099-L3109, diff_diff/staggered.py:L1982-L1987, diff_diff/staggered.py:L2094-L2099).
One non-blocking documentation inconsistency remains: the new helper’s own docstring still says it uses the “SAME generalized-inverse convention” as the point estimate, while the registry/changelog now correctly describe this as the same column-drop family with a scale-invariant selection deviation under mixed-scale exact collinearity (diff_diff/linalg.py:L361-L366, docs/methodology/REGISTRY.md:L389-L389, CHANGELOG.md:L22-L22).
Regression coverage on the changed surface is materially stronger, including helper contract tests and estimator-level weighted / aliasing / silent / error-mode cases (tests/test_linalg.py:L2081-L2189, tests/test_staggered.py:L1460-L1539, tests/test_methodology_triple_diff.py:L1680-L1737, tests/test_methodology_staggered_triple_diff.py:L746-L818).
Static review only; I could not execute the tests here because pytest is unavailable in the environment.

Methodology

Severity: P3 (informational). Impact: diff_diff/linalg.py:L361-L366 still states that _rank_guarded_inv() uses the “SAME generalized-inverse convention” as the point estimate, but the authoritative registry/changelog now document a narrower contract: same column-drop family, with possibly different dropped-member selection under mixed-scale exact collinearity. Because that deviation is documented in docs/methodology/REGISTRY.md:L389-L389, this is not a blocker, but the helper docstring can still mislead future maintainers. Concrete fix: reword the docstring to match docs/methodology/REGISTRY.md:L389-L389 and CHANGELOG.md:L22-L22 by saying “same family” and explicitly mentioning the equilibrated-selection caveat.
No P0/P1 methodology findings. I did not find an undocumented change to identification, weighting, or variance/SE in the touched estimator code.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. The remaining point-estimate scale-equilibration follow-up for CallawaySantAnna / StaggeredTripleDifference is explicitly tracked in TODO.md:L83-L84, so it is properly deferred rather than silently left undocumented.

Security

No findings.

Documentation/Tests

No findings beyond the P3 docstring wording above.
The new tests appropriately cover the changed surface, and I did not see an untracked test gap in the PR diff. Static review only; I could not run the suite because pytest is not installed here.

…lumn-drop The helper's Notes docstring still said it uses "the SAME generalized-inverse convention the point estimate uses," while REGISTRY/CHANGELOG (and the inline comment) now document a narrower contract: same column-drop family, with a scale-invariant equilibrated selection that may drop a different member of a collinear set than the point estimate's raw pivot under mixed-scale exact collinearity. Reword the docstring to match — "same family" + the equilibrated-selection caveat — so the docstring no longer claims raw-pivot parity. Docstring-only; no logic change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-31T17:51:59Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 12db8f0278d847ede5cb78b75d92d528dae8b8c0

Overall Assessment

✅ Looks good

Executive Summary

The prior re-review item is resolved: _rank_guarded_inv() now describes the new inverse as the same family as the point-estimate column-drop, matching the registry note rather than claiming identical selection behavior (diff_diff/linalg.py:L361-L369, docs/methodology/REGISTRY.md:L389-L389).
No unmitigated P0/P1 issues found in the changed analytical-SE paths. The new helper is wired into the intended PS-Hessian / OR-bread sites for CallawaySantAnna, TripleDifference, and StaggeredTripleDifference (diff_diff/staggered.py:L2702-L2702, diff_diff/triple_diff.py:L1176-L1189, diff_diff/triple_diff.py:L1690-L1864, diff_diff/staggered_triple_diff.py:L1474-L1489, diff_diff/staggered_triple_diff.py:L1575-L1589).
The CallawaySantAnna NaN fail-closed fix is correct: non-finite var_psi now propagates se=np.nan instead of silently returning 0.0, so downstream inference also becomes NaN as intended (diff_diff/staggered.py:L2725-L2735, diff_diff/staggered.py:L3033-L3043, diff_diff/staggered.py:L3099-L3109).
Regression coverage on the changed surface is materially stronger, including helper-contract tests plus estimator-level constant-covariate, survey-weighted, silent/error-mode, and cell-aliasing cases (tests/test_linalg.py:L2081-L2189, tests/test_staggered.py:L1205-L1457, tests/test_methodology_triple_diff.py:L1585-L1736, tests/test_methodology_staggered_triple_diff.py:L644-L818, tests/test_staggered_triple_diff.py:L612-L687).
One minor P3 documentation nuance remains: the new registry wording overstates when the IF rank-guard can be reached under rank_deficient_action="error".

Methodology

Severity: P3 (informational). Impact: docs/methodology/REGISTRY.md:L389-L389 says the rank-guarded IF inverse is reached only under rank_deficient_action="warn"/"silent", but _rank_guarded_inv() truncates on a stricter equilibrated-Gram threshold (diff_diff/linalg.py:L337-L344) than upstream design-rank detection (diff_diff/linalg.py:L122-L125, diff_diff/linalg.py:L139-L141). Numerically near-singular yet still full-rank cells can therefore still hit the IF guard under "error" without an upstream exception. This is a wording/contract issue, not a numerical bug. Concrete fix: narrow the note to exact rank-deficient designs, or explicitly say that near-singular full-rank IF breads/Hessians may still be rank-guarded under "error"; update the matching test comments in tests/test_staggered.py:L1442-L1447, tests/test_methodology_triple_diff.py:L1666-L1670, and tests/test_methodology_staggered_triple_diff.py:L768-L771.
No other methodology findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 (tracked). Impact: the remaining structural near-singular inverse surface is explicitly deferred in TODO.md:L83-L84, so the PR’s narrowed scope is properly tracked and non-blocking. Concrete fix: none for approval; handle the listed non-covariate inverse sites in a follow-up.

Security

No findings.

Documentation/Tests

No coverage findings on the changed surface; the new tests are targeted and appropriate for the helper contract plus CS/TD/SDDD estimator paths.
Static review only: I could not execute the test suite here because pytest and numpy are unavailable in this environment.

…/tests The REGISTRY note and the three test_error_mode_raises_before_rank_guard comments said "error" raises upstream so the IF rank-guard is reached only under warn/silent. That overstates: the upstream gate (_detect_rank_deficiency) uses a 1e-7 *design* threshold, while the IF guard uses a stricter 1e-10 *equilibrated-Gram* threshold — and a Gram squares X's condition number, so the IF guard column-drops once X's singular-value ratio falls below ~1e-5, well above the design's 1e-7. A cell that is near-singular yet still design-full-rank therefore passes the "error" gate without an exception and is still IF-column- dropped (the guard does not re-raise; the aggregate warning still fires under error/warn). Reword the REGISTRY enforcement paragraph and the three test comments to say "error" blocks design-rank-deficient covariates (the exactly- collinear duplicates these tests use), not every near-singular IF bread. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-31T18:03:01Z

🔁 AI review rerun (requested by @igerber)

Head SHA: f0a5e6f46edeacd8e300c03af74fe68e38e24c7c

Overall Assessment

✅ Looks good

Executive Summary

The affected methodology surface is limited to analytical influence-function SE linear algebra in CallawaySantAnna, TripleDifference, and StaggeredTripleDifference; identification, weighting, aggregation, and the published IF formulas are otherwise unchanged (diff_diff/linalg.py:L315-L434, diff_diff/staggered.py:L194-L222, diff_diff/triple_diff.py:L1176-L1189, diff_diff/staggered_triple_diff.py:L1474-L1489, diff_diff/staggered_triple_diff.py:L1575-L1589).
The prior re-review item is resolved: REGISTRY.md now explicitly documents that near-singular but design-full-rank cells can still hit the IF guard under rank_deficient_action="error", and the matching test comments were updated (docs/methodology/REGISTRY.md:L389-L389, tests/test_staggered.py:L1444-L1456, tests/test_methodology_triple_diff.py:L1668-L1679, tests/test_methodology_staggered_triple_diff.py:L770-L783).
No unmitigated P0/P1 issues found. At the source-material level, these methods are still RA/IPW/DR influence-function estimators; I did not find a paper-level requirement for a specific numerical matrix inversion routine, so the rank guard reads as a documented numerical stabilization choice rather than an undocumented methodology change. (arxiv.org)
The CallawaySantAnna NaN fail-closed fix appears complete on the changed surface: all three affected var_psi branches now propagate se=np.nan for non-finite IF variance instead of masking rank-0 bread as 0.0 (diff_diff/staggered.py:L2725-L2735, diff_diff/staggered.py:L3033-L3043, diff_diff/staggered.py:L3099-L3109).
Edge-case coverage is materially stronger: helper contract, NaN propagation, survey-weighted paths, silent/error behavior, not-yet-treated control paths, and cell-specific aliasing are all exercised (tests/test_linalg.py:L2082-L2186, tests/test_staggered.py:L1205-L1456, tests/test_methodology_triple_diff.py:L1585-L1743, tests/test_methodology_staggered_triple_diff.py:L644-L818, tests/test_staggered_triple_diff.py:L612-L689).

Methodology

Severity: P3 (informational). Impact: The PR intentionally replaces exact inverse/solve calls with an equilibrated, rank-guarded column-drop inverse for near-singular IF bread/Hessian matrices in CallawaySantAnna, TripleDifference, and StaggeredTripleDifference (diff_diff/linalg.py:L315-L434, diff_diff/staggered.py:L2700-L2735, diff_diff/triple_diff.py:L1176-L1189, diff_diff/triple_diff.py:L1690-L1864, diff_diff/staggered_triple_diff.py:L1474-L1489, diff_diff/staggered_triple_diff.py:L1575-L1589). Cross-checking the cited source material, the papers define the estimators and IF-based inference families but do not prescribe a specific numerical factorization for these nuisance inverses; the repo-specific deviation from raw-pivot / R column selection is now explicitly documented in docs/methodology/REGISTRY.md:L389-L389, docs/methodology/REGISTRY.md:L2096-L2105, and docs/methodology/REGISTRY.md:L2160-L2162. (arxiv.org) Concrete fix: none.
No other methodology findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 (tracked). Impact: Remaining non-blocking follow-ups are explicitly tracked in TODO.md:L83-L84 (CS/SDDD local OR point-estimate scale-equilibration and separate structural inverse sites), so they do not block approval under the repo’s deferred-work policy. Concrete fix: none for approval.

Security

No findings.

Documentation/Tests

No findings on the changed surface.
Static review only: I could not execute the tests here because pytest and numpy are unavailable in this environment.

…-dup tests test_exact_duplicate_covariate[reg] failed in CI (both Pure Python Fallback and the Rust matrix) at the mixed-scale (xbig = 1e8*x1) order-invariance assertion big_ab.overall_se ≈ big_ba.overall_se (rtol=1e-9). It passed locally (bit- identical across column orders on Accelerate BLAS) but CI's LAPACK showed a ~21% swing. Root cause: under reg the SE follows the un-equilibrated local OR solve (TODO-82), whose solution for the near-singular 1e8-scale X'WX (condition ~1e32 after squaring) is roundoff- and column-order-dependent — a point-estimate pathology, not the rank-guard. ipw/dr are genuinely order-invariant (variance flows through the equilibrated rank-guarded inverse) and passed CI. Assert only finiteness for reg at mixed scale; keep the tight order-invariance assertion for ipw/dr. Apply the same guard to the survey-weighted variant (same un-equilibrated reg OR solve). Well-scaled exact-duplicate == drop-one is unchanged (stable, passed CI). Test-only change. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-31T19:52:57Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5aa48e9d751e6614a981bc2278264133d2b957cf

Overall Assessment

✅ Looks good

Executive Summary

Re-review scope: I did not find a regression against the previously approved rank-guarded SE work; the earlier REGISTRY.md clarification about rank_deficient_action="error" remains present at docs/methodology/REGISTRY.md:L389-L389.
The changed methodology surface remains confined to analytical influence-function SE linear algebra in diff_diff/linalg.py:L315-L434, diff_diff/staggered.py:L194-L222, diff_diff/triple_diff.py:L1176-L1189, diff_diff/triple_diff.py:L1690-L1865, and diff_diff/staggered_triple_diff.py:L1474-L1489, diff_diff/staggered_triple_diff.py:L1580-L1589.
The only substantive deviation I see is the equilibrated column-drop choice under mixed-scale exact collinearity, and that deviation is now explicitly documented in docs/methodology/REGISTRY.md:L389-L389, docs/methodology/REGISTRY.md:L2096-L2105, and docs/methodology/REGISTRY.md:L2160-L2162; the cited papers describe the RA/IPW/DR estimators and IF-based inference but do not appear to require a specific matrix-inversion routine, so this reads as a documented numerical stabilization rather than an undocumented methodology change. (arxiv.org)
The CallawaySantAnna NaN fail-closed fix appears complete on the changed surface: all three affected var_psi branches now propagate se=np.nan instead of 0.0 for non-finite IF variance in diff_diff/staggered.py:L2725-L2735, diff_diff/staggered.py:L3033-L3043, and diff_diff/staggered.py:L3099-L3109.
Coverage is materially stronger: helper-contract tests live in tests/test_linalg.py:L2081-L2189, and estimator-level tests now cover constant/duplicate covariates, survey-weighted paths, repeated cross-sections, warning suppression, and NaN propagation across tests/test_staggered.py:L1214-L1485, tests/test_methodology_triple_diff.py:L1595-L1742, tests/test_methodology_staggered_triple_diff.py:L657-L820, and tests/test_staggered_triple_diff.py:L612-L689.

Methodology

Severity: P3 (informational). Impact: _rank_guarded_inv() changes the IF-side generalized-inverse selection to an equilibrated column-drop rule on deficient Gram matrices (diff_diff/linalg.py:L315-L434), but that exact deviation is now documented in REGISTRY.md, and I did not find a paper-level requirement that these estimators use a specific numerical inverse rather than a numerically stable equivalent. Concrete fix: none. docs/methodology/REGISTRY.md:L389-L389, docs/methodology/REGISTRY.md:L2096-L2105, docs/methodology/REGISTRY.md:L2160-L2162. (arxiv.org)
No other methodology findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3 (tracked). Impact: the remaining CS/SDDD local OR point-estimate scale-equilibration gap is still explicitly documented and tracked, so it is non-blocking under the repo’s deferred-work policy. Concrete fix: follow TODO.md:L83-L84 by routing those estimator-local OR point-estimate solves through the shared scale-robust solver or adding local equilibration. docs/methodology/REGISTRY.md:L389-L389, docs/methodology/REGISTRY.md:L475-L475, TODO.md:L83-L84.

Security

No findings.

Documentation/Tests

No findings.

Static review only: I could not execute the test suite in this environment because numpy is unavailable, so my confidence in the added tests is based on code inspection rather than runtime validation.

igerber force-pushed the feature/dr-or-se-rank-guard branch 2 times, most recently from 034356e to bac5ad2 Compare May 31, 2026 17:05

igerber force-pushed the feature/dr-or-se-rank-guard branch from bac5ad2 to a83bb7f Compare May 31, 2026 17:26

igerber added the ready-for-ci Triggers CI test workflows label May 31, 2026

igerber merged commit 839736d into main May 31, 2026
26 checks passed

igerber deleted the feature/dr-or-se-rank-guard branch May 31, 2026 21:08

igerber mentioned this pull request Jun 1, 2026

Bump version to 3.5.0 #516

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rank-guarded generalized inverse for DR/OR influence-function SEs (CallawaySantAnna, TripleDifference, StaggeredTripleDifference)#507

Rank-guarded generalized inverse for DR/OR influence-function SEs (CallawaySantAnna, TripleDifference, StaggeredTripleDifference)#507
igerber merged 5 commits into
mainfrom
feature/dr-or-se-rank-guard

igerber commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 31, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

github-actions Bot commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant