Skip to content

Rank-guarded generalized inverse for DR/OR influence-function SEs (CallawaySantAnna, TripleDifference, StaggeredTripleDifference)#507

Merged
igerber merged 5 commits into
mainfrom
feature/dr-or-se-rank-guard
May 31, 2026
Merged

Rank-guarded generalized inverse for DR/OR influence-function SEs (CallawaySantAnna, TripleDifference, StaggeredTripleDifference)#507
igerber merged 5 commits into
mainfrom
feature/dr-or-se-rank-guard

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 31, 2026

Summary

  • Fix garbage influence-function standard errors when a constant/collinear covariate makes a per-(g,t) covariate Gram matrix near-singular in CallawaySantAnna, TripleDifference, and StaggeredTripleDifference. np.linalg.solve/inv raise LinAlgError only on exactly singular matrices, so a near-singular propensity-score Hessian / outcome-regression bread previously returned a garbage inverse (~1e13) that flowed into the SE while the ATT point estimate stayed valid. Reproduced: CS dr overall_se ~5.1e13, TD reg se ~1.8e17, SDDD SEs inflated 30–100×.
  • Add a shared _rank_guarded_inv(A, *, rcond=1e-10, tracker) to diff_diff/linalg.py: symmetric-equilibrated (D^{-1/2} A D^{-1/2}) eigh-truncation. The well-conditioned fast path returns np.linalg.solve(A, I) unchanged (R-parity preserved); near-singular cells get a finite SE on the identified covariate subset; an all-NaN inverse (NaN SE) is returned only on true rank-0.
  • CS: route _safe_inv through the helper; fix a var_psi > 0 else 0.0 mask so a rank-0 cell yields NaN instead of a misleading 0.0. SDDD: route the OR-IF and PS-Hessian inversions through the helper. TD: route its 7 inline inv/pinv sites through the helper and add the per-fit aggregate rank-guard warning it previously lacked.
  • rank_deficient_action="silent" suppresses the rank-guard warning (uniform across the three); "error" is enforced upstream at the point-estimate solve (solve_ols/solve_logit), which raises before any IF SE is computed.
  • Diagnosed the rest of the covariate surface: EfficientDiD is already rank-safe (pinv(rcond=tol/max_eigval)); ContinuousDiD/SpilloverDiD have no user-covariate path; non-covariate structural inverse sites are tracked as a TODO.md follow-up.

Methodology references (required if estimator / math changes)

  • Method name(s): CallawaySantAnna (2021) group-time ATT; TripleDifference / StaggeredTripleDifference (Ortiz-Villavicencio & Sant'Anna 2025) — the doubly-robust / outcome-regression / IPW influence-function analytical standard errors. This is a numerical-stability fix to the IF variance computation, not a change to identification, weighting, or the point estimate.
  • Paper / source link(s): documented in docs/methodology/REGISTRY.md ("rank-guarded IF standard errors" Note under CallawaySantAnna, plus the StaggeredTripleDifference notes).
  • Any intentional deviations from the source (and why): A generalized inverse equals the point estimate's column-drop whenever the IF multiplier lies in range(A) (PS Hessian; OR bread under consistent aliasing, incl. globally-constant/collinear covariates — verified machine-precision-equal). In the rarer degenerate case where a covariate is collinear only within the control cell but varies in the treated cell, the treated-mean multiplier leaves range(A) and the minimum-norm inverse diverges from column-drop / R did by an estimator/data-dependent amount (≈0.6% CS / ≈2.5% SDDD / up to ~2× the column-drop SE for TD). It stays finite (the estimand is itself extrapolation-dependent in that edge); exact column-drop / R parity is documented and tracked as a TODO.md follow-up.

Validation

  • Tests added/updated: tests/test_linalg.py (helper contract: fast-path == solve, scale-invariance, rank-0 → NaN, boundary), tests/test_staggered.py, tests/test_methodology_triple_diff.py, tests/test_methodology_staggered_triple_diff.py, tests/test_staggered_triple_diff.py (constant-covariate finite SE == drop-one; survey-weighted; RCS / not-yet-treated; rank_deficient_action interactions incl. error-mode raise; control-only-aliasing finite SE; warning fires once / suppressed under silent; well-conditioned goldens unchanged).
  • Backtest / simulation / notebook evidence: full default suite (excluding the unrelated slow SCM files) green — 6884 passed; R-parity methodology goldens unchanged; both backends unaffected (pure-Python change); mypy diff_diff introduces no new errors.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions
Copy link
Copy Markdown

Overall Assessment

⚠️ Needs changes

Executive Summary

  • CallawaySantAnna and StaggeredTripleDifference look aligned with the new methodology notes: the rank-guarded IF-SE behavior and the control-only aliasing caveat are documented in docs/methodology/REGISTRY.md:L389 and L2160-L2162, with the follow-up tracked in TODO.md:L84.
  • The CallawaySantAnna NaN-propagation fix is correct in the touched analytical-SE branches; the old var_psi > 0 else 0.0 masking is gone at diff_diff/staggered.py:L2726-L2735, L3034-L3043, and L3100-L3109.
  • TripleDifference now applies _rank_guarded_inv not just to control-side breads/Hessians, but also to the treated-pre/post OR breads in the DR path (diff_diff/triple_diff.py:L1836-L1865).
  • The new REGISTRY/TODO/tests only cover the control-only aliasing caveat. They do not cover the treated-cell analogue, even though the new treated-bread path uses multipliers built from broader treated samples (diff_diff/triple_diff.py:L1982-L1997).
  • I did not find other new P0/P1 issues in the touched warning plumbing, safe_inference() usage, performance surface, or security surface.

Methodology

  • P1 Impact: TripleDifference DR still fits treated/pre/post nuisance models with solve_ols column-drop semantics at diff_diff/triple_diff.py:L1435-L1479, but the new IF path rank-guards the treated-pre/post breads with _rank_guarded_inv at diff_diff/triple_diff.py:L1836-L1865 and then applies them through mom_pre/mom_post built from all treated observations at diff_diff/triple_diff.py:L1982-L1997. That is the same structural setup the new note uses to explain control-only aliasing divergence, but the PR only documents the control-side case (docs/methodology/REGISTRY.md:L2096-L2104, TODO.md:L84, tests/test_methodology_triple_diff.py:L1680-L1702). In repeated-cross-section data, treated-cell-only aliasing can therefore produce SEs that diverge from the point-estimate/R column-drop convention with no documented contract. Concrete fix: either preserve treated-cell column-drop parity in these IF breads, or explicitly document and track the treated-pre/post aliasing convention in REGISTRY.md and TODO.md.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings beyond the methodology contract gap above.

Tech Debt

  • P3 Impact: the remaining non-covariate structural inverse sites are explicitly tracked and scoped out in TODO.md:L83. Concrete fix: none required in this PR.

Security

  • No findings.

Documentation/Tests

  • P2 Impact: the new regression coverage is otherwise strong, but TripleDifference only adds a control-only aliasing case (tests/test_methodology_triple_diff.py:L1680-L1702). There is no treated-post-only or treated-pre-only aliasing test for the new treated-bread rank-guard path. Concrete fix: add both cases and assert the intended SE/warning contract.

Path to Approval

  1. Decide the intended contract for treated-pre/post-only aliasing in TripleDifference DR.
  2. Either implement treated-cell column-drop parity in the IF breads, or document the minimum-norm treated-cell divergence in docs/methodology/REGISTRY.md and track it in TODO.md.
  3. Add tests/test_methodology_triple_diff.py regressions that isolate aliasing in treated-post and treated-pre cells and lock the chosen behavior.

Static review only; I did not run the suite in this environment because the shell Python here is missing the project dependencies.

@igerber igerber force-pushed the feature/dr-or-se-rank-guard branch 2 times, most recently from 034356e to bac5ad2 Compare May 31, 2026 17:05
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: bac5ad2d9465f0b8e77e337747f2ef48e3aeaadd


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The previous re-review concern around TripleDifference treated-cell aliasing looks addressed: the DR path now rank-guards treated pre/post breads, the registry note covers treated-side behavior, and new tests exercise a treated-cell aliasing case (tests/test_methodology_triple_diff.py:L1680-L1737).
  • CallawaySantAnna’s NaN-propagation fix in the touched analytical-SE branches looks correct: non-finite var_psi now yields NaN SE instead of a misleading 0.0 (diff_diff/staggered.py:L2726-L2733, diff_diff/staggered.py:L3034-L3041, diff_diff/staggered.py:L3100-L3107).
  • One P1 remains in the new shared _rank_guarded_inv() contract: its kept-column selection can diverge from the point-estimate/R column-drop rule in mixed-scale exact-collinearity, even though the new registry/changelog now claim exact parity.
  • I did not find additional P0/P1 issues in the touched warning suppression, safe_inference() usage, performance surface, or security surface.
  • The new structural-inverse follow-up row in TODO.md is malformed/merged with the next row, so deferred-work tracking is not cleanly rendered (TODO.md:L84-L84).

Methodology

Affected methods: CallawaySantAnna, TripleDifference, StaggeredTripleDifference analytical influence-function SEs.

  • Severity: P1. Impact: _detect_rank_deficiency() explicitly preserves the raw pivoted-QR drop selection for genuine collinearity (diff_diff/linalg.py:L152-L190, locked by tests/test_linalg.py:L1492-L1513). That is the nuisance-fit contract CallawaySantAnna uses directly (diff_diff/staggered.py:L1319-L1341) and TripleDifference / StaggeredTripleDifference inherit through solve_ols() (diff_diff/triple_diff.py:L1454-L1477, diff_diff/staggered_triple_diff.py:L1647-L1656). _rank_guarded_inv(), however, always re-pivots on the symmetrically equilibrated Gram (diff_diff/linalg.py:L385-L429). In an exact-collinearity + mixed-scale case (for example x_dup = 1e8 * x), equilibration erases the raw-scale distinction that decides which redundant column survives, so the IF-SE path can silently drop a different column than the documented point-estimate / R lm() convention. That contradicts the new methodology contract claimed in docs/methodology/REGISTRY.md:L389-L389 and CHANGELOG.md:L22-L22. Concrete fix: either make the guarded inverse reuse the same kept-column set as the weighted nuisance fit (for example from _detect_rank_deficiency(sqrt(W) * X) or by threading kept columns through the solve), or explicitly document mixed-scale exact-collinearity as a deviation instead of claiming point-estimate/R parity.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity: P3. Impact: the new structural-inverse follow-up entry is concatenated with the pre-existing ImputationDiD OOM row, so the markdown table no longer tracks those deferrals as separate items (TODO.md:L84-L84). Because this project uses TODO.md to mitigate deferred review items, that tracking should stay clean. Concrete fix: split the merged line into two distinct table rows.

Security

No findings.

Documentation/Tests

No additional findings. The re-review target from the prior AI pass is materially improved by the new registry/test coverage for treated-side aliasing in TripleDifference (docs/methodology/REGISTRY.md:L2096-L2105, tests/test_methodology_triple_diff.py:L1680-L1737). Static review only; I could not run the suite here because the available Python in this environment is missing NumPy/project dependencies.

Path to Approval

  1. Either align _rank_guarded_inv() with the nuisance fit’s actual kept-column selection, or replace the new parity claims in REGISTRY.md / CHANGELOG.md with an explicit deviation note for mixed-scale exact-collinearity.
  2. Add a regression covering exact-collinear mixed-scale duplicates in both column orders, plus at least one weighted branch, so the chosen IF-SE contract is locked.

… SEs

A near-singular covariate Gram matrix (constant/collinear covariate) made the
per-cell propensity-score Hessian / outcome-regression bread blow up in the
influence-function SE of CallawaySantAnna, TripleDifference, and
StaggeredTripleDifference: np.linalg.solve/inv raise LinAlgError only on EXACTLY
singular matrices, so a near-singular Gram returned a garbage inverse (~1e13)
that flowed into the SE while the ATT point estimate stayed valid. Reproduced:
CS dr overall_se 5.1e13, TD reg se 1.8e17, SDDD SEs 30-100x inflated.

- Add shared _rank_guarded_inv(A, *, rcond=1e-10, tracker) to linalg.py: when
  rank-deficient it inverts a COLUMN-DROPPED principal submatrix (pivoted QR on
  the symmetric-equilibrated Gram) — the SAME generalized-inverse convention the
  point estimate uses (_detect_rank_deficiency / R lm()). The fast path returns
  solve(A, I) unchanged (R-parity); all-NaN only on true rank-0.
- Because it uses the point estimate's column-drop (not a minimum-norm
  pseudo-inverse, which diverges when the IF multiplier leaves range(A)), the
  analytical SE equals the well-conditioned near-collinear limit (verified
  se_ratio ~ 1 across reg/ipw/dr) for every per-cell bread, control AND treated.
  A covariate rank-deficient only within one cell still enters the other cells'
  full-rank fits, so a degenerate covariate spec legitimately moves the ATT/SE
  (surfaced by the aggregate warning) — no min-norm divergence.
- CS: route _safe_inv through the helper; fix the var_psi>0-else-0.0 mask so a
  rank-0 cell yields NaN. SDDD: route the OR-IF and PS-Hessian inversions.
  TD: route 7 inv/pinv sites + add the per-fit aggregate rank-guard warning.
- Rank-guard warning suppressed under rank_deficient_action="silent" (uniform);
  "error" is enforced upstream at the point-estimate solve (raises before any
  IF SE), so the IF guard is reached only under warn/silent.
- Docs: REGISTRY rank-guard Notes (column-drop = full-rank limit + 1e-10
  rationale + error enforcement), CHANGELOG entry, TODO structural-inverse
  follow-up.
- Tests: helper contract, finite-SE/warning/golden-unchanged + survey-weighted +
  RCS/notyettreated + error-mode + cell-aliasing (control vs drop-one,
  treated vs near-collinear full-rank limit) for CS/TD/SDDD.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the feature/dr-or-se-rank-guard branch from bac5ad2 to a83bb7f Compare May 31, 2026 17:26
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: a83bb7fa2de012c1156fbf446d35df902d453e4e


Overall Assessment

⚠️ Needs changes

Executive Summary

  • The previous TripleDifference treated-cell aliasing concern looks materially addressed by the new regression coverage and registry updates.
  • The CallawaySantAnna NaN-propagation fix in the touched analytical-SE branches now fail-closes correctly instead of turning non-finite IF variance into a misleading 0.0 SE.
  • One P1 remains: the shared _rank_guarded_inv() still does not implement the same kept-column selection rule that the new REGISTRY/CHANGELOG text says it does.
  • I did not find additional P0/P1 issues in the new warning suppression, tracker reset, safe_inference() usage, or TODO.md tracking.
  • Static review only; I could not run the test suite here because the available Python environment is missing NumPy/project dependencies.

Methodology

Affected methods: CallawaySantAnna, TripleDifference, StaggeredTripleDifference analytical influence-function SEs.

  • Severity: P1. Impact: the source papers do not prescribe a specific generalized inverse here, so the binding contract is the new Methodology Registry note. That note now says the IF inverse uses “the same generalized-inverse convention the point estimate uses” / R-style column drop, but the code still diverges from that contract. _detect_rank_deficiency() preserves the raw pivot when collinearity is genuine (diff_diff/linalg.py:L152-L185), while _rank_guarded_inv() always re-pivots the symmetrically equilibrated Gram to choose kept columns (diff_diff/linalg.py:L415-L424). In mixed-scale exact-collinearity, those rules can keep different redundant columns, so the analytical-SE path is not actually guaranteed to follow the nuisance fit’s documented/R-style drop convention even though REGISTRY.md and CHANGELOG.md now claim that it does (docs/methodology/REGISTRY.md:L389-L389, CHANGELOG.md:L22-L22). The new tests improve coverage for cell-aliasing and CS mixed-scale order-invariance, but they still do not lock equality to the nuisance-fit kept-column rule in this exact-collinear mixed-scale case (tests/test_staggered.py:L1458-L1496, tests/test_methodology_triple_diff.py:L1680-L1737, tests/test_methodology_staggered_triple_diff.py:L784-L815). Concrete fix: either thread the actual kept-column set from each nuisance fit into _rank_guarded_inv()/its callers, or revise the REGISTRY/CHANGELOG to document an explicit equilibrated-column-drop deviation instead of claiming parity with the point estimate / R, then add regressions for exact-collinear mixed-scale duplicates in both column orders and at least one weighted branch.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. The new TODO.md follow-up row is now cleanly separated and usable for deferred tracking.

Security

  • No findings.

Documentation/Tests

  • No additional findings beyond the Methodology item. The re-review target around TD treated-cell aliasing now has direct regression coverage (tests/test_methodology_triple_diff.py:L1680-L1737), and the CS NaN fail-closed path is covered by the new monkeypatch test (tests/test_staggered.py:L1285-L1307). I could not execute the suite in this environment.

Path to Approval

  1. Make _rank_guarded_inv() reuse the nuisance fit’s actual kept-column set for rank-deficient cells, or explicitly downgrade the new REGISTRY/CHANGELOG claim from “same convention as the point estimate” to a documented equilibrated-column-drop deviation.
  2. Add regression coverage for exact-collinear mixed-scale duplicates in the shared helper and estimator-level analytical-SE paths, with both column orders and a weighted branch, asserting whichever contract you choose to ship.

…ate pivot

The rank-guarded IF inverse selects kept columns via pivoted QR on the
*equilibrated* Gram, which is scale-invariant by construction and so can drop a
different member of a collinear set than the point estimate's raw-pivot
`_detect_rank_deficiency` under mixed-scale exact collinearity. Downgrade the
REGISTRY/CHANGELOG claim of "the same generalized-inverse convention the point
estimate uses" to a documented equilibrated column-drop in the same *family*,
and add an explicit selection caveat: the differing member choice leaves the
identified subspace — and hence the SE — unchanged (order-invariant, verified
for both column orders and under survey weighting). The full-rank-limit
property (se_ratio ~= 1, column-drop vs minimum-norm) is retained verbatim.

Add tests/test_staggered.py::test_exact_duplicate_covariate_survey_weighted:
the survey-weighted bread/PS-Hessian gives well-scaled exact-duplicate SE ==
drop-one, and mixed-scale exact collinearity is order-invariant across both
column orders under non-uniform weights.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: dac0cd9234e054651d80de2444ee857da93d0482


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review outcome: the previous P1 is resolved. The registry/changelog now document the equilibrated-column-selection caveat instead of claiming raw-pivot parity, which matches the implementation in _rank_guarded_inv() (docs/methodology/REGISTRY.md:L389-L389, CHANGELOG.md:L22-L22, diff_diff/linalg.py:L315-L429).
  • Affected methods are CallawaySantAnna, TripleDifference, and StaggeredTripleDifference analytical influence-function SEs. In the changed estimator paths, the new helper is threaded into the OR-bread / PS-Hessian inversions consistently (diff_diff/staggered.py:L2700-L3109, diff_diff/triple_diff.py:L1176-L1189, diff_diff/triple_diff.py:L1690-L1863, diff_diff/staggered_triple_diff.py:L1474-L1589).
  • The CallawaySantAnna NaN fail-closed fix is correct: touched branches now propagate non-finite var_psi to se=np.nan instead of silently returning 0.0, so downstream safe_inference() will also return NaN inference fields (diff_diff/staggered.py:L2725-L2735, diff_diff/staggered.py:L3033-L3043, diff_diff/staggered.py:L3099-L3109, diff_diff/staggered.py:L1982-L1987, diff_diff/staggered.py:L2094-L2099).
  • One non-blocking documentation inconsistency remains: the new helper’s own docstring still says it uses the “SAME generalized-inverse convention” as the point estimate, while the registry/changelog now correctly describe this as the same column-drop family with a scale-invariant selection deviation under mixed-scale exact collinearity (diff_diff/linalg.py:L361-L366, docs/methodology/REGISTRY.md:L389-L389, CHANGELOG.md:L22-L22).
  • Regression coverage on the changed surface is materially stronger, including helper contract tests and estimator-level weighted / aliasing / silent / error-mode cases (tests/test_linalg.py:L2081-L2189, tests/test_staggered.py:L1460-L1539, tests/test_methodology_triple_diff.py:L1680-L1737, tests/test_methodology_staggered_triple_diff.py:L746-L818).
  • Static review only; I could not execute the tests here because pytest is unavailable in the environment.

Methodology

  • Severity: P3 (informational). Impact: diff_diff/linalg.py:L361-L366 still states that _rank_guarded_inv() uses the “SAME generalized-inverse convention” as the point estimate, but the authoritative registry/changelog now document a narrower contract: same column-drop family, with possibly different dropped-member selection under mixed-scale exact collinearity. Because that deviation is documented in docs/methodology/REGISTRY.md:L389-L389, this is not a blocker, but the helper docstring can still mislead future maintainers. Concrete fix: reword the docstring to match docs/methodology/REGISTRY.md:L389-L389 and CHANGELOG.md:L22-L22 by saying “same family” and explicitly mentioning the equilibrated-selection caveat.
  • No P0/P1 methodology findings. I did not find an undocumented change to identification, weighting, or variance/SE in the touched estimator code.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. The remaining point-estimate scale-equilibration follow-up for CallawaySantAnna / StaggeredTripleDifference is explicitly tracked in TODO.md:L83-L84, so it is properly deferred rather than silently left undocumented.

Security

  • No findings.

Documentation/Tests

  • No findings beyond the P3 docstring wording above.
  • The new tests appropriately cover the changed surface, and I did not see an untracked test gap in the PR diff. Static review only; I could not run the suite because pytest is not installed here.

…lumn-drop

The helper's Notes docstring still said it uses "the SAME generalized-inverse
convention the point estimate uses," while REGISTRY/CHANGELOG (and the inline
comment) now document a narrower contract: same column-drop family, with a
scale-invariant equilibrated selection that may drop a different member of a
collinear set than the point estimate's raw pivot under mixed-scale exact
collinearity. Reword the docstring to match — "same family" + the
equilibrated-selection caveat — so the docstring no longer claims raw-pivot
parity. Docstring-only; no logic change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 12db8f0278d847ede5cb78b75d92d528dae8b8c0


Overall Assessment

✅ Looks good

Executive Summary

  • The prior re-review item is resolved: _rank_guarded_inv() now describes the new inverse as the same family as the point-estimate column-drop, matching the registry note rather than claiming identical selection behavior (diff_diff/linalg.py:L361-L369, docs/methodology/REGISTRY.md:L389-L389).
  • No unmitigated P0/P1 issues found in the changed analytical-SE paths. The new helper is wired into the intended PS-Hessian / OR-bread sites for CallawaySantAnna, TripleDifference, and StaggeredTripleDifference (diff_diff/staggered.py:L2702-L2702, diff_diff/triple_diff.py:L1176-L1189, diff_diff/triple_diff.py:L1690-L1864, diff_diff/staggered_triple_diff.py:L1474-L1489, diff_diff/staggered_triple_diff.py:L1575-L1589).
  • The CallawaySantAnna NaN fail-closed fix is correct: non-finite var_psi now propagates se=np.nan instead of silently returning 0.0, so downstream inference also becomes NaN as intended (diff_diff/staggered.py:L2725-L2735, diff_diff/staggered.py:L3033-L3043, diff_diff/staggered.py:L3099-L3109).
  • Regression coverage on the changed surface is materially stronger, including helper-contract tests plus estimator-level constant-covariate, survey-weighted, silent/error-mode, and cell-aliasing cases (tests/test_linalg.py:L2081-L2189, tests/test_staggered.py:L1205-L1457, tests/test_methodology_triple_diff.py:L1585-L1736, tests/test_methodology_staggered_triple_diff.py:L644-L818, tests/test_staggered_triple_diff.py:L612-L687).
  • One minor P3 documentation nuance remains: the new registry wording overstates when the IF rank-guard can be reached under rank_deficient_action="error".

Methodology

  • Severity: P3 (informational). Impact: docs/methodology/REGISTRY.md:L389-L389 says the rank-guarded IF inverse is reached only under rank_deficient_action="warn"/"silent", but _rank_guarded_inv() truncates on a stricter equilibrated-Gram threshold (diff_diff/linalg.py:L337-L344) than upstream design-rank detection (diff_diff/linalg.py:L122-L125, diff_diff/linalg.py:L139-L141). Numerically near-singular yet still full-rank cells can therefore still hit the IF guard under "error" without an upstream exception. This is a wording/contract issue, not a numerical bug. Concrete fix: narrow the note to exact rank-deficient designs, or explicitly say that near-singular full-rank IF breads/Hessians may still be rank-guarded under "error"; update the matching test comments in tests/test_staggered.py:L1442-L1447, tests/test_methodology_triple_diff.py:L1666-L1670, and tests/test_methodology_staggered_triple_diff.py:L768-L771.
  • No other methodology findings.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • Severity: P3 (tracked). Impact: the remaining structural near-singular inverse surface is explicitly deferred in TODO.md:L83-L84, so the PR’s narrowed scope is properly tracked and non-blocking. Concrete fix: none for approval; handle the listed non-covariate inverse sites in a follow-up.

Security

  • No findings.

Documentation/Tests

  • No coverage findings on the changed surface; the new tests are targeted and appropriate for the helper contract plus CS/TD/SDDD estimator paths.
  • Static review only: I could not execute the test suite here because pytest and numpy are unavailable in this environment.

…/tests

The REGISTRY note and the three test_error_mode_raises_before_rank_guard
comments said "error" raises upstream so the IF rank-guard is reached only
under warn/silent. That overstates: the upstream gate (_detect_rank_deficiency)
uses a 1e-7 *design* threshold, while the IF guard uses a stricter 1e-10
*equilibrated-Gram* threshold — and a Gram squares X's condition number, so the
IF guard column-drops once X's singular-value ratio falls below ~1e-5, well
above the design's 1e-7. A cell that is near-singular yet still design-full-rank
therefore passes the "error" gate without an exception and is still IF-column-
dropped (the guard does not re-raise; the aggregate warning still fires under
error/warn). Reword the REGISTRY enforcement paragraph and the three test
comments to say "error" blocks design-rank-deficient covariates (the exactly-
collinear duplicates these tests use), not every near-singular IF bread.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: f0a5e6f46edeacd8e300c03af74fe68e38e24c7c


Overall Assessment

✅ Looks good

Executive Summary

  • The affected methodology surface is limited to analytical influence-function SE linear algebra in CallawaySantAnna, TripleDifference, and StaggeredTripleDifference; identification, weighting, aggregation, and the published IF formulas are otherwise unchanged (diff_diff/linalg.py:L315-L434, diff_diff/staggered.py:L194-L222, diff_diff/triple_diff.py:L1176-L1189, diff_diff/staggered_triple_diff.py:L1474-L1489, diff_diff/staggered_triple_diff.py:L1575-L1589).
  • The prior re-review item is resolved: REGISTRY.md now explicitly documents that near-singular but design-full-rank cells can still hit the IF guard under rank_deficient_action="error", and the matching test comments were updated (docs/methodology/REGISTRY.md:L389-L389, tests/test_staggered.py:L1444-L1456, tests/test_methodology_triple_diff.py:L1668-L1679, tests/test_methodology_staggered_triple_diff.py:L770-L783).
  • No unmitigated P0/P1 issues found. At the source-material level, these methods are still RA/IPW/DR influence-function estimators; I did not find a paper-level requirement for a specific numerical matrix inversion routine, so the rank guard reads as a documented numerical stabilization choice rather than an undocumented methodology change. (arxiv.org)
  • The CallawaySantAnna NaN fail-closed fix appears complete on the changed surface: all three affected var_psi branches now propagate se=np.nan for non-finite IF variance instead of masking rank-0 bread as 0.0 (diff_diff/staggered.py:L2725-L2735, diff_diff/staggered.py:L3033-L3043, diff_diff/staggered.py:L3099-L3109).
  • Edge-case coverage is materially stronger: helper contract, NaN propagation, survey-weighted paths, silent/error behavior, not-yet-treated control paths, and cell-specific aliasing are all exercised (tests/test_linalg.py:L2082-L2186, tests/test_staggered.py:L1205-L1456, tests/test_methodology_triple_diff.py:L1585-L1743, tests/test_methodology_staggered_triple_diff.py:L644-L818, tests/test_staggered_triple_diff.py:L612-L689).

Methodology

  • Severity: P3 (informational). Impact: The PR intentionally replaces exact inverse/solve calls with an equilibrated, rank-guarded column-drop inverse for near-singular IF bread/Hessian matrices in CallawaySantAnna, TripleDifference, and StaggeredTripleDifference (diff_diff/linalg.py:L315-L434, diff_diff/staggered.py:L2700-L2735, diff_diff/triple_diff.py:L1176-L1189, diff_diff/triple_diff.py:L1690-L1864, diff_diff/staggered_triple_diff.py:L1474-L1489, diff_diff/staggered_triple_diff.py:L1575-L1589). Cross-checking the cited source material, the papers define the estimators and IF-based inference families but do not prescribe a specific numerical factorization for these nuisance inverses; the repo-specific deviation from raw-pivot / R column selection is now explicitly documented in docs/methodology/REGISTRY.md:L389-L389, docs/methodology/REGISTRY.md:L2096-L2105, and docs/methodology/REGISTRY.md:L2160-L2162. (arxiv.org) Concrete fix: none.
  • No other methodology findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity: P3 (tracked). Impact: Remaining non-blocking follow-ups are explicitly tracked in TODO.md:L83-L84 (CS/SDDD local OR point-estimate scale-equilibration and separate structural inverse sites), so they do not block approval under the repo’s deferred-work policy. Concrete fix: none for approval.

Security

No findings.

Documentation/Tests

  • No findings on the changed surface.
  • Static review only: I could not execute the tests here because pytest and numpy are unavailable in this environment.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 31, 2026
…-dup tests

test_exact_duplicate_covariate[reg] failed in CI (both Pure Python Fallback and
the Rust matrix) at the mixed-scale (xbig = 1e8*x1) order-invariance assertion
big_ab.overall_se ≈ big_ba.overall_se (rtol=1e-9). It passed locally (bit-
identical across column orders on Accelerate BLAS) but CI's LAPACK showed a ~21%
swing. Root cause: under reg the SE follows the un-equilibrated local OR solve
(TODO-82), whose solution for the near-singular 1e8-scale X'WX (condition ~1e32
after squaring) is roundoff- and column-order-dependent — a point-estimate
pathology, not the rank-guard. ipw/dr are genuinely order-invariant (variance
flows through the equilibrated rank-guarded inverse) and passed CI.

Assert only finiteness for reg at mixed scale; keep the tight order-invariance
assertion for ipw/dr. Apply the same guard to the survey-weighted variant
(same un-equilibrated reg OR solve). Well-scaled exact-duplicate == drop-one is
unchanged (stable, passed CI). Test-only change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 5aa48e9d751e6614a981bc2278264133d2b957cf


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review scope: I did not find a regression against the previously approved rank-guarded SE work; the earlier REGISTRY.md clarification about rank_deficient_action="error" remains present at docs/methodology/REGISTRY.md:L389-L389.
  • The changed methodology surface remains confined to analytical influence-function SE linear algebra in diff_diff/linalg.py:L315-L434, diff_diff/staggered.py:L194-L222, diff_diff/triple_diff.py:L1176-L1189, diff_diff/triple_diff.py:L1690-L1865, and diff_diff/staggered_triple_diff.py:L1474-L1489, diff_diff/staggered_triple_diff.py:L1580-L1589.
  • The only substantive deviation I see is the equilibrated column-drop choice under mixed-scale exact collinearity, and that deviation is now explicitly documented in docs/methodology/REGISTRY.md:L389-L389, docs/methodology/REGISTRY.md:L2096-L2105, and docs/methodology/REGISTRY.md:L2160-L2162; the cited papers describe the RA/IPW/DR estimators and IF-based inference but do not appear to require a specific matrix-inversion routine, so this reads as a documented numerical stabilization rather than an undocumented methodology change. (arxiv.org)
  • The CallawaySantAnna NaN fail-closed fix appears complete on the changed surface: all three affected var_psi branches now propagate se=np.nan instead of 0.0 for non-finite IF variance in diff_diff/staggered.py:L2725-L2735, diff_diff/staggered.py:L3033-L3043, and diff_diff/staggered.py:L3099-L3109.
  • Coverage is materially stronger: helper-contract tests live in tests/test_linalg.py:L2081-L2189, and estimator-level tests now cover constant/duplicate covariates, survey-weighted paths, repeated cross-sections, warning suppression, and NaN propagation across tests/test_staggered.py:L1214-L1485, tests/test_methodology_triple_diff.py:L1595-L1742, tests/test_methodology_staggered_triple_diff.py:L657-L820, and tests/test_staggered_triple_diff.py:L612-L689.

Methodology

  • Severity: P3 (informational). Impact: _rank_guarded_inv() changes the IF-side generalized-inverse selection to an equilibrated column-drop rule on deficient Gram matrices (diff_diff/linalg.py:L315-L434), but that exact deviation is now documented in REGISTRY.md, and I did not find a paper-level requirement that these estimators use a specific numerical inverse rather than a numerically stable equivalent. Concrete fix: none. docs/methodology/REGISTRY.md:L389-L389, docs/methodology/REGISTRY.md:L2096-L2105, docs/methodology/REGISTRY.md:L2160-L2162. (arxiv.org)
  • No other methodology findings.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

  • Severity: P3 (tracked). Impact: the remaining CS/SDDD local OR point-estimate scale-equilibration gap is still explicitly documented and tracked, so it is non-blocking under the repo’s deferred-work policy. Concrete fix: follow TODO.md:L83-L84 by routing those estimator-local OR point-estimate solves through the shared scale-robust solver or adding local equilibration. docs/methodology/REGISTRY.md:L389-L389, docs/methodology/REGISTRY.md:L475-L475, TODO.md:L83-L84.

Security

No findings.

Documentation/Tests

No findings.

Static review only: I could not execute the test suite in this environment because numpy is unavailable, so my confidence in the added tests is based on code inspection rather than runtime validation.

@igerber igerber merged commit 839736d into main May 31, 2026
26 checks passed
@igerber igerber deleted the feature/dr-or-se-rank-guard branch May 31, 2026 21:08
@igerber igerber mentioned this pull request Jun 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant