Surface silent np.linalg.solve fallbacks across axis-A minor solver paths by igerber · Pull Request #334 · igerber/diff-diff

igerber · 2026-04-19T15:31:34Z

Summary

Phase 2 silent-failures audit — axis-A minor solver paths (findings #17, #18, #19), all Minor severity. Each site previously ran np.linalg.solve against a matrix that could be rank-deficient or near-singular with no user-facing signal.

Fix quick start guide output accuracy #17 StaggeredTripleDifference per-pair solve (staggered_triple_diff.py:1330): _compute_did_panel records a condition-number sample in an instance tracker on LinAlgError; fit() emits ONE aggregate UserWarning at the end listing affected (g, g_c, t) cells and the max condition number, instead of silently falling back to np.linalg.lstsq per pair. Tracker resets on repeat fit().
Fix result summary table to appropriately show the significance level in the table #18 EfficientDiD covariate sieve (efficient_did_covariates.py:253,401): Precondition-check the normal-equations matrix via np.linalg.cond before np.linalg.solve — near-singular A above 1/sqrt(eps) (≈ 6.7e7) is rejected explicitly. Partial-K skips now surface via UserWarning listing skipped K values, instead of being swallowed by continue.
Fix placebo test examples in README documentation #19 Survey TSL variance (survey.py:1450): compute_survey_vcov checks cond(X'WX) before the sandwich solve and warns above the 1/sqrt(eps) threshold so ill-conditioned bread matrices don't silently produce unstable SEs.

Sibling sites picked up via repo-wide linalg.solve pattern grep (per the pattern-check feedback memory):

two_stage.py:1768 — TSL variance bread silent lstsq fallback → now warns
two_stage_bootstrap.py:197 — multiplier bootstrap bread silent lstsq fallback → now warns

No behavioral change on well-conditioned inputs. Behavior change is strictly additive warnings + K-rejection precondition check that can only produce a fit that is at-least-as-well-conditioned as before.

Out of scope, to be evaluated in a follow-up if needed: staggered.py:_safe_inv helper (called many times in CS for analytical SE paths; silent lstsq fallback per call). The bread matrices it operates on are downstream of solve_ols which already signals rank deficiency via rank_deficient_action.

Methodology references (required if estimator / math changes)

Method name(s): Ortiz-Villavicencio & Sant'Anna (2025) StaggeredTripleDifference; Chen, Sant'Anna & Xie (2025) EfficientDiD; Binder (1983) / Lumley (2004) TSL survey variance
Paper / source link(s): REGISTRY.md sections §StaggeredTripleDifference (updated), §EfficientDiD (updated), §Survey Data Support / Taylor Series Linearization (updated)
Any intentional deviations from the source (and why): None. These are defensive signaling changes; numerical behavior on well-conditioned inputs is unchanged.

Validation

Tests added/updated:
- tests/test_staggered_triple_diff.py::TestStaggeredTripleDiffORSolveFallback (3 tests: collinear covariates warn, well-conditioned no-warning, no-covariates no-warning)
- tests/test_efficient_did.py::TestSievePartialKSkipWarning (3 tests: ratio-sieve partial skip warns, inverse-propensity partial skip warns, clean data no-warning)
- tests/test_survey.py::TestSurveyVcovIllConditionedWarning (2 tests: ill-conditioned X warns, well-conditioned no-warning)
- Repeat-fit idempotency verified manually: instance tracker resets cleanly between consecutive fit() calls on the same estimator.
Backtest / simulation / notebook evidence (if applicable): N/A — additive-warning PR.
Full tests/test_staggered_triple_diff.py, tests/test_efficient_did.py, tests/test_survey.py, tests/test_two_stage.py pass locally (379 tests).

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

…aths Addresses findings #17, #18, #19 from the Phase 2 silent-failures audit (axis A, all Minor). Each site previously ran np.linalg.solve against a matrix that could be rank-deficient or near-singular with no user-facing signal. - StaggeredTripleDifference: `_compute_did_panel` now appends a condition-number sample to an instance tracker on LinAlgError; `fit()` emits ONE aggregate UserWarning listing affected (g, g_c, t) cells and the max condition number instead of silently falling back to np.linalg.lstsq per pair. Tracker resets on repeat fit. - EfficientDiD covariate sieve (estimate_propensity_ratio_sieve, estimate_inverse_propensity_sieve): precondition-check the normal-equations matrix via np.linalg.cond before solve and reject K values above 1/sqrt(eps); partial-K skips now surface via UserWarning listing the skipped K values, instead of being swallowed by `continue`. - compute_survey_vcov: check cond(X'WX) before the sandwich solve; emit UserWarning above the 1/sqrt(eps) threshold so ill-conditioned bread matrices don't silently produce unstable variance estimates. Sibling sites picked up via repo-wide lstsq-fallback pattern grep (per the pattern-check feedback memory): - two_stage.py:1768 (TSL variance bread) - two_stage_bootstrap.py:197 (multiplier bootstrap bread) Both now warn before the silent lstsq fallback. Adds 8 targeted tests across test_staggered_triple_diff.py, test_efficient_did.py, and test_survey.py, covering collinear/ill-conditioned triggers and happy-path negatives. REGISTRY.md notes added for each affected estimator section. No behavioral change on well-conditioned inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-19T15:39:47Z

Overall Assessment

⚠️ Needs changes

Executive Summary

No unmitigated P0s in the diff.
The methodology-affecting changes that are in the diff are consistent with the new registry notes: EfficientDiD’s sieve K precheck is documented, and the StaggeredTripleDiff / Survey / TwoStage changes are warning-surface additions rather than undocumented estimator changes.
No new inline inference or partial-NaN anti-patterns were introduced in the touched files.
P1: the repo-wide silent-solver audit is incomplete. The unchanged helper diff_diff/staggered.py:L95-L100 still silently falls back from np.linalg.solve to np.linalg.lstsq on Callaway-Sant’Anna analytical inference paths, which is the same pattern this PR is trying to surface.
P3: the sieve helper docstrings lag the new condition-number rejection behavior.
P3: there is no direct regression coverage for the new TwoStage Stage-2 bread warnings. I also could not execute tests in this sandbox because pytest/numpy are unavailable.

Methodology

Severity: P1
Impact: Because this PR explicitly audits repo-wide silent solve fallbacks, leaving diff_diff/staggered.py:L95-L100 unchanged makes the fix incomplete. That helper still feeds analytical inference paths in Callaway-Sant’Anna, including propensity-score Hessian inversion at diff_diff/staggered.py:L2277-L2279, OR bread inversion at diff_diff/staggered.py:L2585-L2587, and other bread/Hessian sites such as diff_diff/staggered.py:L3201-L3208. Users can still get degraded analytical SE paths with no user-facing signal, and this is not tracked in TODO.md:L51-L85.
Concrete fix: Instrument _safe_inv() to warn on fallback, or aggregate a one-time warning at the estimator level, and add a focused regression test that forces one analytical path through that fallback.
Severity: P3-informational
Impact: EfficientDiD’s new cond(A) > 1/sqrt(eps) sieve precheck does change which K values are eligible for IC selection, but that deviation is explicitly documented in docs/methodology/REGISTRY.md:L745-L756 and matches the new code in diff_diff/efficient_did_covariates.py:L228-L310 and diff_diff/efficient_did_covariates.py:L404-L472. Under the review policy, this is documented and therefore not a defect.
Concrete fix: None.

Code Quality

No findings. I did not find new inline t = effect / se inference code or partial NaN guards in the changed files.

Performance

No material findings. The added cond() checks are on small normal-equation / bread matrices and are unlikely to dominate estimator runtime.

Maintainability

Severity: P3
Impact: The sieve helper docstrings still describe only singular-basis fallback and do not mention the new cond(A) > 1/sqrt(eps) rejection or partial-K warning surface, so the in-code documentation now lags the implemented behavior. See diff_diff/efficient_did_covariates.py:L161-L175 and diff_diff/efficient_did_covariates.py:L343-L356.
Concrete fix: Update both docstrings to mention the condition-number precheck, partial-K warnings, and all-K fallback semantics.

Tech Debt

No separate findings. No new deferred-work entry was added to TODO.md:L51-L85, so the remaining P1 above is currently untracked.

Security

No findings.

Documentation/Tests

Severity: P3
Impact: The diff adds new warning surfaces in diff_diff/two_stage.py:L1762-L1782 and diff_diff/two_stage_bootstrap.py:L191-L211, but there is no direct regression coverage for those exact Stage-2 bread fallbacks. The added tests in this PR cover EfficientDiD, StaggeredTripleDiff, and Survey only: tests/test_efficient_did.py:L2057-L2136, tests/test_staggered_triple_diff.py:L539-L634, tests/test_survey.py:L3342-L3410.
Concrete fix: Add one analytical and one bootstrap TwoStage test that forces a rank-deficient Stage-2 bread matrix and asserts both the warning text and finite returned variance/SE behavior.
Review limitation: I could not execute the test suite in this sandbox because pytest, numpy, and the project runtime deps are unavailable; this review is based on static inspection plus a no-write compile() syntax pass of the changed modules.

Path to Approval

Surface the remaining _safe_inv() fallback in Callaway-Sant’Anna analytical inference with a user-visible warning, or equivalent aggregated warning, instead of leaving it silent.
Add a regression test that drives at least one Callaway-Sant’Anna analytical path through that fallback and asserts the warning is emitted.

…oStage tests CI review on PR #334 flagged one P1 (incomplete audit of silent solve fallbacks) and two P3s. P1 — Callaway-Sant'Anna `_safe_inv()` sibling site: The shared `_safe_inv(A)` helper in staggered.py was still silently falling back from np.linalg.solve to np.linalg.lstsq. It feeds ~13 analytical SE paths (propensity-score Hessian, OR bread, event-study bread, etc.), so a rank-deficient design could silently ship degraded analytical SEs. Extended `_safe_inv(A, tracker: Optional[list] = None)` to append a condition-number sample on LinAlgError when a tracker is passed. Initialize `self._safe_inv_tracker: List[float] = []` at the top of `CallawaySantAnna.fit()`, thread `tracker=self._safe_inv_tracker` through all 13 callsites, and emit ONE aggregate UserWarning at the end of fit() listing the number of fallbacks and max condition number. Matches the tracker pattern established in STD finding #17. Added TestCallawaySantAnnaSafeInvFallback with two tests: collinear covariates trigger the aggregate warning; well-conditioned data emits no warning (happy-path regression guard). REGISTRY.md §CallawaySantAnna notes the new warning contract. P3 — Sieve docstrings lag behavior: Updated estimate_propensity_ratio_sieve and estimate_inverse_propensity_sieve docstrings to describe the new cond(A) > 1/sqrt(eps) precondition check, the partial-K skip warning, and the all-K fallback semantics. P3 — No TwoStage regression coverage: Added TestTwoStageStage2BreadWarning with two tests covering both the analytical TSL and bootstrap bread paths (contract: if the lstsq fallback triggers, it must warn). TODO.md: logged honest_did.py:1907 basis-enumeration skip as an intentional algorithm behavior (not a silent failure per the Phase 2 audit definition) but notable for a future diagnostic enhancement. No behavioral change on well-conditioned inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T16:05:46Z

/ai-review

github-actions · 2026-04-19T16:12:57Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 90a38316506e2604030627ffba091edb8f441eae

Overall Assessment

⚠️ Needs changes

Executive Summary

The previous P1 on CallawaySantAnna._safe_inv() appears addressed: the fallback is now tracked, surfaced once per fit, and documented in the Methodology Registry.
EfficientDiD’s new sieve cond(A) rejection is documented in REGISTRY.md, so that methodology deviation is informational rather than defective.
Severity: P1 [Newly identified]. StaggeredTripleDifference still has a silent lstsq fallback in the propensity-score Hessian path, so the silent-solver audit is incomplete for IPW/DR inference.
Survey and TwoStage changes are additive warning surfaces; I did not find new inline-inference or partial-NaN anti-patterns in the touched code.
Severity: P3. The new TwoStage tests do not actually force the warned Stage-2 bread path, so that warning surface is still effectively unverified.
Review limitation: pytest is not installed in this sandbox, so this is a static review only.

Methodology

Severity: P1 [Newly identified]. Impact: StaggeredTripleDifference still silently falls back from np.linalg.inv(XWX) to np.linalg.lstsq(...) in the propensity-score Hessian inversion path at diff_diff/staggered_triple_diff.py:L1462-L1470. That Hessian is then used in the PS influence-function correction at diff_diff/staggered_triple_diff.py:L1331-L1338, so rank-deficient PS designs can still silently degrade IPW/DR inference. This is part of the documented estimator formulation (hessian = (X'WX)^{-1} * n_pair) at docs/methodology/REGISTRY.md:L1802-L1809, but the StaggeredTripleDifference registry note added in this PR only documents the OR-side fallback, not this PS-side one, at docs/methodology/REGISTRY.md:L1694-L1701. This also is not tracked in the existing StaggeredTripleDifference TODO items at TODO.md:L71-L73. Concrete fix: surface the PS-Hessian fallback with the same aggregated warning pattern used for the OR solve, update the registry note if the fallback remains intentional, and add a regression test that forces collinear covariates under estimation_method="ipw" or "dr" and asserts the warning.
Severity: P3-informational. Impact: EfficientDiD’s new sieve cond(A) > 1/sqrt(eps) rejection is explicitly documented in docs/methodology/REGISTRY.md:L746-L757 and matches the implementation in diff_diff/efficient_did_covariates.py:L264-L318 and diff_diff/efficient_did_covariates.py:L444-L488. Under the stated review policy, this is not a defect. Concrete fix: None.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3. Impact: The new TwoStage warnings tell users to “drop collinear covariates” at diff_diff/two_stage.py:L1774-L1778 and diff_diff/two_stage_bootstrap.py:L203-L207, but the warned matrix is X_2'WX_2, and X_2 is built from Stage-2 treatment/horizon/group indicators, not the covariate block, at diff_diff/two_stage.py:L1141-L1146, diff_diff/two_stage.py:L1304-L1320, and diff_diff/two_stage.py:L1431-L1442. That guidance points users at the wrong failure mode. Concrete fix: revise the warning text to reference singular or zero-weight second-stage indicator columns, and ideally mention which aggregation path triggered it.

Tech Debt

Severity: P3-informational. Impact: The new honest_did.py diagnostic follow-up is properly tracked in TODO.md:L85, so it does not block this PR. Concrete fix: None.

Security

No findings.

Documentation/Tests

Severity: P3. Impact: The new TwoStage tests do not directly exercise the warning path they claim to cover. They create collinear z1/z2 covariates at tests/test_two_stage.py:L1450-L1457, then explicitly allow the warning not to fire at tests/test_two_stage.py:L1479-L1493 and tests/test_two_stage.py:L1517-L1519. But those covariates do not enter the warned Stage-2 bread matrix X_2'WX_2 at all; X_2 is built only from Stage-2 indicators at diff_diff/two_stage.py:L1141-L1146, diff_diff/two_stage.py:L1304-L1320, and diff_diff/two_stage.py:L1431-L1442. So the new warning sites in diff_diff/two_stage.py:L1762-L1782 and diff_diff/two_stage_bootstrap.py:L191-L211 remain unverified. Concrete fix: add a direct test that makes X_2'WX_2 singular, e.g. an event-study/group case with an all-zero or all-zero-weight Stage-2 column, or a lower-level unit test of the variance helpers.
Review limitation: pytest is unavailable in this sandbox (pytest: command not found), so I could not execute the targeted tests.

Path to Approval

Surface the remaining StaggeredTripleDifference PS-Hessian fallback in _compute_pscore() with a user-visible warning, preferably using the same aggregate-warning pattern already added for the OR fallback.
Add a regression test that forces that PS-Hessian path under estimation_method="ipw" or "dr" and asserts the warning is emitted.

…+ stronger tests Round 2 CI review surfaced one P1 and two P3s. P1 — StaggeredTripleDifference PS-Hessian fallback: `_compute_pscore` had a second silent lstsq fallback (np.linalg.inv(X'WX) → lstsq on LinAlgError) that was missed in the initial PR. Under IPW/DR inference a rank-deficient propensity-score design could silently degrade influence-function corrections. Added a separate `self._ps_lstsq_fallback_tracker` alongside the existing OR tracker; `_compute_pscore` appends a condition-number sample per LinAlgError. `fit()` emits a sibling aggregate UserWarning with cell count + max condition number. Added TestStaggeredTripleDiffORSolveFallback ::test_collinear_covariates_emit_ps_hessian_warning which forces the PS-Hessian path under estimation_method="ipw". REGISTRY note added. Also scoped the existing OR-side test to the OR message text so the two aggregate warnings don't collide in the assertion. P3 — TwoStage warning text accuracy: Reviewer correctly pointed out that "drop collinear covariates" was misleading because X_2 is the Stage-2 indicator design (treatment, event-time, or group dummies), not user covariates. Reworded both the analytical and bootstrap warnings to name the actual failure mode (zero- weight or all-zero indicator column from an aggregation path with no qualifying observations). P3 — TwoStage tests were not verifying the warning path: Reviewer noted that my previous z1/z2 collinear-covariate tests never reached X_2'WX_2 at all because user covariates go into Stage 1, not Stage 2. Rewrote both tests to patch np.linalg.solve and raise LinAlgError specifically on the `solve(X'WX, np.eye(k))` shape, forcing the Stage-2 bread fallback. Tests now directly assert the warning fires and the lstsq fallback still produces finite SEs. No behavioral change on well-conditioned inputs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-04-19T16:42:39Z

/ai-review

github-actions · 2026-04-19T16:47:46Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 099507dc1987605a8cc9087c7c9c8ee6e1b2668d

Overall Assessment

Looks good

Executive Summary

All previously raised findings appear addressed in the changed code.
The prior P1 on the silent StaggeredTripleDifference propensity-score Hessian fallback is resolved: the fallback is now surfaced once per fit(), documented in the Methodology Registry, and covered by a regression test in the IPW path.
EfficientDiD's new sieve cond(A) rejection changes behavior only in a documented registry-noted way, so it is informational rather than a methodology defect.
The previous TwoStage warning-text and test-reachability concerns are resolved: the warning now points to the singular Stage-2 indicator design, and the tests directly force both analytical and bootstrap bread fallback handlers.
I did not find new incorrect variance/SE handling, NaN-inference regressions, control-group errors, or parameter-propagation gaps in the touched code.
Review limitation: pytest is unavailable in this sandbox, so this is a static review only.

Methodology

Severity: P3-informational. Impact: The prior StaggeredTripleDifference P1 is resolved. The PR now surfaces the per-pair propensity-score Hessian fallback once per fit(), documents that behavior in the Methodology Registry, and adds an IPW regression test for the warning surface. See diff_diff/staggered_triple_diff.py:L537-L555, diff_diff/staggered_triple_diff.py:L1488-L1501, docs/methodology/REGISTRY.md:L1701-L1702, and tests/test_staggered_triple_diff.py:L590-L631. Concrete fix: None.
Severity: P3-informational. Impact: EfficientDiD's new cond(A) > 1/sqrt(eps) K-rejection is explicitly documented in the Methodology Registry and matches the implementation and tests, so under the review policy it is not a defect. See diff_diff/efficient_did_covariates.py:L175-L181, diff_diff/efficient_did_covariates.py:L264-L318, diff_diff/efficient_did_covariates.py:L366-L372, docs/methodology/REGISTRY.md:L757-L757, and tests/test_efficient_did.py:L2067-L2135. Concrete fix: None.
Severity: P3-informational. Impact: CallawaySantAnna now aggregates _safe_inv() analytical-SE fallbacks into one fit-level warning, and all touched _safe_inv() call sites in the changed file thread the tracker through. The registry note matches the implementation, so I do not see an undocumented methodology deviation here. See diff_diff/staggered.py:L95-L118, diff_diff/staggered.py:L1457-L1461, diff_diff/staggered.py:L2003-L2021, docs/methodology/REGISTRY.md:L300-L304, and tests/test_staggered.py:L4171-L4232. Concrete fix: None.
Severity: P3-informational. Impact: The survey and TwoStage changes are additive warning surfaces around existing SE and bread behavior, with corresponding registry notes. I did not find a new undocumented variance/SE mismatch or NaN-inference regression in the touched paths. See diff_diff/survey.py:L1448-L1477, diff_diff/two_stage.py:L1762-L1789, diff_diff/two_stage_bootstrap.py:L191-L217, docs/methodology/REGISTRY.md:L1178-L1181, docs/methodology/REGISTRY.md:L2727-L2732, tests/test_survey.py:L3350-L3412, and tests/test_two_stage.py:L1449-L1548. Concrete fix: None.

Code Quality

No findings.

Performance

No findings.

Maintainability

No findings.

Tech Debt

Severity: P3-informational. Impact: The out-of-scope honest_did.py silent basis-rejection diagnostic is now explicitly tracked in TODO.md, so the remaining follow-up is properly deferred and non-blocking under the stated policy. See TODO.md:L85-L85 and diff_diff/honest_did.py:L1906-L1909. Concrete fix: None.

Security

No findings.

Documentation/Tests

Severity: P3-informational. Impact: The previous TwoStage test gap is addressed: the new tests now force the analytical and bootstrap bread fallback handlers directly, and the new StaggeredTripleDifference, CallawaySantAnna, EfficientDiD, and survey tests each cover the warning surface introduced by this PR. See tests/test_two_stage.py:L1449-L1548, tests/test_staggered_triple_diff.py:L546-L679, tests/test_staggered.py:L4171-L4232, tests/test_efficient_did.py:L2064-L2135, and tests/test_survey.py:L3350-L3412. Concrete fix: None.
Review limitation: pytest is not installed in this sandbox (pytest: command not found), so I could not execute the targeted tests locally.

PR #330 marked `test_timing_performance` and `TestPerformanceRegression` with `@pytest.mark.slow`, which the default pytest `addopts = "-m 'not slow'"` already excludes. That catches the default Python CI matrix but misses the Rust-backend CI jobs at `.github/workflows/rust-test.yml:155, 162, 190`, which explicitly override the marker filter with `-m ''` so they can exercise the full slow suite (intentional — TROP parity tests live there). That's why our PR #334 tripped a 0.120s vs 0.1s threshold on Windows py3.11 under the Rust backend. Add a `skipif(os.environ.get("CI") == "true", ...)` marker in addition to `@pytest.mark.slow` on the affected tests: - `test_se_accuracy.py::TestCallawaySantAnnaSEAccuracy::test_timing_performance` - `test_se_accuracy.py::TestPerformanceRegression` (class-level) - `test_methodology_honest_did.py::TestOptimalFLCI::test_m0_short_circuit` GitHub Actions sets `CI=true` on every runner, so the skip covers both the default-CI and Rust-CI invocation patterns. Local development flows (`pytest`, `pytest -m slow`, `pytest -m ''`) are unaffected — no `CI` env var means the tests still run as on-demand performance sanity. The `test_m0_short_circuit` case is special: it uses wall-clock time as a proxy for "short-circuit path taken" (fast path <0.5s, slow optimization would blow past that). The existing PR #330 TODO.md entry already tracks replacing it with a mock/spy; the `skipif` here is the interim guard until that refactor lands. Verified locally: `CI=true pytest ... -m ''` reports 5 skipped (all three targets); unset `CI` and the tests run and pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-failures audit Packages 161 commits across 18 PRs since v3.1.3 as minor release 3.2.0. Per project SemVer convention, minor bumps are reserved for new estimators or new module-level public API — BusinessReport / DiagnosticReport / DiagnosticReportResults (PR #318) add a new public API surface and drive this bump. Headline work: - PR #318 BusinessReport + DiagnosticReport (experimental preview) - practitioner- ready output layer. Plain-English narrative summaries across all 16 result types, with AI-legible to_dict() schemas. See docs/methodology/REPORTING.md. - PR #327, #335 did-no-untreated foundation - kernel infrastructure, local linear regression, HC2/Bell-McCaffrey variance, nprobust port. Foundation for the upcoming HeterogeneousAdoptionDiD estimator. - PR #323, #329, #332 dCDH survey completion - cell-period IF allocator (Class A contract), heterogeneity + within-group-varying PSU under Binder TSL, and PSU-level Hall-Mammen wild bootstrap at cell granularity. - PR #333 performance review - docs/performance-scenarios.md documents 5-7 realistic practitioner workflows; benchmark harness extended. Silent-failures audit closeouts (PRs #324, #326, #328, #331, #334, #337, #339) continue the reliability work started in v3.1.2-3.1.3 across axes A/C/E/G/J. CI infrastructure: PRs #330 and #336 exclude wall-clock timing tests from default CI after false-positive flakes; perf-review harness is the principled replacement. Version strings bumped in diff_diff/__init__.py, pyproject.toml, rust/Cargo.toml, diff_diff/guides/llms-full.txt, and CITATION.cff (version: 3.2.0, date-released: 2026-04-19). CHANGELOG populated with Added / Changed / Fixed sections and the comparison-link footer. CITATION.cff retains v3.1.3 versioned DOI in identifiers; the v3.2.0 versioned DOI will be minted by Zenodo on GitHub Release and added in a follow-up. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber added the ready-for-ci Triggers CI test workflows label Apr 19, 2026

igerber merged commit 4a48487 into main Apr 19, 2026
22 of 24 checks passed

igerber deleted the fix/axis-a-minor-solver-paths branch April 19, 2026 18:18

igerber mentioned this pull request Apr 19, 2026

Skip wall-clock timing tests on CI to stop false-positive flake #336

Merged

igerber mentioned this pull request Apr 20, 2026

Release 3.2.0: BusinessReport preview, dCDH survey completion, silent-failures audit #342

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Surface silent np.linalg.solve fallbacks across axis-A minor solver paths#334

Surface silent np.linalg.solve fallbacks across axis-A minor solver paths#334
igerber merged 3 commits intomainfrom
fix/axis-a-minor-solver-paths

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Apr 19, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

igerber commented Apr 19, 2026

Uh oh!

github-actions Bot commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant