Add LSMR/MLSMR Krylov solver alongside CG by schroedk · Pull Request #25 · py-econometrics/within

schroedk · 2026-04-28T14:07:53Z

Summary

Adds LSMR (Fong & Saunders) and a preconditioned variant as
KrylovMethod::Lsmr { local_size }, fully wired through to the Python
API as within.LSMR(tol, maxiter, local_size). CG remains the default;
LSMR is opt-in via config=LSMR(...).

The motivation is accuracy on ill-conditioned designs. CG's residual
stopping rule under-resolves rank-deficient fixed-effects systems
(sparse-mobility AKM, multi-factor high-cardinality panels), while
LSMR's normal-equation stop bounds the demean error directly.

Headline benchmark

Sparse-mobility AKM, 5 M obs (worker = 833 k × firm = 25 k, 1 % mobility):

method	tol	per_solve	demean_rel	demean_max
CG	1e-12	692 ms	5.73e-10	1.82e-8
LSMR	1e-12	773 ms	4.98e-12	1.96e-10

~100× tighter demean at +12 % wall time. Tightening CG by one tolerance
magnitude does not close the gap on this shape (CG-1e-13 still
trails LSMR-1e-12 by ~10× in both relative L2 and pointwise max).

On well-conditioned designs (1-/2-/3-FE panels, AKM with 10 % mobility)
the two methods are within 10–25 % on time, and CG with one-magnitude
tighter tolerance does match LSMR's accuracy. CG remains the right
default for those cases.

Full sweep over 1fe-highcard / 2fe-panel / 3fe-panel / akm / akm-sparse / 3fe-highcard / 4fe-chain at {500 k, 1 M, 5 M} obs:
crates/within/examples/lsmr_vs_cg.rs.

Diff buckets

Solver core — crates/schwarz-precond/src/solve/lsmr/{,bidiag,recurrence,tests}.rs
within wiring + Python bindings — crates/within/src/{solver,operator,config}.rs,
crates/within-py/src/lib.rs, python/within/{__init__.py,_within.pyi}
Examples and bench-suite additions — ~850 lines, splittable to a
follow-up PR if reviewer scope is tight

Modified LSMR (Fong & Saunders 2011) using the Modified Golub-Kahan bidiagonalization (Arridge et al. 2014), which requires only one M^-1 application per iteration — the preconditioner M ≈ A^T A is applied directly with no square-root factorization needed. Exposed as `schwarz_precond::solve::lsmr::mlsmr`. Covered by unit tests for unpreconditioned, preconditioned, inconsistent, and maxiter paths.

- KrylovMethod::Lsmr variant dispatched inline in Solver::solve. - WeightedDesignOperator adapts WeightedDesign to the rectangular A = W^{1/2}D operator MLSMR expects; its normal equations A^T A coincide with the Gramian, so the existing Schwarz preconditioner carries over unchanged. - Python: new LSMR config class (tol, maxiter). CG-style preconditioner validation now also covers LSMR. - Tests: orchestrate_solve covers weighted LSMR and LSMR-vs-CG agreement.

The old threshold `EPS * rz_init` fired when `rz_new/rz_init < EPS`, i.e. `||r||/||b|| < sqrt(EPS) ~ 1.5e-8`, colliding with user tolerances near 1e-8 and cutting off CG above the attainable-accuracy floor of `||r||/||b|| ~ u` for well-conditioned problems. The new threshold EPS^2 * rz_init fires only when rz reaches true numerical noise (`||r|| ~ EPS*||b||`), restoring legitimate convergence near sqrt(EPS) tol. Resolves the `prop_single_factor_converges` property-test failure.

Documents the LSMR rectangular least-squares solver and the CG stagnation guard fix landed on this branch.

Include LSMR(Schwarz) alongside CG(Schwarz) and GMRES(Mult-Schwarz) in standard_solver_configs so every comparison suite exercises all three Krylov methods. Thread an optional maxiter override through standard_solver_configs for suites that raise the iteration cap. Suites with the plain CG+GMRES pair (preconditioners_3fe, high_fe, high_fe_scaling) now delegate to standard_solver_configs. preconditioner_comparison keeps its CG(none) baseline and appends the standard trio. Suites whose purpose is a different comparison axis (fixest_comparison exact/approx Schur, ac_comparison AC/AC2 local solver, graph_backend_comparison) append the standard trio alongside their existing configs with clearer label naming.

Fuse and parallelize the per-iteration axpy-with-sq-norm, axpby, dot, and the h̄/x/h/v update block behind a 10k-element threshold with 4096-sized chunks. Small problems stay sequential to avoid rayon dispatch overhead; the chunk size keeps each task L1-resident, which beat alternative schemes (per-thread chunks, always-parallel) in profiling on 1M/5M-obs 3FE panels.

- Cargo profile `profiling` inherits release + line-table debug info so samply/perf/dtrace can resolve stack frames without LTO symbol loss. - samply added to pixi dev deps. - Two new examples under within/: lsmr_profile (single-shape 3FE DGP for samply capture) and lsmr_vs_cg (head-to-head CG vs LSMR across shapes and tolerances, reporting wall time and demean error vs a reference).

Restructure lsmr.rs so the code mirrors the four blocks Fong & Saunders write in Algorithm 2.8: bidiagonalization, P̂_k, P̄_k, and the (x, h, h̄) recurrence. - Add `Givens { c, s, r }` with `Givens::new(a, b)` so each rotation construction is one named call, not three lines duplicated twice. - Introduce a `Bidiagonalization` trait with two impls. `GolubKahan` handles the unpreconditioned path with no `p_tilde` buffer and no `M⁻¹` apply; `ModifiedGolubKahan` (renamed from `MgkBidiag`) keeps the M-norm dot-product trick. The choice of preconditioner is now confined to picking which bidiagonalization to construct. - `LsmrRecurrenceState::step` shrinks to constructing P̂_k and P̄_k and committing chain state; it returns a `RotationStep` of natural rotation outputs (ρ, ρ̄, θ_new, θ̄, ζ) instead of pre-baked solution coefficients. - `SolutionState::update` takes the current and previous `RotationStep` and derives its own (t_x, t_hbar, t_h) ratios at the point of use. `RotationStep::initial()` seeds the first iteration so `h̄₀ = 0` falls out without a special case. - `mlsmr` dispatches `None`/`Some` once into a generic `lsmr_from_bidiag` that knows nothing about `M`. No behavioral change; all four LSMR tests pass.

…ence `mlsmr` with `None` (GolubKahan) and with `Some(&IdentityOperator)` (ModifiedGolubKahan with M = I) are mathematically the same algorithm. Add a test that runs both on the existing 4x3 `OverdeterminedOp` system and asserts the iteration counts, solutions, and residual-norm estimates agree to 1e-10. Guards against future drift between the two bidiagonalization implementations.

Adds local (windowed) modified Gram-Schmidt to both Golub-Kahan and Modified Golub-Kahan bidiagonalizations, restoring v-orthogonality on ill-conditioned problems where the short recurrence drifts. Surfaced as `local_size: Option<usize>` on `KrylovMethod::Lsmr` and the Python `LSMR` config. `None` (default) is the zero-cost no-op fast path. Includes a Vandermonde demo example, regression and orthogonality tests for the windowed path, and a Python sweep script comparing window sizes across panel-data topologies.

Three follow-ups from review of the windowed-reorth change: * `mlsmr` no longer treats `‖b‖ < EPSILON` as exactly zero — uses `b_norm == 0.0` instead, so scaled-small RHS vectors flow through the normal solve path rather than silently returning x = 0 with a false residual of 0. * `LocalReorth::new` and `ModifiedLocalReorth::new` now clamp ring capacity by `min(m, n)` instead of just `n`, matching the actual bound on the bidiagonalization sequence and avoiding unnecessary n-length allocations on wide rectangular operators. * New `test_mlsmr_none_matches_identity_precond_windowed` extends the existing None-vs-IdentityOperator equivalence guarantee to the windowed path, guarding the M-weighted scaling logic in `ModifiedLocalReorth::push` against future drift.

`solve/lsmr.rs` had grown to ~1500 lines mixing the bidiagonalization machinery, the Givens/recurrence/convergence state machines, and a ~500-line inline test module. Split along the producer/consumer seam: - `lsmr.rs` (~90 lines) — facade with `mlsmr`, `LsmrResult`, the `lsmr_from_bidiag` driver, and the shared parallel-kernel constants. - `lsmr/bidiag.rs` — the `(α, β)` stream: `Bidiagonalization` trait, `GolubKahan`/`ModifiedGolubKahan`, `LocalReorth`/`ModifiedLocalReorth`, and the vector kernels they consume. - `lsmr/recurrence.rs` — the consumer: `Givens`, `RotationStep`, `LsmrRecurrenceState`, `SolutionState`, `ConvergenceState`, `Stop`. - `lsmr/tests.rs` — the existing test suite, unchanged. Public API frozen: `schwarz_precond::solve::lsmr::{mlsmr, LsmrResult}` still resolves identically for `within::solver` and the `lsmr_local_reorth` example.

The "0 = disabled" convention was carried via empty `slots` Vecs and `is_empty()` short-circuits in `reorthogonalize`/`push`. Encoding the disabled state as `Option<LocalReorth>` / `Option<ModifiedLocalReorth>` on the buffer puts it in the type, drops the sentinel branches, and removes the `is_empty` helpers.

…rage) Comment-only changes to bidiag.rs and recurrence.rs (algebra unchanged): - bidiag: debug_assert alpha>0 before p_coeff division; restructure ModifiedGolubKahan::step comments into three self-contained phases - recurrence: doc-comment Stop enum and variants; expand Givens-rotation comments to map onto Fong & Saunders Algorithm 2.8; document why absolute f64::EPSILON guards are acceptable in SolutionState::update Test improvements: - Extract normal_equation_residual helper, reuse across 3 sites - Relax iter-count assert in none/identity equivalence test to <=1 - Replace exact equality on r1.x / residual_norm with <1e-15 tolerance to keep determinism testable under future parallel reductions - Add test_mlsmr_step1_alpha_zero_early_exit covering the A^T b == 0 path - Add test_mlsmr_local_reorth_window_boundary_sizes for window in {1, n_cols, n_cols+1} on Vandermonde(30,12) - Rename closure param size -> window_size; add cond(A) ~ 1e10 preface

- Drop lsmr_profile (samply harness), lsmr_local_reorth (one-off reorth-window study), and bench_lsmr_local_size.py (manual local-size sweep). All three were exploration scaffolding; their findings are baked into the solver defaults and the lsmr/cg comparison in lsmr_vs_cg. - Extend lsmr_vs_cg with akm-sparse / 3fe-highcard / 4fe-chain shapes and CG tol=1e-13 to back the PR's headline accuracy claims.

- Expose lsmr() and preconditioned_lsmr() as standalone entry points; mlsmr() remains as a thin Option-dispatched wrapper for the internal round-trip test and the existing public API. - Add LsmrStopReason for diagnostics, splitting Stop::Converged into ResidualTolerance / NormalEquationTolerance and surfacing ZeroRhs / InitialNormalEquationResidualZero / BidiagonalizationBreakdown / MaxIterations. - Add SolveError::InvalidInput, validate rhs length / finiteness, tol, and (preconditioned) shape before any work runs. - Convergence wins over breakdown when both fire on the same step; the user-supplied tolerance is the contract. - Factor the zero-rhs early-return path into zero_rhs_result(n). - Tests: cover underdetermined / rank-deficient / zero-column-and-row systems, mid-stream beta=0 lucky breakdown, zero-rhs and bad-input rejection, and dispatcher equivalence.

- Replace Vec<Vec<f64>> reorth slots with a single flat Vec<f64> and index helpers. One allocation per LocalReorth / ModifiedLocalReorth instead of `cap`; better locality for the windowed MGS pass. - Use par_dot(b, b).sqrt() in init() instead of vec_norm(b); par_dot falls back to sequential below LSMR_PAR_THRESHOLD. - Handle the beta == 0 lucky breakdown explicitly: zero v (and the paired p_tilde in the modified path) and return alpha = 0 so the caller's solution.update is a no-op. The outer solver's alpha == 0 branch then exits cleanly with BidiagonalizationBreakdown.

Make future variant additions to SolveError non-breaking for downstream Rust consumers. Bundle the attribute with the InvalidInput variant added earlier in this branch so the breakage happens once, not twice. Also fold in CHANGELOG entries for the non_exhaustive/InvalidInput change and the iterations-semantics shift (now includes refinement iterations), and patch a duplicated docstring line + missing time_total entry in the SolveResult attributes block of _within.pyi.

schroedk added 14 commits April 23, 2026 10:15

docs: add Unreleased section to CHANGELOG

1a9061e

Documents the LSMR rectangular least-squares solver and the CG stagnation guard fix landed on this branch.

schroedk force-pushed the feature/mlsmr-mlsqr branch from 98cdaf4 to 8cddfb0 Compare April 28, 2026 14:27

schroedk force-pushed the feature/mlsmr-mlsqr branch from 8cddfb0 to 9c98f2d Compare April 28, 2026 14:31

schroedk added 3 commits April 29, 2026 09:21

schroedk merged commit ef3ad04 into main Apr 29, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LSMR/MLSMR Krylov solver alongside CG#25

Add LSMR/MLSMR Krylov solver alongside CG#25
schroedk merged 18 commits intomainfrom
feature/mlsmr-mlsqr

schroedk commented Apr 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

schroedk commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Headline benchmark

Diff buckets

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

schroedk commented Apr 28, 2026 •

edited

Loading