Skip to content

feral v0.6.0

Choose a tag to compare

@jkitchin jkitchin released this 23 May 22:31
· 214 commits to main since this release

Added — Solver::with_scaling() builder ([#51][i51])

New public builder method on Solver to pin the scaling strategy
explicitly. Lets IPM hosts override ScalingStrategy::Auto when
they already know what they want — recommended escape hatch when
the picker's per-matrix heuristic is wrong for a problem class.
Default (Auto) is now sticky on cached pattern (see "Fixed" below)
so the picker no longer flaps; this builder is for callers that
want full control.

Fixed — sticky Auto scaling on cached pattern ([#51][i51])

Three coordinated fixes in src/numeric/solver.rs address a ~50×
numeric-phase slowdown on IPM workloads when the picker re-routes
across iters or PartialSingular outcomes drop the MC64 cache.

Reproducer (pounce/gams/nlpbench/feral_repro/powerflow22/,
n=2,813,976 IPM KKT, default Auto):

call pre-fix post-fix
factor (cold) 54.77 s 54.42 s
refactor (cached symbolic, iter A values) 1.00 s 1.02 s
factor #2 (cached symbolic, iter B values) 53.80 s 1.07 s

The three fixes:

  1. Sticky Auto pick. First factor() on a pattern runs the full
    compute_scaling_auto_with_cache pipeline (preserves Policy-4
    fallback semantics — InfNorm-spread guard, off-diag-ratio guard,
    MC64 catastrophic-spread guard). Post-call we derive the resolved
    strategy from factors.scaling_info
    (Mc64FallbackToInfnorm → InfNorm,
    PartialSingular → Mc64Symmetric, NotApplied → Identity,
    Applied → pick_scaling_strategy(matrix)) and stash it on the
    Solver as auto_picked_strategy. Every subsequent factor on
    the same pattern uses the stashed strategy directly, bypassing
    Auto. Pattern change clears it alongside the MC64 cache. Mirrors
    MUMPS ICNTL(7) / SSIDS options%ordering: structural decision
    once at first call, reuse every refactor.

  2. MC64 cache gate widened to PartialSingular. The Hungarian
    on a structurally rank-deficient KKT still produced a real
    scaling vector (unmatched positions land at 1.0 per
    mc64.rs:222); the value-bound check still gates reuse, so
    caching is correctness-safe. Pre-fix the post-#49 gate required
    ScalingInfo::Applied, which dropped the cache on every IPM iter
    over a structurally rank-deficient KKT and forced the Hungarian
    to rerun from scratch.

  3. Solver::with_scaling() builder (see "Added" above).

Tests

Three unit tests in src/numeric/solver.rs:

  • issue_51_with_scaling_builder_overrides_default
  • issue_51_auto_pick_is_sticky_on_cached_pattern
  • issue_51_partial_singular_populates_cache

And one #[ignore]'d corpus regression test:
issue_51_corpus_sticky_auto_holds_across_ipm_iters walks
tests/data/parity/<family>/*.mtx for every family with ≥ 2 IPM
iter snapshots, factors all iters against one Solver::new(), and
asserts the sticky pick holds across every iter (13 families
covered: acopp30, hatfldbne, hahn1, ssi, …).

mc64_fallback_surfaces_via_solver_api was updated to reflect the
new sticky semantics: iter 2 now asserts
mc64_fallback_count == 1 (sticky pin runs straight InfNorm; no
fallback to surface) and locks
auto_picked_strategy = Some(InfNorm).

Changed — Auto dispatcher rewrites ([#50][i50])

The OrderingMethod::Auto dispatcher (src/symbolic/mod.rs:: choose_adaptive) was simplified to two rules on top of
pick_default_method: kept only the very-large-and-sparse catch
(n > 100_000 && full_avg_deg < 5.0 → Amd); everything else
delegates to pick_default_method (n ≤ 10_000 → Amf,
n > 10_000 → MetisND).

  • Fix A — large-and-sparse swap (c442a0c). The pre-fix
    n > 100_000 && full_avg_deg < 5.0 → ScotchND branch is swapped
    to Amd. On powerflow22 (n=2.8 M, full_avg_deg ≈ 3.7) prior
    ScotchND took 113.8 s symbolic (15.8 M nnz_L); AMD takes 55 s
    (10.4 M nnz_L). The ScotchND advantage at very large n had been
    load-bearing against the same BK pivoting cascade that motivated
    the chain catches; issue #46 eliminated that amplifier in May
    2026 and removed the justification for routing very-large sparse
    matrices through nested dissection. Corpus inventory:
    dev/research/issue-50-numeric-inventory.csv shows the IPM
    corpus's [100k, 200k) bucket has AMD/MetisND num_nnz_l ratio 1.00
    on both representatives. Validation: 258 chain-catch corpus rows
    under post-Fix Auto — 0 failures, 0 num_nnz_l regressions for
    matrices that actually reroute. See
    dev/research/issue-50-metisnd-symbolic-cost.md §F7–§F11.

  • F11 follow-up — small-and-sparse retire (3f8f6f6). The
    pre-fix n < 10_000 && full_avg_deg < 15.0 → KahipND branch is
    deleted entirely; the population now falls through to
    pick_default_method's n ≤ 10_000 → Amf default (MUMPS
    ana_set_ordering.F SYM=2 N≤10000). Justified by an 838-matrix
    4-way inventory (dev/research/small-sparse-inventory.csv): AMF
    wins 169/838 strict per-matrix (KahipND 16); aggregate num_nnz_l
    ratios to AMD are AMF 0.870×, KahipND 0.984×; aggregate factor_us
    ratios are AMF 0.832×, KahipND 0.990×. KahipND remains reachable
    via OrderingMethod::KahipND for the 41 high-avg-deg cases
    (STEENBRD, HADAMARD, TABLE8) where it still wins — all sub-22k
    nnz_L absolute. See dev/research/issue-50-metisnd-symbolic-cost.md
    §F12.