Skip to content

feral v0.8.0

Choose a tag to compare

@jkitchin jkitchin released this 29 May 01:40
· 189 commits to main since this release

Removed — 4 synthetic rank-deficient stress-corpus matrices

Dropped rankdef_10_3, rankdef_50_5, rankdef_exact_50_5, and
stokes_q1p0_8 from the stress corpus. Under #54's SSIDS-aligned
strict-zero routing, feral reported inertia.zero = 1 on all four —
which contradicted MUMPS, SSIDS, and MA57 simultaneously,
violating CLAUDE.md's "must agree with at least one canonical" rule
and red-lighting the stress-smoke gate on the v0.8.0 release commit.

Rather than allowlist them or narrow #54's zero_tol (which would
reopen the IPM δ-cascade instability on nuffield2_trap_iter1.mtx
that motivated #54 in the first place), the four matrices were
removed entirely. They were synthetic borderline fixtures where the
"correct" zero count depends on order-1e-15 round-off and no
3-of-4-oracle consensus exists — the same disagreement that
compute_consensus.py already tags excluded. The rank-deficient
regime remains covered by rankdef_5_2, rankdef_200_20,
rankdef_exact_100_10, saddle_rankdef_50_10_3,
saddle_rankdef_100_20_5. See dev/decisions.md (2026-05-28 entry).

Performance — Thomson-Hessian per-iter throughput (#56)

Three additive levers on the dense / wide-supernode IPM-KKT hot path,
landed on issue-56-thomson-hessian-throughput and merged to main:

  • Lever A — pre-built permuted_pattern + permute structure cache.
    Collapsed the symbolic permute (P A P^T) + symmetric_pattern
    phases to <0.5 % of total wall on warm calls.
  • Lever B — fused single-pass contribextract write. Replaced
    resize(cdim², 0.0) + lower-triangle overwrite with
    reserve + unsafe set_len + write-each-cell-once. Bit-identical
    contrib block (extend_add reads only ci ≥ cj; root-Schur
    extractor canonicalizes via transpose; parallel_corpus_parity
    binary-compares the full buffer). Roughly 33 % fewer writes.
    Re-measurement at Thomson n=200 (sequential, 9 warm reps):
    factor min −5.5 % (parallel ON), −10 % (parallel OFF).
  • Lever C — InfNorm Knight-Ruiz inner-loop vectorization. Hoisted
    the loop-carried row_max[j] dependency to a register accumulator
    (sparse and dense paths, bit-identical by associativity of max on
    non-NaN finite inputs); added a pulp-dispatched SIMD off-diagonal
    kernel on the dense path. Re-measurement at Thomson n=200:
    scaling phase −19 %, total wall −5 %.

No regression on the Phase 2.8.1 corpus partition gates; both
small-frontal and medium buckets improved relative to the 2026-05-27
baseline. KIRBY2 family worst-case sparse outlier improved from
10.25× → 7.97× vs MUMPS.

See dev/sessions/2026-05-28-01.md and
dev/research/issue-56-thomson-hessian-throughput-2026-05-27.md for
the localization data and re-measurement tables.

Added — symbolic-analysis-time delayed-pivot budget + CB rewire (#55)

Per-supernode delayed_capacity is assigned during symbolic
analysis. The numeric phase enforces the bound at the frontal
expansion site: if more pivots delay into a supernode than its
capacity, the factor returns the structured
FeralError::DelayBudgetExceeded { supernode, required, capacity }
(MUMPS INFO(2) workspace-overflow analog) rather than growing
the front unboundedly. Cascade-break now triggers on budget
exhaustion instead of the heuristic ratio gate, matching MUMPS's
dfac_front_aux.F:1251-1331 invariant that static perturbation
fires only when delay is structurally impossible.

Capacity formula:

  • tight = max(4 * own_ncol, 16)
  • capacity = min(subtree_ncol(s) - own_ncol(s), tight)

Root-supernode cap (defensive, n >= 1024): declines amalgamations
that would push the root past min(0.05 * n, 2048) columns.

FeralConfig::default() now ships with cascade-break armed
(cascade_break_ratio = Some(0.5), cascade_break_eps = Some(1e-10))
as the budget-exhausted fallback. Pounce's per-problem .opt
overrides for nql180 / pinene_3200 are no longer required.

See dev/research/symbolic-delay-budget-2026-05-27.md and
dev/decisions.md (2026-05-27 entry, frozen convention).

Changed — strict-zero pivots route to inertia.zero (SSIDS-aligned, #54)

When ZeroPivotAction::ForceAccept accepts a 1×1 pivot whose
magnitude satisfies |d| <= zero_tol, the inertia now increments
zero instead of routing by sign(d). This matches SSIDS
(NumericSubtree.hxx:259-267) and HSL MA57 (INFO(24) = neig,
INFO(25) = number of zero pivots) and supersedes the Issue #42
Option A sign-routing rule.

Motivation: pounce's IPM δ-cascade on nuffield2_trap_iter1.mtx
(LP-shaped KKT, n=26155) stalled for 600 s (vs 1.8 s on MA57)
because Option A split bit-exact zero pivots by IEEE round-off and
the counter jumped backwards mid-cascade (13042 → 12615). The new
accounting restores monotonicity (probe confirms 0 backwards jumps
across δ_x ∈ {0 … 6.99e19}).

Behavior changes visible to callers:

  • Solver::inertia() and factor()'s Inertia field now report
    the mathematical (Sylvester) inertia on matrices with rank
    deficiency that lands in strict-zero pivots.
  • num_negative_eigenvalues() returns strict inertia.negative
    only (unchanged convention). IPM callers comparing against an
    expected oracle should now sum negative + zero to match
    MA57's INFO(24)+INFO(25) convention.

Tests updated to reflect the new convention; see
dev/decisions.md (2026-05-26 entry) for the full trade-off
analysis and dev/research/issue-54-lp-kkt-inertia.md for the
oracle cross-check.

Added — DelayBudgetExceeded exposed through Python bindings

feral-solver (Python) now re-exports the #55 error as a named
exception:

  • feral.DelayBudgetExceeded — subclass of feral.FactorError
    (and transitively feral.FeralError), raised when
    factor()/refactor() hits the symbolic-analysis-time
    delayed-pivot budget. The message carries the supernode index,
    required and capacity columns. Python callers no longer see this
    routed through the generic NumericFailure wildcard.

Added — documentation site (mdBook + rustdoc on GitHub Pages)

  • book/ — mdBook skeleton (introduction, getting-started,
    inertia, api). Content is intentionally thin; the dev/research/
    notes remain internal and are not auto-published.
  • .github/workflows/pages.yml — builds the book and
    cargo doc --workspace --no-deps --lib, mounts rustdoc under /api/,
    and deploys to GitHub Pages on every push to main. Site URL is
    https://jkitchin.github.io/feral/; rustdoc at /api/feral/.
  • Cargo.toml: /book added to the crate exclude list so the book
    source does not ship in the published crate tarball.
  • .gitignore: /book/book ignored (mdBook output is regenerated by
    the Pages workflow).

Repo setting required once: Settings → Pages → Source: GitHub
Actions
. Until that flag is set, the deploy job will fail with a
"Pages not enabled" error; the build job is unaffected.