Skip to content

stable_sort events in Result::Finalize for cpp-mirror parity (#375)#20

Open
matsen wants to merge 3 commits into
mainfrom
375-stable-sort-events
Open

stable_sort events in Result::Finalize for cpp-mirror parity (#375)#20
matsen wants to merge 3 commits into
mainfrom
375-stable-sort-events

Conversation

@matsen
Copy link
Copy Markdown
Collaborator

@matsen matsen commented May 8, 2026

Summary

Replace sort + reverse with stable_sort + descending comparator in Result::Finalize. Tied events (kets with f64 viterbi scores within ~1e-6 that collapse to identical f32 values when stored in RecoEvent::score_) are now resolved deterministically by push order — first-pushed kset wins.

This is the C++ side of the cpp-mirror fix for partis issue #375. Companion partis PR: psathyrella/partis#376.

Why

RecoEvent::score_ is float; Result::Finalize sorts events by score and picks events_[0] as best_event_. When two ksets compute f64 viterbi scores within ~1e-6 of each other, the f32 narrowing collapses them to identical float values. std::sort is unstable (libstdc++ uses introsort) and picks an algorithm-dependent winner. The Zig backend uses std.sort.block (stable), so the two backends could pick different best_events for the same query and parameter set — different (v, d, j) gene calls, different naive_seq, different hfrac, different cluster merges. Latent until ≥25k where float-tied ksets become frequent enough.

Push order in DPHandler::Run

C++ pushes events in nested reverse-iteration order (dphandler.cc:101-119):

for(size_t k_v = kbounds.vmax - 1; k_v >= kbounds.vmin; --k_v)
  for(size_t k_d = kbounds.dmax - 1; k_d >= kbounds.dmin; --k_d)
    ...
    if(algorithm_ == "viterbi" && best_scores[kset] != -INFINITY)
      result.PushBackRecoEvent(FillRecoEvent(seqs, kset, best_genes[kset], best_scores[kset]));

Zig matches this order. Both backends with stable sort + descending comparator pick events_[0] = first-pushed tied event = highest-(k_v, k_d) tied kset.

Test plan

  • Builds cleanly (scons bcrham)
  • partis 1-query repro: cpp-fixed picks the same (d_3p_del=3, dj_insertion='') as Zig (was (4, 'A') pre-fix)
  • partis 5k cpp-fixed bit-identical to 5k cpp-pre-fix and to Zig (regression — no f32-tied ksets at small N)
  • partis 30k cpp-fixed bit-identical to 30k Zig stable: 10750 igh + 10815 igk events match
  • partis-test.py --paired --no-simu passes (both with and without --zig)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

matsen and others added 2 commits April 24, 2026 15:52
…gedQuery

- glomerator.cc ReadCacheFile: stof→stod for log_probs_ and naive_hfracs_
  (both maps are map<string,double>; cache files are written with setprecision(20))
- glomerator.cc GetMergedQuery: double(n)*mute_freq_ so products use double
  arithmetic instead of float (size_t*float→float was losing precision)
- text.cc Floatify: stof→stod (return type is vector<double>)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…pace (#19)

Replaces the glibc log + exp pair inside AddInLogSpace with a single
65 K-entry linear-interpolation lookup of softplus(d) = log(1+exp(d)).
On modern glibc 2.29+ with FMA-optimized scalar log/exp, this saves
20% of bcrham wall and 15% of partis wall on a 5k paired partition
(median-of-3 on AMD Zen 5; bigger wins on older Intel/AMD per
cross-arch microbench at psathyrella/partis#366 item 3.2).

Implementation in src/fast_math.c, declared via extern "C" in
mathutils.h. Linear-interp arithmetic dominates over the table size
in cycle count, so picking 64 K is essentially free; 64 K nodes give
6.5e-9 max abs error over [-30, 0] — five orders of magnitude under
the partis-test 1e-5 gate. Verified bit-identical partis output vs
glibc baseline at 5k partition (3797 + 3802 events).

Built with -O3 -ffp-contract=off -fno-builtin so the linear-interp
is bit-deterministic across compilers, matching the Zig mirror at
packages/zig-core/src/ham/fast_math.c on the partis side.

For psathyrella/partis#366 item 3.2.
RecoEvent::score_ is float (single-precision); two ksets whose f64 viterbi
scores differ by ~1e-6 collapse to identical float values. The previous
sort+reverse used std::sort (unstable introsort), which picks an
algorithm-dependent winner among tied events. The Zig backend uses
std.sort.block (stable), so the two backends could pick different
best_events for the same query and parameter set.

Switch to stable_sort with a descending comparator. Push order in
DPHandler::Run is k_v=vmax-1..vmin, k_d=dmax-1..dmin, so the first-pushed
tied event wins; both backends now agree.

The divergence was visible in unbounded paired-loci partition at 30k on
/tmp/paired-paper-all-seqs.fa — different best_events propagated through
naive_seq -> hfrac -> cluster merge decisions, giving 919 different paired
clusters per locus despite identical cluster counts. Smaller workloads
(<=20k) only triggered the float-tie pattern rarely enough that the
divergence didn't surface.

See partis issue #375.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant