stable_sort events in Result::Finalize for cpp-mirror parity (#375) by matsen · Pull Request #20 · psathyrella/ham

matsen · 2026-05-08T16:25:30Z

Summary

Replace sort + reverse with stable_sort + descending comparator in Result::Finalize. Tied events (kets with f64 viterbi scores within ~1e-6 that collapse to identical f32 values when stored in RecoEvent::score_) are now resolved deterministically by push order — first-pushed kset wins.

This is the C++ side of the cpp-mirror fix for partis issue #375. Companion partis PR: psathyrella/partis#376.

Why

RecoEvent::score_ is float; Result::Finalize sorts events by score and picks events_[0] as best_event_. When two ksets compute f64 viterbi scores within ~1e-6 of each other, the f32 narrowing collapses them to identical float values. std::sort is unstable (libstdc++ uses introsort) and picks an algorithm-dependent winner. The Zig backend uses std.sort.block (stable), so the two backends could pick different best_events for the same query and parameter set — different (v, d, j) gene calls, different naive_seq, different hfrac, different cluster merges. Latent until ≥25k where float-tied ksets become frequent enough.

Push order in DPHandler::Run

C++ pushes events in nested reverse-iteration order (dphandler.cc:101-119):

for(size_t k_v = kbounds.vmax - 1; k_v >= kbounds.vmin; --k_v)
  for(size_t k_d = kbounds.dmax - 1; k_d >= kbounds.dmin; --k_d)
    ...
    if(algorithm_ == "viterbi" && best_scores[kset] != -INFINITY)
      result.PushBackRecoEvent(FillRecoEvent(seqs, kset, best_genes[kset], best_scores[kset]));

Zig matches this order. Both backends with stable sort + descending comparator pick events_[0] = first-pushed tied event = highest-(k_v, k_d) tied kset.

Test plan

Builds cleanly (scons bcrham)
partis 1-query repro: cpp-fixed picks the same (d_3p_del=3, dj_insertion='') as Zig (was (4, 'A') pre-fix)
partis 5k cpp-fixed bit-identical to 5k cpp-pre-fix and to Zig (regression — no f32-tied ksets at small N)
partis 30k cpp-fixed bit-identical to 30k Zig stable: 10750 igh + 10815 igk events match
partis-test.py --paired --no-simu passes (both with and without --zig)

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com

…gedQuery - glomerator.cc ReadCacheFile: stof→stod for log_probs_ and naive_hfracs_ (both maps are map<string,double>; cache files are written with setprecision(20)) - glomerator.cc GetMergedQuery: double(n)*mute_freq_ so products use double arithmetic instead of float (size_t*float→float was losing precision) - text.cc Floatify: stof→stod (return type is vector<double>) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…pace (#19) Replaces the glibc log + exp pair inside AddInLogSpace with a single 65 K-entry linear-interpolation lookup of softplus(d) = log(1+exp(d)). On modern glibc 2.29+ with FMA-optimized scalar log/exp, this saves 20% of bcrham wall and 15% of partis wall on a 5k paired partition (median-of-3 on AMD Zen 5; bigger wins on older Intel/AMD per cross-arch microbench at psathyrella/partis#366 item 3.2). Implementation in src/fast_math.c, declared via extern "C" in mathutils.h. Linear-interp arithmetic dominates over the table size in cycle count, so picking 64 K is essentially free; 64 K nodes give 6.5e-9 max abs error over [-30, 0] — five orders of magnitude under the partis-test 1e-5 gate. Verified bit-identical partis output vs glibc baseline at 5k partition (3797 + 3802 events). Built with -O3 -ffp-contract=off -fno-builtin so the linear-interp is bit-deterministic across compilers, matching the Zig mirror at packages/zig-core/src/ham/fast_math.c on the partis side. For psathyrella/partis#366 item 3.2.

RecoEvent::score_ is float (single-precision); two ksets whose f64 viterbi scores differ by ~1e-6 collapse to identical float values. The previous sort+reverse used std::sort (unstable introsort), which picks an algorithm-dependent winner among tied events. The Zig backend uses std.sort.block (stable), so the two backends could pick different best_events for the same query and parameter set. Switch to stable_sort with a descending comparator. Push order in DPHandler::Run is k_v=vmax-1..vmin, k_d=dmax-1..dmin, so the first-pushed tied event wins; both backends now agree. The divergence was visible in unbounded paired-loci partition at 30k on /tmp/paired-paper-all-seqs.fa — different best_events propagated through naive_seq -> hfrac -> cluster merge decisions, giving 919 different paired clusters per locus despite identical cluster counts. Smaller workloads (<=20k) only triggered the float-tie pattern rarely enough that the divergence didn't surface. See partis issue #375. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

matsen and others added 2 commits April 24, 2026 15:52

matsen mentioned this pull request May 8, 2026

Fix Zig-vs-C++ 30k partition parity: stable_sort tied events (#375) psathyrella/partis#376

Closed

5 tasks

matsen force-pushed the 375-stable-sort-events branch from d354b76 to 35bf4a9 Compare May 8, 2026 17:16

matsen mentioned this pull request May 26, 2026

zig perf round 2 + Zig↔C++ 30k parity fix (#366, #375, #377) psathyrella/partis#385

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stable_sort events in Result::Finalize for cpp-mirror parity (#375)#20

stable_sort events in Result::Finalize for cpp-mirror parity (#375)#20
matsen wants to merge 3 commits into
mainfrom
375-stable-sort-events

matsen commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

matsen commented May 8, 2026

Summary

Why

Push order in DPHandler::Run

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant