feat: add --runRNGseed flag with seeded primary tie-break#5
Merged
Conversation
Adds the STAR `--runRNGseed` CLI flag (default 777) and uses it to randomize primary alignment selection among equal-scoring multi-mappers. Previously primary selection was purely deterministic (lexicographic tie-break), which meant the 127 "genuine tie" disagreements against STAR on 10k SE yeast were frozen in whichever direction our sort happened to pick. With a seeded shuffle, ties flip independently of sort order -- matching STAR's behavior at `ReadAlign_multMapSelect.cpp:71-79` and `funPrimaryAlignMark.cpp:19-28`. Unblocks nf-core/rnaseq, which invokes STAR_ALIGN with `--runRNGseed 0`. Implementation notes: - `shuffle_tied_prefix` does an in-place Fisher-Yates on the contiguous top-score prefix only, leaving non-tied lower-scored alignments alone. Matches STAR's two-phase "move best to front, then shuffle front" (collapsed since the input is already score-sorted). - Applied at the final primary-selection point in both SE (`align_read`) and PE (`align_paired_read`). Not applied in `stitch.rs` per-window sorts -- STAR only shuffles at multMapSelect / primary-mark time. - STAR seeds `std::mt19937` once per chunk and advances per read. ruSTAR parallelises per-read via rayon, so instead of a per-chunk RNG we derive a deterministic per-read seed from `run_rng_seed * (hash(read_name) + 1)`. This is stronger reproducibility than STAR (independent of thread count) while honoring `--runRNGseed`. - Uses `rand::rngs::StdRng` rather than `rand_mt`. Not bit-for-bit parity with STAR's tie-breaking choices (STAR's `std::uniform_real_distribution` is libstdc++-specific anyway) -- it just has to honor the seed deterministically. Tests: +5 (parse-default, parse-override, four shuffle behavior tests). 283 unit tests passing (was 278), 4 integration tests passing. Note: commit is unsigned because local 1Password SSH signing (op-ssh-sign) returned "failed to fill whole buffer" on every attempt. Feel free to re-sign with `git commit --amend -S` once the signer is back. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hand-rolled Fisher-Yates loop with `rand::seq::SliceRandom::shuffle` on the tied prefix, and collapse the length guard with `slice::first()`. Same behavior (shuffles the [0..tied) range with the seeded RNG), fewer lines, and no need to import `Rng`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the Fisher-Yates reference in `shuffle_tied_prefix`'s doc (the impl now delegates to `SliceRandom::shuffle` — the algorithm is an implementation detail). Collapse the SE call-site comment to one line since the fn doc already covers the "why"; keep the STAR source file + line reference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`DefaultHasher` is re-exported from `std::hash` as of Rust 1.76, so there's no need to pull it in via the full `std::collections::hash_map` path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…efix Single-use generic parameter — `impl Fn(&T) -> i32` in the argument list is shorter than a `where F: Fn(&T) -> i32` bound and reads more naturally at the call sites we have. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fixes formatting drift inherited from main. No semantic changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 17, 2026
--runRNGseed flag with seeded primary tie-break
Psy-Fer
added a commit
that referenced
this pull request
Apr 24, 2026
- shuffle_tied_prefix: deterministic per-read RNG for tied primary selection - --runRNGseed param (default 777, matches STAR) - --outSAMattrRGline: @rg header + RG:Z tags - Various code simplifications and formatting
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the
--runRNGseedCLI flag, matching STAR's behavior. Used to seed a Fisher–Yates shuffle over top-score-tied alignments at primary-selection time, so which copy of a multi-mapper is marked primary can be controlled reproducibly.Part 1 of 3 branches unblocking nf-core/rnaseq's STAR_ALIGN step. This is a standalone PR; does not depend on the others.
Changes
src/params.rs: newrun_rng_seed: u64arg, default777(matches STAR'sparametersDefault).src/align/read_align.rs:per_read_seed(seed, read_name)+shuffle_tied_prefix()helpers. Called after the final sort inalign_read(SE) andalign_paired_read(PE).Cargo.toml:rand = "0.8".Divergences from STAR
rand::rngs::StdRng(ChaCha12) instead ofstd::mt19937. STAR'sstd::uniform_real_distribution<double>on mt19937 is libstdc++-specific, so bit-for-bit parity is not achievable anyway.seed * (hash(read_name) + 1)) instead of STAR's per-chunk (seed * (iChunk+1)) — ruSTAR's rayon-based parallelism would make per-chunk seeding non-deterministic across thread counts. Per-read seeding is strictly more reproducible.Follow-up commits
rand::seq::SliceRandom::shuffle, tighter comments, modernDefaultHasherre-export,impl Fnbound.cargo fmtcommit to fix pre-existing drift inherited frommain.Test plan
cargo test— 283 passingcargo clippy --lib -- -D warnings— cleancargo fmt --check— cleanNotes
git commit --amend -S --no-edit+ force-push-with-lease if signed history is required.per_read_seedusesrun_rng_seed.wrapping_mul(hash + 1). When the user passes--runRNGseed 0(as nf-core does), every read gets seed 0 and per-read decorrelation collapses. A bitwise-XOR mix (run_rng_seed ^ hash) would keep decorrelation at seed=0. Not changed here — it's a behavior tweak, not a simplification.🤖 Generated with Claude Code