Skip to content

fix(stats): route no-seeds-no-clusters reads to UnmappedReason::Other#49

Open
pinin4fjords wants to merge 1 commit into
scverse:mainfrom
pinin4fjords:fix/unmapped-other-bucket
Open

fix(stats): route no-seeds-no-clusters reads to UnmappedReason::Other#49
pinin4fjords wants to merge 1 commit into
scverse:mainfrom
pinin4fjords:fix/unmapped-other-bucket

Conversation

@pinin4fjords
Copy link
Copy Markdown

Summary

Log.final.out reports Number of reads unmapped: too short and Number of reads unmapped: other using STAR's exact field labels, but rustar was folding everything into too short and always reporting other = 0. Reads that failed before a transcript was generated (no seeds / no clusters) were being misclassified as too short, inflating that bar and zeroing out STAR's other bucket.

On WT_REP2 pre-fix: too short = 4 193 (8.46 %), other = 0 (0 %) vs STAR's 1 540 (3.11 %) and 2 656 (5.36 %).

Fix

Audit the record_unmapped_reason(...) call sites and route the no-seeds / no-clusters path to UnmappedReason::Other instead of UnmappedReason::TooShort. Totals unchanged; only the bucketing changes.

Test plan

  • New unit/integration test asserts unmapped_other > 0 and totals conserved
  • Existing tests pass; any that hard-coded other = 0 updated
  • cargo build
  • cargo clippy --lib -- -D warnings
  • cargo fmt --check

Fixes #48

The UnmappedReason enum had all four STAR-compatible variants
(Other, TooShort, TooManyMismatches, TooManyLoci), but reads that
failed before any transcript was generated were being recorded as
TooShort. Log.final.out's "Reads unmapped: other" therefore always
read 0 even when STAR would have classified the read in that bucket,
inflating "too short" by the same count.

Fix the routing at the no-seeds / no-clusters call site so reads
without alignments are recorded as Other. Totals are unchanged;
only the bucketing changes.

Fixes scverse#48

Co-Authored-By: Claude <noreply@anthropic.com>
@pinin4fjords
Copy link
Copy Markdown
Author

Verified end-to-end with paired STAR + rustar runs on the same yeast PE inputs (--twopassMode Basic --sjdbGTFfile genes.gtf --runRNGseed 0, 50 000 read pairs).

Unmapped buckets in Log.final.out:

Field rustar pre-fix rustar post-fix (this PR) STAR 2.7.11b
Number of reads unmapped: too short 4 193 3 777 3 786
Number of reads unmapped: other 0 3 592 3 558
Number of reads unmapped: too many mismatches 0 0 0
Total unmapped 4 193 7 369 7 344

other = 0 is now correctly populated, and the too short / other split is methodologically consistent with STAR's bucketing: parity_loss = |3777 - 3786| + |3592 - 3558| = 9 + 34 = 43 reads (vs 5 309 pre-fix, a 99% reduction). Differences are well within RNG / tie-breaking tolerance. No double-counting.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Log.final.out folds all unmapped reads into too short; other bucket always 0

1 participant