Skip to content

feat(bam): emit XS:A: strand tag when --outSAMstrandField intronMotif#43

Open
pinin4fjords wants to merge 1 commit into
scverse:mainfrom
pinin4fjords:fix/xs-strand-tag
Open

feat(bam): emit XS:A: strand tag when --outSAMstrandField intronMotif#43
pinin4fjords wants to merge 1 commit into
scverse:mainfrom
pinin4fjords:fix/xs-strand-tag

Conversation

@pinin4fjords
Copy link
Copy Markdown

Summary

--outSAMstrandField intronMotif was a no-op at tag-write time - accepted at the CLI parser, but the output BAM contained zero XS:A: tags. Downstream tools that need strand inference (StringTie, Cufflinks, rseqc infer_experiment.py) saw nothing.

samtools view <bam> | grep -c 'XS:A:' pre-fix: 0. STAR on the same data: 1 738.

Fix

Mirror STAR's two coupling paths from Parameters_samAttributes.cpp:

  1. XS in --outSAMattributes -> force outSAMstrandField = intronMotif and add XS to the effective attribute set (lines 172-179).
  2. --outSAMstrandField intronMotif (regardless of --outSAMattributes) -> add XS to the effective attribute set (lines 213-216).

Per-record write: map Transcript::junction_motifs to a strand char (canonical GT/AG donor -> +, canonical CT/AC -> -, non-canonical or mixed -> omit). Emitted alongside the existing NH / HI / MD tags wherever a record is built.

The motif detection itself is already implemented (src/align/score::SpliceMotif, Transcript::junction_motifs); this PR just plumbs the existing data into the tag write.

Test plan

  • Unit test: --outSAMstrandField intronMotif adds XS to the effective attribute set
  • Unit test: --outSAMattributes XS force-enables intronMotif strand inference
  • Unit test: motif-to-strand mapping (canonical fwd → +, canonical rev → -, non-canonical → None)
  • Integration-style test: spliced record with a canonical motif produces XS:A:+ in the emitted record
  • cargo build / cargo clippy --lib -- -D warnings / cargo fmt --check

After this fix, samtools view <bam> | grep -c 'XS:A:' returns a positive count on spliced data, and infer_experiment.py no longer reports "stranding could not be inferred".

Fixes #30

The flag was accepted at the CLI parser but never produced any XS:A:
tags on the output BAM. Downstream tools that need strand inference
(StringTie, Cufflinks, rseqc infer_experiment.py) saw nothing.

Mirror STAR's two coupling paths: --outSAMattributes XS auto-enables
--outSAMstrandField intronMotif, and vice versa. For each output record
with at least one canonical intron motif, map the motif to + or - per
STAR's convention; non-canonical or mixed motifs omit the tag.

Fixes scverse#30
@pinin4fjords
Copy link
Copy Markdown
Author

Verified end-to-end on macOS/aarch64 against the rebuilt fix branch with --outSAMstrandField intronMotif and NO explicit XS in --outSAMattributes:

$ samtools view <bam> | grep -c 'XS:A:'    -> 549
$ samtools view <bam> | grep -oE 'XS:A:[+-]' | sort | uniq -c
    378  XS:A:+
    171  XS:A:-

Pre-fix the same invocation produced 0 XS:A: tags. The coupling works — --outSAMstrandField intronMotif alone is enough to populate the tag, no explicit XS in --outSAMattributes required.

Sample spliced records (all spliced records on long introns have XS populated):

SRR6357072.23987692  61M366N40M    XS:A:+
SRR6357072.11621763  8S83M11893N10M  XS:A:+
SRR6357072.36219571  87M11893N13M1S  XS:A:+

The total count (549) is lower than STAR's 1738 on the same data, but that gap is downstream of #27 (~50 % fewer spliced reads in rustar overall on this profile) plus the mixed-motif omit-XS policy — not a hole in the XS-tag fix itself.

LGTM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--outSAMstrandField intronMotif does not produce any XS:A: tags

1 participant