Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update BIN values in 'samtools depad' & check in 'samtools index' #43

Merged
merged 5 commits into from
Apr 23, 2013

Conversation

peterjc
Copy link
Contributor

@peterjc peterjc commented Apr 19, 2013

By its nature, 'samtools depad' edits mapped reads, altering both the mapping position and CIGAR string. This means the mapped start and end can change, and therefore the BIN can change.

Prior to this fix the pre-editing BIN (from the padded reference) was being written to the BAM output, leading to subtle problems downstream when trying to use the depadded BAM file via the BAI index. For example, in Picard v1.89 this could trigger an java.lang.ArrayIndexOutOfBoundsException where an old BIN was outside the length of the unpadded reference sequence.

A workaround is to use 'samtools depad' with SAM output, and pipe this via 'samtools view' to convert from SAM to BAM (and thus force the recalculation of the BIN values).

By its nature, 'samtools depad' edits mapped reads, altering both
the mapping position and CIGAR string. This means the mapped start
and end can change, and therefore the BIN can change.

Prior to this fix the pre-editing BIN (from the padded reference)
was being written to the BAM output, leading to subtle problems
downstream when trying to use the depadded BAM file via the BAI
index. For example, in Picard v1.89 this could trigger an
java.lang.ArrayIndexOutOfBoundsException where an old BIN was
outside the length of the unpadded reference sequence.
This simple check catches invalid BIN values in reads which can cause
subtle bugs due to position-based retrieval missing data, and simple
out of bounds errors if a BIN is used beyond the length of the reference.
Previously unmapped reads with a POS entry (e.g. those reads with
a mate which was mapped) were not being updated.

This has passed my testing with some non-trivial de novo MIRA 3.9
assemblies, coupled with the new 'samtools index' BIN verification.
It would be nice to include the RNAME, but currently the header
is being discarded before parsing.
pd3 added a commit that referenced this pull request Apr 23, 2013
Update BIN values in 'samtools depad' & check in 'samtools index'
@pd3 pd3 merged commit 99b55f5 into samtools:master Apr 23, 2013
jmarshall added a commit that referenced this pull request Jul 29, 2013
The abort introduced in pull request #43 (8e3b3f2 "Verify old BIN value
of reads during 'samtools index'") is too Draconian, as this field is
in practice used only during indexing -- so the BAM file is mostly fine.

If we're recalculating it here anyway, we might as well use the corrected
value to generate the index and just warn (in a rate-limited way) about
the anomalous records.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants