New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CRC32 header errors when using multi-iterator mode of samtools view #819
Comments
Also - please note that the change to the debug message can be seen here: https://github.com/samtools/samtools/compare/develop...ernfrid:dbg_msg?expand=1 |
Thanks for the report @ernfrid. Can you also provide us with the CRAM and BED files you used? |
The regions bed file is here. I had to add a .txt extension for github to accept it. I'll look into whether or not I can share the NA12878 CRAM as it is from our own data and I'm unsure where it falls in terms of ease of sharing. |
Please note that this is for GRCH38DH. |
Thank you! Any CRAM file that triggers the unwanted behaviour should do. |
I've managed to reproduce the |
Glad to hear it. Thanks for taking a look!
|
`bed_unify` on the region list, when regions are provided via BED file only Fixes samtools#819.
I've recently been testing the new-ish
-M
option to samtools view on a CRAM file and am observing what I believe to be buggy behavior. I'll try to detail as best as I can. I've simply been trying to compare performance ofsamtools view -L
tosamtools view -M -L
on the same BED file and CRAM. I don't see any errors with-L
by itself.However, when using a combination of
-M
and-L
to retrieve reads overlapping regions in a CRAM file, I'm repeatedly seeing errors like:I see these with both samtools 1.7 and samtools 1.8 and they remain after re-indexing the CRAM with samtools 1.8 (originally it was indexed with samtools 1.3.1).
I've seen a variety of errors immediately after the one above as I've tried different things (switching versions, using different docker containers, reindexing). These range from bus errors to segfaults to a backtrace:
The integer id in the error message has also changed slightly with different configurations.
Since the file decodes without issue when using
-L
alone, I've assumed that the CRAM itself is uncorrupted and perhaps a single region was causing the issue. I modified the samtools view error message to output the problematic region and see the following:However, this region doesn't exist in my input BED file.
I experimented with pulling out regions that overlapped the most with the reported coordinates and observed that the regions I identified do not report any reads overlapping them, but DO report reads overlapping them when specifying the region on the command line:
I have not dug further and thought I should report this back. If there's any additional information that I can add, please let me know.
The text was updated successfully, but these errors were encountered: