Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRC32 header errors when using multi-iterator mode of samtools view #819

Closed
ernfrid opened this issue Apr 13, 2018 · 7 comments
Closed

CRC32 header errors when using multi-iterator mode of samtools view #819

ernfrid opened this issue Apr 13, 2018 · 7 comments
Assignees

Comments

@ernfrid
Copy link
Contributor

ernfrid commented Apr 13, 2018

I've recently been testing the new-ish -M option to samtools view on a CRAM file and am observing what I believe to be buggy behavior. I'll try to detail as best as I can. I've simply been trying to compare performance of samtools view -L to samtools view -M -L on the same BED file and CRAM. I don't see any errors with -L by itself.

However, when using a combination of -M and -L to retrieve reads overlapping regions in a CRAM file, I'm repeatedly seeing errors like:

[E::cram_read_container] Container header CRC32 failure
[main_samview] retrieval of region 10 failed due to truncated file or corrupt BAM index file

I see these with both samtools 1.7 and samtools 1.8 and they remain after re-indexing the CRAM with samtools 1.8 (originally it was indexed with samtools 1.3.1).

I've seen a variety of errors immediately after the one above as I've tried different things (switching versions, using different docker containers, reindexing). These range from bus errors to segfaults to a backtrace:

*** glibc detected *** /gscmnt/gc2802/halllab/idas/software/htslib-1.8/bin/samtools: munmap_chunk(): invalid pointer: 0x00000000020781c0 ***
======= Backtrace: =========
/lib/libc.so.6(+0x788d6)[0x7fe0067268d6]
/gscmnt/gc2802/halllab/idas/software/htslib-1.8/lib/libhts.so.2(cram_free_container+0x58)[0x7fe0073cd77a]
/gscmnt/gc2802/halllab/idas/software/htslib-1.8/lib/libhts.so.2(cram_close+0x32d)[0x7fe0073d271a]
/gscmnt/gc2802/halllab/idas/software/htslib-1.8/lib/libhts.so.2(hts_close+0xc0)[0x7fe00735f8a8]
/gscmnt/gc2802/halllab/idas/software/htslib-1.8/bin/samtools[0x40ddff]
/gscmnt/gc2802/halllab/idas/software/htslib-1.8/bin/samtools[0x40f310]
/gscmnt/gc2802/halllab/idas/software/htslib-1.8/bin/samtools[0x42eeaf]
/lib/libc.so.6(__libc_start_main+0xfd)[0x7fe0066ccc8d]
/gscmnt/gc2802/halllab/idas/software/htslib-1.8/bin/samtools[0x406519]

The integer id in the error message has also changed slightly with different configurations.

Since the file decodes without issue when using -L alone, I've assumed that the CRAM itself is uncorrupted and perhaps a single region was causing the issue. I modified the samtools view error message to output the problematic region and see the following:

[E::cram_read_container] Container header CRC32 failure
[main_samview] retrieval of region chr11 106124810 106124961 failed due to truncated file or corrupt BAM index file
*** Error in `samtools': munmap_chunk(): invalid pointer: 0x0000000002101380 ***

However, this region doesn't exist in my input BED file.

I experimented with pulling out regions that overlapped the most with the reported coordinates and observed that the regions I identified do not report any reads overlapping them, but DO report reads overlapping them when specifying the region on the command line:

$ cat region1_test.bed
chr7    106124962    106125963
$ /usr/local/bin/samtools view -M -L region1_test.bed H_IJ-NA12878-NA12878_K10.cram | wc -l
0
$ /usr/local/bin/samtools view H_IJ-NA12878-NA12878_K10.cram chr7:106124963-106125963 | wc -l
176
$ cat region2_test.bed
chr11    106123818    106124819
~$ /usr/local/bin/samtools view H_IJ-NA12878-NA12878_K10.cram chr11:106123819-106124819 | wc -l
171
~$ /usr/local/bin/samtools view -M -L region2_test.bed H_IJ-NA12878-NA12878_K10.cram | wc -l
0

I have not dug further and thought I should report this back. If there's any additional information that I can add, please let me know.

@ernfrid
Copy link
Contributor Author

ernfrid commented Apr 13, 2018

Also - please note that the change to the debug message can be seen here: https://github.com/samtools/samtools/compare/develop...ernfrid:dbg_msg?expand=1

@valeriuo valeriuo self-assigned this Apr 13, 2018
@valeriuo
Copy link
Contributor

Thanks for the report @ernfrid. Can you also provide us with the CRAM and BED files you used?

@ernfrid
Copy link
Contributor Author

ernfrid commented Apr 13, 2018

The regions bed file is here. I had to add a .txt extension for github to accept it.

I'll look into whether or not I can share the NA12878 CRAM as it is from our own data and I'm unsure where it falls in terms of ease of sharing.

@ernfrid
Copy link
Contributor Author

ernfrid commented Apr 13, 2018

Please note that this is for GRCH38DH.

@valeriuo
Copy link
Contributor

Thank you! Any CRAM file that triggers the unwanted behaviour should do.

@daviesrob
Copy link
Member

I've managed to reproduce the [main_samview] retrieval of region 19 failed due to truncated file or corrupt BAM index file message with a locally-available cram file. Hopefully we should be able to track the problem down with that.

@ernfrid
Copy link
Contributor Author

ernfrid commented Apr 13, 2018 via email

valeriuo added a commit to valeriuo/samtools that referenced this issue Apr 17, 2018
`bed_unify` on the region list, when regions are provided via
BED file only
Fixes samtools#819.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants