New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samtools treats many errors as EOF, silently hiding problems #101
Comments
On a Linux machine with commit 99b55f5 (the current stable master branch),
While not ideal, this is not a silent data corruption. I would try the What version of samtools are you using, and can you give an example command line where samtools silently mishandles this file? |
Full reproduction: #!/bin/sh Thanks! On Tue, Dec 03, 2013 at 06:56:12AM -0800, Peter Cock wrote:
|
On Tue, Dec 03, 2013 at 06:56:10AM -0800, Peter Cock wrote:
I pushed a whole set of robustness changes to samtools a while back $ ./samtools view blah.bam However this still doesn't fix the issue of not being able to James James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova The Wellcome Trust Sanger Institute is operated by Genome Research |
That script will give different results as the samtools & htslib repositories are updated. What are the latest commits in each case? (i.e. try |
$ cd samtools ; git log | head -1 ; git branch ; cd -
On Tue, Dec 03, 2013 at 07:08:54AM -0800, Peter Cock wrote:
|
Thanks @beaumontlab - using those same revisions I get:
(i.e. same as James), however using the stats command:
This confirms your original report - there is no error about the invalid BAM file (silent failure and potential data corruption). |
Suggested fix, based on how James did this in samtools view (and the same possibly ambiguous error message): peterjc@2f1a16f |
Right now there are likely similar issues in |
On Tue, Dec 03, 2013 at 08:40:21AM -0800, Peter Cock wrote:
I can't recall doing this fix to samtools view. I just meant that the Anyway your fix looks appropriate. We just need to do some greps and James James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova The Wellcome Trust Sanger Institute is operated by Genome Research |
Apologies, the return code check in Based on a grep, fixes are needed in at least |
Commands failing to detect error conditions and set exit statuses accordingly is a long-standing samtools problem and is issue #51 and probably several others. This particular case of the mostly-undocumented return values of For compatibility with third-party code, the old-API Sadly auditing the code for these problems is boring painstaking work. e.g. Peter's suggested fix is incomplete -- there is an unchecked |
I didn't do anything to the To me this reinforces the need for an automated test suite (issue #1), which should include checking error conditions like this. |
As to @jmarshall, in many contexts it makes sense to follow the UNIX/Posix philosophy: EOF=0, error is negative. But even then people often don't want to type the whole three-pronged if-statement required. Exceptions? ;-) |
In the meantime, could 2f1a16f be applied please? It is a small improvement, but I appreciate it is part of a larger set of problems. Do you want it as a pull request? |
Checking this I see that the bugs reported here have mostly been fixed, including an equivalent fix to Peter's patch, but grepping for So leaving this open still (sorry), but it's unlikely we'll go through all these prior to 1.4. |
More fixes unlikely before 1.4 |
Unfortunately even back in 2013 it proved impractical to alter the codes returned by At the time of writing, all direct invocations of |
In the development of Antonie, I generated BAM files that were subtly malformed. This is of course my own bug, and of course samtools should not compensate for my bugs. However, samtools silently accepted my BAM files, and appeared to process them quite well!
Further investigation found that 'samread()' returns negative values for both errors and EOFS, and that many loops within samtools treat all negative values equally. In other words, they turn an error into a normal EOF, which generates no error or warning message.
Any loop like this is problematic:
while (samread(sam,bam_line) >= 0) {... }
As an example, http://ds9a.nl/tmp/blah.bam has an invalid sequence id in there, but 'samtools stats blah.bam' processes it without apparent error, but also without producing any statistics beyond the problematic read.
While I of course appreciate the samtools software, I would suggest screaming bloody murder on any kind of unexpected error, lest our users end up with invalid results because part of their data was silently skipped!
Thanks for your attention.
The text was updated successfully, but these errors were encountered: