Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samtools treats many errors as EOF, silently hiding problems #101

Closed
beaumontlab opened this issue Dec 3, 2013 · 17 comments · Fixed by #1379
Closed

samtools treats many errors as EOF, silently hiding problems #101

beaumontlab opened this issue Dec 3, 2013 · 17 comments · Fixed by #1379
Assignees

Comments

@beaumontlab
Copy link

In the development of Antonie, I generated BAM files that were subtly malformed. This is of course my own bug, and of course samtools should not compensate for my bugs. However, samtools silently accepted my BAM files, and appeared to process them quite well!

Further investigation found that 'samread()' returns negative values for both errors and EOFS, and that many loops within samtools treat all negative values equally. In other words, they turn an error into a normal EOF, which generates no error or warning message.

Any loop like this is problematic:
while (samread(sam,bam_line) >= 0) {... }

As an example, http://ds9a.nl/tmp/blah.bam has an invalid sequence id in there, but 'samtools stats blah.bam' processes it without apparent error, but also without producing any statistics beyond the problematic read.

While I of course appreciate the samtools software, I would suggest screaming bloody murder on any kind of unexpected error, lest our users end up with invalid results because part of their data was silently skipped!

Thanks for your attention.

@peterjc
Copy link
Contributor

peterjc commented Dec 3, 2013

On a Linux machine with commit 99b55f5 (the current stable master branch),

$ ./samtools view blah.bam 
Segmentation fault (core dumped)

While not ideal, this is not a silent data corruption. I would try the develop branch using htslib but I'm having trouble building it right now...

What version of samtools are you using, and can you give an example command line where samtools silently mishandles this file?

@beaumontlab
Copy link
Author

Full reproduction:

#!/bin/sh
wget http://ds9a.nl/tmp/blah.bam
rm -rf samtools htslib
git clone https://github.com/samtools/samtools.git
git clone https://github.com/samtools/htslib.git
cd samtools
make -j4
./samtools stats ../blah.bam

Thanks!

On Tue, Dec 03, 2013 at 06:56:12AM -0800, Peter Cock wrote:

On a Linux machine with commit 99b55f5 (the current stable master branch),

$ ./samtools view blah.bam 
Segmentation fault (core dumped)

While not ideal, this is not a silent data corruption. I would try the develop branch using htslib but I'm having trouble building it right now...

What version of samtools are you using, and can you give an example command line where samtools silently mishandles this file?


Reply to this email directly or view it on GitHub:
#101 (comment)

@jkbonfield
Copy link
Contributor

On Tue, Dec 03, 2013 at 06:56:10AM -0800, Peter Cock wrote:

While not ideal, this is not a silent data corruption. I would try
the develop branch using htslib but I'm having trouble building it
right now...

I pushed a whole set of robustness changes to samtools a while back
that fixes many corruption errors. My checkout gives:

$ ./samtools view blah.bam
[main_samview] truncated file.
$ echo $?
1

However this still doesn't fix the issue of not being able to
distinguish some errors from EOF. No doubt the while loop listed is a
code-meme and the same error occurs in many places.

James

James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
| Plurima gyrabant gymbolitare vabo;
A Staden Package developer: | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi.

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

@peterjc
Copy link
Contributor

peterjc commented Dec 3, 2013

That script will give different results as the samtools & htslib repositories are updated. What are the latest commits in each case? (i.e. try git log | head and git branch).

@beaumontlab
Copy link
Author

$ cd samtools ; git log | head -1 ; git branch ; cd -
commit 9a39e01

  • develop
    $ cd htslib/ ; git log | head -1 ; git branch ; cd -
    commit 7092725
  • bcftools+calling

On Tue, Dec 03, 2013 at 07:08:54AM -0800, Peter Cock wrote:

That script will give different results as the samtools & htslib repositories are updated. What are the latest commits in each case? (i.e. try git log | head and git branch).


Reply to this email directly or view it on GitHub:
#101 (comment)

@peterjc
Copy link
Contributor

peterjc commented Dec 3, 2013

Thanks @beaumontlab - using those same revisions I get:

$ ./samtools view blah.bam 
[main_samview] truncated file.
$ echo $?
1

(i.e. same as James), however using the stats command:

$ ./samtools stats blah.bam 
# This file was produced by samtools stats (0.2.0-rc5-11-g9a39e01:0.2.0-rc5-5-g7092725) and can be plotted using plot-bamstats
...
$ echo $?
0

This confirms your original report - there is no error about the invalid BAM file (silent failure and potential data corruption).

peterjc added a commit to peterjc/samtools that referenced this issue Dec 3, 2013
@peterjc
Copy link
Contributor

peterjc commented Dec 3, 2013

Suggested fix, based on how James did this in samtools view (and the same possibly ambiguous error message): peterjc@2f1a16f

@peterjc
Copy link
Contributor

peterjc commented Dec 3, 2013

Right now there are likely similar issues in bam_rmdup.c and bam_rmdupse.c

@jkbonfield
Copy link
Contributor

On Tue, Dec 03, 2013 at 08:40:21AM -0800, Peter Cock wrote:

Suggested fix, based on how James did this in samtools view

I can't recall doing this fix to samtools view. I just meant that the
crashes and core-dumps were likely in the fixes I committed.

Anyway your fix looks appropriate. We just need to do some greps and
find all the other copies of the same code-meme.

James

James Bonfield (jkb@sanger.ac.uk) | Hora aderat briligi. Nunc et Slythia Tova
| Plurima gyrabant gymbolitare vabo;
A Staden Package developer: | Et Borogovorum mimzebant undique formae,
https://sf.net/projects/staden/ | Momiferique omnes exgrabure Rathi.

The Wellcome Trust Sanger Institute is operated by Genome Research
Limited, a charity registered in England with number 1021457 and a
company registered in England with number 2742969, whose registered
office is 215 Euston Road, London, NW1 2BE.

@peterjc
Copy link
Contributor

peterjc commented Dec 3, 2013

Apologies, the return code check in sam_view.c is actually quite old dating back to r16 prior to the GitHub repository.

Based on a grep, fixes are needed in at least bam_rmdup.c and bam_rmdupse.c but ideally we'd have a test case which triggered the silent error (or new error message after fixing them).

@jmarshall
Copy link
Member

Commands failing to detect error conditions and set exit statuses accordingly is a long-standing samtools problem and is issue #51 and probably several others.

This particular case of the mostly-undocumented return values of samread() and friends is something that coincidentally enough I am working on at the moment. Having no clear distinction between a little negative is EOF and more negative is an error makes it very easy to omit this error check after a loop. I am in the process of changing the htslib xyz_read1()/hts_itr_next() functions to all return negative for error, 0 for EOF, or positive for record-received and document them as such. This IMHO makes more sense, and the change will mean that existing new-API-based code using while (sam_read1(...) >= 0) will infinite loop, which will be easy to track down and audit that it is followed by a if (result < 0) test!

For compatibility with third-party code, the old-API samread() etc needs to stay with -1=EOF as is. Probably the best time to ensure the various samtools commands check for errors after read loops is as we change them to use the htslib API directly. But we can apply some fixes along the way too.

Sadly auditing the code for these problems is boring painstaking work. e.g. Peter's suggested fix is incomplete -- there is an unchecked bam_fetch() alongside which exhibits the same problem.

@peterjc
Copy link
Contributor

peterjc commented Dec 3, 2013

I didn't do anything to the bam_fetch() since it isn't part of the report above - do you have a test case for this where calling samtools stats with a region? I could change it, but am loath to without an example to validate it on.

To me this reinforces the need for an automated test suite (issue #1), which should include checking error conditions like this.

@ahupowerdns
Copy link

As to @jmarshall, in many contexts it makes sense to follow the UNIX/Posix philosophy: EOF=0, error is negative. But even then people often don't want to type the whole three-pronged if-statement required. Exceptions? ;-)

@peterjc
Copy link
Contributor

peterjc commented Dec 5, 2013

In the meantime, could 2f1a16f be applied please? It is a small improvement, but I appreciate it is part of a larger set of problems. Do you want it as a pull request?

@jmarshall jmarshall added this to the 1.1 milestone Mar 24, 2014
@jmarshall jmarshall modified the milestones: 1.1, 1.2 Sep 19, 2014
@jmarshall jmarshall modified the milestones: 1.3, 1.2 Jan 30, 2015
@jmarshall jmarshall modified the milestones: 1.4, 1.3 Jul 27, 2015
@jkbonfield
Copy link
Contributor

Checking this I see that the bugs reported here have mostly been fixed, including an equivalent fix to Peter's patch, but grepping for sam_read1 calls I still see failure to check in other cases; fastq, fixmate, phase, targetcut, bedcov and maybe more. It looks like there is a lack of iterator checking in many places too.

So leaving this open still (sorry), but it's unlikely we'll go through all these prior to 1.4.

@dkj dkj removed this from the 1.4 milestone Feb 13, 2017
@dkj
Copy link
Member

dkj commented Feb 13, 2017

More fixes unlikely before 1.4

@jmarshall
Copy link
Member

jmarshall commented Feb 17, 2021

Unfortunately even back in 2013 it proved impractical to alter the codes returned by sam_read1() as discussed in #101 (comment), so error checking code to this day must check ret < -1 to detect errors.

At the time of writing, all direct invocations of sam_read1() in the samtools code (the legacy samread() API is no longer used, so irrelevant to samtools) have been correctly checked, producing a diagnostic and EXIT_FAILURE, for a long time. However there remained some checking infelicities when the call is via iterator or pileup machinery, as @jkbonfield's most recent comment mentions, all of which are fixed by PR #1379.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants