Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bam slicing issue: Unable to query by chromosome #73

Closed
chilinglam opened this issue Jul 21, 2014 · 11 comments
Closed

Bam slicing issue: Unable to query by chromosome #73

chilinglam opened this issue Jul 21, 2014 · 11 comments
Assignees

Comments

@chilinglam
Copy link

I tried to use the SamReader:

reader.query(intervals, false);

where intervals contain a single QueryInterval which is:

QueryInterval interval = new QueryInterval(0,0,0);

It will throw an exception as follows:

htsjdk.samtools.util.RuntimeIOException: Expected 8 bytes, got 4
    at htsjdk.samtools.AbstractBAMFileIndex$IndexStreamBuffer.readLong(AbstractBAMFileIndex.java:684) ~[dcc-htsjdk-1.117.jar:${hts-version}(d65f6b3af647b0e4610f17760b17c3323df64aab_1405347869)]
    at htsjdk.samtools.AbstractBAMFileIndex.readLong(AbstractBAMFileIndex.java:438) ~[dcc-htsjdk-1.117.jar:${hts-version}(d65f6b3af647b0e4610f17760b17c3323df64aab_1405347869)]
    at htsjdk.samtools.AbstractBAMFileIndex.query(AbstractBAMFileIndex.java:332) ~[dcc-htsjdk-1.117.jar:${hts-version}(d65f6b3af647b0e4610f17760b17c3323df64aab_1405347869)]
    at htsjdk.samtools.DiskBasedBAMFileIndex.getSpanOverlapping(DiskBasedBAMFileIndex.java:60) ~[dcc-htsjdk-1.117.jar:${hts-version}(d65f6b3af647b0e4610f17760b17c3323df64aab_1405347869)]
    at htsjdk.samtools.BAMFileReader.createIndexIterator(BAMFileReader.java:729) ~[dcc-htsjdk-1.117.jar:${hts-version}(d65f6b3af647b0e4610f17760b17c3323df64aab_1405347869)]
    at htsjdk.samtools.BAMFileReader.query(BAMFileReader.java:412) ~[dcc-htsjdk-1.117.jar:${hts-version}(d65f6b3af647b0e4610f17760b17c3323df64aab_1405347869)]
    at htsjdk.samtools.SamReader$PrimitiveSamReaderToSamReaderAdapter.query(SamReader.java:468) ~[dcc-htsjdk-1.117.jar:${hts-version}(d65f6b3af647b0e4610f17760b17c3323df64aab_1405347869)]

The BAM file size is about 15 GB downloaded from 1000 Genomes: ftp://ftp-trace.ncbi.nih.gov/1000genomes/ftp/data/HG00595/alignment/HG00595.mapped.ILLUMINA.bwa.CHS.low_coverage.20120522.bam

Any help is appreciated.

@chilinglam
Copy link
Author

more info:

SamReader seems to have bugs. When I switched back to use SAMFileReader, the above API works.

comments??

@nh13
Copy link
Member

nh13 commented Jul 24, 2014

A few things:

  1. The start and end in your interval should be one-based regardless.
  2. Could you provide what top level function you called for SamReader and SAMFileReader respectively? I am assuming for SamReader it was:
    public SAMRecordIterator query(final QueryInterval[] intervals, final boolean contained);
    and for SAMFileReader it was:
    public SAMRecordIterator query(final QueryInterval[] intervals, final boolean contained) {
    I am wondering if you use the following instead for SAMFileReader:
    public SAMRecordIterator query(final String sequence, final int start, final int end, final boolean contained) {
    which could be more robust to querying outside of the reference.

@mccowan
Copy link
Contributor

mccowan commented Jul 25, 2014

Hey @chilinglam, thanks for reporting this!

I can't seem to reproduce the problem, however. Can you please post your code verbatim?

Thanks!

@chilinglam
Copy link
Author

Hi @mccowan, thank you for your help.

What I did was simply:

SamReader reader = SamReaderFactory.make().open(getResource(bamUri, baiUri));

The getResource is just a function to return a SamInputResource that points to a http resource for the BAM file and a local file resource for the bai file.

Later in the code, I iterates through all the records from

reader.query(intervals, false);

The intervals are of type QueryInterval[].

@mccowan
Copy link
Contributor

mccowan commented Jul 25, 2014

Got it; I thought you were reading the file locally. I can reproduce your problem now - thanks!

@chilinglam
Copy link
Author

That's great to hear! I now use SAMFileReader like the following:

SAMFileReader reader = new SAMFileReader(getSeekableStream(bamUri),getSeekableStream(baiUri),true);

The getSeekableStream return a seekable stream that points to the http resource specified by bamUri. For baiUri, it is mostly pointed to a local file (i.e. SeekableFileStream)
and the rest of the code is just iterating through all records.

Thank you for your help to debug this. I would like to move away from the deprecated SAMFileReader but as of now, I just need to get it to work first!

@mccowan
Copy link
Contributor

mccowan commented Jul 25, 2014

Opened #75 to address this issue.

@mccowan
Copy link
Contributor

mccowan commented Jul 29, 2014

Hi again, #75 has been merged so this issue should be resolved. Please let me know if you continue to experience it!

Thanks again @chilinglam for reporting!

@mccowan mccowan closed this as completed Jul 29, 2014
@chilinglam
Copy link
Author

Hi @mccowan, I will try it right now! I assume this is available in sourceforge (http://sourceforge.net/projects/picard/files/)? I saw version 1.118.

Also, if you have time, can you also please look at #76?

Thank you once again for fix this bug so quickly!

@mccowan
Copy link
Contributor

mccowan commented Jul 29, 2014

No, unfortunately; this fix has not been included in a release. You'd have to build picard manually from master of this repository. Alternatively, it should be ferried out as part of the next picard release on sourceforge.

@chilinglam
Copy link
Author

I see. I will wait for the next release that includes this fix. Thank you for letting me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants