Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch does not work properly on CRAM files when multiple_iterators is True #611

Closed
nspies opened this issue Jan 31, 2018 · 2 comments
Closed

Comments

@nspies
Copy link
Contributor

nspies commented Jan 31, 2018

CRAM files do not apparently work properly when used with cram.fetch(..., multiple_iterators=True). Using the examples from the tests/pysam_data directory and the following test code:

import pysam

def test_fetcher(fetcher):
    print(next(fetcher).reference_name)

chrom = "chr2"
start = 200
end = 2005

# these all work
bam = pysam.AlignmentFile("pysam_data/ex1.bam")
test_fetcher(bam.fetch(chrom, start, end))

bam = pysam.AlignmentFile("pysam_data/ex1.bam")
test_fetcher(bam.fetch(chrom, start, end, multiple_iterators=True))

cram = pysam.AlignmentFile("pysam_data/ex1.cram")
test_fetcher(cram.fetch(chrom, start, end))

# this one does not
cram = pysam.AlignmentFile("pysam_data/ex1.cram")
test_fetcher(cram.fetch(chrom, start, end, multiple_iterators=True))

I get the output:

chr2
chr2
chr2
chr1

Where I'd expect all the results to be chr2. It appears that the fetch on the cram file is starting from the beginning, rather than the specified coordinates.

I'm using the latest version of pysam from github. I'm not sure exactly what's going on in #293, but this may be a related issue.

@AndreasHeger
Copy link
Contributor

Thanks for the nice test case, something is clearly wrong.

@AndreasHeger
Copy link
Contributor

The reason is: when using multiple iterators I use the same index across the iterators. This works for BAM, but not for CRAM. Not sure I fully get all the details, but it looks as if while setting up the query, options are set, for example in cram_itr_query():

cram_range r = { tid, beg+1, end };
int ret = cram_set_option(cidx->cram, CRAM_OPT_RANGE, &r);

A solution is to provide each iterator with its own index.

AndreasHeger added a commit that referenced this issue Feb 7, 2018
{AH} each iterator now loads own copy of index, fixes #611 and fixes …
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants