Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant slow downs reading small CRAM files #605

Open
cshenanigans opened this issue Jan 19, 2018 · 3 comments
Open

Significant slow downs reading small CRAM files #605

cshenanigans opened this issue Jan 19, 2018 · 3 comments

Comments

@cshenanigans
Copy link

@bioinformed cc @AndreasHeger

We've noticed a significant reduction in performance reading small CRAM files (in our current example files smaller than ~300MB). It appears to be an issue with HTSlib iteration when the file is small enough to fit into a few cram slices, when trying to access adjacent regions it's decompressing the same ones over and over, causing slow downs.

See related issue and fix in this other repo: tfwillems/HipSTR#24, perhaps this can be incorporated here.

@cshenanigans cshenanigans changed the title Exponential slow downs reading small CRAM files Significant slow downs reading small CRAM files Jan 19, 2018
@dpryan79
Copy link
Contributor

It makes sense to alert the htslib folks if people are thinking that the base issue is there. They'll want an example to look at, of course.

Ping @jkbonfield and @jmarshall

@AndreasHeger
Copy link
Contributor

Thanks. This looks like a complex fix and might not easily fit in the way iteration is encapsulated in pysam. Basically, iteration for each fetch call is handled by its own iterator object and even if fetch recognized that two calls were adjacent, it would not have access to the previous iterator.

As a workaround, if you have adjacent regions, could not the higher-level code use a single .fetch() call instead of multiple .fetch() calls?

@lnovara
Copy link

lnovara commented May 18, 2018

Any update about this issue? Is there a corresponding one that we can reference in htslib?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants