You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've noticed a significant reduction in performance reading small CRAM files (in our current example files smaller than ~300MB). It appears to be an issue with HTSlib iteration when the file is small enough to fit into a few cram slices, when trying to access adjacent regions it's decompressing the same ones over and over, causing slow downs.
See related issue and fix in this other repo: tfwillems/HipSTR#24, perhaps this can be incorporated here.
The text was updated successfully, but these errors were encountered:
cshenanigans
changed the title
Exponential slow downs reading small CRAM files
Significant slow downs reading small CRAM files
Jan 19, 2018
Thanks. This looks like a complex fix and might not easily fit in the way iteration is encapsulated in pysam. Basically, iteration for each fetch call is handled by its own iterator object and even if fetch recognized that two calls were adjacent, it would not have access to the previous iterator.
As a workaround, if you have adjacent regions, could not the higher-level code use a single .fetch() call instead of multiple .fetch() calls?
@bioinformed cc @AndreasHeger
We've noticed a significant reduction in performance reading small CRAM files (in our current example files smaller than ~300MB). It appears to be an issue with HTSlib iteration when the file is small enough to fit into a few cram slices, when trying to access adjacent regions it's decompressing the same ones over and over, causing slow downs.
See related issue and fix in this other repo: tfwillems/HipSTR#24, perhaps this can be incorporated here.
The text was updated successfully, but these errors were encountered: