-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async BAM performance improvements #1249
base: master
Are you sure you want to change the base?
Async BAM performance improvements #1249
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1249 +/- ##
===============================================
+ Coverage 69.203% 69.462% +0.260%
+ Complexity 8701 8177 -524
===============================================
Files 588 549 -39
Lines 34581 32930 -1651
Branches 5779 5567 -212
===============================================
- Hits 23931 22874 -1057
+ Misses 8368 7817 -551
+ Partials 2282 2239 -43
|
Direct incorporation of samtools/htsjdk#1249 Added async CollectGridssMetrics incorporating the performance improvements
@d-cameron Thank you for this PR. It sounds like a big speedup for bam reading. It's fairly complicated multithreaded code in a mission critical section of the codebase so it might take some time to review. |
Hey @d-cameron. This seems like a really useful feature for htsjdk. A few of us have read through and were wondering why you didn't use CompletableFutures as the underlying synchronization pattern. Using this pattern would simplify the review process, further concerns about possible locks and eliminate the need for explicit sleeps in the code. In addition, Expections will be propagated across threads helping in error messaging and debugging. Would you be amenable to modifying your code to use CompletableFutures? We understand the uncertainty around the process of getting a PR merged, and are committed to working with you towards that goal. |
Switching to CompleteableFutures will remove some of the sleeps, but not all. Note that even with CompleteableFutures, the code still needs a review for deadlocks and race conditions, although the complexity would mostly be limited to just the async read task. The issue with the async read task is that the next read task can be scheduled either by the foreground thread, or the background thread and at most one of these tasks can be running at any point. We can't just have a single long-running task as that can deadlock when the thread pool is limited and results in unfair read-ahead scheduling when there are multiple files being read in parallel. Since the task can be scheduled I've thought of an alternate design in which the read-ahead task are all scheduled from the calling thread so I'll rewrite using that approach with CompleteableFutures and see how much cleaner it ends up being. |
much appreciated. |
@yfarjoun @lbergelson refactored so that all background processing tasks are CompletableFutures scheduled from the foreground thread. This version should be much simpler to code review. I've split the refactoring to two separate commits - one with the CompletableFuture refactor, and a second adding the |
@d-cameron Thank you! That's great. We will review this this week. |
Hi @d-cameron It looks like this is a couple of months behind master - can you rebase on current master before we start reviewing ? Thanks. |
@d-cameron I'm sorry I haven't gotten through reviewing this yet. I expected to get an htsjdk release out this week and then review it after that, but I've run into a number of blocking issues that ate my time and I haven't gotten to give this a proper look yet. It is one of my priorities to get to next week though. It will not be abandoned forever this time... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d-cameron I'm not finished reviewing this yet, but I thought I'd post my initial thoughts so you can see that we're not totally ignoring you. I had to make an unplanned trip last week that ate into my review time.
Generally this looks much nicer to me than the previous version.
There are few things so far that I think could be made clearer. The buffer/ decoder leasing is confusing and adds a lot of lines of code right now. (especially in the BlockCompressedInputStream classes.) I think it could be cleaned up by adding an explicit notion of ObjectPools. That might also enable the two codepaths to be unified more with less special casing of things. I'll finish taking a look tomorrow.
There's a lot of little cleanup things that could be done, making things more things final, adding new lines between methods, etc, that I haven't labelled every instance of. I don't mind doing my own pass on that sort of thing after we work out the details of this.
src/main/java/htsjdk/samtools/util/BlockCompressedInputStream.java
Outdated
Show resolved
Hide resolved
Back to you @lbergelson |
Is there a README / documentation with all the settings exposed through |
@lbergelson @yfarjoun Any update on this PR? |
@lbergelson please review this when you have time. |
Oh no. The PR that I swore to review quickly and then never did. @d-cameron It looks like there are build failures due to a missing logger and a typo. Inflater/inflator strikes again. I uploaded a fix here https://github.com/samtools/htsjdk/tree/lb_async_bam_performance_improvements |
Fixes in; updated to latest master |
57e19f4
to
312f7d1
Compare
…tsjdk * Less custom code * Faster: Indexing the 129 GB file took 1h instead of 1h30m * More correct: Before, the files we wrote didn't have the right BGZF headers (FLG was 0 instead of 4) and so were not readable by other tools * We can take advantage of future improvements such as samtools/htsjdk#1249
…pression and BAM record parsing
Refactored AsyncReadTaskRunner test case to handle our early abort redesign. Now need to test against the threadpool itself as we will now skip any scheduled read-aheads of transforms that we haven't already sent to the thread pool.
…complete after the sleep
279e51d
to
9b8638b
Compare
I have taken the liberty of rebasing this onto current master, to resolve the (trivial) merge conflict and rerun the CI tests. |
Codecov Report
@@ Coverage Diff @@
## master #1249 +/- ##
===============================================
+ Coverage 69.814% 69.842% +0.028%
- Complexity 9627 9668 +41
===============================================
Files 702 704 +2
Lines 37607 37765 +158
Branches 6107 6119 +12
===============================================
+ Hits 26255 26376 +121
- Misses 8903 8916 +13
- Partials 2449 2473 +24
|
Description
The existing aysnc read code performs read-ahead and decompression on a seperate thread. Combining with AsyncBufferedIterator and we can get ~3x performance increase when doing light processing (eg. picard CollectInsertSizeMetrics). This PR increases the level of parallelism to close to linear with the number of cores (tested to ~16x) through the following changes:
The default 128k buffer size results in read-ahead of only 2 gzip block thus only 2 concurrent decompression tasks. Specifying a larger buffer size (e.g.
-Dsamjdk.buffer_size=4194304
) enables scaling to additional cores.Checklist