Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async BAM performance improvements #1249

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

d-cameron
Copy link
Contributor

@d-cameron d-cameron commented Jan 2, 2019

Description

The existing aysnc read code performs read-ahead and decompression on a seperate thread. Combining with AsyncBufferedIterator and we can get ~3x performance increase when doing light processing (eg. picard CollectInsertSizeMetrics). This PR increases the level of parallelism to close to linear with the number of cores (tested to ~16x) through the following changes:

  • Separation of async BlockCompressedInputStream read-ahead and decompression into separate tasks to enable parallel decompression of multiple gzip blocks
  • Parallel async decoding of BAM records.
    • whilst this incurs the overhead of an additional buffer copy, it enables SAMRecord decoding to be performed in parallel
    • By default, SAMRecords are decoded in 8KiB batches to prevent excessive thread synchronisation overhead when processing small SAM records.

The default 128k buffer size results in read-ahead of only 2 gzip block thus only 2 concurrent decompression tasks. Specifying a larger buffer size (e.g. -Dsamjdk.buffer_size=4194304) enables scaling to additional cores.

Checklist

  • [ X ] Code compiles correctly
  • [ X ] New tests covering changes and new functionality
  • [ X ] All tests passing (tested on windows so some unrelated test cases fail)
  • [ NA ] Extended the README / documentation, if necessary
  • [ ] Is not backward compatible (breaks binary or source compatibility)

@codecov-io
Copy link

codecov-io commented Jan 7, 2019

Codecov Report

Merging #1249 into master will increase coverage by 0.260%.
The diff coverage is 82.216%.

@@               Coverage Diff               @@
##              master     #1249       +/-   ##
===============================================
+ Coverage     69.203%   69.462%   +0.260%     
+ Complexity      8701      8177      -524     
===============================================
  Files            588       549       -39     
  Lines          34581     32930     -1651     
  Branches        5779      5567      -212     
===============================================
- Hits           23931     22874     -1057     
+ Misses          8368      7817      -551     
+ Partials        2282      2239       -43     
Impacted Files Coverage Δ Complexity Δ
src/main/java/htsjdk/samtools/BAMRecordCodec.java 77.419% <20.000%> (-4.032%) 23.000 <0.000> (-3.000)
...java/htsjdk/samtools/util/AsyncReadTaskRunner.java 79.144% <79.144%> (ø) 23.000 <23.000> (?)
src/main/java/htsjdk/samtools/SAMUtils.java 59.500% <80.000%> (-3.302%) 126.000 <1.000> (-5.000)
...sjdk/samtools/util/BlockCompressedInputStream.java 84.962% <83.582%> (-0.597%) 87.000 <17.000> (-3.000)
...samtools/util/AsyncBlockCompressedInputStream.java 78.571% <85.000%> (+6.571%) 8.000 <8.000> (-4.000) ⬆️
src/main/java/htsjdk/samtools/BAMFileReader.java 67.070% <90.476%> (-1.396%) 43.000 <2.000> (-9.000)
...coding/core/experimental/ExperimentalEncoding.java 0.000% <0.000%> (-100.000%) 0.000% <0.000%> (-1.000%)
src/main/java/htsjdk/variant/vcf/VCFConstants.java 0.000% <0.000%> (-87.500%) 0.000% <0.000%> (-1.000%)
src/main/java/htsjdk/tribble/FeatureCodec.java 50.000% <0.000%> (-50.000%) 1.000% <0.000%> (-1.000%)
...ain/java/htsjdk/samtools/cram/structure/Slice.java 44.545% <0.000%> (-36.946%) 20.000% <0.000%> (-75.000%)
... and 351 more

d-cameron added a commit to PapenfussLab/gridss that referenced this pull request Jan 7, 2019
Direct incorporation of samtools/htsjdk#1249
Added async CollectGridssMetrics incorporating the performance improvements
@lbergelson
Copy link
Member

@d-cameron Thank you for this PR. It sounds like a big speedup for bam reading.

It's fairly complicated multithreaded code in a mission critical section of the codebase so it might take some time to review.

@yfarjoun
Copy link
Contributor

Hey @d-cameron.

This seems like a really useful feature for htsjdk. A few of us have read through and were wondering why you didn't use CompletableFutures as the underlying synchronization pattern.

Using this pattern would simplify the review process, further concerns about possible locks and eliminate the need for explicit sleeps in the code. In addition, Expections will be propagated across threads helping in error messaging and debugging.

Would you be amenable to modifying your code to use CompletableFutures? We understand the uncertainty around the process of getting a PR merged, and are committed to working with you towards that goal.

@d-cameron
Copy link
Contributor Author

Switching to CompleteableFutures will remove some of the sleeps, but not all. Note that even with CompleteableFutures, the code still needs a review for deadlocks and race conditions, although the complexity would mostly be limited to just the async read task.

The issue with the async read task is that the next read task can be scheduled either by the foreground thread, or the background thread and at most one of these tasks can be running at any point. We can't just have a single long-running task as that can deadlock when the thread pool is limited and results in unfair read-ahead scheduling when there are multiple files being read in parallel. Since the task can be scheduled

I've thought of an alternate design in which the read-ahead task are all scheduled from the calling thread so I'll rewrite using that approach with CompleteableFutures and see how much cleaner it ends up being.

@yfarjoun
Copy link
Contributor

much appreciated.

@yfarjoun yfarjoun added the Waiting for revisions This PR has received comments from reviewers and is waiting for the Author to respond label Feb 25, 2019
@d-cameron
Copy link
Contributor Author

@yfarjoun @lbergelson refactored so that all background processing tasks are CompletableFutures scheduled from the foreground thread. This version should be much simpler to code review. I've split the refactoring to two separate commits - one with the CompletableFuture refactor, and a second adding the RecordOrException complexity required for exceptions to be raised at the same time as they are raised when reading synchronously

@lbergelson
Copy link
Member

@d-cameron Thank you! That's great. We will review this this week.

@cmnbroad
Copy link
Collaborator

cmnbroad commented Mar 4, 2019

Hi @d-cameron It looks like this is a couple of months behind master - can you rebase on current master before we start reviewing ? Thanks.

@lbergelson
Copy link
Member

@d-cameron I'm sorry I haven't gotten through reviewing this yet. I expected to get an htsjdk release out this week and then review it after that, but I've run into a number of blocking issues that ate my time and I haven't gotten to give this a proper look yet. It is one of my priorities to get to next week though. It will not be abandoned forever this time...

Copy link
Member

@lbergelson lbergelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@d-cameron I'm not finished reviewing this yet, but I thought I'd post my initial thoughts so you can see that we're not totally ignoring you. I had to make an unplanned trip last week that ate into my review time.

Generally this looks much nicer to me than the previous version.

There are few things so far that I think could be made clearer. The buffer/ decoder leasing is confusing and adds a lot of lines of code right now. (especially in the BlockCompressedInputStream classes.) I think it could be cleaned up by adding an explicit notion of ObjectPools. That might also enable the two codepaths to be unified more with less special casing of things. I'll finish taking a look tomorrow.

There's a lot of little cleanup things that could be done, making things more things final, adding new lines between methods, etc, that I haven't labelled every instance of. I don't mind doing my own pass on that sort of thing after we work out the details of this.

src/main/java/htsjdk/samtools/BAMFileReader.java Outdated Show resolved Hide resolved
src/main/java/htsjdk/samtools/BAMFileReader.java Outdated Show resolved Hide resolved
@d-cameron
Copy link
Contributor Author

Back to you @lbergelson

@d-cameron
Copy link
Contributor Author

Extended the README / documentation, if necessary

Is there a README / documentation with all the settings exposed through Defaults.java? This PR now exposes the threadpool size through a new one.

@d-cameron
Copy link
Contributor Author

@lbergelson @yfarjoun Any update on this PR?

@yfarjoun
Copy link
Contributor

@lbergelson please review this when you have time.

@lbergelson
Copy link
Member

Oh no. The PR that I swore to review quickly and then never did. @d-cameron It looks like there are build failures due to a missing logger and a typo. Inflater/inflator strikes again.

I uploaded a fix here https://github.com/samtools/htsjdk/tree/lb_async_bam_performance_improvements

@d-cameron
Copy link
Contributor Author

Fixes in; updated to latest master

@d-cameron d-cameron force-pushed the async_bam_performance_improvements branch from 57e19f4 to 312f7d1 Compare April 25, 2020 00:08
robinst added a commit to VCCRI/atlantool that referenced this pull request Sep 25, 2020
…tsjdk

* Less custom code
* Faster: Indexing the 129 GB file took 1h instead of 1h30m
* More correct: Before, the files we wrote didn't have the right BGZF
  headers (FLG was 0 instead of 4) and so were not readable by other
  tools
* We can take advantage of future improvements such as
  samtools/htsjdk#1249
@brainstorm brainstorm mentioned this pull request Jan 13, 2021
Daniel Cameron added 2 commits August 18, 2021 09:44
Refactored AsyncReadTaskRunner test case to handle our early abort redesign.
Now need to test against the threadpool itself as we will now skip any scheduled read-aheads of transforms that we haven't already sent to the thread pool.
@jmarshall jmarshall force-pushed the async_bam_performance_improvements branch from 279e51d to 9b8638b Compare August 18, 2021 09:45
@jmarshall
Copy link
Member

I have taken the liberty of rebasing this onto current master, to resolve the (trivial) merge conflict and rerun the CI tests.

@codecov-commenter
Copy link

Codecov Report

Merging #1249 (925cd2c) into master (6a60de7) will increase coverage by 0.028%.
The diff coverage is 85.246%.

@@               Coverage Diff               @@
##              master     #1249       +/-   ##
===============================================
+ Coverage     69.814%   69.842%   +0.028%     
- Complexity      9627      9668       +41     
===============================================
  Files            702       704        +2     
  Lines          37607     37765      +158     
  Branches        6107      6119       +12     
===============================================
+ Hits           26255     26376      +121     
- Misses          8903      8916       +13     
- Partials        2449      2473       +24     
Impacted Files Coverage Δ
...ava/htsjdk/samtools/apps/TimeRandomAccessFile.java 0.000% <0.000%> (ø)
src/main/java/htsjdk/samtools/SAMUtils.java 62.651% <80.000%> (-0.151%) ⬇️
...java/htsjdk/samtools/util/AsyncReadTaskRunner.java 81.944% <81.944%> (ø)
...samtools/util/AsyncBlockCompressedInputStream.java 79.661% <83.333%> (+0.994%) ⬆️
...sjdk/samtools/util/BlockCompressedInputStream.java 84.314% <84.286%> (-1.246%) ⬇️
src/main/java/htsjdk/samtools/BAMFileReader.java 70.992% <89.744%> (+1.491%) ⬆️
src/main/java/htsjdk/samtools/BAMRecordCodec.java 84.496% <100.000%> (+1.432%) ⬆️
src/main/java/htsjdk/samtools/Defaults.java 76.271% <100.000%> (+0.409%) ⬆️
src/main/java/htsjdk/samtools/SamStreams.java 59.649% <100.000%> (-4.966%) ⬇️
src/main/java/htsjdk/samtools/util/IOUtil.java 60.284% <100.000%> (+0.668%) ⬆️
... and 26 more

@jmarshall
Copy link
Member

jmarshall commented Aug 18, 2021

I have been unable to reproduce the testDisableAsyncProcessingLetsExistingTasksComplete failure locally with OpenJDK 8 (java-1.8.0-openjdk on CentOS 8). Looking at the difference between builds 6577 and 6578 it appears that the issue is nondeterministic… (the worst kind!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Large 🐋 Waiting for revisions This PR has received comments from reviewers and is waiting for the Author to respond
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants