Async BAM performance improvements #1249

d-cameron · 2019-01-02T12:47:38Z

Description

The existing aysnc read code performs read-ahead and decompression on a seperate thread. Combining with AsyncBufferedIterator and we can get ~3x performance increase when doing light processing (eg. picard CollectInsertSizeMetrics). This PR increases the level of parallelism to close to linear with the number of cores (tested to ~16x) through the following changes:

Separation of async BlockCompressedInputStream read-ahead and decompression into separate tasks to enable parallel decompression of multiple gzip blocks
Parallel async decoding of BAM records.
- whilst this incurs the overhead of an additional buffer copy, it enables SAMRecord decoding to be performed in parallel
- By default, SAMRecords are decoded in 8KiB batches to prevent excessive thread synchronisation overhead when processing small SAM records.

The default 128k buffer size results in read-ahead of only 2 gzip block thus only 2 concurrent decompression tasks. Specifying a larger buffer size (e.g. -Dsamjdk.buffer_size=4194304) enables scaling to additional cores.

Checklist

[ X ] Code compiles correctly
[ X ] New tests covering changes and new functionality
[ X ] All tests passing (tested on windows so some unrelated test cases fail)
[ NA ] Extended the README / documentation, if necessary
[ ] Is not backward compatible (breaks binary or source compatibility)

codecov-io · 2019-01-07T00:48:49Z

Codecov Report

Merging #1249 into master will increase coverage by 0.260%.
The diff coverage is 82.216%.

@@               Coverage Diff               @@
##              master     #1249       +/-   ##
===============================================
+ Coverage     69.203%   69.462%   +0.260%     
+ Complexity      8701      8177      -524     
===============================================
  Files            588       549       -39     
  Lines          34581     32930     -1651     
  Branches        5779      5567      -212     
===============================================
- Hits           23931     22874     -1057     
+ Misses          8368      7817      -551     
+ Partials        2282      2239       -43

Impacted Files	Coverage Δ	Complexity Δ
src/main/java/htsjdk/samtools/BAMRecordCodec.java	`77.419% <20.000%> (-4.032%)`	`23.000 <0.000> (-3.000)`
...java/htsjdk/samtools/util/AsyncReadTaskRunner.java	`79.144% <79.144%> (ø)`	`23.000 <23.000> (?)`
src/main/java/htsjdk/samtools/SAMUtils.java	`59.500% <80.000%> (-3.302%)`	`126.000 <1.000> (-5.000)`
...sjdk/samtools/util/BlockCompressedInputStream.java	`84.962% <83.582%> (-0.597%)`	`87.000 <17.000> (-3.000)`
...samtools/util/AsyncBlockCompressedInputStream.java	`78.571% <85.000%> (+6.571%)`	`8.000 <8.000> (-4.000)`	⬆️
src/main/java/htsjdk/samtools/BAMFileReader.java	`67.070% <90.476%> (-1.396%)`	`43.000 <2.000> (-9.000)`
...coding/core/experimental/ExperimentalEncoding.java	`0.000% <0.000%> (-100.000%)`	`0.000% <0.000%> (-1.000%)`
src/main/java/htsjdk/variant/vcf/VCFConstants.java	`0.000% <0.000%> (-87.500%)`	`0.000% <0.000%> (-1.000%)`
src/main/java/htsjdk/tribble/FeatureCodec.java	`50.000% <0.000%> (-50.000%)`	`1.000% <0.000%> (-1.000%)`
...ain/java/htsjdk/samtools/cram/structure/Slice.java	`44.545% <0.000%> (-36.946%)`	`20.000% <0.000%> (-75.000%)`
... and 351 more

Direct incorporation of samtools/htsjdk#1249 Added async CollectGridssMetrics incorporating the performance improvements

lbergelson · 2019-01-08T21:56:38Z

@d-cameron Thank you for this PR. It sounds like a big speedup for bam reading.

It's fairly complicated multithreaded code in a mission critical section of the codebase so it might take some time to review.

yfarjoun · 2019-02-11T19:50:51Z

Hey @d-cameron.

This seems like a really useful feature for htsjdk. A few of us have read through and were wondering why you didn't use CompletableFutures as the underlying synchronization pattern.

Using this pattern would simplify the review process, further concerns about possible locks and eliminate the need for explicit sleeps in the code. In addition, Expections will be propagated across threads helping in error messaging and debugging.

Would you be amenable to modifying your code to use CompletableFutures? We understand the uncertainty around the process of getting a PR merged, and are committed to working with you towards that goal.

d-cameron · 2019-02-12T02:11:44Z

Switching to CompleteableFutures will remove some of the sleeps, but not all. Note that even with CompleteableFutures, the code still needs a review for deadlocks and race conditions, although the complexity would mostly be limited to just the async read task.

The issue with the async read task is that the next read task can be scheduled either by the foreground thread, or the background thread and at most one of these tasks can be running at any point. We can't just have a single long-running task as that can deadlock when the thread pool is limited and results in unfair read-ahead scheduling when there are multiple files being read in parallel. Since the task can be scheduled

I've thought of an alternate design in which the read-ahead task are all scheduled from the calling thread so I'll rewrite using that approach with CompleteableFutures and see how much cleaner it ends up being.

yfarjoun · 2019-02-12T14:18:51Z

much appreciated.

d-cameron · 2019-03-04T11:38:12Z

@yfarjoun @lbergelson refactored so that all background processing tasks are CompletableFutures scheduled from the foreground thread. This version should be much simpler to code review. I've split the refactoring to two separate commits - one with the CompletableFuture refactor, and a second adding the RecordOrException complexity required for exceptions to be raised at the same time as they are raised when reading synchronously

lbergelson · 2019-03-04T16:44:17Z

@d-cameron Thank you! That's great. We will review this this week.

cmnbroad · 2019-03-04T19:08:17Z

Hi @d-cameron It looks like this is a couple of months behind master - can you rebase on current master before we start reviewing ? Thanks.

lbergelson · 2019-03-08T22:34:53Z

@d-cameron I'm sorry I haven't gotten through reviewing this yet. I expected to get an htsjdk release out this week and then review it after that, but I've run into a number of blocking issues that ate my time and I haven't gotten to give this a proper look yet. It is one of my priorities to get to next week though. It will not be abandoned forever this time...

lbergelson

@d-cameron I'm not finished reviewing this yet, but I thought I'd post my initial thoughts so you can see that we're not totally ignoring you. I had to make an unplanned trip last week that ate into my review time.

Generally this looks much nicer to me than the previous version.

There are few things so far that I think could be made clearer. The buffer/ decoder leasing is confusing and adds a lot of lines of code right now. (especially in the BlockCompressedInputStream classes.) I think it could be cleaned up by adding an explicit notion of ObjectPools. That might also enable the two codepaths to be unified more with less special casing of things. I'll finish taking a look tomorrow.

There's a lot of little cleanup things that could be done, making things more things final, adding new lines between methods, etc, that I haven't labelled every instance of. I don't mind doing my own pass on that sort of thing after we work out the details of this.

src/main/java/htsjdk/samtools/BAMFileReader.java

src/main/java/htsjdk/samtools/util/AsyncReadTaskRunner.java

src/main/java/htsjdk/samtools/util/BlockCompressedInputStream.java

src/main/java/htsjdk/samtools/BAMFileReader.java

d-cameron · 2019-03-31T14:12:38Z

Back to you @lbergelson

d-cameron · 2019-03-31T14:16:06Z

Extended the README / documentation, if necessary

Is there a README / documentation with all the settings exposed through Defaults.java? This PR now exposes the threadpool size through a new one.

d-cameron · 2019-04-26T04:15:59Z

@lbergelson @yfarjoun Any update on this PR?

yfarjoun · 2019-06-17T18:57:27Z

@lbergelson please review this when you have time.

lbergelson · 2020-02-10T22:15:53Z

Oh no. The PR that I swore to review quickly and then never did. @d-cameron It looks like there are build failures due to a missing logger and a typo. Inflater/inflator strikes again.

I uploaded a fix here https://github.com/samtools/htsjdk/tree/lb_async_bam_performance_improvements

d-cameron · 2020-03-07T06:57:41Z

Fixes in; updated to latest master

…tsjdk * Less custom code * Faster: Indexing the 129 GB file took 1h instead of 1h30m * More correct: Before, the files we wrote didn't have the right BGZF headers (FLG was 0 instead of 4) and so were not readable by other tools * We can take advantage of future improvements such as samtools/htsjdk#1249

…pression and BAM record parsing

Refactored AsyncReadTaskRunner test case to handle our early abort redesign. Now need to test against the threadpool itself as we will now skip any scheduled read-aheads of transforms that we haven't already sent to the thread pool.

…complete after the sleep

… is empty

jmarshall · 2021-08-18T09:47:05Z

I have taken the liberty of rebasing this onto current master, to resolve the (trivial) merge conflict and rerun the CI tests.

codecov-commenter · 2021-08-18T12:13:45Z

Codecov Report

Merging #1249 (925cd2c) into master (6a60de7) will increase coverage by 0.028%.
The diff coverage is 85.246%.

@@               Coverage Diff               @@
##              master     #1249       +/-   ##
===============================================
+ Coverage     69.814%   69.842%   +0.028%     
- Complexity      9627      9668       +41     
===============================================
  Files            702       704        +2     
  Lines          37607     37765      +158     
  Branches        6107      6119       +12     
===============================================
+ Hits           26255     26376      +121     
- Misses          8903      8916       +13     
- Partials        2449      2473       +24

Impacted Files	Coverage Δ
...ava/htsjdk/samtools/apps/TimeRandomAccessFile.java	`0.000% <0.000%> (ø)`
src/main/java/htsjdk/samtools/SAMUtils.java	`62.651% <80.000%> (-0.151%)`	⬇️
...java/htsjdk/samtools/util/AsyncReadTaskRunner.java	`81.944% <81.944%> (ø)`
...samtools/util/AsyncBlockCompressedInputStream.java	`79.661% <83.333%> (+0.994%)`	⬆️
...sjdk/samtools/util/BlockCompressedInputStream.java	`84.314% <84.286%> (-1.246%)`	⬇️
src/main/java/htsjdk/samtools/BAMFileReader.java	`70.992% <89.744%> (+1.491%)`	⬆️
src/main/java/htsjdk/samtools/BAMRecordCodec.java	`84.496% <100.000%> (+1.432%)`	⬆️
src/main/java/htsjdk/samtools/Defaults.java	`76.271% <100.000%> (+0.409%)`	⬆️
src/main/java/htsjdk/samtools/SamStreams.java	`59.649% <100.000%> (-4.966%)`	⬇️
src/main/java/htsjdk/samtools/util/IOUtil.java	`60.284% <100.000%> (+0.668%)`	⬆️
... and 26 more

jmarshall · 2021-08-18T12:30:48Z

I have been unable to reproduce the testDisableAsyncProcessingLetsExistingTasksComplete failure locally with OpenJDK 8 (java-1.8.0-openjdk on CentOS 8). Looking at the difference between builds 6577 and 6578 it appears that the issue is nondeterministic… (the worst kind!)

d-cameron added a commit to PapenfussLab/gridss that referenced this pull request Jan 7, 2019

Updated htsjdk/picard to latest version

1243674

Direct incorporation of samtools/htsjdk#1249 Added async CollectGridssMetrics incorporating the performance improvements

d-cameron mentioned this pull request Feb 11, 2019

CollectMultipleMetrics performance improvements broadinstitute/picard#1277

Open

yfarjoun added the Waiting for revisions This PR has received comments from reviewers and is waiting for the Author to respond label Feb 25, 2019

lbergelson requested changes Mar 19, 2019

View reviewed changes

yfarjoun assigned lbergelson Jun 17, 2019

yfarjoun added the Large 🐋 label Sep 30, 2019

d-cameron force-pushed the async_bam_performance_improvements branch from 57e19f4 to 312f7d1 Compare April 25, 2020 00:08

d-cameron mentioned this pull request May 12, 2020

gridss failure with exit code 6 PapenfussLab/gridss#323

Closed

lbergelson mentioned this pull request May 28, 2020

BAM async compression could use multiple threads #559

Open

robinst mentioned this pull request Sep 25, 2020

Use BlockCompressedInputStream and BlockCompressedOutputStream from htsjdk VCCRI/atlantool#17

Merged

brainstorm mentioned this pull request Jan 13, 2021

New release? #1515

Closed

d-cameron mentioned this pull request Jul 15, 2021

parallelization of bgzf compression zaeleus/noodles#17

Closed

Daniel Cameron added 2 commits August 18, 2021 09:44

Using separate async tasks for IO and computation for both gzip decom…

543574f

…pression and BAM record parsing

Corrected incorrectly failing test case

a8826fe

Refactored AsyncReadTaskRunner test case to handle our early abort redesign. Now need to test against the threadpool itself as we will now skip any scheduled read-aheads of transforms that we haven't already sent to the thread pool.

Daniel Cameron added 6 commits August 18, 2021 09:44

Added RecyclingSupplier test coverage

fddabcc

Ensuring all test classes extend HtsJdkTest

12b3839

Optimised imports; fixed spelling of HtsjdkTest

24701db

Removed AsyncReadTaskRunnerTest logging spam

2ac60c4

Updated test case to handle possibility that not all read tasks were …

8b40b86

…complete after the sleep

Reduced lock contention by only attempting job scheduling when buffer…

9b8638b

… is empty

jmarshall force-pushed the async_bam_performance_improvements branch from 279e51d to 9b8638b Compare August 18, 2021 09:45

Put assertEquals() actual/expected the right way around [minor]

925cd2c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async BAM performance improvements #1249

Async BAM performance improvements #1249

d-cameron commented Jan 2, 2019 •

edited

Loading

codecov-io commented Jan 7, 2019 •

edited

Loading

lbergelson commented Jan 8, 2019

yfarjoun commented Feb 11, 2019

d-cameron commented Feb 12, 2019

yfarjoun commented Feb 12, 2019

d-cameron commented Mar 4, 2019

lbergelson commented Mar 4, 2019

cmnbroad commented Mar 4, 2019

lbergelson commented Mar 8, 2019

lbergelson left a comment

d-cameron commented Mar 31, 2019

d-cameron commented Mar 31, 2019

d-cameron commented Apr 26, 2019

yfarjoun commented Jun 17, 2019

lbergelson commented Feb 10, 2020

d-cameron commented Mar 7, 2020

jmarshall commented Aug 18, 2021

codecov-commenter commented Aug 18, 2021

jmarshall commented Aug 18, 2021 •

edited

Loading

Async BAM performance improvements #1249

Are you sure you want to change the base?

Async BAM performance improvements #1249

Conversation

d-cameron commented Jan 2, 2019 • edited Loading

Description

Checklist

codecov-io commented Jan 7, 2019 • edited Loading

Codecov Report

lbergelson commented Jan 8, 2019

yfarjoun commented Feb 11, 2019

d-cameron commented Feb 12, 2019

yfarjoun commented Feb 12, 2019

d-cameron commented Mar 4, 2019

lbergelson commented Mar 4, 2019

cmnbroad commented Mar 4, 2019

lbergelson commented Mar 8, 2019

lbergelson left a comment

Choose a reason for hiding this comment

d-cameron commented Mar 31, 2019

d-cameron commented Mar 31, 2019

d-cameron commented Apr 26, 2019

yfarjoun commented Jun 17, 2019

lbergelson commented Feb 10, 2020

d-cameron commented Mar 7, 2020

jmarshall commented Aug 18, 2021

codecov-commenter commented Aug 18, 2021

Codecov Report

jmarshall commented Aug 18, 2021 • edited Loading

d-cameron commented Jan 2, 2019 •

edited

Loading

codecov-io commented Jan 7, 2019 •

edited

Loading

jmarshall commented Aug 18, 2021 •

edited

Loading