Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Async BAM decompression #576
Conversation
d-cameron
referenced
this pull request
Apr 18, 2016
Closed
Asynchronous read performance enhancements #78 #482
|
Additional feature in this PR not in #482 is async buffers reuse. This also nicely make mOperations less awkward as the values actually mean something now. |
akiezun
and 1 other
commented on an outdated diff
Apr 18, 2016
| + try { | ||
| + while (true) { | ||
| + byte[] buffer = mOperations.take(); | ||
| + // snapshot where to output to. If seek() is called | ||
| + // on the input stream, then under no circumstances | ||
| + // do we want to write our result to the post-seek | ||
| + // results | ||
| + BlockingQueue<DecompressedBlock> result = mResult; | ||
| + if (result != null) { | ||
| + DecompressedBlock decompressed = nextBlock(buffer); | ||
| + result.put(decompressed); | ||
| + } | ||
| + } | ||
| + } catch (InterruptedException e) { | ||
| + // Reset interrupt status | ||
| + Thread.interrupted(); |
akiezun
Contributor
|
|
@d-cameron I've tried to plug in this PR's code to gatk4 and run tests (which use asyncIO). However everything deadlocks when I run the tests. See the attached jstack dump. I also see this in the console
Can you help me figure out what's causing this? It may be that my test suite is behaving badly, or a bug in the new code. |
akiezun
self-assigned this
Apr 19, 2016
|
Another stacktrace in the logs:
|
|
I cleaned up my reader closing a bit and now the jstack dump is like this (again, tests are stalled): I think it essentially it boils down to the the test runner thread being stuck at
|
|
If the worker thread just dies due to an exception, the main thread may never find out if it's blocking on the |
akiezun
referenced
this pull request
in broadinstitute/gatk
Apr 19, 2016
Merged
fix resource leaks in MarkDuplicates #1729
|
Is your test harness currently leaking thousands of threads? I'll refactor the code to use a thread pool tonight and add an extra constructor so I can incorporate tests into htsjdk. This should make testing easier, the class instanciation more lightweight, and remove the thread leak penalty of code that doesn't call close(). |
|
@d-cameron It did leak threads - I fixed the thread leakage now, at least in MarkDuplicates. |
|
@d-cameron do let me know when the PR is ready for re-review and re-test |
|
Ready to go from my perspective. The test case picked up a few bugs in my implementation during development so I'm happy that it is actually testing the async code in a meaningful way. This async code is both shorter and has a simpler API so reasoning about correctness should be an easier task. |
|
@d-cameron I plugged in this new code to gatk4 and ran the test suite. I see this in the log (it does not happen on master htsjdk)
|
|
another exception on a rerun (not sure how easy it would be to diagnose from this one because there's no htsjdk on the stack though). These must be due to the async reading because i change nothing else and those tests run reliably on travis on https://github.com/broadinstitute/gatk (and we do use async IO in the tests all the time).
|
|
I had a look at org.seqdoop.hadoop_bam.BAMRecordReader and it does seem very much like the Hadoop-Bam code is making assuptions about the synchronization between the position exposed by the BlockCompressedInputStream, and the position of the underlying stream. Digging around a bit I found the following comment in org.seqdoop.hadoop_bam.util.BGZFSplitCompressionInputStream::
Looking at the API https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/io/compress/SplitCompressionInputStream.html I see references to the underlying stream being passed to a base class that has a seek() method as well as to BlockCompressedInputStream. It looks like Hadoop-BAM is not compatible with any sort of read-ahead on the stream. |
|
@tomwhite can you comment on whether Hadoop-BAM can be made to work with async reading? If not, we may have to just be able to turn off async IO for Hadoop-BAM. |
|
I suspect you'll find it is the read-ahead of the underlying stream that's the issue, not the async itself. Edit: replacing threadpool.execute(this); with run(); would be an even simpler approach as a temporary testing measure. |
|
Changing the BlockCompressedInputStreams constructors to pass async=false in the Hadoop-BAM codebase would be straight-forward - there's only a handful of reference to the class. |
|
@akiezun @d-cameron I expected readahead to be a problem with Hadoop-BAM, and to have it disabled for the reasons you say. Even if async is disabled by default (which I believe it is), Hadoop-BAM should explicitly use the non-async constructor. |
|
@d-cameron @tomwhite how should we proceed? we can't update hadoop-bam until the new constructor is in and I can't properly test gatk until hadoop bam is updated. @d-cameron can you move the code that checks |
|
If I'm to do that then it would make more sense to refactor into a separate ASyncBlockCompressedInputStream class. Any objections? |
|
no objections from me. that may indeed be better |
|
Back to you @akiezun |
|
thanks @d-cameron , i'm on it |
akiezun
commented on an outdated diff
Apr 26, 2016
| @@ -126,7 +128,8 @@ public boolean hasIndex(){ | ||
| } | ||
| public CloseableTribbleIterator<T> iterator() throws IOException { | ||
| - final InputStream is = new BlockCompressedInputStream(ParsingUtils.openInputStream(path)); | ||
| + final InputStream compressedStream = ParsingUtils.openInputStream(path); | ||
| + final InputStream is = Defaults.USE_ASYNC_IO_FOR_TRIBBLE ? new AsyncBlockCompressedInputStream(compressedStream) : new BlockCompressedInputStream(compressedStream); |
akiezun
Contributor
|
akiezun
commented on an outdated diff
Apr 26, 2016
| @@ -183,7 +184,7 @@ private static InputStream indexFileInputStream(final String indexFile) throws I | ||
| return new GZIPInputStream(inputStreamInitial); | ||
| } | ||
| else if (indexFile.endsWith(TabixUtils.STANDARD_INDEX_EXTENSION)) { | ||
| - return new BlockCompressedInputStream(inputStreamInitial); | ||
| + return Defaults.USE_ASYNC_IO_FOR_TRIBBLE ? new AsyncBlockCompressedInputStream(inputStreamInitial) : new BlockCompressedInputStream(inputStreamInitial); |
|
|
akiezun
commented on an outdated diff
Apr 26, 2016
| @@ -261,7 +262,7 @@ public SamReader open(final SamInputResource resource) { | ||
| primitiveSamReader = new BAMFileReader(sourceFile, indexFile, false, validationStringency, this.samRecordFactory); | ||
| } | ||
| } else if (BlockCompressedInputStream.isValidFile(bufferedStream)) { | ||
| - primitiveSamReader = new SAMTextReader(new BlockCompressedInputStream(bufferedStream), validationStringency, this.samRecordFactory); | ||
| + primitiveSamReader = new SAMTextReader(Defaults.USE_ASYNC_IO_FOR_SAMTOOLS ? new AsyncBlockCompressedInputStream(bufferedStream) : new BlockCompressedInputStream(bufferedStream), validationStringency, this.samRecordFactory); |
akiezun
Contributor
|
akiezun
commented on an outdated diff
Apr 26, 2016
| +import java.util.concurrent.ArrayBlockingQueue; | ||
| +import java.util.concurrent.BlockingQueue; | ||
| +import java.util.concurrent.Executor; | ||
| +import java.util.concurrent.Executors; | ||
| +import java.util.concurrent.Semaphore; | ||
| + | ||
| +/** | ||
| + * Asynchronous read-ahead implementation of {@link htsjdk.samtools.util.BlockCompressedInputStream}. | ||
| + * | ||
| + * Note that this implementation is not synchronized. If multiple threads access an instance concurrently, it must be synchronized externally. | ||
| + */ | ||
| +public class AsyncBlockCompressedInputStream extends BlockCompressedInputStream { | ||
| + private static final int READ_AHEAD_BUFFERS = (int)Math.ceil(Defaults.NON_ZERO_BUFFER_SIZE / BlockCompressedStreamConstants.MAX_COMPRESSED_BLOCK_SIZE); | ||
| + private static final Executor threadpool = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()); | ||
| + private final BlockingQueue<DecompressedBlock> mResult = new ArrayBlockingQueue<>(READ_AHEAD_BUFFERS); | ||
| + private final BlockingQueue<byte[]> freeBuffers = new ArrayBlockingQueue<>(READ_AHEAD_BUFFERS); |
|
|
akiezun
commented on an outdated diff
Apr 26, 2016
| + | ||
| + public AsyncBlockCompressedInputStream(final SeekableStream strm) { | ||
| + super(strm); | ||
| + } | ||
| + | ||
| + protected DecompressedBlock nextBlock(byte[] bufferAvailableForReuse) { | ||
| + if (bufferAvailableForReuse != null) { | ||
| + freeBuffers.offer(bufferAvailableForReuse); | ||
| + } | ||
| + return nextBlockSync(); | ||
| + } | ||
| + | ||
| + @Override | ||
| + protected void prepareForSeek() { | ||
| + flushReadAhead(); | ||
| + super.prepareForSeek(); |
akiezun
Contributor
|
akiezun
and 1 other
commented on an outdated diff
Apr 26, 2016
| +import java.io.InputStream; | ||
| +import java.net.URL; | ||
| +import java.util.concurrent.ArrayBlockingQueue; | ||
| +import java.util.concurrent.BlockingQueue; | ||
| +import java.util.concurrent.Executor; | ||
| +import java.util.concurrent.Executors; | ||
| +import java.util.concurrent.Semaphore; | ||
| + | ||
| +/** | ||
| + * Asynchronous read-ahead implementation of {@link htsjdk.samtools.util.BlockCompressedInputStream}. | ||
| + * | ||
| + * Note that this implementation is not synchronized. If multiple threads access an instance concurrently, it must be synchronized externally. | ||
| + */ | ||
| +public class AsyncBlockCompressedInputStream extends BlockCompressedInputStream { | ||
| + private static final int READ_AHEAD_BUFFERS = (int)Math.ceil(Defaults.NON_ZERO_BUFFER_SIZE / BlockCompressedStreamConstants.MAX_COMPRESSED_BLOCK_SIZE); | ||
| + private static final Executor threadpool = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors()); |
akiezun
Contributor
|
akiezun
commented on an outdated diff
Apr 26, 2016
| - byte[] buffer = mCurrentBlock; | ||
| - mCurrentBlock = null; | ||
| - if (buffer == null || buffer.length != uncompressedLength) { | ||
| - try { | ||
| - buffer = new byte[uncompressedLength]; | ||
| - } catch (final NegativeArraySizeException e) { | ||
| - throw new RuntimeIOException("BGZF file has invalid uncompressedLength: " + uncompressedLength, e); | ||
| - } | ||
| + private byte[] inflateBlock(final byte[] compressedBlock, final int compressedLength, | ||
| + final byte[] bufferAvailableForReuse) throws IOException { | ||
| + final int uncompressedLength = unpackInt32(compressedBlock, compressedLength - 4); | ||
| + if (uncompressedLength < 0) { | ||
| + throw new RuntimeIOException("BGZF file has invalid uncompressedLength: " + uncompressedLength); | ||
| + } | ||
| + byte[] buffer = bufferAvailableForReuse; | ||
| + if (buffer == null || uncompressedLength != buffer.length) { |
akiezun
Contributor
|
akiezun
and 1 other
commented on an outdated diff
Apr 26, 2016
| } | ||
| - private void readBlock() | ||
| - throws IOException { | ||
| - | ||
| + private void readBlock() throws IOException { | ||
| + mCurrentBlock = nextBlock(getBufferForReuse(mCurrentBlock)); | ||
| + mCurrentOffset = 0; | ||
| + checkAndRethrowDecompressionException(); | ||
| + } | ||
| + /** | ||
| + * Reads and decompresses the next block | ||
| + * @param bufferAvailableForReuse decompression buffer available for reuse | ||
| + * @return next block in the decompressed stream | ||
| + */ | ||
| + protected DecompressedBlock nextBlock(byte[] bufferAvailableForReuse) { |
akiezun
Contributor
|
akiezun
and 1 other
commented on an outdated diff
Apr 26, 2016
| * | ||
| * c.f. http://samtools.sourceforge.net/SAM1.pdf for details of BGZF format | ||
| + * | ||
| + * | ||
| */ | ||
| public class BlockCompressedInputStream extends InputStream implements LocationAware { |
akiezun
Contributor
|
|
I added a few comments. Thanks for your ongoing work on it. Back to you @d-cameron Two major points:
|
|
This PR is really an async block decompression PR. Performing async stream decompression as well as async iteration (using AsyncBufferedIterator) increases throughput by more than 100% over a single-threaded synchronous iterator for a moderate amount of computation on each record. This is only possible by buffering both the stream and the iterator. Processing time per record is: Quite a number of picard tools workloads benefit from having both async. |
|
hmm, nevermind, hadoop-bam still uses the async stream, allocated like this
and others, all sharing |
|
@tomwhite what do you think can be done here to enforce the use of sync stream in Hadoop-BAM? The SAMHeaderReader in hadoop-bam is using the SamReaderFactory to read the header but it does not close the underlying reader or the wrapped stream (which is a Async stream that wraps FSDataInputStream) - this leaves the background thread running and blows up in seek on the wrapped FSDataInputStream a few lines down. (BTW, I feel like all these ultimately are due to the way sync/async is specified which is through this single global setting in |
|
@akiezun Perhaps use SAMHeaderReader#readSAMHeaderFrom() that takes a Path and does close the stream. Would that help? |
|
As discussed with @tomwhite , I'll attempt to add an option to the reader factory to enable overriding |
akiezun
referenced
this pull request
May 10, 2016
Merged
adding a useAsynchronousIO flag to SamReaderFactory #601
|
@d-cameron can you rebase and use the new API in the SamReaderFactory (see #601) to not rely on |
d-cameron
added a commit
to d-cameron/htsjdk
that referenced
this pull request
May 12, 2016
|
|
d-cameron |
b2fdb56
|
|
Back to @akiezun |
|
I've tried this and an updated Hadoop-BAM in GATK and the tests are still not passing. Looking into why now... |
|
Investigated a bit more and the issue I'm seeing is not due to this change, but another one in Hadoop-BAM (HadoopGenomics/Hadoop-BAM#80). When I remove that change all the GATK tests pass for me when running with the change in this issue (#576), and updates in Hadoop-BAM to disable async IO. So +1 from me. |
tomwhite
referenced
this pull request
in HadoopGenomics/Hadoop-BAM
May 12, 2016
Closed
Release Hadoop-BAM 7.5.0 #88
|
thanks @tomwhite. Can Hadoop-bam switch to synch reader for |
|
@akiezun here's the PR to switch Hadoop-BAM to use synchronous streams: HadoopGenomics/Hadoop-BAM#94. The BTW I've also opened broadinstitute/gatk#1817 to address the other issue I was seeing which was due to sharded BAM files not have a .bam extension. |
This was referenced May 13, 2016
|
The GATK tests now work on this branch which is great. But the threads in the pool are non-deamon and so the vm does not die even when main thread is dead. Run this for synch IO
and this for async IO
The latter never terminates and there is a bunch of threads from the pool that are still alive:
back to @d-cameron |
|
Should be good to go now @akiezun. |
|
Thanks. that is working now. I'll test it on gatk and picard next |
|
BTW, perf gain is still the same, running on a 1.6GB file (reading only + eager decode). (I had those runs alternated, printing consecutive here for clarity) Sync Async |
akiezun
and 1 other
commented on an outdated diff
May 24, 2016
| + * Asynchronous read-ahead implementation of {@link htsjdk.samtools.util.BlockCompressedInputStream}. | ||
| + * | ||
| + * Note that this implementation is not synchronized. If multiple threads access an instance concurrently, it must be synchronized externally. | ||
| + */ | ||
| +public class AsyncBlockCompressedInputStream extends BlockCompressedInputStream { | ||
| + private static final int READ_AHEAD_BUFFERS = (int)Math.ceil(Defaults.NON_ZERO_BUFFER_SIZE / BlockCompressedStreamConstants.MAX_COMPRESSED_BLOCK_SIZE); | ||
| + private static final Executor threadpool = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors(),new ThreadFactory() { | ||
| + public Thread newThread(Runnable r) { | ||
| + Thread t = Executors.defaultThreadFactory().newThread(r); | ||
| + t.setDaemon(true); | ||
| + return t; | ||
| + } | ||
| + }); | ||
| + private final BlockingQueue<DecompressedBlock> mResult = new ArrayBlockingQueue<>(READ_AHEAD_BUFFERS); | ||
| + private final BlockingQueue<byte[]> freeBuffers = new ArrayBlockingQueue<>(READ_AHEAD_BUFFERS); | ||
| + private final Semaphore running = new Semaphore(1); |
akiezun
Contributor
|
akiezun
commented on an outdated diff
May 24, 2016
| + return nextBlockSync(); | ||
| + } | ||
| + | ||
| + @Override | ||
| + protected void prepareForSeek() { | ||
| + flushReadAhead(); | ||
| + super.prepareForSeek(); | ||
| + } | ||
| + | ||
| + @Override | ||
| + public void close() throws IOException { | ||
| + // Suppress interrupts while we close. | ||
| + final boolean isInterrupted = Thread.interrupted(); | ||
| + mAbort = true; | ||
| + flushReadAhead(); | ||
| + super.close(); |
akiezun
Contributor
|
akiezun
and 1 other
commented on an outdated diff
May 24, 2016
| + private void flushReadAhead() { | ||
| + final boolean abortStatus = mAbort; | ||
| + mAbort = true; | ||
| + try { | ||
| + // block until the thread pool operation has completed | ||
| + running.acquire(); | ||
| + } catch (InterruptedException e) { | ||
| + throw new RuntimeException("Interrupted waiting for decompression thread", e); | ||
| + } | ||
| + // flush any read-ahead results | ||
| + mResult.clear(); | ||
| + mAbort = abortStatus; | ||
| + running.release(); | ||
| + } | ||
| + private void ensureReadAhead() { | ||
| + if (running.tryAcquire()) { |
akiezun
Contributor
|
akiezun
commented on an outdated diff
May 24, 2016
| + } | ||
| + | ||
| + public AsyncBlockCompressedInputStream(final File file) | ||
| + throws IOException { | ||
| + super(file); | ||
| + } | ||
| + | ||
| + public AsyncBlockCompressedInputStream(final URL url) { | ||
| + super(url); | ||
| + } | ||
| + | ||
| + public AsyncBlockCompressedInputStream(final SeekableStream strm) { | ||
| + super(strm); | ||
| + } | ||
| + | ||
| + protected DecompressedBlock nextBlock(byte[] bufferAvailableForReuse) { |
|
|
|
Back to you @akiezun |
d-cameron
referenced
this pull request
in PapenfussLab/gridss
May 27, 2016
Closed
error after variant calling using gridss-0.11.4-jar-with-dependencies #30
d-cameron
added a commit
to PapenfussLab/gridss
that referenced
this pull request
May 29, 2016
|
|
d-cameron |
725c6ac
|
|
performance testing update: on my use cases (HaplotypeCaller and BQSR in GATK4) the improvements are somewhat less pronounced because these tools don't do as much reading as simple print reads example. Example (all tested on a 8-core MacOSX machine): |
akiezun
removed their assignment
Jun 8, 2016
|
Correctness update: with this change on, tests pass on both https://github.com/broadinstitute/gatk and https://github.com/broadinstitute/gatk-protected test suites (both use async IO everywhere) so I don't have any problems with merging it. However, given that BQSR runs on 1 core in our production and there's no speedup for HaplotypeCaller and complex mutli-threaded code like this always carries risks, I feel like I can't champion this PR. I will unassign myself accurdingly as a reviewer. Thanks @d-cameron for your work on this PR. What do others in the community think? Should this be merged given the benefits and the risks? |
|
@akiezun Perhaps we should provide a way to turn async on/off separately for reading and writing bams, so that we can insulate ourselves from the risks you mention while turning it on where we want/need it. These global properties in |
|
I agree that control of read vs. write separately is important - I often use async write because the performance benefit is large, but for read I don't see a huge benefit. That said @droazen I disagree that the reader/writer factories should be the exclusive place to control this, as that means that every single command line tool has to expose (hopefully consistent) parameters for these. Having a toolkit that is configured via system properties is very convenient. |
|
@tfenne I'd argue that other possible solutions to that would be to create a common base class for your tools to handle exposing settings for reader/writer creation, or create shared utility methods for reader/writer creation called by convention. Centralizing reader/writer creation in your toolkit has other benefits as well (eg., you can make sure that CRAM is properly supported across all tools). That said, I think I'd be satisfied as long as the factories have a mechanism to control async directly, and we're not compelled to rely on the system properties. |
|
@droazen And if I have a dozen independent toolkits that all need the same functionality? That have to remain separate for licensing/IP issues? |
|
@tfenne Writing one shared utility method or base class per toolkit (not per tool) to configure your readers/writers in a consistent fashion still seems better than dealing with global configuration via properties, which is never sufficiently granular for everyone's needs, and only covers a subset of the total configuration options available anyway. |
|
Async read performance improvements are only become significant when computation per read is relatively low. Have you benchmarked any of the read-intensive Picard tools? I expect you'll get a lot more than 16% out of Collect*Metrics and the likes. |
|
FYI I looked at 2 simple metrics - speedup is indeed decent: @droazen @tfenne how about splitting samjdk.use_async_io_samtools into 2 options so that everyone can pick the level of exposure they want (reader factory itself also supports this setting): and we'll do similarly for tribble once async reading is turned back on (it's deleted now, see #583). |
|
@akiezun I'm happy with anything as long as it's possible to configure/override at the factory level. |
|
Re-assigning to @akiezun now that he's gotten the requested feedback |
akiezun
was assigned
by droazen
Jun 10, 2016
yfarjoun
added Review-party candidate and removed Review-party candidate
labels
Jun 11, 2016
This was referenced Jun 11, 2016
|
I think this should wait until #641 is merged |
|
@d-cameron can you rebase on top of master (now contains #641) and use the |
coveralls
commented
Jun 23, 2016
|
First attempt at merging trashed symlinked since it was done on a windows machine. Redid merge from scratch on a linux machine so should be good to go now @akiezun |
akiezun
commented on an outdated diff
Jun 24, 2016
| + abstract public SamReaderFactory referenceSequence(File referenceSequence); | ||
| + | ||
| + /** Sets the specified reference sequence * */ | ||
| + abstract public SamReaderFactory referenceSource(CRAMReferenceSource referenceSequence); | ||
| + | ||
| + /** Utility method to open the file get the header and close the file */ | ||
| + abstract public SAMFileHeader getFileHeader(File samFile); | ||
| + | ||
| + /** Reapplies any changed options to the reader * */ | ||
| + abstract public void reapplyOptions(SamReader reader); | ||
| + | ||
| + /** Set this factory's {@link ValidationStringency} to the provided one, then returns itself. */ | ||
| + abstract public SamReaderFactory validationStringency(final ValidationStringency validationStringency); | ||
| + | ||
| + /** Set whether readers created by this factory will use asynchronous IO. | ||
| + * If this methods is not called, this flag will default to the value of {@link Defaults#USE_ASYNC_IO_FOR_SAMTOOLS}. |
|
|
coveralls
commented
Jun 27, 2016
|
@jabbarish can you help review this PR? |
|
I tested on gatk and picard. GATK test suite works (both sync and async readinng). Picard test suite fails with strange 5 errors:
(and it does work fine on master picard) |
|
@akiezun Sorry for delay. Sure, we will review this PR. |
akiezun
removed their assignment
Jul 26, 2016
|
@jabbarish Any update on the review? I'm going to un-assign myself because I wont be able to work on this for now. |
|
@akiezun I just noticed you mentioned me here. I haven't seen that error in picard. It looks like either a packaging error in htsjdk or in picard. Do you see the same error again if you try with the current picard master? It's possible it was some messed up intermediate build that's fixed now? |
|
@akiezun Sorry for delay, we will come back with detailed reviews of several issues soon. Quick answer: this exception occurs when you use 'not complete' jar. Try the one created with shadowJar, it should work. |
|
@jabbarish just a quick note to say that we are still waiting for the review. please let us know if you are still planing on reviewing it. Thanks. |
lbergelson
commented on an outdated diff
Aug 23, 2016
| @@ -1,528 +1,528 @@ | ||
| -package htsjdk.samtools; |
lbergelson
Contributor
|
|
@jacarey We need some multithreaded expertise to PR this. would you be able to take another final look? |
|
@jacarey, I just heard via a private channel that there might be a competing PR being prepared that has (supposedly) better performance. it would be best if we can do a comparative reivew somehow... |
|
@jabbarish have you had time to look at this? You are the reviewer for this branch as nominated by @akiezun, and @d-cameron was recently asking about the status of this. |
|
In short: we propose to introduce centralized thread management tool to control CPU load in all metrics (we will make a separate PR), we think the proposed changes should be implemented using Java 8 features, and finally there is more room for improvements here when using threads. I think we should better make another PR with our solution rather then commenting this one. |
|
@jabbarish Since this PR has already undergone several review passes and is now pretty mature, it seems worth merging this work while waiting for your proposed more generalized solution to materialize. Are you able to take a look at the code here and offer specific suggestions/improvements? |
|
Sure. I'll make it by Monday.
|
ZLyanov
reviewed
Oct 3, 2016
Finally, the consequence of the above mentioned decisions is the intensive and complicated usage of methods like ensureReadAhead and tryQueueTask in both foreground and background threads.
We propose to consider classic single-threaded producer-consumer solution with minimum sync primitives involved, based on a cycle in the background thread and communication via the existing bounded blocking queue (mResult).
Thanks.
| + * Note that this implementation is not synchronized. If multiple threads access an instance concurrently, it must be synchronized externally. | ||
| + */ | ||
| +public class AsyncBlockCompressedInputStream extends BlockCompressedInputStream { | ||
| + private static final int READ_AHEAD_BUFFERS = (int)Math.ceil(Defaults.NON_ZERO_BUFFER_SIZE / BlockCompressedStreamConstants.MAX_COMPRESSED_BLOCK_SIZE); |
d-cameron
Oct 4, 2016
•
Contributor
Yes. Math.ceil and the upstream guarantee of a non-zero Defaults.NON_ZERO_BUFFER_SIZE ensure this.
| + t.setDaemon(true); | ||
| + return t; | ||
| + } | ||
| + }); |
ZLyanov
Oct 3, 2016
•
Contributor
this object will always use no more than one thread, so why create thread pool for all available cores? What if we would use this as a part of some pipeline on an AWS instance with 128 or more cores? Even if we would have several AsyncBlockCompressedInputStreams in one app, one dedicated thread per one AsyncBlockCompressedInputStream would be enough.
d-cameron
Oct 4, 2016
•
Contributor
The use of a thread pool ensure that when reading from many BAM files at once, the number of threads spawned is limited to the number of CPUs available. The scenario the thread pool protects against is the concurrent opening of many (hundreds of) BAM files, a scenario I have personally encountered.
| + * Indicates whether a read-ahead task has been scheduled to run. Only one read-ahead task | ||
| + * per stream can be scheduled at any one time. | ||
| + */ | ||
| + private final Semaphore running = new Semaphore(1); |
ZLyanov
Oct 3, 2016
Contributor
Due to the decision of using static multi-threaded pool you have to provide sequential nature of block reading via the semaphore. It provides some sort of critical section support too. It would not be necessary if you would use single threaded executor: it is sequential by nature.
d-cameron
Oct 4, 2016
Contributor
The executor is static and shared across all input streams. A single-threaded executor is not appropriate.
|
What we also must keep in mind, that this approach makes it difficult to provide further improvements in BlockCompressedInputStream. For example, reading from file and decompressing a block could also be done in a separate background threads. |
|
One more thought: we should not consider this approach or ours mentioned above to improve BlockCompressedInputStream as a final solutions to provide fast reading of information. For example, we can concurrently load SamRecords via iterator and process them. Here are my marks on different launches of CollectAlignmentSummaryMetrics with combinations of Async Block Compressed Read and Async Load/Process (time to process 20 mln. records): |
|
As for performance, do you have an idea of why yours differs from the earlier benchmarking of this PR (see #482)? That parent PR also includes an async iterator which allows for eager SAMRecord parsing which, when combined with aysnc decompression resulted in processing times of less than half that of the single-threaded version (of course this is dependent on the amount of processing that is done per record). |
|
An early version of this PR did indeed use the classic single-threaded producer-consumer solution with a dedicated background thread but it was found that in use cases such as merging many BAM files (such as occurs in the final stage of a sort), a thread-per-BAM model was not acceptable. In my use case, not only was the memory overhead of thousands of threads unacceptable, but I ran into the per-user OS limits. A typical linux distribution will have a process limit at least an order a magnitude less than the file handle limit. This PR uses a shared executor (with the associated additional complications in the synchronisation logic) because the producer-consumer implementation resulted in unacceptable per BAM overhead. |
|
You are correct that async decompression and file reading would provide further improvements, but this design does not preclude such an optimisation, as eager file reading could be provided by a wrapper to the stream passed to the BlockCompressedInputStream. For example, a eager loading version of BufferedInputStream could be implemented and passed to the BlockCompressedInputStream. Such as design would improve multi-BAM BlockCompressedInputStream performance as CPU under-utilisation as background threads may be blocking on the underlying IO and the total number of background threads is limited. |
|
@d-cameron, |
|
@jabbarish If there's a simple design pattern that covers the concurrency requirements I'm all for that. The pattern needs to be:
The implementation complexity in this PR is due to the seek() cancelling of all outstanding work, the delaying of the eager look-ahead until data is requested (to remove any overhead from the stream open -> seek use case), and the serial scheduling of multiple background tasks. Separating out some of these concurrency concerns by replacing some of the concurrency constructs (eg swapping a semaphore for an Executor instance), may make reasoning about correctness simpler. You mentioned a larger concurrency/thread management PR. What time-frame are we talking about for such a design change, and is there a design pattern that this would integrate with, or would this PR get replaced wholesale by the larger PR? Edit: additional constraint |
|
@d-cameron I suppose the coming PR from our team should not replace yours. We only want to propose a convenient common solution for thread management. On the other hand, your PR seems too complicated in both reading and extensibility, as I wrote before. Your current code is correct and fast, so we could accept it but with immediate effort to improve it. Actually, easy-to-read is not as important as easy-to-extend. |
|
@jabbarish Sounds good. I'm currently on holidays, but I'll see what I can put together. |
|
@d-cameron please have a look at https://github.com/jabbarish/htsjdk/blob/zl_async/src/main/java/htsjdk/samtools/util/AsyncBlockCompressedInputStream.java |
|
@ZLyanov @d-cameron How would you like to proceed with this PR? |
|
@droazen as I stated above, I think the proposed PR is correct, but it will be harder later to extend it. Actually, no problem, we can afford it. I see direct continuation in taking 'unpack' onto another thread. We can do it (as well as a necessary refactoring) in subsequent PRs. |
|
@droazen my preference is to merge, then refactor along the lines of @ZLyanov's proposal at a later stage. The details under discussion are entirely internal to the async class so a reimplementation is fairly trivial to push through in another PR. The code @ZLyanov has linked is much more elegant design but also looks like seek() does not guarantee that no more read-ahead will be performed (since it's there's no guarantee on thread ordering into synchronized blocks). It also looks vulnerable to stream corruption race condition as a scheduled read-ahead task could be removed from the queues, but then run processNextBlock() during a call to seek(). |
|
@d-cameron Sounds like a good plan. Can you rebase/squash this branch into a single commit, and let us know here when it's ready for merge? |
coveralls
commented
Nov 8, 2016
|
Rebase/squash complete and ready for merging. |
|
@d-cameron Looks like we need one final rebase -- sorry! We promise to merge this time :) |
coveralls
commented
Dec 1, 2016
coveralls
commented
Dec 1, 2016
|
Resolved merge conflicts. Should be good to go @droazen |
|
@d-cameron Had a final look at the branch before merge, and I'm noticing that there are no direct unit tests for |
coveralls
commented
Dec 6, 2016
|
@droazen I've added test coverage based on an input file of random (non-BAM) data which I independently compressed using tabix bgzip. |
|
@droazen do you have any ETA on when this will get merged? |
|
The pull request that lived for ever... @d-cameron David should be back from Vacation on Monday. I'm going to make sure he takes a look at this first thing. |
|
|
d-cameron commentedApr 18, 2016
No description provided.