-
Notifications
You must be signed in to change notification settings - Fork 447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sam multi-threading #786
Closed
Closed
Sam multi-threading #786
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This function is now around double the speed of develop.
Benchmarks: Best of 3 runs, elapsed time measured by perf. File is first 3 million records from 9827_2#49.bam. Built using --with-libdeflate ===================== Read speed: (view -c) gcc8 clang50 icc Orig CRAM.u 2.72 3.14 3.19 Orig CRAM.ux 2.22 2.48 2.53 Orig CRAM 4.36 4.79 5.26 Orig BAM.u 0.37 0.38 0.44 Orig BAM 2.07 2.08 2.15 Orig SAM 2.54 2.63 3.08 Orig SAM.gz 4.59 4.55 5.31 New SAM 1.20 1.39 1.65 New SAM.gz 3.26 3.36 3.95 CRAM.u and BAM.u are uncompressed CRAM/BAM. CRAM.ux is uncompressed and referenceless. SAM.gz is bgzipped SAM. So reading speed is 87%(icc) to 112%(gcc) faster for uncompressed SAM, and up to 40% faster for compressed SAM. Compressed SAM is now 57(gcc)-84%(icc) slower than compressed BAM. ===================== Write speed (convert BAM.u to new format, so includes ~0.4 BAM decode time): gcc8 clang50 icc Orig CRAM.u 3.97 4.24 4.17 Orig CRAM 9.97 10.30 10.23 Orig BAM.u 0.63 0.65 0.79 Orig BAM 12.27 12.27 12.52 Orig SAM 2.49 2.62 3.35 Orig SAM.gz 14.82 14.79 14.71 New SAM 1.40 1.46 2.14 New SAM.gz 14.73 14.86 14.73 NB: Writing SAM.gz is samtools view | bgzip. Writing uncompressed SAM is 56(icc) to 79% faster. Writing compressed data is dominated by libdeflate speeds, and differences between new/orig are likely in the noise rather than meaningful. It's clear though that writing gzipped SAM (albeit via a pipe) is about 15% slower than BAM. Uncompressed BAM is still obviously the best format for piping, as expected. Not shown here, but level 1 BAM took about 8s to write and is 5% larger than the default BAM. (That doesn't sound like a bad tradeoff to make.) Level 9 BAM took 57s and was only 3% smaller than default. ===================== -rw-r--r-- 1 jkb team117 866008213 Jun 28 11:20 3m.u.bam -rw-r--r-- 1 jkb team117 350739851 Jun 28 11:20 3m.bam -rw-r--r-- 1 jkb team117 1121385629 Jun 28 11:20 3m.sam -rw-r--r-- 1 jkb team117 333746006 Jun 28 11:28 3m.sam.gz -rw-r--r-- 1 jkb team117 809884391 Jun 28 11:38 3m.ux.cram -rw-r--r-- 1 jkb team117 485074200 Jun 28 11:37 3m.u.cram -rw-r--r-- 1 jkb team117 173363126 Jun 28 11:36 3m.cram
Ditched the (non-working due to lack of m4_include) autoconf check for __builtin_clz as it doesn't work well anyway, given it's in a public header file.
Bug fix to kputw and negative values. It wasn't checking the size needed for '-' char. Silence valgrind warnings in SAM parser as it's minimal speed impact to do so. Dropped the now unneeded HAVE___BUILTIN_CLZ from Makefile. (The code still checks for it incase your compiler supports this and isn't gcc or clang, permitting users to manually set it).
Needs tidy up as it is using global memory and hard-coded on at the moment. No thread clean up code either.
Use a fixed sized buffer (checking to come!) instead of repeated hts_getline. Benchmarks main+10 thread: (NB faster than 10 thread on test.u.bam) real 0m0.978s user 0m8.137s sys 0m1.048s main+1 thread: real 0m5.918s user 0m6.964s sys 0m0.688s main only (unthreaded): real 0m6.291s user 0m5.600s sys 0m0.676s
TODO: handle exiting early (test_view -N 100000 -@10 -B foo.sam)
This follows a similar strategy to the multi-threaded BAM writer with a thread dedicated to pulling things off the result queue and writing to the hfile. ABI BREAKING: The sam_write1 prototype has changed. The bam_hdr_t pointer is no longer const, so we can record the use of this header by incrementing the reference count. (It may be possible we can avoid this, but some tools destroy the header prior to closing the file, which causes crashes instead.)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This adds multi-threaded reading and writing for the SAM format.
SAM output has been tested with full and partial (region) based work flows. SAM input has been tested with full files, but it's hard to evaluate seek handling as this needs PR #718 merging in order to be able to do index and random access on SAMs.
Some metrics on
./test/test_view -@16 /tmp/10m.sam > /dev/null
Dev 11.41sec elapsed
This PR: 2.0sec elapsed (~18s cpu)
It still has bottlenecks somewhere, although maybe it is I/O. It doesn't gain much performance beyond 10 threads. Note this faster elapsed time than decoding from 10m.u.BAM and outputting to u.BAM (3.1sec elapsed, 10.3s cpu), possibly due to CRC32 calculations.
NB: this needs squashing, but rebase including resolving all the SAM speed changes has been done.