Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sam multi-threading #786

Closed
wants to merge 15 commits into from
Closed

Conversation

jkbonfield
Copy link
Contributor

@jkbonfield jkbonfield commented Oct 26, 2018

This adds multi-threaded reading and writing for the SAM format.

SAM output has been tested with full and partial (region) based work flows. SAM input has been tested with full files, but it's hard to evaluate seek handling as this needs PR #718 merging in order to be able to do index and random access on SAMs.

Some metrics on ./test/test_view -@16 /tmp/10m.sam > /dev/null

Dev 11.41sec elapsed
This PR: 2.0sec elapsed (~18s cpu)

It still has bottlenecks somewhere, although maybe it is I/O. It doesn't gain much performance beyond 10 threads. Note this faster elapsed time than decoding from 10m.u.BAM and outputting to u.BAM (3.1sec elapsed, 10.3s cpu), possibly due to CRC32 calculations.

NB: this needs squashing, but rebase including resolving all the SAM speed changes has been done.

This function is now around double the speed of develop.
Benchmarks:

Best of 3 runs, elapsed time measured by perf.  File is first 3
million records from 9827_2#49.bam.

Built using --with-libdeflate

=====================
Read speed: (view -c)

                gcc8            clang50         icc
Orig CRAM.u     2.72            3.14            3.19
Orig CRAM.ux    2.22            2.48            2.53
Orig CRAM       4.36            4.79            5.26
Orig BAM.u      0.37            0.38            0.44
Orig BAM        2.07            2.08            2.15
Orig SAM        2.54            2.63            3.08
Orig SAM.gz     4.59            4.55            5.31
New  SAM        1.20            1.39            1.65
New  SAM.gz     3.26            3.36            3.95

CRAM.u and BAM.u are uncompressed CRAM/BAM.
CRAM.ux is uncompressed and referenceless.
SAM.gz is bgzipped SAM.

So reading speed is 87%(icc) to 112%(gcc) faster for uncompressed SAM,
and up to 40% faster for compressed SAM.  Compressed SAM is now
57(gcc)-84%(icc) slower than compressed BAM.

=====================
Write speed (convert BAM.u to new format, so includes ~0.4 BAM decode time):
                gcc8            clang50         icc
Orig CRAM.u     3.97            4.24            4.17
Orig CRAM       9.97           10.30           10.23
Orig BAM.u      0.63            0.65            0.79
Orig BAM       12.27           12.27           12.52
Orig SAM        2.49            2.62            3.35
Orig SAM.gz    14.82           14.79           14.71
New  SAM        1.40            1.46            2.14
New  SAM.gz    14.73           14.86           14.73

NB: Writing SAM.gz is samtools view | bgzip.

Writing uncompressed SAM is 56(icc) to 79% faster.

Writing compressed data is dominated by libdeflate speeds, and
differences between new/orig are likely in the noise rather than
meaningful.  It's clear though that writing gzipped SAM (albeit via a
pipe) is about 15% slower than BAM.

Uncompressed BAM is still obviously the best format for piping, as
expected.  Not shown here, but level 1 BAM took about 8s to write
and is 5% larger than the default BAM.  (That doesn't sound like a bad
tradeoff to make.)  Level 9 BAM took 57s and was only 3% smaller than
default.

=====================
-rw-r--r-- 1 jkb team117  866008213 Jun 28 11:20 3m.u.bam
-rw-r--r-- 1 jkb team117  350739851 Jun 28 11:20 3m.bam

-rw-r--r-- 1 jkb team117 1121385629 Jun 28 11:20 3m.sam
-rw-r--r-- 1 jkb team117  333746006 Jun 28 11:28 3m.sam.gz

-rw-r--r-- 1 jkb team117  809884391 Jun 28 11:38 3m.ux.cram
-rw-r--r-- 1 jkb team117  485074200 Jun 28 11:37 3m.u.cram
-rw-r--r-- 1 jkb team117  173363126 Jun 28 11:36 3m.cram
Ditched the (non-working due to lack of m4_include) autoconf check for
__builtin_clz as it doesn't work well anyway, given it's in a public
header file.
Bug fix to kputw and negative values.  It wasn't checking the size
needed for '-' char.

Silence valgrind warnings in SAM parser as it's minimal speed impact
to do so.

Dropped the now unneeded HAVE___BUILTIN_CLZ from Makefile. (The code
still checks for it incase your compiler supports this and isn't gcc
or clang, permitting users to manually set it).
Needs tidy up as it is using global memory and hard-coded on at the
moment.  No thread clean up code either.
Use a fixed sized buffer (checking to come!) instead of repeated
hts_getline.

Benchmarks
main+10 thread: (NB faster than 10 thread on test.u.bam)
real	0m0.978s
user	0m8.137s
sys	0m1.048s

main+1 thread:
real   0m5.918s
user   0m6.964s
sys    0m0.688s

main only (unthreaded):
real   0m6.291s
user   0m5.600s
sys    0m0.676s
TODO: handle exiting early (test_view -N 100000 -@10 -B foo.sam)
This follows a similar strategy to the multi-threaded BAM writer with
a thread dedicated to pulling things off the result queue and writing
to the hfile.

ABI BREAKING:  The sam_write1 prototype has changed.  The bam_hdr_t
pointer is no longer const, so we can record the use of this header by
incrementing the reference count.  (It may be possible we can avoid
this, but some tools destroy the header prior to closing the file,
which causes crashes instead.)
@jkbonfield jkbonfield changed the title Sam speed+thread Sam multi-threading Nov 27, 2018
@jkbonfield jkbonfield closed this Aug 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant