wtpoa-cns not using all requested threads #19

lh3 · 2018-09-24T19:15:35Z

I asked wtpoa-cns to use 16 threads. However, in average, only it only uses 500% CPU on my machine. I changed the default memory allocator and it seems to improve the multi-thread performance.

I am using the E. coli example from PBcR:

http://www.cbcb.umd.edu/software/PBcR/data/selfSampleData.tar.gz

The command lines I was using:

wtdbg2 -i ecoli.fa.gz -t 16 -fo test -L5000 -e2
wtpoa-cns -i test.ctg.lay -t 16 -fo test.ctg.fa

You can override the system allocator with LD_PRELAD:

LD_PRELOAD=libtcmalloc.so wtpoa-cns -i test.ctg.lay -t 16 -fo test.ctg.fa

Here are some results:

Library	Real time (sec)	User time	Sys time	Max RSS (kb)
glibc-2.12	285.901	848.230	575.720	1660412.0
jemalloc	75.703	814.820	41.580	3274516.0
tcmalloc	72.275	1023.740	26.120	1765996.0
lockless	100.658	953.020	102.220	4018172.0

You can see that the default glibc allocator (I am using CentOS 6) is quite bad, spending lots of system time on thread scheduling. tcmalloc is much better. You get almost a 4-fold speedup. jemalloc is good, too, but it takes too much extra memory.

Typically, you see the effect of memory allocators when you frequently malloc/free in each thread. Bwa suffers from this problem, too. I think there are two ways to fix this:

Use a custom memory allocator. tcmalloc has been quite good for the few examples I have tried. This solution doesn't require you to modify the C source code. However, it is a little difficult for general users to build performant binaries.
Reorganize malloc/free calls. You allocate a buffer before spawning the workers and try to avoid frequent malloc/free in each worker. Minimap2 takes this approach with a thread-local buffer. With this buffer disabled, minimap2 will become noticeably slower on many threads.

The text was updated successfully, but these errors were encountered:

lh3 · 2018-09-24T19:24:44Z

PS: I should add that the issue may depend on available RAM and other processes running on the same machine. On a just rebooted machine or on a machine with lots of free RAM, the issue may be alleviated.

lh3 · 2018-09-24T19:54:22Z

PSS: I have updated the precompiled binaries in the release page. wtdbg2, wtdbg-cns and wtpoa-cns in the binary tar-ball are now statically linked to TCMalloc. They are faster on my machine. Not sure if you can see the difference on your side.

ruanjue · 2018-09-25T02:41:34Z

Dear Heng,

wtpoa-cns and wtdbg-cns used the same schema in paralleling. They take one contig and break the task into mutiple parallel parts naturally according to edges in wtdbg, then merge the consensus edge sequences into the contig sequence.

If paralleling in contigs instead of edges within contig, the CPU usage will be expected as -t 16. But my concern was the various contig lengths might leave some threads run much much longer than others.

Another way might be seprating the 'edge consensus sequence' and 'merging edges'. For all contigs, we first build consensus edge sequences and write a prefix.ctg.lay.edges.fa file, it will take full CPU usage. Then, merging, it also will take full CPU usage. Let me try it.

lh3 · 2018-09-25T03:08:45Z

On my machine, the problem is not caused by some threads running longer. Using tcmalloc wouldn't help in that case. I believe the slowdown is due to frequent malloc calls. I have seen similar behaviors a few times before.

ruanjue · 2018-10-05T09:11:26Z

Hi Heng,

You are right, the frequent malloc calls slowed down multi-threads. I have located the causal codes.

Thanks very much!
Jue

ruanjue closed this as completed Oct 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wtpoa-cns not using all requested threads #19

wtpoa-cns not using all requested threads #19

lh3 commented Sep 24, 2018

lh3 commented Sep 24, 2018

lh3 commented Sep 24, 2018 •

edited

Loading

ruanjue commented Sep 25, 2018

lh3 commented Sep 25, 2018

ruanjue commented Oct 5, 2018

wtpoa-cns not using all requested threads #19

wtpoa-cns not using all requested threads #19

Comments

lh3 commented Sep 24, 2018

lh3 commented Sep 24, 2018

lh3 commented Sep 24, 2018 • edited Loading

ruanjue commented Sep 25, 2018

lh3 commented Sep 25, 2018

ruanjue commented Oct 5, 2018

lh3 commented Sep 24, 2018 •

edited

Loading