samtools sort number of threads in reading phase #891

bernt-matthias · 2018-07-11T15:48:04Z

Is your feature request related to a problem? Please specify.

When using mapper | samtools sort - it is difficult to specify the number of threads for the mapper and for samtools.
Until all data is read entirely samtools seldomly uses the available CPUs efficiently (CPU usage is seldomly larger than 100%).

Describe the solution you would like.

I suggest to allow to specify the number of CPUs used by samtools during reading the data (and producing pre sorted chunks) separately. This would simplify the specification of the number of threads used by both programs. Until the mapper is finished samtools could for instance use a single thread for reading and chunking and then use the full number of threads afterwards (when the mapper has finished). Thereby

the CPU usage could be better limited (in shared environments you need to specify the number of cores and sometimes admins really check)
the currently suboptimal performance of samtools sort during reading would be nicely hidden.
I guess the single thread for the first phase could nicely fill the missing CPU utilization of the mapper.

The text was updated successfully, but these errors were encountered:

jkbonfield · 2018-07-11T16:22:07Z

Sort could certainly be more efficient. Ideally it would be using asynchronous I/O too.

However this particular problem is perhaps one of expectation. Over-specifying the number of threads is not a catastrophically bad thing to do, and you can use cgroups or hwloc-bind to govern how many cores the entire process can take up too.

Also I don't think it's true to say that samtools sort only uses more than one CPU until the mapper has finished. It uses one thread until it's read enough data and then it uses multiple threads to sort and write that temporary data to disk, repeatedly. On finishing (no more stdin) it then has a separate merge stage. If your mapper is the slow part, then yes samtools will likely be stuck at under 100% CPU, but that's not really a samtools issue I think.

Note there is more or less a way to handle what you want already (untested, but I think it's equivalent), eg:

mapper | samtools sort -l 0 -O bam -@2 | samtools view -O bam -@16 -o out.bam

The second merge stage only starts when the mapper has finished, and this will be I/O bound and won't be threading on output as there are no lengthy bgzf compression steps. The samtools view command will only start consuming cpu after the mapper has finished so both mapper and view can be given the same cores to work on.

Finally maybe you'll get more luck using mapper | mbuffer | samtools too with some systems and/or aligners. This can avoid issues with small pipe sizes.

bernt-matthias · 2018-07-11T17:09:19Z

Thanks for the info and suggestions.

On finishing (no more stdin) it then has a separate merge stage. If your mapper is the slow part, then yes samtools will likely be stuck at under 100% CPU, but that's not really a samtools issue I think.

Actually (in my case the mapper is hisat2) CPU usage is most of the time approx 100% and then spikes for a short time to approx. x*100%, where x ist the number of threads given to samtools. But this time is really short.

Note there is more or less a way to handle what you want already (untested, but I think it's equivalent), eg:

mapper | samtools sort -l 0 -O bam -@2 | samtools view -O bam -@16 -o out.bam

The second merge stage only starts when the mapper has finished, and this will be I/O bound and won't be threading on output as there are no lengthy bgzf compression steps. The samtools view command will only start consuming cpu after the mapper has finished so both mapper and view can be given the same cores to work on.

Sounds like a cool idea. The result should be equivalent.

Efficiency depends a bit on how sort merges the temporary files. If it is done in a tree like fashion, then it would start to write output on the top level of the merge tree. But if all temporary files are merged at once, then it would start writing output immediately (which would start view earlier). For the suggested solution the latter would be better -- I guess.

jkbonfield · 2018-07-11T17:14:12Z

Sadly sort is pretty noddy. It simply reads until hitting the memory limit, sort, writes to temporary file, repeat. At the end it then opens ALL files and merges. This isn't particularly efficient and can cause major I/O bottlenecks and/or running out of file descriptors if you've set the memory limit too low.

It can perhaps be sped up by adjusting the block size to be larger than the file system hints at (fstat) via the --input-fmt-option block_size=10000000 option, for example. This would use more memory (but probably still less than you used for sorting), but will perhaps thrash the system less.

kemin711 · 2020-04-14T03:45:19Z

I have computers with memory 500G, could it use more memory to speed it up? I was using pipe
bwa | samtools sort -t 6. I saw bwa finished, but sort is still working hard using only about 50% of 1 cpu. Maybe it is in the merging state which does not need more CPU.

kemin711 · 2020-04-14T03:47:55Z

I saw that the sort algorithm has -m 10G option, I will explore using this one to speed up sorting

alexjacobsCDS · 2021-06-15T17:08:17Z

@kemin711 curious if you had luck with increasing the memory per thread to speed up sorting?

bernt-matthias mentioned this issue Jul 11, 2018

hisat2 serialize samtools galaxyproject/tools-iuc#1979

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

samtools sort number of threads in reading phase #891

samtools sort number of threads in reading phase #891

bernt-matthias commented Jul 11, 2018 •

edited

jkbonfield commented Jul 11, 2018

bernt-matthias commented Jul 11, 2018 •

edited by jkbonfield

jkbonfield commented Jul 11, 2018

kemin711 commented Apr 14, 2020

kemin711 commented Apr 14, 2020

alexjacobsCDS commented Jun 15, 2021

samtools sort number of threads in reading phase #891

samtools sort number of threads in reading phase #891

Comments

bernt-matthias commented Jul 11, 2018 • edited

Is your feature request related to a problem? Please specify.

Describe the solution you would like.

jkbonfield commented Jul 11, 2018

bernt-matthias commented Jul 11, 2018 • edited by jkbonfield

jkbonfield commented Jul 11, 2018

kemin711 commented Apr 14, 2020

kemin711 commented Apr 14, 2020

alexjacobsCDS commented Jun 15, 2021

bernt-matthias commented Jul 11, 2018 •

edited

bernt-matthias commented Jul 11, 2018 •

edited by jkbonfield