New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samtools sort number of threads in reading phase #891
Comments
Sort could certainly be more efficient. Ideally it would be using asynchronous I/O too. However this particular problem is perhaps one of expectation. Over-specifying the number of threads is not a catastrophically bad thing to do, and you can use cgroups or hwloc-bind to govern how many cores the entire process can take up too. Also I don't think it's true to say that samtools sort only uses more than one CPU until the mapper has finished. It uses one thread until it's read enough data and then it uses multiple threads to sort and write that temporary data to disk, repeatedly. On finishing (no more stdin) it then has a separate merge stage. If your mapper is the slow part, then yes samtools will likely be stuck at under 100% CPU, but that's not really a samtools issue I think. Note there is more or less a way to handle what you want already (untested, but I think it's equivalent), eg:
The second merge stage only starts when the mapper has finished, and this will be I/O bound and won't be threading on output as there are no lengthy bgzf compression steps. The samtools view command will only start consuming cpu after the mapper has finished so both mapper and view can be given the same cores to work on. Finally maybe you'll get more luck using mapper | mbuffer | samtools too with some systems and/or aligners. This can avoid issues with small pipe sizes. |
Thanks for the info and suggestions.
Actually (in my case the mapper is hisat2) CPU usage is most of the time approx 100% and then spikes for a short time to approx. x*100%, where x ist the number of threads given to samtools. But this time is really short.
Sounds like a cool idea. The result should be equivalent. Efficiency depends a bit on how sort merges the temporary files. If it is done in a tree like fashion, then it would start to write output on the top level of the merge tree. But if all temporary files are merged at once, then it would start writing output immediately (which would start view earlier). For the suggested solution the latter would be better -- I guess. |
Sadly sort is pretty noddy. It simply reads until hitting the memory limit, sort, writes to temporary file, repeat. At the end it then opens ALL files and merges. This isn't particularly efficient and can cause major I/O bottlenecks and/or running out of file descriptors if you've set the memory limit too low. It can perhaps be sped up by adjusting the block size to be larger than the file system hints at ( |
I have computers with memory 500G, could it use more memory to speed it up? I was using pipe |
I saw that the sort algorithm has -m 10G option, I will explore using this one to speed up sorting |
@kemin711 curious if you had luck with increasing the memory per thread to speed up sorting? |
Is your feature request related to a problem? Please specify.
mapper | samtools sort -
it is difficult to specify the number of threads for the mapper and for samtools.Describe the solution you would like.
I suggest to allow to specify the number of CPUs used by samtools during reading the data (and producing pre sorted chunks) separately. This would simplify the specification of the number of threads used by both programs. Until the mapper is finished samtools could for instance use a single thread for reading and chunking and then use the full number of threads afterwards (when the mapper has finished). Thereby
The text was updated successfully, but these errors were encountered: