samtools sort setting memory requirement #831

bernt-matthias · 2018-04-25T14:04:10Z

I'm trying to set the available memory for samtools sort with -m in a cluster environment. If I use the memory that is available for the job (as reported by the cluster environment) I get: samtools sort: couldn't allocate memory for bam_mem. I guess this is because samtools sort uses memory also for other things than bam_mem.

Can you suggest a way how to set the memory parameter (automatically) such that as much as possible of the systems memory is used?

Maybe related: #807

FYI: I need it for that: galaxyproject/tools-iuc#1801

The text was updated successfully, but these errors were encountered:

daviesrob · 2018-04-25T15:57:55Z

The memory limit for samtools sort is actually per-thread, so you probably want to use GALAXY_MEMORY_MB / GALAXY_SLOTS when setting the -m option. Yes, this is crazy and I don't know why it was done that way but we're a bit stuck with it now.

A future update will add a new option to allow the memory limit to be set for the entire program, and we'll just have to work out how it should interact with the existing -m option.

Sort does use a bit more memory than specified, but it shouldn't go over by too much these days.

bernt-matthias · 2018-04-25T16:03:41Z

@daviesrob Thanks for your response, but since I'm currently running single threaded jobs this seems not to be the solution.

My guess is that the value that is specified on the command line is the amount of memory that samtools uses for buffering BAM data. But samtools uses some more memory than that (for other datastructures). So the total memory used by samtools is the value given on the command line + X.

FYI: We recently introduced GALAXY_MEMORY_MB_PER_SLOT (galaxyproject/galaxy#5625) but I forgot to use it here 😀 .. but if I'm right it would not help anyway..?

daviesrob · 2018-04-26T10:46:09Z

GALAXY_MEMORY_MB_PER_SLOT sounds ideal, but you would have to subtract a bit to allow for overheads. My experiments suggest that setting -m to about 75% of the absolute limit should be safe enough (if reading/writing BAM or SAM - CRAM may need extra space for references).

Once you've started spilling to disk it doesn't make much difference to the run time if you under-specify the limit, as long as it's not too far out. The only sorts that will get much slower are the ones that would otherwise have fitted in memory with a more generous limit.

According to the manual, `-m` specifies the approximated maximum required memory per thread. In some cases, `samtools sort` can use more memory than specified (e.g. samtools/samtools#831) so we account for some overhead. ### QC  * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).

This was referenced Apr 25, 2018

[WIP] upgrade samtools and add merge and view galaxyproject/tools-iuc#1801

Closed

Give samtools sort information on available memory galaxyproject/tools-iuc#1748

Open

bernt-matthias mentioned this issue Jul 4, 2018

samtools sort: upgrade to v1.9 galaxyproject/tools-iuc#1964

Merged

8 tasks

pansapiens mentioned this issue Dec 11, 2018

Memory set for samtools sort is incorrect, doesn't account for number of threads MonashBioinformaticsPlatform/RNAsik-pipe#39

Closed

arnikz mentioned this issue Mar 4, 2020

Memory allocation fails for samtools sort GooglingTheCancerGenome/sv-gen#22

Closed

ndaniel mentioned this issue Mar 19, 2021

Missing most of IGH fusions suhrig/arriba#106

Closed

fgvieira mentioned this issue Feb 12, 2024

fix: account for memory overhead snakemake/snakemake-wrappers#2648

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

samtools sort setting memory requirement #831

samtools sort setting memory requirement #831

bernt-matthias commented Apr 25, 2018

daviesrob commented Apr 25, 2018

bernt-matthias commented Apr 25, 2018

daviesrob commented Apr 26, 2018

samtools sort setting memory requirement #831

samtools sort setting memory requirement #831

Comments

bernt-matthias commented Apr 25, 2018

daviesrob commented Apr 25, 2018

bernt-matthias commented Apr 25, 2018

daviesrob commented Apr 26, 2018