New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
samtools sort setting memory requirement #831
Comments
The memory limit for A future update will add a new option to allow the memory limit to be set for the entire program, and we'll just have to work out how it should interact with the existing Sort does use a bit more memory than specified, but it shouldn't go over by too much these days. |
@daviesrob Thanks for your response, but since I'm currently running single threaded jobs this seems not to be the solution. My guess is that the value that is specified on the command line is the amount of memory that samtools uses for buffering BAM data. But samtools uses some more memory than that (for other datastructures). So the total memory used by samtools is the value given on the command line + X. FYI: We recently introduced GALAXY_MEMORY_MB_PER_SLOT (galaxyproject/galaxy#5625) but I forgot to use it here 😀 .. but if I'm right it would not help anyway..? |
GALAXY_MEMORY_MB_PER_SLOT sounds ideal, but you would have to subtract a bit to allow for overheads. My experiments suggest that setting Once you've started spilling to disk it doesn't make much difference to the run time if you under-specify the limit, as long as it's not too far out. The only sorts that will get much slower are the ones that would otherwise have fitted in memory with a more generous limit. |
<!-- Ensure that the PR title follows conventional commit style (<type>: <description>)--> <!-- Possible types are here: https://github.com/commitizen/conventional-commit-types/blob/master/index.json --> <!-- Add a description of your PR here--> According to the manual, `-m` specifies the approximated maximum required memory per thread. In some cases, `samtools sort` can use more memory than specified (e.g. samtools/samtools#831) so we account for some overhead. ### QC <!-- Make sure that you can tick the boxes below. --> * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).
I'm trying to set the available memory for samtools sort with -m in a cluster environment. If I use the memory that is available for the job (as reported by the cluster environment) I get:
samtools sort: couldn't allocate memory for bam_mem
. I guess this is because samtools sort uses memory also for other things than bam_mem.Can you suggest a way how to set the memory parameter (automatically) such that as much as possible of the systems memory is used?
Maybe related: #807
FYI: I need it for that: galaxyproject/tools-iuc#1801
The text was updated successfully, but these errors were encountered: