diff --git a/docs/executing/cli.rst b/docs/executing/cli.rst index 18f1a4662..6c763db79 100644 --- a/docs/executing/cli.rst +++ b/docs/executing/cli.rst @@ -115,8 +115,8 @@ Therefore, since Snakemake 4.1, it is possible to specify configuration profiles to be used to obtain default options. Since Snakemake 7.29, two kinds of profiles are supported: -* A global profile that is defined in a system-wide or user-specific configuration directory (on Linux, this will be ``$HOME/.config/snakemake`` and ``/etc/xdg/snakemake``, you can find the answer for your system via ``snakemake --help``). -* A workflow specific profile (introduced in Snakemake 7.29) that is defined via a flag (``--workflow-profile``) or searched in a default location (``profile/default``) in the working directory or next to the Snakefile. +* A **global profile** that is defined in a system-wide or user-specific configuration directory (on Linux, this will be ``$HOME/.config/snakemake`` and ``/etc/xdg/snakemake``, you can find the answer for your system via ``snakemake --help``). +* A **workflow specific profile** (introduced in Snakemake 7.29) that is defined via a flag (``--workflow-profile``) or searched in a default location (``profile/default``) in the working directory or next to the Snakefile. The workflow specific profile is meant to be used to define default options for a particular workflow, like providing constraints for certain custom resources the workflow uses (e.g. ``api_calls``) or overwriting the threads and resource definitions of individual rules without modifying the workflow code itself. In contrast, the global profile is meant to be used to define default options for a particular environment, like the default cluster submission command or the default number of jobs to run in parallel. @@ -148,23 +148,41 @@ The profile can be used to set a default for each option of the Snakemake comman For this, option ``--someoption`` becomes ``someoption:`` in the profile. The profile folder can additionally contain auxilliary files, e.g., jobscripts, or any kind of wrappers. See https://github.com/snakemake-profiles/doc for examples. If options accept multiple arguments these must be given as YAML list in the profile. -If options expect structured arguments (like ``--set-threads RULE=VALUE`` or ``--set-resources RULE:RESOURCE=VALUE``), those can be given as strings in the expected forms, i.e. +If options expect structured arguments (like ``--default-resources RESOURCE=VALUE``, ``--set-threads RULE=VALUE``, or ``--set-resources RULE:RESOURCE=VALUE``), those can be given as strings in the expected forms, i.e. .. code-block:: yaml + default-resources: mem_mb=200 set-threads: myrule=5 set-resources: myrule:mem=500MB -or alternatively (which is preferable) as YAML maps, e.g.: +or as YAML maps, which is easier to read: .. code-block:: yaml + default-resources: + mem_mb: 200 set-threads: myrule: 5 set-resources: myrule: mem: 500MB +All of these resource specifications can also be made dynamic, by using expressions and certain variables that are available. +For details of the variables you can use, refer to the callable signatures given in the documentation sections on the specification of :ref:`threads `` and :ref:`dynamic resources ``. +These enable ``config.yaml`` entries like: + +.. code-block:: yaml + + default-resources: + mem_mb: max(1.5 * input.size_mb, 100) + set-threads: + myrule: max(input.size_mb / 5, 2) + set-resources: + myrule: + mem_mb: attempt * 200 + + Setting resources or threads via the profile is of course rather a job for the workflow profile instead of the global profile (as such settings are likely workflow specific). Under https://github.com/snakemake-profiles/doc, you can find publicly available global profiles (e.g. for cluster systems). diff --git a/docs/snakefiles/rules.rst b/docs/snakefiles/rules.rst index 80ceb15d9..e55e3d19f 100644 --- a/docs/snakefiles/rules.rst +++ b/docs/snakefiles/rules.rst @@ -341,11 +341,11 @@ In addition to threads, a rule can use arbitrary user-defined resources by speci shell: "..." -If limits for the resources are given via the command line, e.g. +If workflow-wide limits for the resources are given via the command line, e.g. .. code-block:: console - $ snakemake --resources mem_mb=100 + $ snakemake --resources mem_mb=200 the scheduler will ensure that the given resources are not exceeded by running jobs. @@ -361,11 +361,29 @@ If no limits are given, the resources are ignored in local execution. Resources can have any arbitrary name, and must be assigned ``int`` or ``str`` values. In case of ``None``, the resource is considered to be unset (i.e. ignored) in the rule. -Resources can also be callables (e.g. functions or lambda expressions) that return ``int``, ``str`` or ``None`` values. +.. _snakefiles-dynamic-resources: + +Dynamic Resources +~~~~~~~~~~~~~~~~~ + +It is often useful to determine resource specifications dynamically during workflow execution. +A common example is determining the amount of memory that a job needs, based on the input file size of that particular rule instance. +To enable this, resource specifications can also be callables (for example functions or lambda expressions) that return ``int``, ``str`` or ``None`` values. The signature of the callable must be ``callable(wildcards [, input] [, threads] [, attempt])`` (``input``, ``threads``, and ``attempt`` are optional parameters). Such callables are evaluated immediately before the job is executed (or printed during a dry-run). -Since the callables can take e.g. ``input`` as an argument, they can for example be used to obtain the size of an input file and infer the amount of memory needed for the job. +The above described example of using input size to determined memory requirements could for example be realized via a lambda expression (here also providing a minimum value of 300 MB memory): + +.. code-block:: python + + rule: + input: ... + output: ... + resources: + mem_mb=lambda wc, input: max(2.5 * input.size_mb, 300) + shell: + "..." + In order to make this work with a dry-run, where the input files are not yet present, Snakemake automatically converts a ``FileNotFoundError`` that is raised by the callable into a placeholder called ```` that will be displayed during dry-run in such a case. The parameter ``attempt`` allows us to adjust resources based on how often the job has been restarted (see :ref:`all_options`, option ``--retries``). @@ -406,19 +424,22 @@ Another application of callables as resources is when memory usage depends on th shell: "..." -Here, the value the function ``get_mem_mb`` returns grows linearly with the number of threads. +Here, the value that the function ``get_mem_mb`` returns, grows linearly with the number of threads. Of course, any other arithmetic could be performed in that function. -Both threads and resources can be defined (or overwritten) upon invocation (without modifying the workflow code) via `--set-threads` and `--set-resources`, see :ref:`user_manual-snakemake_options` and via workflow profiles, see :ref:`profiles`. -To quickly exemplify the latter, you could provide the following workflow profile in a file ``profiles/default/config.yaml`` relative to the Snakefile or the current working directory: +Both threads and resources can be defined (or overwritten) upon invocation (without modifying the workflow code) via `--set-threads` and `--set-resources`, see :ref:`user_manual-snakemake_options`. +Or they can be defined via workflow :ref:`profiles`, with the variables listed above in the signature for usable callables. +You could, for example, provide the following workflow profile in a file ``profiles/default/config.yaml`` relative to the Snakefile or the current working directory: .. code-block:: yaml + set-threads: + b: 3 set-resources: b: mem_mb: 1000 -to set the memory requirement of rule ``b`` to 1000 MB. +to set the requirements for rule ``b`` to 3 threads and 1000 MB. .. _snakefiles-standard-resources: @@ -453,13 +474,11 @@ Because of these special meanings, the above names should always be used instead Default Resources ~~~~~~~~~~~~~~~~~~ -Since it could be cumbersome to define these standard resources for every rule, you can set default values at -the terminal or in a :ref:`profile `. -This works via the command line flag ``--default-resources``, see ``snakemake --help`` for more information. +Since it could be cumbersome to define these standard resources for every rule, you can set default values via the command line flag ``--default-resources`` or in a :ref:`profile `. +As with ``--set-resources``, this can be done dynamically, using the variables specified for the callables in the section on :ref:`snakefile-dynamic-resources`. If those resource definitions are mandatory for a certain execution mode, Snakemake will fail with a hint if they are missing. Any resource definitions inside a rule override what has been defined with ``--default-resources``. -If ``--default-resources`` are not specified, Snakemake uses ``'mem_mb=max(2*input.size_mb, 1000)'``, -``'disk_mb=max(2*input.size_mb, 1000)'``, and ``'tmpdir=system_tmpdir'``. +If ``--default-resources`` are not specified, Snakemake uses ``'mem_mb=max(2*input.size_mb, 1000)'``, ``'disk_mb=max(2*input.size_mb, 1000)'``, and ``'tmpdir=system_tmpdir'``. The latter points to whatever is the default of the operating system or specified by any of the environment variables ``$TMPDIR``, ``$TEMP``, or ``$TMP`` as outlined `here `_. If ``--default-resources`` is specified with some definitions, but any of the above defaults (e.g. ``mem_mb``) is omitted, these are still used. In order to explicitly unset these defaults, assign them a value of ``None``, e.g. ``--default-resources mem_mb=None``. diff --git a/snakemake/cli.py b/snakemake/cli.py index ceea7b7c5..5565c3b94 100644 --- a/snakemake/cli.py +++ b/snakemake/cli.py @@ -4,6 +4,7 @@ __license__ = "MIT" import argparse +from functools import partial import sys from typing import Set @@ -59,7 +60,12 @@ get_container_image, parse_key_value_arg, ) -from snakemake.resources import ResourceScopes, parse_resources, DefaultResources +from snakemake.resources import ( + ResourceScopes, + eval_resource_expression, + parse_resources, + DefaultResources, +) from snakemake.settings import RerunTrigger @@ -68,13 +74,14 @@ def parse_set_threads(args): args, "Invalid threads definition: entries have to be defined as RULE=THREADS pairs " "(with THREADS being a positive integer).", + fallback=partial(eval_resource_expression, threads_arg=False), ) def parse_set_resources(args): errmsg = ( "Invalid resource definition: entries have to be defined as RULE:RESOURCE=VALUE, with " - "VALUE being a positive integer or a string." + "VALUE being a positive integer a quoted string, or a Python expression (e.g. min(max(2*input.size_mb, 1000), 8000))." ) from collections import defaultdict @@ -90,9 +97,8 @@ def parse_set_resources(args): try: value = int(value) except ValueError: - assignments[rule][resource] = value - continue - if value < 0: + value = eval_resource_expression(value) + if isinstance(value, int) and value < 0: raise ValueError(errmsg) assignments[rule][resource] = value return assignments @@ -125,7 +131,7 @@ def parse_set_resource_scope(args): return ResourceScopes() -def parse_set_ints(arg, errmsg): +def parse_set_ints(arg, errmsg, fallback=None): assignments = dict() if arg is not None: for entry in arg: @@ -133,8 +139,14 @@ def parse_set_ints(arg, errmsg): try: value = int(value) except ValueError: - raise ValueError(errmsg) - if value < 0: + if fallback is not None: + try: + value = fallback(value) + except Exception as e: + raise ValueError(errmsg) + else: + raise ValueError(errmsg) + if isinstance(value, int) and value < 0: raise ValueError(errmsg) assignments[key] = value return assignments diff --git a/snakemake/resources.py b/snakemake/resources.py index 7a7f3a272..706009a2a 100644 --- a/snakemake/resources.py +++ b/snakemake/resources.py @@ -47,41 +47,10 @@ def __init__(self, args=None, from_other=None, mode="full"): {name: value for name, value in map(self.decode_arg, args)} ) - def fallback(val): - def callable(wildcards, input, attempt, threads, rulename): - try: - value = eval( - val, - { - "input": input, - "attempt": attempt, - "threads": threads, - "system_tmpdir": tempfile.gettempdir(), - }, - ) - # Triggers for string arguments like n1-standard-4 - except NameError: - return val - except Exception as e: - if not ( - isinstance(e, FileNotFoundError) and e.filename in input - ): - # Missing input files are handled by the caller - raise WorkflowError( - "Failed to evaluate default resources value " - "'{}'.\n" - " String arguments may need additional " - "quoting. Ex: --default-resources " - "\"tmpdir='/home/user/tmp'\".".format(val), - e, - ) - raise e - return value - - return callable - self.parsed = dict(_cores=1, _nodes=1) - self.parsed.update(parse_resources(self._args, fallback=fallback)) + self.parsed.update( + parse_resources(self._args, fallback=eval_resource_expression) + ) def set_resource(self, name, value): self._args[name] = f"{value}" @@ -538,6 +507,58 @@ def _highest_proportion(group): return rows +def eval_resource_expression(val, threads_arg=True): + def generic_callable(**kwargs): + args = { + "input": kwargs["input"], + "attempt": kwargs["attempt"], + "system_tmpdir": tempfile.gettempdir(), + } + if threads_arg: + args["threads"] = kwargs["threads"] + try: + value = eval( + val, + args, + ) + # Triggers for string arguments like n1-standard-4 + except NameError: + return val + except Exception as e: + if not (isinstance(e, FileNotFoundError) and e.filename in kwargs["input"]): + # Missing input files are handled by the caller + raise WorkflowError( + "Failed to evaluate default resources value " + f"'{val}'.\n" + " String arguments may need additional " + "quoting. Ex: --default-resources " + "\"tmpdir='/home/user/tmp'\".", + e, + ) + raise e + return value + + if threads_arg: + + def callable(wildcards, input, attempt, threads, rulename): + return generic_callable( + wildcards=wildcards, + input=input, + attempt=attempt, + threads=threads, + rulename=rulename, + ) + + else: + + def callable(wildcards, input, attempt, rulename): + return generic_callable( + wildcards=wildcards, input=input, attempt=attempt, rulename=rulename + ) + + return callable + + def parse_resources(resources_args, fallback=None): """Parse resources from args.""" resources = dict() diff --git a/snakemake/workflow.py b/snakemake/workflow.py index 3fe0be591..c3524a092 100644 --- a/snakemake/workflow.py +++ b/snakemake/workflow.py @@ -1538,17 +1538,18 @@ def decorate(ruleinfo): "Threads value has to be an integer, float, or a callable.", rule=rule, ) - if name in self.resource_settings.overwrite_threads: - rule.resources["_cores"] = self.resource_settings.overwrite_threads[ - name - ] - else: + if name not in self.resource_settings.overwrite_threads: if isinstance(ruleinfo.threads, float): ruleinfo.threads = int(ruleinfo.threads) rule.resources["_cores"] = ruleinfo.threads else: rule.resources["_cores"] = 1 + if name in self.resource_settings.overwrite_threads: + rule.resources["_cores"] = self.resource_settings.overwrite_threads[ + name + ] + if ruleinfo.shadow_depth: if ruleinfo.shadow_depth not in ( True, diff --git a/tests/test_profile/Snakefile b/tests/test_profile/Snakefile index 00914b5ae..02247fb9b 100644 --- a/tests/test_profile/Snakefile +++ b/tests/test_profile/Snakefile @@ -1,3 +1,3 @@ rule: shell: - "python -m snakemake --cores 1 --profile . -s Snakefile.internal" + "python -m snakemake --cores 3 --profile . -s Snakefile.internal" diff --git a/tests/test_profile/Snakefile.internal b/tests/test_profile/Snakefile.internal index c8d4a32cd..92ac9a0b0 100644 --- a/tests/test_profile/Snakefile.internal +++ b/tests/test_profile/Snakefile.internal @@ -1,5 +1,15 @@ +shell.executable("bash") + rule a: + input: + "input.txt", output: - config["out"] + config["out"], shell: - "touch {output}" + 'echo "' + 'threads: {threads}\n' + 'mem_mb: {resources.mem_mb}\n' + 'eggs_factor: {resources.eggs_factor}\n' + 'spam_factor: {resources.spam_factor}\n' + 'double_jeopardy: {resources.double_jeopardy}' + '"> {output}' \ No newline at end of file diff --git a/tests/test_profile/config.yaml b/tests/test_profile/config.yaml index c3e99960f..af9743162 100644 --- a/tests/test_profile/config.yaml +++ b/tests/test_profile/config.yaml @@ -1,4 +1,13 @@ configfile: "workflow-config.yaml" cores: all set-threads: - - a=2 + - a=max(1024*24*input.size_mb, 2) + - b=4 +default-resources: + - mem_mb=max(1024*32*input.size_mb, 5) + - eggs_factor=twitter +set-resources: + a: + mem_mb: max(1024*24*input.size_mb, 1) + spam_factor: X + double_jeopardy: attempt diff --git a/tests/test_profile/expected-results/test.out b/tests/test_profile/expected-results/test.out index e69de29bb..5779b2401 100644 --- a/tests/test_profile/expected-results/test.out +++ b/tests/test_profile/expected-results/test.out @@ -0,0 +1,5 @@ +threads: 2 +mem_mb: 1 +eggs_factor: twitter +spam_factor: X +double_jeopardy: 1 \ No newline at end of file diff --git a/tests/test_profile/input.txt b/tests/test_profile/input.txt new file mode 100644 index 000000000..e69de29bb