Skip to content

Commit

Permalink
feat: allow python expressions in --set-resources (#2521)
Browse files Browse the repository at this point in the history
### Description

<!--Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).

---------

Co-authored-by: David Laehnemann <david.laehnemann@hhu.de>
  • Loading branch information
johanneskoester and dlaehnemann committed Nov 27, 2023
1 parent 7864a76 commit 022a31e
Show file tree
Hide file tree
Showing 10 changed files with 163 additions and 68 deletions.
26 changes: 22 additions & 4 deletions docs/executing/cli.rst
Expand Up @@ -115,8 +115,8 @@ Therefore, since Snakemake 4.1, it is possible to specify configuration profiles
to be used to obtain default options.
Since Snakemake 7.29, two kinds of profiles are supported:

* A global profile that is defined in a system-wide or user-specific configuration directory (on Linux, this will be ``$HOME/.config/snakemake`` and ``/etc/xdg/snakemake``, you can find the answer for your system via ``snakemake --help``).
* A workflow specific profile (introduced in Snakemake 7.29) that is defined via a flag (``--workflow-profile``) or searched in a default location (``profile/default``) in the working directory or next to the Snakefile.
* A **global profile** that is defined in a system-wide or user-specific configuration directory (on Linux, this will be ``$HOME/.config/snakemake`` and ``/etc/xdg/snakemake``, you can find the answer for your system via ``snakemake --help``).
* A **workflow specific profile** (introduced in Snakemake 7.29) that is defined via a flag (``--workflow-profile``) or searched in a default location (``profile/default``) in the working directory or next to the Snakefile.

The workflow specific profile is meant to be used to define default options for a particular workflow, like providing constraints for certain custom resources the workflow uses (e.g. ``api_calls``) or overwriting the threads and resource definitions of individual rules without modifying the workflow code itself.
In contrast, the global profile is meant to be used to define default options for a particular environment, like the default cluster submission command or the default number of jobs to run in parallel.
Expand Down Expand Up @@ -148,23 +148,41 @@ The profile can be used to set a default for each option of the Snakemake comman
For this, option ``--someoption`` becomes ``someoption:`` in the profile.
The profile folder can additionally contain auxilliary files, e.g., jobscripts, or any kind of wrappers. See https://github.com/snakemake-profiles/doc for examples.
If options accept multiple arguments these must be given as YAML list in the profile.
If options expect structured arguments (like ``--set-threads RULE=VALUE`` or ``--set-resources RULE:RESOURCE=VALUE``), those can be given as strings in the expected forms, i.e.
If options expect structured arguments (like ``--default-resources RESOURCE=VALUE``, ``--set-threads RULE=VALUE``, or ``--set-resources RULE:RESOURCE=VALUE``), those can be given as strings in the expected forms, i.e.

.. code-block:: yaml
default-resources: mem_mb=200
set-threads: myrule=5
set-resources: myrule:mem=500MB
or alternatively (which is preferable) as YAML maps, e.g.:
or as YAML maps, which is easier to read:

.. code-block:: yaml
default-resources:
mem_mb: 200
set-threads:
myrule: 5
set-resources:
myrule:
mem: 500MB
All of these resource specifications can also be made dynamic, by using expressions and certain variables that are available.
For details of the variables you can use, refer to the callable signatures given in the documentation sections on the specification of :ref:`threads <snakefiles-threads>`` and :ref:`dynamic resources <snakefiles-dynamic-resources>``.
These enable ``config.yaml`` entries like:

.. code-block:: yaml
default-resources:
mem_mb: max(1.5 * input.size_mb, 100)
set-threads:
myrule: max(input.size_mb / 5, 2)
set-resources:
myrule:
mem_mb: attempt * 200
Setting resources or threads via the profile is of course rather a job for the workflow profile instead of the global profile (as such settings are likely workflow specific).

Under https://github.com/snakemake-profiles/doc, you can find publicly available global profiles (e.g. for cluster systems).
Expand Down
45 changes: 32 additions & 13 deletions docs/snakefiles/rules.rst
Expand Up @@ -341,11 +341,11 @@ In addition to threads, a rule can use arbitrary user-defined resources by speci
shell:
"..."
If limits for the resources are given via the command line, e.g.
If workflow-wide limits for the resources are given via the command line, e.g.

.. code-block:: console
$ snakemake --resources mem_mb=100
$ snakemake --resources mem_mb=200
the scheduler will ensure that the given resources are not exceeded by running jobs.
Expand All @@ -361,11 +361,29 @@ If no limits are given, the resources are ignored in local execution.
Resources can have any arbitrary name, and must be assigned ``int`` or ``str`` values.
In case of ``None``, the resource is considered to be unset (i.e. ignored) in the rule.

Resources can also be callables (e.g. functions or lambda expressions) that return ``int``, ``str`` or ``None`` values.
.. _snakefiles-dynamic-resources:

Dynamic Resources
~~~~~~~~~~~~~~~~~

It is often useful to determine resource specifications dynamically during workflow execution.
A common example is determining the amount of memory that a job needs, based on the input file size of that particular rule instance.
To enable this, resource specifications can also be callables (for example functions or lambda expressions) that return ``int``, ``str`` or ``None`` values.
The signature of the callable must be ``callable(wildcards [, input] [, threads] [, attempt])`` (``input``, ``threads``, and ``attempt`` are optional parameters).
Such callables are evaluated immediately before the job is executed (or printed during a dry-run).

Since the callables can take e.g. ``input`` as an argument, they can for example be used to obtain the size of an input file and infer the amount of memory needed for the job.
The above described example of using input size to determined memory requirements could for example be realized via a lambda expression (here also providing a minimum value of 300 MB memory):

.. code-block:: python
rule:
input: ...
output: ...
resources:
mem_mb=lambda wc, input: max(2.5 * input.size_mb, 300)
shell:
"..."
In order to make this work with a dry-run, where the input files are not yet present, Snakemake automatically converts a ``FileNotFoundError`` that is raised by the callable into a placeholder called ``<TBD>`` that will be displayed during dry-run in such a case.

The parameter ``attempt`` allows us to adjust resources based on how often the job has been restarted (see :ref:`all_options`, option ``--retries``).
Expand Down Expand Up @@ -406,19 +424,22 @@ Another application of callables as resources is when memory usage depends on th
shell:
"..."
Here, the value the function ``get_mem_mb`` returns grows linearly with the number of threads.
Here, the value that the function ``get_mem_mb`` returns, grows linearly with the number of threads.
Of course, any other arithmetic could be performed in that function.

Both threads and resources can be defined (or overwritten) upon invocation (without modifying the workflow code) via `--set-threads` and `--set-resources`, see :ref:`user_manual-snakemake_options` and via workflow profiles, see :ref:`profiles`.
To quickly exemplify the latter, you could provide the following workflow profile in a file ``profiles/default/config.yaml`` relative to the Snakefile or the current working directory:
Both threads and resources can be defined (or overwritten) upon invocation (without modifying the workflow code) via `--set-threads` and `--set-resources`, see :ref:`user_manual-snakemake_options`.
Or they can be defined via workflow :ref:`profiles`, with the variables listed above in the signature for usable callables.
You could, for example, provide the following workflow profile in a file ``profiles/default/config.yaml`` relative to the Snakefile or the current working directory:

.. code-block:: yaml
set-threads:
b: 3
set-resources:
b:
mem_mb: 1000
to set the memory requirement of rule ``b`` to 1000 MB.
to set the requirements for rule ``b`` to 3 threads and 1000 MB.

.. _snakefiles-standard-resources:

Expand Down Expand Up @@ -453,13 +474,11 @@ Because of these special meanings, the above names should always be used instead
Default Resources
~~~~~~~~~~~~~~~~~~

Since it could be cumbersome to define these standard resources for every rule, you can set default values at
the terminal or in a :ref:`profile <profiles>`.
This works via the command line flag ``--default-resources``, see ``snakemake --help`` for more information.
Since it could be cumbersome to define these standard resources for every rule, you can set default values via the command line flag ``--default-resources`` or in a :ref:`profile <profiles>`.
As with ``--set-resources``, this can be done dynamically, using the variables specified for the callables in the section on :ref:`snakefile-dynamic-resources`.
If those resource definitions are mandatory for a certain execution mode, Snakemake will fail with a hint if they are missing.
Any resource definitions inside a rule override what has been defined with ``--default-resources``.
If ``--default-resources`` are not specified, Snakemake uses ``'mem_mb=max(2*input.size_mb, 1000)'``,
``'disk_mb=max(2*input.size_mb, 1000)'``, and ``'tmpdir=system_tmpdir'``.
If ``--default-resources`` are not specified, Snakemake uses ``'mem_mb=max(2*input.size_mb, 1000)'``, ``'disk_mb=max(2*input.size_mb, 1000)'``, and ``'tmpdir=system_tmpdir'``.
The latter points to whatever is the default of the operating system or specified by any of the environment variables ``$TMPDIR``, ``$TEMP``, or ``$TMP`` as outlined `here <https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir>`_.
If ``--default-resources`` is specified with some definitions, but any of the above defaults (e.g. ``mem_mb``) is omitted, these are still used.
In order to explicitly unset these defaults, assign them a value of ``None``, e.g. ``--default-resources mem_mb=None``.
Expand Down
28 changes: 20 additions & 8 deletions snakemake/cli.py
Expand Up @@ -4,6 +4,7 @@
__license__ = "MIT"

import argparse
from functools import partial
import sys
from typing import Set

Expand Down Expand Up @@ -59,7 +60,12 @@
get_container_image,
parse_key_value_arg,
)
from snakemake.resources import ResourceScopes, parse_resources, DefaultResources
from snakemake.resources import (
ResourceScopes,
eval_resource_expression,
parse_resources,
DefaultResources,
)
from snakemake.settings import RerunTrigger


Expand All @@ -68,13 +74,14 @@ def parse_set_threads(args):
args,
"Invalid threads definition: entries have to be defined as RULE=THREADS pairs "
"(with THREADS being a positive integer).",
fallback=partial(eval_resource_expression, threads_arg=False),
)


def parse_set_resources(args):
errmsg = (
"Invalid resource definition: entries have to be defined as RULE:RESOURCE=VALUE, with "
"VALUE being a positive integer or a string."
"VALUE being a positive integer a quoted string, or a Python expression (e.g. min(max(2*input.size_mb, 1000), 8000))."
)

from collections import defaultdict
Expand All @@ -90,9 +97,8 @@ def parse_set_resources(args):
try:
value = int(value)
except ValueError:
assignments[rule][resource] = value
continue
if value < 0:
value = eval_resource_expression(value)
if isinstance(value, int) and value < 0:
raise ValueError(errmsg)
assignments[rule][resource] = value
return assignments
Expand Down Expand Up @@ -125,16 +131,22 @@ def parse_set_resource_scope(args):
return ResourceScopes()


def parse_set_ints(arg, errmsg):
def parse_set_ints(arg, errmsg, fallback=None):
assignments = dict()
if arg is not None:
for entry in arg:
key, value = parse_key_value_arg(entry, errmsg=errmsg)
try:
value = int(value)
except ValueError:
raise ValueError(errmsg)
if value < 0:
if fallback is not None:
try:
value = fallback(value)
except Exception as e:
raise ValueError(errmsg)
else:
raise ValueError(errmsg)
if isinstance(value, int) and value < 0:
raise ValueError(errmsg)
assignments[key] = value
return assignments
Expand Down
89 changes: 55 additions & 34 deletions snakemake/resources.py
Expand Up @@ -47,41 +47,10 @@ def __init__(self, args=None, from_other=None, mode="full"):
{name: value for name, value in map(self.decode_arg, args)}
)

def fallback(val):
def callable(wildcards, input, attempt, threads, rulename):
try:
value = eval(
val,
{
"input": input,
"attempt": attempt,
"threads": threads,
"system_tmpdir": tempfile.gettempdir(),
},
)
# Triggers for string arguments like n1-standard-4
except NameError:
return val
except Exception as e:
if not (
isinstance(e, FileNotFoundError) and e.filename in input
):
# Missing input files are handled by the caller
raise WorkflowError(
"Failed to evaluate default resources value "
"'{}'.\n"
" String arguments may need additional "
"quoting. Ex: --default-resources "
"\"tmpdir='/home/user/tmp'\".".format(val),
e,
)
raise e
return value

return callable

self.parsed = dict(_cores=1, _nodes=1)
self.parsed.update(parse_resources(self._args, fallback=fallback))
self.parsed.update(
parse_resources(self._args, fallback=eval_resource_expression)
)

def set_resource(self, name, value):
self._args[name] = f"{value}"
Expand Down Expand Up @@ -538,6 +507,58 @@ def _highest_proportion(group):
return rows


def eval_resource_expression(val, threads_arg=True):
def generic_callable(**kwargs):
args = {
"input": kwargs["input"],
"attempt": kwargs["attempt"],
"system_tmpdir": tempfile.gettempdir(),
}
if threads_arg:
args["threads"] = kwargs["threads"]
try:
value = eval(
val,
args,
)
# Triggers for string arguments like n1-standard-4
except NameError:
return val
except Exception as e:
if not (isinstance(e, FileNotFoundError) and e.filename in kwargs["input"]):
# Missing input files are handled by the caller
raise WorkflowError(
"Failed to evaluate default resources value "
f"'{val}'.\n"
" String arguments may need additional "
"quoting. Ex: --default-resources "
"\"tmpdir='/home/user/tmp'\".",
e,
)
raise e
return value

if threads_arg:

def callable(wildcards, input, attempt, threads, rulename):
return generic_callable(
wildcards=wildcards,
input=input,
attempt=attempt,
threads=threads,
rulename=rulename,
)

else:

def callable(wildcards, input, attempt, rulename):
return generic_callable(
wildcards=wildcards, input=input, attempt=attempt, rulename=rulename
)

return callable


def parse_resources(resources_args, fallback=None):
"""Parse resources from args."""
resources = dict()
Expand Down
11 changes: 6 additions & 5 deletions snakemake/workflow.py
Expand Up @@ -1538,17 +1538,18 @@ def decorate(ruleinfo):
"Threads value has to be an integer, float, or a callable.",
rule=rule,
)
if name in self.resource_settings.overwrite_threads:
rule.resources["_cores"] = self.resource_settings.overwrite_threads[
name
]
else:
if name not in self.resource_settings.overwrite_threads:
if isinstance(ruleinfo.threads, float):
ruleinfo.threads = int(ruleinfo.threads)
rule.resources["_cores"] = ruleinfo.threads
else:
rule.resources["_cores"] = 1

if name in self.resource_settings.overwrite_threads:
rule.resources["_cores"] = self.resource_settings.overwrite_threads[
name
]

if ruleinfo.shadow_depth:
if ruleinfo.shadow_depth not in (
True,
Expand Down
2 changes: 1 addition & 1 deletion tests/test_profile/Snakefile
@@ -1,3 +1,3 @@
rule:
shell:
"python -m snakemake --cores 1 --profile . -s Snakefile.internal"
"python -m snakemake --cores 3 --profile . -s Snakefile.internal"
14 changes: 12 additions & 2 deletions tests/test_profile/Snakefile.internal
@@ -1,5 +1,15 @@
shell.executable("bash")

rule a:
input:
"input.txt",
output:
config["out"]
config["out"],
shell:
"touch {output}"
'echo "'
'threads: {threads}\n'
'mem_mb: {resources.mem_mb}\n'
'eggs_factor: {resources.eggs_factor}\n'
'spam_factor: {resources.spam_factor}\n'
'double_jeopardy: {resources.double_jeopardy}'
'"> {output}'
11 changes: 10 additions & 1 deletion tests/test_profile/config.yaml
@@ -1,4 +1,13 @@
configfile: "workflow-config.yaml"
cores: all
set-threads:
- a=2
- a=max(1024*24*input.size_mb, 2)
- b=4
default-resources:
- mem_mb=max(1024*32*input.size_mb, 5)
- eggs_factor=twitter
set-resources:
a:
mem_mb: max(1024*24*input.size_mb, 1)
spam_factor: X
double_jeopardy: attempt
5 changes: 5 additions & 0 deletions tests/test_profile/expected-results/test.out
@@ -0,0 +1,5 @@
threads: 2
mem_mb: 1
eggs_factor: twitter
spam_factor: X
double_jeopardy: 1
Empty file added tests/test_profile/input.txt
Empty file.

0 comments on commit 022a31e

Please sign in to comment.