Skip to content

Commit

Permalink
feat: add ability to return input functions from input functions. Suc…
Browse files Browse the repository at this point in the history
…h nesting is evaluated 10 times at most. Beyond that, an error is thrown. (#2717)

### Description

Snakemake fails building the DAG when a function containing a
inner-function is called within an expand-command.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).

---------

Co-authored-by: Johannes Koester <johannes.koester@uni-due.de>
  • Loading branch information
FelixMoelder and johanneskoester committed Feb 23, 2024
1 parent b6636e9 commit 7a47924
Show file tree
Hide file tree
Showing 6 changed files with 75 additions and 24 deletions.
5 changes: 5 additions & 0 deletions docs/snakefiles/rules.rst
Expand Up @@ -186,6 +186,11 @@ The function has to accept a single argument that will be the wildcards object g
Note that you can also use `lambda expressions <https://docs.python.org/3/tutorial/controlflow.html#lambda-expressions>`_ instead of full function definitions.
By this, rules can have entirely different input files (both in form and number) depending on the inferred wildcards. E.g. you can assign input files that appear in entirely different parts of your filesystem based on some wildcard value and a dictionary that maps the wildcard value to file paths.

.. sidebar:: Note

Input functions can themselves return input functions again (this also holds for functions given to params and resources.)
Such nested evaluation is allowed for a depth up to 10. Afterwards, an exception will be thrown.

In addition to a single wildcards argument, input functions can optionally take a ``groupid`` (with exactly that name) as second argument, see :ref:`snakefiles_group-local` for details.

Finally, when implementing the input function, it is best practice to make sure that it can properly handle all possible wildcard values your rule can have.
Expand Down
67 changes: 43 additions & 24 deletions snakemake/rules.py
Expand Up @@ -598,7 +598,6 @@ def apply_input_function(
groupid=None,
**aux_params,
):
incomplete = False
if isinstance(func, _IOFile):
func = func._file.callable
elif isinstance(func, AnnotatedString):
Expand All @@ -621,29 +620,49 @@ def apply_input_function(
if callable(value):
_aux_params[name] = value()

try:
value = func(Wildcards(fromdict=wildcards), **_aux_params)
if isinstance(value, types.GeneratorType):
# generators should be immediately collected here,
# otherwise we would miss any exceptions and
# would have to capture them again later.
value = list(value)
except IncompleteCheckpointException as e:
value = incomplete_checkpoint_func(e)
incomplete = True
except Exception as e:
if "input" in aux_params and is_file_not_found_error(
e, aux_params["input"]
):
# Function evaluation can depend on input files. Since expansion can happen during dryrun,
# where input files are not yet present, we need to skip such cases and
# mark them as <TBD>.
value = TBDString()
elif raw_exceptions:
raise e
else:
raise InputFunctionException(e, rule=self, wildcards=wildcards)
return value, incomplete
wildcards_arg = Wildcards(fromdict=wildcards)

def apply_func(func):
incomplete = False
try:
value = func(wildcards_arg, **_aux_params)
if isinstance(value, types.GeneratorType):
# generators should be immediately collected here,
# otherwise we would miss any exceptions and
# would have to capture them again later.
value = list(value)
except IncompleteCheckpointException as e:
value = incomplete_checkpoint_func(e)
incomplete = True
except Exception as e:
if "input" in aux_params and is_file_not_found_error(
e, aux_params["input"]
):
# Function evaluation can depend on input files. Since expansion can happen during dryrun,
# where input files are not yet present, we need to skip such cases and
# mark them as <TBD>.
value = TBDString()
elif raw_exceptions:
raise e
else:
raise InputFunctionException(e, rule=self, wildcards=wildcards)
return value, incomplete

res = func
tries = 0
while (callable(res) or tries == 0) and tries < 10:
res, incomplete = apply_func(res)
tries += 1
if tries == 10:
raise WorkflowError(
"Evaluated 10 nested input functions (i.e. input functions that "
"themselves return an input function.). More than 10 such nested "
"evaluations are not allowed. Does the workflow accidentally return a "
"function instead of calling it in the input function?",
rule=self,
)

return res, incomplete

def _apply_wildcards(
self,
Expand Down
22 changes: 22 additions & 0 deletions tests/test_inner_call/Snakefile
@@ -0,0 +1,22 @@
def some_b():
def inner(wildcards):
return {wildcards.x}
return inner

def some_a():
def inner(wildcards):
return expand("{x}.in", x=some_b())
return inner

rule all:
input:
"a.txt"

rule b:
input:
some_a()
output:
"{x}.txt"
shell:
"touch {output}"

Empty file added tests/test_inner_call/a.in
Empty file.
Empty file.
5 changes: 5 additions & 0 deletions tests/tests.py
Expand Up @@ -2028,3 +2028,8 @@ def test_set_resources_human_readable():
dpath("test05"),
shellcmd="snakemake -c1 --set-resources \"compute1:runtime='50h'\"",
)


@skip_on_windows
def test_call_inner():
run(dpath("test_inner_call"))

0 comments on commit 7a47924

Please sign in to comment.