Skip to content

Commit

Permalink
fix: check template rendering output for leaked input file paths (#2850)
Browse files Browse the repository at this point in the history
<!--Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).
  • Loading branch information
johanneskoester committed Apr 28, 2024
1 parent 74b99ec commit 433302e
Show file tree
Hide file tree
Showing 3 changed files with 52 additions and 2 deletions.
30 changes: 28 additions & 2 deletions docs/snakefiles/rules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2712,7 +2712,7 @@ Apart from Jinja2, Snakemake supports `YTE <https://github.com/koesterlab/yte>`_

.. code-block:: python
rule render_jinja2_template:
rule render_yte_template:
input:
"some-yte-template.yaml"
output:
Expand All @@ -2737,7 +2737,33 @@ Analogously to the jinja2 case YTE has access to ``params``, ``wildcards``, and
- b
- ?config["threshold"]
Template rendering rules are always executed locally, without submission to cluster or cloud processes (since templating is usually not resource intensive).
By default, template rendering rules are executed locally, without submission to cluster or cloud processes (since templating is usually not resource intensive).
However, if a :ref:`storage plugin <storage-support>` is used, a template rule can theoretically leak paths to local copies of the storage files into the rendered template.
This can happen if the template inserts the path of an input file into the rendered output.
Snakemake tries to detect such cases by checking the template output.
To avoid such leaks (only required if your template does something like that with an input file path), you can assign the same :ref:`group <job_grouping>` to your template rule and the consuming rule, and in addition mark the template output as ``temp()``, i.e.:

.. code-block:: python
rule render_yte_template:
input:
"some-yte-template.yaml"
output:
temp("results/{sample}.rendered-version.yaml")
params:
foo=0.1
group: "some-group"
template_engine:
"yte"
rule consume_template:
input:
"results/{sample}.rendered-version.yaml"
output:
"results/some-output.txt"
group: "some-group"
shell:
"sometool {input} {output}"
.. _snakefiles_mpi_support:

Expand Down
10 changes: 10 additions & 0 deletions snakemake/jobs.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
from abc import ABC, abstractmethod
from snakemake.settings import DeploymentMethod

from snakemake.template_rendering import check_template_output
from snakemake_interface_common.utils import lazy_property
from snakemake_interface_executor_plugins.jobs import (
JobExecutorInterface,
Expand Down Expand Up @@ -1101,6 +1102,15 @@ async def postprocess(
wait_for_local=True,
)
self.dag.unshadow_output(self, only_log=error)

if (
not error
and self.rule.is_template_engine
and not is_flagged(self.output[0], "temp")
):
# TODO also check if consumers are executed on the same node
check_template_output(self)

await self.dag.handle_storage(
self, store_in_storage=store_in_storage, store_only_log=error
)
Expand Down
14 changes: 14 additions & 0 deletions snakemake/template_rendering/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,3 +57,17 @@ def render_template(engine, input, output, params, wildcards, config, rule):
)
except Exception as e:
raise WorkflowError(f"Error rendering template in rule {rule}.", e)


def check_template_output(job):
with open(job.output[0]) as out:
for l in out:
for f in job.input:
if f.is_storage and f in l:
raise WorkflowError(
"Output of template_engine rule contains local path to input file "
f"from storage: {f} for {f.storage_object.query}. "
"However, this path is variable as it can change between runs (e.g. when "
"the storage local prefix is modified). To circumvent this issue, place the "
"rule in one group with the consumer(s) and mark the output as temp()."
)

0 comments on commit 433302e

Please sign in to comment.