Skip to content

Commit

Permalink
feat: add flag to mark files where path should not be modified (#2888)
Browse files Browse the repository at this point in the history
<!--Add a description of your PR here-->

When using modules with a prefix, the prefix is added to all input and
output files. If an input file is defined in the config, the path
becomes invalid. As such it is necessary to mark these files so that
their path are not changed.

Not sure if `path_modified` is the best name.... maybe `protect_path`,
or `not_modify_path`, or `no_prefix`, or `fixed_path`?

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).
  • Loading branch information
fgvieira authored Jun 7, 2024
1 parent 6f3669d commit d142b46
Show file tree
Hide file tree
Showing 9 changed files with 55 additions and 9 deletions.
3 changes: 2 additions & 1 deletion docs/snakefiles/modularization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -165,8 +165,9 @@ It is possible to overwrite the global config dictionary for the module, which i
In this case, any ``configfile`` statements inside the module are ignored.
In addition, it is possible to skip any :ref:`validation <snakefiles_config_validation>` statements in the module, by specifying ``skip_validation: True`` in the module statement.
Moreover, one can automatically move all relative input and output files of a module into a dedicated folder: by specifying ``prefix: "foo"`` in the module definition, e.g. any output file ``path/to/output.txt`` in the module would be stored under ``foo/path/to/output.txt`` instead.
Moreover, one can automatically move all relative input and output files of a module into a dedicated folder by specifying ``prefix: "foo"`` in the module definition, e.g. any output file ``path/to/output.txt`` in the module would be stored under ``foo/path/to/output.txt`` instead.
This becomes particularly useful when combining multiple modules, see :ref:`use_with_modules`.
However, if you have some input files that come from outside the workflow, you can use the ``local`` flag so that their path is not modified (see :ref:`snakefiles-storage-local-files`)..

Instead of using all rules, it is possible to import specific rules.
Specific rules may even be modified before using them, via a final ``with:`` followed by a block that lists items to overwrite.
Expand Down
10 changes: 7 additions & 3 deletions docs/snakefiles/storage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,12 +56,16 @@ Custom settings can be passed as well::
snakemake --default-storage-provider s3 --default-storage-prefix s3://mybucket/ \
--storage-s3-max-requests-per-second 10


.. _snakefiles-storage-local-files:

Local input/output files
""""""""""""""""""""""""

Despite using a default storage provider, you might have certain files in your workflow
that still come from the local filesystem. In this case, you can use the ``local``
flag::
that still come from the local filesystem. Likewise, when importing a module while
specifying a prefix (see :ref:`snakefiles-modules`), you might have some input files
that come from outside the workflow. In either cases, you can use the ``local`` flag::

rule example:
input:
Expand Down Expand Up @@ -156,4 +160,4 @@ Depending on the storage provider, you might have to provide credentials.
Usually, this can be done via environment variables, e.g. for S3::

export SNAKEMAKE_STORAGE_S3_ACCESS_KEY=...
export SNAKEMAKE_STORAGE_S3_SECRET_KEY=...
export SNAKEMAKE_STORAGE_S3_SECRET_KEY=...
13 changes: 10 additions & 3 deletions snakemake/path_modifier.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,11 +34,18 @@ def __init__(self, replace_prefix: dict, prefix: str, workflow):
self.trie[prefix] = replacement

def modify(self, path, property=None):
if get_flag_value(path, PATH_MODIFIER_FLAG) is self:
logger.debug(f"Flag PATH_MODIFIER_FLAG found in file {path}")
if get_flag_value(path, PATH_MODIFIER_FLAG):
logger.debug(
f"Not modifying path of file {path}, as it has already been modified"
)
# Path has been modified before and is reused now, no need to modify again.
return path

if get_flag_value(path, "local"):
logger.debug(f"Not modifying path of file {path}, as it is local")
# File is local
return path

modified_path = self.apply_default_storage(self.replace_prefix(path, property))
if modified_path == path:
# nothing has changed
Expand All @@ -54,7 +61,7 @@ def modify(self, path, property=None):
self.replace_prefix(modified_path.flags["multiext"], property)
)
# Flag the path as modified and return.
modified_path = flag(modified_path, PATH_MODIFIER_FLAG, self)
modified_path = flag(modified_path, PATH_MODIFIER_FLAG)
return modified_path

def replace_prefix(self, path, property=None):
Expand Down
4 changes: 2 additions & 2 deletions snakemake/workflow.py
Original file line number Diff line number Diff line change
Expand Up @@ -692,8 +692,8 @@ def files(items):
else:

def files(items):
relpath = (
lambda f: f
relpath = lambda f: (
f
if os.path.isabs(f) or f.startswith("root://")
else os.path.relpath(f)
)
Expand Down
17 changes: 17 additions & 0 deletions tests/test_modules_prefix_local/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@

module module1:
snakefile:
"module1/Snakefile"
config:
config
prefix:
"out_1"


use rule * from module1 as module1_*


rule joint_all:
input:
"out_1/test_final.txt",
default_target: True
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
test_a
1 change: 1 addition & 0 deletions tests/test_modules_prefix_local/input.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
test_a
7 changes: 7 additions & 0 deletions tests/test_modules_prefix_local/module1/Snakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
rule a:
input:
local("input.txt"),
output:
"test_final.txt",
shell:
"cat {input} > {output}"
8 changes: 8 additions & 0 deletions tests/tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -1623,6 +1623,14 @@ def test_module_no_prefixing_modified_paths():
)


@skip_on_windows
def test_modules_prefix_local():
run(
dpath("test_modules_prefix_local"),
targets=["out_1/test_final.txt"],
)


def test_module_with_script():
run(dpath("test_module_with_script"))

Expand Down

0 comments on commit d142b46

Please sign in to comment.