You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
and I checked against the changelog, apparently nothing related has been worked on since 7.19.1.
Describe the bug
When modularizing Snakemake workflows, one would naturally expect (and need) the workflows to apply rule renaming and path prefixing strictly recursively.
In short: when top.Snakefile modularizes nested-1st-level.Snakefile and applies prefix prefix_a/, and nested-1st-level.Snakefile modularizes nested-2nd-level.Snakefile and applies prefix_b/, then outputs of the latter nested-2nd-level.Snakefile should be prefixed prefix_a/prefix_b/ when executed via the former top.Snakefile, and the same rationale should apply to rule renaming.
This is not the case, as the following example demonstrates,
Minimal example
We have three Snakefiles,
$ ls -1nested-1st-level.Snakefilenested-2nd-level.Snakefiletop.Snakefile
with their content
$ cat nested-2nd-level.Snakefile from snakemake.utils import min_versionmin_version("6.0")rule leaf: output: ".done" shell: "touch {output}"rule default: input: rules.leaf.output default_target: True
$ cat nested-1st-level.Snakefile from snakemake.utils import min_versionmin_version("6.0")module module_2: snakefile: "nested-2nd-level.Snakefile" prefix: "nested-2nd-level"use rule * from module_2 as nested_2nd_level_*rule default: input: rules.nested_2nd_level_default.input default_target: True
$ cat top.Snakefile from snakemake.utils import min_versionmin_version("6.0")module module_1: snakefile: "nested-1st-level.Snakefile" prefix: "nested-1st-level"use rule * from module_1 as nested_1st_level_*rule default: input: rules.nested_1st_level_default.input default_target: True
so top.Snakefile imports, renames, and prefixes nested-1st-level.Snakefile, and nested-1st-level.Snakefile in turn imports, renames, and prefixes nested-1st-level.Snakefile.
All Snakefiles are syntactically correct:
$ snakemake --lint --snakefile nested-2nd-level.Snakefile Lints for rule leaf (line 4, /mnt/dat2/git/ContactEngineering/snakemake-nested-modules/nested-2nd-level.Snakefile): * No log directive defined: Without a log directive, all output will be printed to the terminal. In distributed environments, this means that errors are harder to discover. In local environments, output of concurrent jobs will be mixed and become unreadable. Also see: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files * Specify a conda environment or container for each rule.: This way, the used software for each specific step is documented, and the workflow can be executed on any machine without prerequisites. Also see: https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
$ snakemake --lint --snakefile nested-1st-level.Snakefile Lints for rule nested_2nd_level_leaf (line 4, /mnt/dat2/git/ContactEngineering/snakemake-nested-modules/nested-2nd-level.Snakefile): * No log directive defined: Without a log directive, all output will be printed to the terminal. In distributed environments, this means that errors are harder to discover. In local environments, output of concurrent jobs will be mixed and become unreadable. Also see: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files * Specify a conda environment or container for each rule.: This way, the used software for each specific step is documented, and the workflow can be executed on any machine without prerequisites. Also see: https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
$ snakemake --lint --snakefile top.Snakefile Lints for rule nested_2nd_level_leaf (line 4, /mnt/dat2/git/ContactEngineering/snakemake-nested-modules/nested-2nd-level.Snakefile): * No log directive defined: Without a log directive, all output will be printed to the terminal. In distributed environments, this means that errors are harder to discover. In local environments, output of concurrent jobs will be mixed and become unreadable. Also see: https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files * Specify a conda environment or container for each rule.: This way, the used software for each specific step is documented, and the workflow can be executed on any machine without prerequisites. Also see: https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers
The bottom Snakefile has its rules and output files as expected,
$ snakemake --list --snakefile nested-2nd-level.Snakefile defaultleaf
$ snakemake --summary --snakefile nested-2nd-level.Snakefile Building DAG of jobs...output_file date rule version log-file(s) status plan.done - - - - missing update pending
and so does the middle Snakefile,
$ snakemake --list --snakefile nested-1st-level.Snakefile defaultnested_2nd_level_defaultnested_2nd_level_leaf
$ snakemake --summary --snakefile nested-1st-level.Snakefile Building DAG of jobs...output_file date rule version log-file(s) status plannested-2nd-level/.done - - - - missing update pending
but at the top level Snakefile, the behavior deviates from what I would expect in clean recursive inclusion, and that renders modules quite impossible to use efficiently when nesting deeper than one level,
$ snakemake --list --snakefile top.Snakefile defaultnested_1st_level_defaultnested_2nd_level_defaultnested_2nd_level_leaf
$ snakemake --summary --snakefile top.Snakefile Building DAG of jobs...MissingInputException in rule default in file /mnt/dat2/git/ContactEngineering/snakemake-nested-modules/top.Snakefile, line 10:Missing input files for rule default: affected files: nested-1st-level/nested-2nd-level/.done(venv) jotelha@jotelha-fujitsu-ubuntu-20:/mnt/dat
Instead, I would have expected this output,
$ snakemake --list --snakefile top.Snakefile defaultnested_1st_level_defaultnested_1st_level_nested_2nd_level_defaultnested_1st_level_nested_2nd_level_leaf
$ snakemake --summary --snakefile top.Snakefile Building DAG of jobs...output_file date rule version log-file(s) status plannested-1st-level/nested-2nd-level/.done - - - - missing update pending
Additional context
I did not find documentation, other issues or stackoverflow posts on that limitting behavior, hence I am reporting here as bug.
I started using snakemake in December for quick and dirty parameter space exploration on my local machine and have been pretty happy with it for that purpose, very useful! Gets the user productive quickly without learning curve as long as one knows Python and understands Makefiles, at least conceptually. But now, trying to get a little cleaner and better organized, I am unfortunately slowed down here.
Thanks for clarifying whether this behavior might actually be intended for some reasons or it's worth fixing.
The text was updated successfully, but these errors were encountered:
jotelha
changed the title
Unintuitive and limitting behavior in naming and prefixing nested modules
Unintuitive and limiting behavior in naming and prefixing nested modules
Feb 5, 2023
I've just merged PR #1817, which should I think fix the naming issues. If you think there is still something wrong with it or with the prefixing, may I ask you to provide a PR with a test case that illustrates the failure? Thanks a lot!
Snakemake version
and I checked against the changelog, apparently nothing related has been worked on since 7.19.1.
Describe the bug
When modularizing Snakemake workflows, one would naturally expect (and need) the workflows to apply rule renaming and path prefixing strictly recursively.
In short: when
top.Snakefile
modularizesnested-1st-level.Snakefile
and applies prefixprefix_a/
, andnested-1st-level.Snakefile
modularizesnested-2nd-level.Snakefile
and appliesprefix_b/
, then outputs of the latternested-2nd-level.Snakefile
should be prefixedprefix_a/prefix_b/
when executed via the formertop.Snakefile
, and the same rationale should apply to rule renaming.This is not the case, as the following example demonstrates,
Minimal example
We have three Snakefiles,
with their content
so
top.Snakefile
imports, renames, and prefixesnested-1st-level.Snakefile
, andnested-1st-level.Snakefile
in turn imports, renames, and prefixesnested-1st-level.Snakefile
.All Snakefiles are syntactically correct:
The bottom Snakefile has its rules and output files as expected,
and so does the middle Snakefile,
but at the top level Snakefile, the behavior deviates from what I would expect in clean recursive inclusion, and that renders modules quite impossible to use efficiently when nesting deeper than one level,
Instead, I would have expected this output,
Additional context
I did not find documentation, other issues or stackoverflow posts on that limitting behavior, hence I am reporting here as bug.
I started using
snakemake
in December for quick and dirty parameter space exploration on my local machine and have been pretty happy with it for that purpose, very useful! Gets the user productive quickly without learning curve as long as one knows Python and understands Makefiles, at least conceptually. But now, trying to get a little cleaner and better organized, I am unfortunately slowed down here.Thanks for clarifying whether this behavior might actually be intended for some reasons or it's worth fixing.
The text was updated successfully, but these errors were encountered: