Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unintuitive and limiting behavior in naming and prefixing nested modules #2084

Open
jotelha opened this issue Feb 1, 2023 · 1 comment
Open
Assignees
Labels
bug Something isn't working

Comments

@jotelha
Copy link

jotelha commented Feb 1, 2023

Snakemake version

$ snakemake --version
7.19.1

and I checked against the changelog, apparently nothing related has been worked on since 7.19.1.

Describe the bug
When modularizing Snakemake workflows, one would naturally expect (and need) the workflows to apply rule renaming and path prefixing strictly recursively.

In short: when top.Snakefile modularizes nested-1st-level.Snakefile and applies prefix prefix_a/, and nested-1st-level.Snakefile modularizes nested-2nd-level.Snakefile and applies prefix_b/, then outputs of the latter nested-2nd-level.Snakefile should be prefixed prefix_a/prefix_b/ when executed via the former top.Snakefile, and the same rationale should apply to rule renaming.

This is not the case, as the following example demonstrates,

Minimal example
We have three Snakefiles,

$ ls -1
nested-1st-level.Snakefile
nested-2nd-level.Snakefile
top.Snakefile

with their content

$ cat nested-2nd-level.Snakefile 
from snakemake.utils import min_version
min_version("6.0")

rule leaf:
    output:
        ".done"
    shell:
        "touch {output}"

rule default:
    input:
        rules.leaf.output
    default_target: True

$ cat nested-1st-level.Snakefile 
from snakemake.utils import min_version
min_version("6.0")

module module_2:
    snakefile: "nested-2nd-level.Snakefile"
    prefix: "nested-2nd-level"

use rule * from module_2 as nested_2nd_level_*

rule default:
    input:
       rules.nested_2nd_level_default.input
    default_target: True

$ cat top.Snakefile 
from snakemake.utils import min_version
min_version("6.0")

module module_1:
    snakefile: "nested-1st-level.Snakefile"
    prefix: "nested-1st-level"

use rule * from module_1 as nested_1st_level_*

rule default:
    input:
       rules.nested_1st_level_default.input
    default_target: True

so top.Snakefile imports, renames, and prefixes nested-1st-level.Snakefile, and nested-1st-level.Snakefile in turn imports, renames, and prefixes nested-1st-level.Snakefile.

All Snakefiles are syntactically correct:

$ snakemake --lint --snakefile nested-2nd-level.Snakefile 
Lints for rule leaf (line 4, /mnt/dat2/git/ContactEngineering/snakemake-nested-modules/nested-2nd-level.Snakefile):
    * No log directive defined:
      Without a log directive, all output will be printed to the terminal. In distributed environments, this means that errors are harder to discover. In local environments, output of concurrent jobs will be
      mixed and become unreadable.
      Also see:
      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
    * Specify a conda environment or container for each rule.:
      This way, the used software for each specific step is documented, and the workflow can be executed on any machine without prerequisites.
      Also see:
      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers

$ snakemake --lint --snakefile nested-1st-level.Snakefile 
Lints for rule nested_2nd_level_leaf (line 4, /mnt/dat2/git/ContactEngineering/snakemake-nested-modules/nested-2nd-level.Snakefile):
    * No log directive defined:
      Without a log directive, all output will be printed to the terminal. In distributed environments, this means that errors are harder to discover. In local environments, output of concurrent jobs will be
      mixed and become unreadable.
      Also see:
      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
    * Specify a conda environment or container for each rule.:
      This way, the used software for each specific step is documented, and the workflow can be executed on any machine without prerequisites.
      Also see:
      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers

$ snakemake --lint --snakefile top.Snakefile 
Lints for rule nested_2nd_level_leaf (line 4, /mnt/dat2/git/ContactEngineering/snakemake-nested-modules/nested-2nd-level.Snakefile):
    * No log directive defined:
      Without a log directive, all output will be printed to the terminal. In distributed environments, this means that errors are harder to discover. In local environments, output of concurrent jobs will be
      mixed and become unreadable.
      Also see:
      https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#log-files
    * Specify a conda environment or container for each rule.:
      This way, the used software for each specific step is documented, and the workflow can be executed on any machine without prerequisites.
      Also see:
      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#integrated-package-management
      https://snakemake.readthedocs.io/en/latest/snakefiles/deployment.html#running-jobs-in-containers

The bottom Snakefile has its rules and output files as expected,

$ snakemake --list --snakefile nested-2nd-level.Snakefile 
default
leaf

$ snakemake --summary --snakefile nested-2nd-level.Snakefile 
Building DAG of jobs...
output_file	date	rule	version	log-file(s)	status	plan
.done	-	-	-	-	missing	update pending

and so does the middle Snakefile,

$ snakemake --list --snakefile nested-1st-level.Snakefile 
default
nested_2nd_level_default
nested_2nd_level_leaf

$ snakemake --summary --snakefile nested-1st-level.Snakefile 
Building DAG of jobs...
output_file	date	rule	version	log-file(s)	status	plan
nested-2nd-level/.done	-	-	-	-	missing	update pending

but at the top level Snakefile, the behavior deviates from what I would expect in clean recursive inclusion, and that renders modules quite impossible to use efficiently when nesting deeper than one level,

$ snakemake --list --snakefile top.Snakefile 
default
nested_1st_level_default
nested_2nd_level_default
nested_2nd_level_leaf

$ snakemake --summary --snakefile top.Snakefile 
Building DAG of jobs...
MissingInputException in rule default in file /mnt/dat2/git/ContactEngineering/snakemake-nested-modules/top.Snakefile, line 10:
Missing input files for rule default:
    affected files:
        nested-1st-level/nested-2nd-level/.done
(venv) jotelha@jotelha-fujitsu-ubuntu-20:/mnt/dat

Instead, I would have expected this output,

$ snakemake --list --snakefile top.Snakefile 
default
nested_1st_level_default
nested_1st_level_nested_2nd_level_default
nested_1st_level_nested_2nd_level_leaf

$ snakemake --summary --snakefile top.Snakefile 
Building DAG of jobs...
output_file	date	rule	version	log-file(s)	status	plan
nested-1st-level/nested-2nd-level/.done	-	-	-	-	missing	update pending

Additional context
I did not find documentation, other issues or stackoverflow posts on that limitting behavior, hence I am reporting here as bug.

I started using snakemake in December for quick and dirty parameter space exploration on my local machine and have been pretty happy with it for that purpose, very useful! Gets the user productive quickly without learning curve as long as one knows Python and understands Makefiles, at least conceptually. But now, trying to get a little cleaner and better organized, I am unfortunately slowed down here.

Thanks for clarifying whether this behavior might actually be intended for some reasons or it's worth fixing.

@jotelha jotelha added the bug Something isn't working label Feb 1, 2023
@jotelha jotelha changed the title Unintuitive and limitting behavior in naming and prefixing nested modules Unintuitive and limiting behavior in naming and prefixing nested modules Feb 5, 2023
@johanneskoester
Copy link
Contributor

I've just merged PR #1817, which should I think fix the naming issues. If you think there is still something wrong with it or with the prefixing, may I ask you to provide a PR with a test case that illustrates the failure? Thanks a lot!

@johanneskoester johanneskoester self-assigned this Aug 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants