Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect shell command executed when defined within a loop. #2832

Open
ethan-shanahan opened this issue Apr 21, 2024 · 5 comments
Open

Incorrect shell command executed when defined within a loop. #2832

ethan-shanahan opened this issue Apr 21, 2024 · 5 comments
Labels
bug Something isn't working

Comments

@ethan-shanahan
Copy link

Snakemake version

8.10.7 and 8.1.0

Describe the bug

When rules are defined within a loop, some rules execute the wrong command if the shell keyword is given an f-string containing variables assigned by the loop. The shell appears to interpret the f-string correctly, via inspection using the -p flag, however, a different command is actually executed.

Logs

n/a

Minimal example

rule all:
    input: "outputfile_a", "outputfile_b"

for l in ["a", "b"]:
    rule:
        name:   f"letter_{l}"
        input:  f"existingfile_{l}"
        output: f"outputfile_{l}"
        shell:  f"echo {l} > outputfile_{l}"

The first rule executed is letter_b which completes successfully, creating the file outputfile_b. Then rule letter_a is executed, but outputfile_a is not created and a MissingOutputException is raised.

Using the -p flag from the command line indicates that the correct command is run by both rules. However, if we replace the shell assignment with the following, we can see that in fact a different command is run.

        shell:  f"echo checking_{l} && echo {l} > outputfile_{l}"

It appears that the incorrect command is the same command that was run by rule letter_b. Below is a snippet of the output generated. The last two lines are printed due to the -p flag and the echo checking_{l} command respectively.

localrule letter_a:
    input: inputfile_a
    output: outputfile_a
    jobid: 1
    reason: Missing output files: outputfile_a
    resources: tmpdir=C:\Users\Ethan\AppData\Local\Temp

echo checking_a && echo a > outputfile_a
checking_b

Additional context

I am on Windows and using Python 3.12.3. I installed Snakemake from bioconda.
This is my first "Issue", apologies if I've made a mistake!

@ethan-shanahan ethan-shanahan added the bug Something isn't working label Apr 21, 2024
@fgvieira
Copy link
Contributor

Never tried to make a for loop to define rules, but it is always nice to see new approaches. 😄
However, I don't think you need it since you can use {l} as a wildcards:

rule all:
    input: "outputfile_a", "outputfile_b"

rule:
    name:   "letter_{l}"
    input:  "existingfile_{l}"
    output: "outputfile_{l}"
    shell:  "echo {l} > outputfile_{l}"

@ethan-shanahan
Copy link
Author

Thanks, that may help some people but not me. I only provided a minimal example of the bug of course. My actual application makes wildcards impractical if even impossible.
Using a for loop to define rules is demonstrated in the documentation here.

@cademirch
Copy link
Contributor

This is an interesting one. I have never dug into Snakemake's parser before (so take this with a grain of salt) but it seems the issue is there.

The parser takes the Snakefile and makes it into python.

This snakefile:

rule all:
    input: "bcftools/outputfile", "freebayes/outputfile"


for tool in ["bcftools", "freebayes"]:
    rule:
        name: f"call_variants_{tool}"
        output: f"{tool}/outputfile"
        shell: f"{tool} > {{output}}"

Gives this python:

@workflow.rule(name='all', lineno=1, snakefile='/Users/cade/dev/sandbox/snakemake_issues/issue_2824/Snakefile')

@workflow.input( "bcftools/outputfile" , "freebayes/outputfile"

)


@workflow.norun()
@workflow.run
def __rule_all(input, output, params, wildcards, threads, resources, log, rule, conda_env, container_img, singularity_args, use_singularity, env_modules, bench_record, jobid, is_shell, bench_iteration, cleanup_scripts, shadow_dir, edit_notebook, conda_base_path, basedir, sourcecache_path, runtime_sourcecache_path, __is_snakemake_rule_func=True):
        pass
for tool in [ "bcftools" , "freebayes" ] :
                                @workflow.rule(name=None, lineno=13, snakefile='/Users/cade/dev/sandbox/snakemake_issues/issue_2824/Snakefile')

                                @workflow.name( f"call_variants_{tool}"

                                )

                                @workflow.output( f"{tool}/outputfile"

                                )
                                @workflow.shellcmd ( f"{tool} > {{output}}" 
                                )
                                @workflow.run
                                def __rule_2(input, output, params, wildcards, threads, resources, log, rule, conda_env, container_img, singularity_args, use_singularity, env_modules, bench_record, jobid, is_shell, bench_iteration, cleanup_scripts, shadow_dir, edit_notebook, conda_base_path, basedir, sourcecache_path, runtime_sourcecache_path, __is_snakemake_rule_func=True):
                                        shell ( f"{tool} > {{output}}" , bench_record=bench_record, bench_iteration=bench_iteration
                                )

The functions at the bottom of each rule (decorated with @workflow.run) are called when it actually comes time to run the rule. Since the function name is static (__rule_2), then it'll be redefined each iteration of the for loop, leaving only the last definition.

The name keyword (added in #811) seems to be mostly superficial in that it's only used to change the rule name after the Snakefile is parsed. This was pretty confusing since the dryruns for the above workflow show the rules call_variants_bcftools/freebayes. However, both rules use the same run function.

Furthermore, the full shell command is not formatted until

cmd = format(cmd, *args, stepout=2, **kwargs)

which is why the proper output files were being created, but the shell command was wrong.

All of this to say, I'm not really sure the example in the docs works, and possibly never did? I'm still not super familiar with all of the inner workings of Snakemake (though I just learned alot lol) and it is getting late, so I very well could be wrong!

I'll think about how to go about fixing this, but it may take me awhile to get to this. Hopefully someone else can chime in!

Lastly, out of curiosity @ethan-shanahan what is your use case for this? Might be able to come up with a workaround using wildcards to get you something working until this is fixed.

@ethan-shanahan
Copy link
Author

That's really fascinating @cademirch. It sounds like you're definitely on the right track! Is there a way to output the python code that the Snakemake parser generates for any Snakefile?

As for my use-case, its rather convoluted. Simply put, I'm building on top of Snakemake to generalise/automate the creation of Snakefiles. Thankfully, I have now found a workaround for myself. I first run a metaSnakefile.py script that contains all the logic and looping to resolve things like formatted strings, etc. This script creates the Snakefile with all the required rules written out explicitly. I'm still keen on this bug being fixed though! :)

@cademirch
Copy link
Contributor

Of the top of my head, no. I just put print(join_compilation) after this line:

join_compilation = "".join(format_tokens(compilation))

Glad you were able to figure something out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants