Incorrect shell command executed when defined within a loop. #2832

ethan-shanahan · 2024-04-21T20:39:54Z

Snakemake version

8.10.7 and 8.1.0

Describe the bug

When rules are defined within a loop, some rules execute the wrong command if the shell keyword is given an f-string containing variables assigned by the loop. The shell appears to interpret the f-string correctly, via inspection using the -p flag, however, a different command is actually executed.

Logs

n/a

Minimal example

rule all:
    input: "outputfile_a", "outputfile_b"

for l in ["a", "b"]:
    rule:
        name:   f"letter_{l}"
        input:  f"existingfile_{l}"
        output: f"outputfile_{l}"
        shell:  f"echo {l} > outputfile_{l}"

The first rule executed is letter_b which completes successfully, creating the file outputfile_b. Then rule letter_a is executed, but outputfile_a is not created and a MissingOutputException is raised.

Using the -p flag from the command line indicates that the correct command is run by both rules. However, if we replace the shell assignment with the following, we can see that in fact a different command is run.

        shell:  f"echo checking_{l} && echo {l} > outputfile_{l}"

It appears that the incorrect command is the same command that was run by rule letter_b. Below is a snippet of the output generated. The last two lines are printed due to the -p flag and the echo checking_{l} command respectively.

localrule letter_a:
    input: inputfile_a
    output: outputfile_a
    jobid: 1
    reason: Missing output files: outputfile_a
    resources: tmpdir=C:\Users\Ethan\AppData\Local\Temp

echo checking_a && echo a > outputfile_a
checking_b

Additional context

I am on Windows and using Python 3.12.3. I installed Snakemake from bioconda.
This is my first "Issue", apologies if I've made a mistake!

The text was updated successfully, but these errors were encountered:

fgvieira · 2024-04-23T08:34:39Z

Never tried to make a for loop to define rules, but it is always nice to see new approaches. 😄
However, I don't think you need it since you can use {l} as a wildcards:

rule all:
    input: "outputfile_a", "outputfile_b"

rule:
    name:   "letter_{l}"
    input:  "existingfile_{l}"
    output: "outputfile_{l}"
    shell:  "echo {l} > outputfile_{l}"

ethan-shanahan · 2024-04-23T11:53:45Z

Thanks, that may help some people but not me. I only provided a minimal example of the bug of course. My actual application makes wildcards impractical if even impossible.
Using a for loop to define rules is demonstrated in the documentation here.

cademirch · 2024-04-24T06:42:55Z

This is an interesting one. I have never dug into Snakemake's parser before (so take this with a grain of salt) but it seems the issue is there.

The parser takes the Snakefile and makes it into python.

This snakefile:

rule all:
    input: "bcftools/outputfile", "freebayes/outputfile"


for tool in ["bcftools", "freebayes"]:
    rule:
        name: f"call_variants_{tool}"
        output: f"{tool}/outputfile"
        shell: f"{tool} > {{output}}"

Gives this python:

@workflow.rule(name='all', lineno=1, snakefile='/Users/cade/dev/sandbox/snakemake_issues/issue_2824/Snakefile')

@workflow.input( "bcftools/outputfile" , "freebayes/outputfile"

)


@workflow.norun()
@workflow.run
def __rule_all(input, output, params, wildcards, threads, resources, log, rule, conda_env, container_img, singularity_args, use_singularity, env_modules, bench_record, jobid, is_shell, bench_iteration, cleanup_scripts, shadow_dir, edit_notebook, conda_base_path, basedir, sourcecache_path, runtime_sourcecache_path, __is_snakemake_rule_func=True):
        pass
for tool in [ "bcftools" , "freebayes" ] :
                                @workflow.rule(name=None, lineno=13, snakefile='/Users/cade/dev/sandbox/snakemake_issues/issue_2824/Snakefile')

                                @workflow.name( f"call_variants_{tool}"

                                )

                                @workflow.output( f"{tool}/outputfile"

                                )
                                @workflow.shellcmd ( f"{tool} > {{output}}" 
                                )
                                @workflow.run
                                def __rule_2(input, output, params, wildcards, threads, resources, log, rule, conda_env, container_img, singularity_args, use_singularity, env_modules, bench_record, jobid, is_shell, bench_iteration, cleanup_scripts, shadow_dir, edit_notebook, conda_base_path, basedir, sourcecache_path, runtime_sourcecache_path, __is_snakemake_rule_func=True):
                                        shell ( f"{tool} > {{output}}" , bench_record=bench_record, bench_iteration=bench_iteration
                                )

The functions at the bottom of each rule (decorated with @workflow.run) are called when it actually comes time to run the rule. Since the function name is static (__rule_2), then it'll be redefined each iteration of the for loop, leaving only the last definition.

The name keyword (added in #811) seems to be mostly superficial in that it's only used to change the rule name after the Snakefile is parsed. This was pretty confusing since the dryruns for the above workflow show the rules call_variants_bcftools/freebayes. However, both rules use the same run function.

Furthermore, the full shell command is not formatted until

snakemake/snakemake/shell.py

Line 135 in 6ec1e93

cmd = format(cmd, *args, stepout=2, **kwargs)

which is why the proper output files were being created, but the shell command was wrong.

All of this to say, I'm not really sure the example in the docs works, and possibly never did? I'm still not super familiar with all of the inner workings of Snakemake (though I just learned alot lol) and it is getting late, so I very well could be wrong!

I'll think about how to go about fixing this, but it may take me awhile to get to this. Hopefully someone else can chime in!

Lastly, out of curiosity @ethan-shanahan what is your use case for this? Might be able to come up with a workaround using wildcards to get you something working until this is fixed.

ethan-shanahan · 2024-04-24T07:59:56Z

That's really fascinating @cademirch. It sounds like you're definitely on the right track! Is there a way to output the python code that the Snakemake parser generates for any Snakefile?

As for my use-case, its rather convoluted. Simply put, I'm building on top of Snakemake to generalise/automate the creation of Snakefiles. Thankfully, I have now found a workaround for myself. I first run a metaSnakefile.py script that contains all the logic and looping to resolve things like formatted strings, etc. This script creates the Snakefile with all the required rules written out explicitly. I'm still keen on this bug being fixed though! :)

cademirch · 2024-04-24T20:23:28Z

Of the top of my head, no. I just put print(join_compilation) after this line:

snakemake/snakemake/parser.py

Line 1307 in de12463

join_compilation = "".join(format_tokens(compilation))

Glad you were able to figure something out!

ethan-shanahan added the bug Something isn't working label Apr 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect shell command executed when defined within a loop. #2832

Incorrect shell command executed when defined within a loop. #2832

ethan-shanahan commented Apr 21, 2024

fgvieira commented Apr 23, 2024

ethan-shanahan commented Apr 23, 2024

cademirch commented Apr 24, 2024

ethan-shanahan commented Apr 24, 2024

cademirch commented Apr 24, 2024

Incorrect shell command executed when defined within a loop. #2832

Incorrect shell command executed when defined within a loop. #2832

Comments

ethan-shanahan commented Apr 21, 2024

fgvieira commented Apr 23, 2024

ethan-shanahan commented Apr 23, 2024

cademirch commented Apr 24, 2024

ethan-shanahan commented Apr 24, 2024

cademirch commented Apr 24, 2024