You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Temp files get deleted early even when they are required input of one unfinished rule after one checkpoint, resulting in an early uncompleted finish like #823. It happens when the checkpoint output is a directory, and the directory content and a Temp output from previous rules are the input of the downstream rules. It seems ok when the checkpoint output is just a file.
I'm not sure if it's a similar bug like #823 or just a limited way of writing snakemake workflow. So I report a new issue here.
Logs
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
--------- ------- ------------- -------------
all 1 1 1
process01 1 1 1
process02 1 1 1
somestep 1 1 1
total 4 1 1
Select jobs to execute...
[Tue Oct 25 17:23:02 2022]
rule process01:
output: processed01.txt
jobid: 3
reason: Missing output files: processed01.txt
resources: tmpdir=/tmp
[Tue Oct 25 17:23:02 2022]
Finished job 3.
1 of 4 steps (25%) done
Select jobs to execute...
[Tue Oct 25 17:23:02 2022]
rule process02:
input: processed01.txt
output: processed02.txt
jobid: 2
reason: Missing output files: processed02.txt; Input files updated by another job: processed01.txt
resources: tmpdir=/tmp
[Tue Oct 25 17:23:02 2022]
Finished job 2.
2 of 4 steps (50%) done
Removing temporary output processed01.txt.
Select jobs to execute...
[Tue Oct 25 17:23:02 2022]
checkpoint somestep:
input: processed02.txt
output: my_directory
jobid: 1
reason: Missing output files: my_directory; Input files updated by another job: processed02.txt
resources: tmpdir=/tmp
Downstream jobs will be updated after completion.
[Tue Oct 25 17:23:02 2022]
Finished job 1.
3 of 4 steps (75%) done
BUG: Out of jobs ready to be started, but not all files built yet. Please check https://github.com/snakemake/snakemake/issues/823 for more information.
Remaining jobs:
- all:
- process: processed1/3.txt
- process: processed1/2.txt
- process: processed1/1.txt
Complete log: .snakemake/log/2022-10-25T172301.913665.snakemake.log
Minimal example
# a target rule to define the desired final output
rule all:
input:
lambda wildcards: aggregate_input(wildcards),
rule process01:
output:
temp("processed01.txt"),
shell:
"echo PROCESSED > {output}"
rule process02:
input:
"processed01.txt"
output:
"processed02.txt",
shell:
"echo PROCESSED > {output}"
# the checkpoint that shall trigger re-evaluation of the DAG
# an number of file is created in a defined directory
checkpoint somestep:
input:
"processed02.txt"
output:
directory("my_directory/"),
shell:
"""
mkdir my_directory/
cd my_directory
for i in 1 2 3; do touch $i.txt; done
"""
rule process:
input:
"processed01.txt",
"my_directory/{i}.txt",
output:
"processed1/{i}.txt",
shell:
"echo PROCESSED > {output}"
# collect process output
def aggregate_input(wildcards):
checkpoint_output = checkpoints.somestep.get(**wildcards).output[0]
return expand(
"processed1/{i}.txt",
i=glob_wildcards(os.path.join(checkpoint_output, "{i}.txt")).i,
)
Additional context
The unfinished jobs can be solved by a re-run, which will re-create the temp files. Using --keep-going flag, the workflow will exit with 0 but it still stops at the same step. It only finishes successfully when the temp files are kept e.g., using --notemp.
The issue seems to violate the use of temp(), which shall be deleted after all rules that use it as input are completed. Also, it is a bit unpleasant to re-collect the output. 😿
The text was updated successfully, but these errors were encountered:
temp() works correctly if the temp is also the input of checkpoints. Something related to DAG re-evaluation at checkpoint?
The below workflow works.
# a target rule to define the desired final output
rule all:
input:
lambda wildcards: aggregate_input(wildcards),
rule process01:
output:
temp("processed01.txt"),
shell:
"echo PROCESSED > {output}"
rule process02:
input:
"processed01.txt"
output:
"processed02.txt",
shell:
"echo PROCESSED > {output}"
# the checkpoint that shall trigger re-evaluation of the DAG
# an number of file is created in a defined directory
checkpoint somestep:
input:
"processed01.txt",
"processed02.txt"
output:
directory("my_directory/"),
shell:
"""
mkdir my_directory/
cd my_directory
for i in 1 2 3; do touch $i.txt; done
"""
rule process:
input:
"processed01.txt",
"my_directory/{i}.txt",
output:
"processed1/{i}.txt",
shell:
"echo PROCESSED > {output}"
# collect process output
def aggregate_input(wildcards):
checkpoint_output = checkpoints.somestep.get(**wildcards).output[0]
return expand(
"processed1/{i}.txt",
i=glob_wildcards(os.path.join(checkpoint_output, "{i}.txt")).i,
)
Snakemake version
7.16.0
Describe the bug
Temp
files get deleted early even when they are required input of one unfinished rule after one checkpoint, resulting in an early uncompleted finish like #823. It happens when the checkpoint output is a directory, and the directory content and aTemp
output from previous rules are the input of the downstream rules. It seems ok when the checkpoint output is just a file.I'm not sure if it's a similar bug like #823 or just a limited way of writing snakemake workflow. So I report a new issue here.
Logs
Minimal example
Additional context
The unfinished jobs can be solved by a re-run, which will re-create the temp files. Using
--keep-going
flag, the workflow will exit with0
but it still stops at the same step. It only finishes successfully when the temp files are kept e.g., using--notemp
.The issue seems to violate the use of
temp()
, which shall be deleted after all rules that use it as input are completed. Also, it is a bit unpleasant to re-collect the output. 😿The text was updated successfully, but these errors were encountered: