Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MANY missing output files; Input files updated by another job #65

Closed
mihinduk opened this issue Feb 18, 2022 · 1 comment
Closed

MANY missing output files; Input files updated by another job #65

mihinduk opened this issue Feb 18, 2022 · 1 comment

Comments

@mihinduk
Copy link
Contributor

Hi Mike,
After Hecatomb crashed, I ran this:
hecatomb run --reads RC2_freeze_2_samples_C.tsv --profile slurm --configfile heca
tomb.config.yaml --snake=-n --snake=--reason

                                                                                                                                       [142/1903]

[Thu Feb 17 08:52:52 2022]
rule secondary_nt_lca_table:
input: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.m8
output: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.lin
log: hecatomb_out/STDERR/secondary_nt_lca_table.log
jobid: 2034
benchmark: hecatomb_out/BENCHMARKS/secondary_nt_lca_table.txt
reason: Missing output files: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.lin
resources: mem_mb=16000, disk_mb=893298, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022]
rule secondary_nt_calc_lca:
input: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.lin, /opt/apps/labs/sahlab/software/miniconda3/envs/hecatomb/snakemake/workflow/../..
/databases/tax/taxonomy
output: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/lca.lineage, hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/secondary_nt_lca.tsv
log: hecatomb_out/STDERR/secondary_nt_calc_lca.log
jobid: 2033
benchmark: hecatomb_out/BENCHMARKS/secondary_nt_calc_lca.txt
reason: Missing output files: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/secondary_nt_lca.tsv; Input files updated by another job: hecatomb
_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.lin

threads: 24
resources: mem_mb=64000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

    {
    # calculate lca and lineage
    taxonkit lca -i 2 -s ';' --data-dir /opt/apps/labs/sahlab/software/miniconda3/envs/hecatomb/snakemake/workflow/../../databases/tax/taxonomy h

ecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/all.lin | taxonkit lineage -i 3 --data-dir /opt/apps/labs/sahlab/software/miniconda3/envs
/hecatomb/snakemake/workflow/../../databases/tax/taxonomy | cut --complement -f 2 > hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/
results/lca.lineage 2> hecatomb_out/STDERR/secondary_nt_calc_lca.log

    # Reformat lineages
    awk -F '        ' '$2 != 0' hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/lca.lineage |             taxonkit reformat --data-dir /opt/apps

/labs/sahlab/software/miniconda3/envs/hecatomb/snakemake/workflow/../../databases/tax/taxonomy -i 3 -f "{k}\t{p}\t{c}\t{o}\t{f}\t{g}
t{s}" -F --fill-miss-rank 2>> hecatomb_out/STDERR/secondary_nt_calc_lca.log |
cut --complement -f3 > hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/secondary_nt_lca.tsv
} &> hecatomb_out/STDERR/secondary_nt_calc_lca.log
rm hecatomb_out/STDERR/secondary_nt_calc_lca.log

[Thu Feb 17 08:52:52 2022]
rule SECONDARY_NT_generate_output_table:
input: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/results/tophit.m8, hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/SECONDARY_nt.tsv, hecatomb_out/RESULT
S/MMSEQS_NT_SECONDARY/results/secondary_nt_lca.tsv, hecatomb_out/RESULTS/sampleSeqCounts.tsv, /opt/apps/labs/sahlab/software/miniconda3/envs/hecatomb
/snakemake/workflow/../../databases/tables/2020_07_27_Viral_classification_table_ICTV2019.txt
output: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable.tsv
log: hecatomb_out/STDERR/SECONDARY_NT_generate_output_table.log
jobid: 2026
benchmark: hecatomb_out/BENCHMARKS/SECONDARY_NT_generate_output_table.txt
reason: Missing output files: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable.tsv; Input files updated by another job: hecatomb_out/RESULTS/
MMSEQS_NT_SECONDARY/results/secondary_nt_lca.tsv

resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022]
rule combine_AA_NT:
input: hecatomb_out/RESULTS/MMSEQS_AA_SECONDARY/AA_bigtable.tsv, hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable.tsv
output: hecatomb_out/RESULTS/bigtable.tsv
log: hecatomb_out/STDERR/combine_AA_NT.log
jobid: 2036
benchmark: hecatomb_out/BENCHMARKS/combine_AA_NT.txt
reason: Missing output files: hecatomb_out/RESULTS/bigtable.tsv; Input files updated by another job: hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_
bigtable.tsv

resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

    { cat hecatomb_out/RESULTS/MMSEQS_AA_SECONDARY/AA_bigtable.tsv > hecatomb_out/RESULTS/bigtable.tsv;
    tail -n+2 hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable.tsv >> hecatomb_out/RESULTS/bigtable.tsv; } &> hecatomb_out/STDERR/combine_AA_

NT.log
rm hecatomb_out/STDERR/combine_AA_NT.log

[Thu Feb 17 08:52:52 2022]
rule tax_level_counts:
input: hecatomb_out/RESULTS/bigtable.tsv
output: hecatomb_report/taxonLevelCounts.tsv
log: hecatomb_out/STDERR/tax_level_counts.log
jobid: 2045
reason: Missing output files: hecatomb_report/taxonLevelCounts.tsv; Input files updated by another job: hecatomb_out/RESULTS/bigtable.tsv
threads: 2
resources: mem_mb=16000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022]
rule contig_read_taxonomy:
input: hecatomb_out/PROCESSING/MAPPING/assembly.seqtable.bam, hecatomb_out/PROCESSING/MAPPING/assembly.seqtable.bam.bai, hecatomb_out/RESULTS/big
table.tsv
output: hecatomb_out/RESULTS/contigSeqTable.tsv
log: hecatomb_out/STDERR/contig_read_taxonomy.log
jobid: 2041
benchmark: hecatomb_out/BENCHMARKS/contig_read_taxonomy.txt
reason: Missing output files: hecatomb_out/RESULTS/contigSeqTable.tsv; Input files updated by another job: hecatomb_out/RESULTS/bigtable.tsv
threads: 2
resources: mem_mb=16000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022]
rule krona_text_format:
input: hecatomb_out/RESULTS/bigtable.tsv
output: hecatomb_report/krona.txt
log: hecatomb_out/STDERR/krona_text_format.log
jobid: 2047
benchmark: hecatomb_out/BENCHMARKS/krona_text_format.txt
reason: Missing output files: hecatomb_report/krona.txt; Input files updated by another job: hecatomb_out/RESULTS/bigtable.tsv
resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022]
rule contig_krona_text_format:
input: hecatomb_out/RESULTS/contigSeqTable.tsv
output: hecatomb_report/contigKrona.txt
log: hecatomb_out/STDERR/contig_krona_text_format.log
jobid: 2043
reason: Missing output files: hecatomb_report/contigKrona.txt; Input files updated by another job: hecatomb_out/RESULTS/contigSeqTable.tsv
resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

[Thu Feb 17 08:52:52 2022]
rule krona_plot:
input: hecatomb_report/krona.txt
output: hecatomb_report/krona.html
log: hecatomb_out/STDERR/krona_plot.log
jobid: 2046
benchmark: hecatomb_out/BENCHMARKS/krona_plot.txt
reason: Missing output files: hecatomb_report/krona.html; Input files updated by another job: hecatomb_report/krona.txt
resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

    ktImportText hecatomb_report/krona.txt -o hecatomb_report/krona.html &> hecatomb_out/STDERR/krona_plot.log
    rm hecatomb_out/STDERR/krona_plot.log

[Thu Feb 17 08:52:52 2022]
rule contig_krona_plot:
input: hecatomb_report/contigKrona.txt
output: hecatomb_report/contigKrona.html
log: hecatomb_out/STDERR/contig_krona_plot.log
jobid: 2042
reason: Missing output files: hecatomb_report/contigKrona.html; Input files updated by another job: hecatomb_report/contigKrona.txt
resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

    ktImportText hecatomb_report/contigKrona.txt -o hecatomb_report/contigKrona.html &> hecatomb_out/STDERR/contig_krona_plot.log
    rm hecatomb_out/STDERR/contig_krona_plot.log

[Thu Feb 17 08:52:52 2022]
localrule all:
input: hecatomb_out/RESULTS/seqtable.fasta, hecatomb_out/RESULTS/sampleSeqCounts.tsv, hecatomb_out/RESULTS/seqtable.properties.tsv, hecatomb_out/
PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/assembly.fasta, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/MAPPING/contig_count_table.tsv, hecatom
b_out/RESULTS/assembly.properties.tsv, hecatomb_out/RESULTS/MMSEQS_AA_SECONDARY/AA_bigtable.tsv, hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable
.tsv, hecatomb_out/RESULTS/bigtable.tsv, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/SECONDARY_nt.tsv, hecatomb_out/PROCESSING/ASSEMBLY/C
ONTIG_DICTIONARY/FLYE/SECONDARY_nt_phylum_summary.tsv, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/SECONDARY_nt_class_summary.tsv, hecato
mb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/SECONDARY_nt_order_summary.tsv, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/SECONDARY_n
t_family_summary.tsv, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG_DICTIONARY/FLYE/SECONDARY_nt_genus_summary.tsv, hecatomb_out/PROCESSING/ASSEMBLY/CONTIG
DICTIONARY/FLYE/SECONDARY_nt_species_summary.tsv, hecatomb_out/PROCESSING/MAPPING/assembly.seqtable.bam, hecatomb_out/PROCESSING/MAPPING/assembly.se
qtable.bam.bai, hecatomb_out/RESULTS/contigSeqTable.tsv, hecatomb_report/contigKrona.html, hecatomb_report/Step00_counts.tsv, hecatomb_report/Step01

counts.tsv, hecatomb_report/Step02_counts.tsv, hecatomb_report/Step03_counts.tsv, hecatomb_report/Step04_counts.tsv, hecatomb_report/Step05_counts.ts
v, hecatomb_report/Step06_counts.tsv, hecatomb_report/Step07_counts.tsv, hecatomb_report/Step08_counts.tsv, hecatomb_report/Step09_counts.tsv, hecato
mb_report/Step10_counts.tsv, hecatomb_report/Step11_counts.tsv, hecatomb_report/Step12_counts.tsv, hecatomb_report/Step13_counts.tsv, hecatomb_report
/Sankey.svg, hecatomb_report/hecatomb.samples.tsv, hecatomb_report/taxonLevelCounts.tsv, hecatomb_report/krona.html
jobid: 0
reason: Input files updated by another job: hecatomb_out/RESULTS/bigtable.tsv, hecatomb_report/contigKrona.html, hecatomb_report/krona.html, heca
tomb_report/taxonLevelCounts.tsv, hecatomb_out/RESULTS/contigSeqTable.tsv, hecatomb_out/RESULTS/MMSEQS_NT_SECONDARY/NT_bigtable.tsv

resources: mem_mb=2000, disk_mb=, tmpdir=/tmp, time=1440, jobs=100

Job stats:
job count min threads max threads


SECONDARY_NT_generate_output_table 1 1 1
all 1 1 1
combine_AA_NT 1 1 1
contig_krona_plot 1 1 1
contig_krona_text_format 1 1 1
contig_read_taxonomy 1 2 2
krona_plot 1 1 1
krona_text_format 1 1 1
secondary_nt_calc_lca 1 24 24
secondary_nt_lca_table 1 1 1
tax_level_counts 1 2 2
total 11 1 24

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

What the hecatomb?

@beardymcjohnface
Copy link
Collaborator

I'm not sure what the problem is. when you run snakemake with --reason it will print an explanation for why it's running each rule. The 'missing output files' and 'input updated by another job' are the reasons why snakemake is planning on running those rules.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants