Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAStool finishes without errors but output not recgnized by Anvi'o #2160

Closed
eneas01 opened this issue Oct 31, 2023 · 3 comments
Closed

DAStool finishes without errors but output not recgnized by Anvi'o #2160

eneas01 opened this issue Oct 31, 2023 · 3 comments

Comments

@eneas01
Copy link

eneas01 commented Oct 31, 2023

Short description of the problem

Please help. DAS-tool is not creating critical output file "OUTPUT_DASTool_scaffolds2bin.txt", but finishes without errors in logs.txt, and creates "OUTPUT_DASTool_contig2bin.tsv", which is not recognized by Anvi'o.

anvi'o version

Keep the header of this section, but replace this text with the output of this command in your terminal:

Anvi'o .......................................: hope (v7.1)

Profile database .............................: 38
Contigs database .............................: 20
Pan database .................................: 15
Genome data storage ..........................: 7
Auxiliary data storage .......................: 2
Structure database ...........................: 2
Metabolic modules database ...................: 2
tRNA-seq database ............................: 2

System info

Distributor ID: Ubuntu
Description: Ubuntu 22.04.2 LTS
Release: 22.04
Codename: jammy

Anvi'o was installed using Anaconda3
DAS Tool 1.1.6

Detailed description of the issue

DAStool finishes without issues, but does not produce the expected file "OUTPUT_DASTool_scaffolds2bin.txt". Instead, I have "OUTPUT_DASTool_contig2bin.tsv", which seem to have the binning results. Therefore, the bins are not added to the databse, and I get an error instead. This is related to issue #1510, but not the same, as in this case DAStool did finish without errors, as far as I can tell (see logs.txt bellow).

Commands to reproduce the issue

Command that produced the problem:

 anvi-cluster-contigs -p 04_MAPPING_ANVIO/M22-MERGED/PROFILE.db \
                         -S concoct_bins,maxbin2_bins,metabat2_bins \
                         -c 05_CONTIGS/M22_contigs.db \
                         -C dastool_bins \
                         -T 60 \
                         --driver dastool \
                         --search-engine diamond \
                         --just-do-it

Output:

Config Error: One of the critical output files is missing                              
              ('OUTPUT_DASTool_scaffolds2bin.txt'). Please take a look at the log file:
              /tmp/tmpibqg3h47/logs.txt
cat  /tmp/tmpibqg3h47/logs.txt

Output from cat:

# DATE: 30 Oct 23 14:01:00
# CMD LINE: DAS_Tool -c /tmp/tmpibqg3h47/sequence_splits.fa -i /tmp/tmpibqg3h47/metabat2_bins.txt,/tmp/tmpibqg3h47/maxbin2_bins.txt,/tmp/tmpibqg3h47/concoct_bins.txt -l metabat2_bins,maxbin2_bins,concoct_bins -o /tmp/tmpibqg3h47/OUTPUT --threads 60 --search_engine diamond
DAS Tool 1.1.6 
Analyzing assembly 
Predicting genes 
Annotating single copy genes using diamond 
Dereplicating, aggregating, and scoring bins 

Hmmmm... no errors reported!

ls  /tmp/tmpibqg3h47

Output from ls:

concoct_bins-info.txt          OUTPUT_proteins.faa
concoct_bins.txt               OUTPUT_proteins.faa.all.b6
contig_coverages_log_norm.txt  OUTPUT_proteins.faa.archaea.scg
contig_coverages.txt           OUTPUT_proteins.faa.bacteria.scg
logs.txt                       OUTPUT_proteins.faa.findSCG.b6
maxbin2_bins-info.txt          OUTPUT_proteins.faa.scg.candidates.faa
maxbin2_bins.txt               OUTPUT.seqlength
metabat2_bins-info.txt         sequence_contigs.fa
metabat2_bins.txt              sequence_splits.fa
OUTPUT_DASTool_contig2bin.tsv  split_coverages_log_norm.txt
OUTPUT_DASTool.log             split_coverages.txt
OUTPUT_DASTool_summary.tsv
head /tmp/tmpibqg3h47/OUTPUT_DASTool_contig2bin.tsv

Output from head:

c_000000006057_split_00001	MAXBIN__040
c_000000007621_split_00001	MAXBIN__040
c_000000007723_split_00001	MAXBIN__040
c_000000008164_split_00001	MAXBIN__040
c_000000010277_split_00001	MAXBIN__040
c_000000014078_split_00001	MAXBIN__040
c_000000014541_split_00001	MAXBIN__040
c_000000017535_split_00001	MAXBIN__040
c_000000034075_split_00001	MAXBIN__040
c_000000034843_split_00001	MAXBIN__040

I think "OUTPUT_DASTool_contig2bin.tsv" seems to be the expected output and have the right format, but different name, am I wrong?

Lines 138-141 of dastool.py seem to check for 'OUTPUT_DASTool_scaffolds2bin.txt' and throw this error as it is not found. Can I just change it for "OUTPUT_DASTool_contig2bin.tsv" on line 138, or is this not the correct binning results file?

I could just run DASTool outside Anvi'o and import the collection, but as I am trying to automate a pipleline, I would prefer to stay within Anvi'o, if possible. ¿Can you please help me solve this issue? Any help will be greatly appreciated. Thanks in advance

meren added a commit that referenced this issue Oct 31, 2023
@meren
Copy link
Member

meren commented Oct 31, 2023

hi @eneas01, a8c0c55 is an attempt to address this. if you install anvio-dev, you may be able to try it and see if this solves your problem.

In the worst case scenario you will need to import your binning results with anvi-import-collection, but I hope this change can help you.

@meren meren closed this as completed Oct 31, 2023
@eneas01
Copy link
Author

eneas01 commented Oct 31, 2023

Thanks a lot @meren. I will try with anvio-dev, and otherwise import the collection.

@eneas01
Copy link
Author

eneas01 commented Oct 31, 2023

The change made to dastool.py in the anvio-dev version worked! Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants