Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AntiSMASH data were not properly processed #736

Open
jae0326 opened this issue Jun 22, 2022 · 5 comments
Open

AntiSMASH data were not properly processed #736

jae0326 opened this issue Jun 22, 2022 · 5 comments

Comments

@jae0326
Copy link

jae0326 commented Jun 22, 2022

Are you using the latest release?
version: 1.8.9

Describe the bug
AntiSMASH data do not seem to be incorporated properly during the annotate function.
Although there was no error during the analysis, only nine clusters were shown in 'annotations.antismash.clusters.txt' as follow. Each cluster has more than hundred genes.

CNY61_000480-T1 note antiSMASH:Cluster_1
CNY61_000481-T1 note antiSMASH:Cluster_1
CNY61_000482-T1 note antiSMASH:Cluster_1
CNY61_000483-T1 note antiSMASH:Cluster_1
CNY61_000484-T1 note antiSMASH:Cluster_1
CNY61_002216-T1 note antiSMASH:Cluster_1
CNY61_002217-T1 note antiSMASH:Cluster_1
CNY61_002218-T1 note antiSMASH:Cluster_1
CNY61_002219-T1 note antiSMASH:Cluster_1
CNY61_002220-T1 note antiSMASH:Cluster_1
CNY61_002222-T1 note antiSMASH:Cluster_1
CNY61_002223-T1 note antiSMASH:Cluster_1
CNY61_002224-T1 note antiSMASH:Cluster_1
CNY61_002225-T1 note antiSMASH:Cluster_1
CNY61_005335-T1 note antiSMASH:Cluster_1
CNY61_005336-T1 note antiSMASH:Cluster_1
CNY61_005337-T1 note antiSMASH:Cluster_1
CNY61_005338-T1 note antiSMASH:Cluster_1
CNY61_005340-T1 note antiSMASH:Cluster_1
CNY61_005341-T1 note antiSMASH:Cluster_1
CNY61_005342-T1 note antiSMASH:Cluster_1
CNY61_005343-T1 note antiSMASH:Cluster_1
CNY61_005344-T1 note antiSMASH:Cluster_1
CNY61_005345-T1 note antiSMASH:Cluster_1
CNY61_005346-T1 note antiSMASH:Cluster_1

.......(more than hundred lines)

CNY61_013295-T1 note antiSMASH:Cluster_1
CNY61_013296-T1 note antiSMASH:Cluster_1
CNY61_013297-T1 note antiSMASH:Cluster_1
CNY61_013298-T1 note antiSMASH:Cluster_1
CNY61_013299-T1 note antiSMASH:Cluster_1
CNY61_000691-T1 note antiSMASH:Cluster_2
CNY61_000692-T1 note antiSMASH:Cluster_2
CNY61_000693-T1 note antiSMASH:Cluster_2
CNY61_000694-T1 note antiSMASH:Cluster_2
CNY61_000695-T1 note antiSMASH:Cluster_2
CNY61_000696-T1 note antiSMASH:Cluster_2
CNY61_000697-T1 note antiSMASH:Cluster_2
CNY61_000698-T1 note antiSMASH:Cluster_2
....

AntiSMASH analysis was performed manually at the server homepage (https://fungismash.secondarymetabolites.org/#!/start).
The .gbk file was downloaded and used for annotate function.

What command did you issue?
funannotate annotate -i fun_pred --cpus 32 --sbt template_js0361.sbt --phobius ./fun_pred/annotate_misc/phobius_results.txt --antismash ./fun_pred/annotate_misc/Colletotrichum_nymphaeae_JS-0361.gbk --iprscan ./fun_pred/annotate_misc/iprscan_result.xml -s "Colletotrichum nymphaeae" --isolate JS-0361

Logfiles
Please provide relavent log files of the error.

[05/15/22 07:01:23]: Now parsing antiSMASH v6 results, finding SM clusters
[05/15/22 07:01:27]: Found 68 clusters, 154 biosynthetic enyzmes, and 242 smCOGs predicted by antiSMASH
[05/15/22 07:01:34]: Found 0 duplicated annotations, adding 107,113 valid annotations
[05/15/22 07:01:35]: Parsing tbl file: /home/linu/workspace/genomes/Colletotrichum/JS-0361/GenomeAssembly/fun_pred/annotate_misc/genome.tbl
[05/15/22 07:01:36]: Converting to final Genbank format, good luck!
[05/15/22 07:01:36]: /home/linu/anaconda3/envs/funannotate/bin/python /home/linu/anaconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py -i fun_pred/annotate_misc/tbl2asn/genome.tbl -f fun_pred/annotate_misc/tbl2asn/genome.fsa -o fun_pred/annotate_misc/tbl2asn --sbt template_js0361.sbt -d discrepency.report.txt -s Colletotrichum nymphaeae -t -l paired-ends -v 1 -c 32 --isolate JS-0361
[05/15/22 07:02:55]: Creating AGP file and corresponding contigs file
[05/15/22 07:02:57]: Cross referencing SM cluster hits with MIBiG database version 1.4
[05/15/22 07:02:57]: diamond blastp --sensitive --query fun_pred/annotate_misc/antismash/smcluster.proteins.fasta --threads 32 --out fun_pred/annotate_misc/antismash/smcluster.MIBiG.blast.txt --db /home/linu/workspace/tools/funannotate_db/mibig.dmnd --max-hsps 1 --evalue 0.001 --max-target-seqs 1 --outfmt 6
[05/15/22 07:03:04]: Creating tab-delimited SM cluster output
[05/15/22 07:03:09]: Writing genome annotation table.
[05/15/22 07:04:35]: Funannotate annotate has completed successfully!

OS/Install Information

Ubuntu 18.04.6 LTS /

Checking dependencies for 1.8.9
You are running Python v 3.7.10. Now checking python packages...
biopython: 1.77
goatools: 1.1.12
matplotlib: 3.4.3
natsort: 8.1.0
numpy: 1.21.5
pandas: 1.3.5
psutil: 5.9.0
requests: 2.27.1
scikit-learn: 1.0.2
scipy: 1.7.3
seaborn: 0.11.2
All 11 python packages installed

You are running Perl v b'5.026002'. Now checking perl modules...
Bio::Perl: 1.7.4
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed

Checking Environmental Variables...
$FUNANNOTATE_DB=/home/linu/workspace/tools/funannotate_db
$PASAHOME=/home/linu/anaconda3/envs/funannotate/opt/pasa-2.4.1
$TRINITY_HOME=/home/linu/anaconda3/envs/funannotate/opt/trinity-2.8.5
$EVM_HOME=/home/linu/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/home/linu/anaconda3/envs/funannotate/config/
$GENEMARK_PATH=/home/linu/workspace/tools/genemark/gmes_linux_64
All 6 environmental variables are set

Checking external dependencies...
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.14
emapper.py: 2.1.7
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.13
kallisto: 0.46.1
mafft: v7.490 (2021/Oct/30)
makeblastdb: makeblastdb 2.11.0+
minimap2: 2.24-r1122
proteinortho: 6.0.33
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.12
signalp: 4.1
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 26
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.11.0+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
ERROR: gmes_petap.pl not installed

@nextgenusfs
Copy link
Owner

Okay, sounds like they have changed the cluster tags in GenBank format for antiSMASH v6..... I'll need to see what it is now, it is using protocluster tags I think currently to define a cluster.

@IanDMedeiros
Copy link
Contributor

I am seeing the same discrepancy. Just to make sure I understand the problem, the issue is only that multiple clusters are all getting called Cluster_1 (or Cluster_2 or Cluster_3), not that there is any other problem with the annotations?

@ernesfranco
Copy link

ernesfranco commented Aug 30, 2022

Hi! Thank you for developing this great tool!

I am having an issue that I think is related to this post:

Funannotate compare creates a "secmet" folder containing a table and a graph showing counts for "Other: other backbone enzyme" only.

Thank you!

Funannotate version: 1.8.12 (conda)
AntiSMASH version: 6.1.1

$ funannotate mask -i input -o assembly.fa
$ funannotate predict -i assembly.fa -o fun --species species --strain strain --busco_db eurotiomycetes
$ funannotate iprscan -i fun -m docker
$ funannotate remote -i fun -m phobius antismash -e email
$ funannotate annotate -i fun --busco_db eurotiomycetes
$ funannotate compare -i gbk_files_from_funannotate_annotate

Screenshot from 2022-08-30 15-09-41

@IanDMedeiros
Copy link
Contributor

The issue from the original post and my response (not sure about @ernesfranco) seems to be that antiSMASH numbers clusters at the contig level, and parses those numbers intelligibly in the online graphical output (i.e., cluster 1.1, 1.2, 1.3, 2.1, 2.2, 4.1, etc.), but outputs a gbk file that just uses the contig cluster numbers without modification (hence why there can be as many cluster_1's as there are contigs). Perhaps this is the desired behavior for antiSMASH, but it seems weird.

My current workaround is to (1) run a script to renumber the clusters in annotate_misc/annotations.antismash.clusters.txt and (2) rerun funannotate annotate with an additional argument I have added that uses the edited annotations.antismash.clusters.txt when merging all of the annotations. Note that this cluster numbering may not match the number of clusters in annotate_misc/antismash/clusters.bed because some of the clusters delimited in that file can have identical ranges... I'm not certain that that is a bug, but it isn't something that shows up when you view the same antismash run in the online graphical output.

IanDMedeiros added a commit to IanDMedeiros/funannotate that referenced this issue Sep 15, 2022
This is my current solution to the issue described in nextgenusfs#736 where antiSMASH cluster numbering starts over from 1 on each contig. It isn't terribly elegant, but at least each cluster ends up with a different number.
nextgenusfs pushed a commit that referenced this issue Oct 19, 2022
This is my current solution to the issue described in #736 where antiSMASH cluster numbering starts over from 1 on each contig. It isn't terribly elegant, but at least each cluster ends up with a different number.
IanDMedeiros added a commit to IanDMedeiros/funannotate that referenced this issue Oct 20, 2022
Deals with the secondary metabolite cluster plotting error @ernesfranco described in a comment on nextgenusfs#736.
@IanDMedeiros
Copy link
Contributor

Hi @ernesfranco: I was able to reproduce your error with my own data and I think it is another issue with how antiSMASH v6 output is getting parsed... in this case it seems to be because of more flexibility in the product names assigned to NRPS and PKS genes? I have a solution that is working and will submit a pull request shortly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants