-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AntiSMASH data were not properly processed #736
Comments
Okay, sounds like they have changed the cluster tags in GenBank format for antiSMASH v6..... I'll need to see what it is now, it is using |
I am seeing the same discrepancy. Just to make sure I understand the problem, the issue is only that multiple clusters are all getting called Cluster_1 (or Cluster_2 or Cluster_3), not that there is any other problem with the annotations? |
Hi! Thank you for developing this great tool! I am having an issue that I think is related to this post: Funannotate compare creates a "secmet" folder containing a table and a graph showing counts for "Other: other backbone enzyme" only. Thank you!
|
The issue from the original post and my response (not sure about @ernesfranco) seems to be that antiSMASH numbers clusters at the contig level, and parses those numbers intelligibly in the online graphical output (i.e., cluster 1.1, 1.2, 1.3, 2.1, 2.2, 4.1, etc.), but outputs a gbk file that just uses the contig cluster numbers without modification (hence why there can be as many cluster_1's as there are contigs). Perhaps this is the desired behavior for antiSMASH, but it seems weird. My current workaround is to (1) run a script to renumber the clusters in annotate_misc/annotations.antismash.clusters.txt and (2) rerun funannotate annotate with an additional argument I have added that uses the edited annotations.antismash.clusters.txt when merging all of the annotations. Note that this cluster numbering may not match the number of clusters in annotate_misc/antismash/clusters.bed because some of the clusters delimited in that file can have identical ranges... I'm not certain that that is a bug, but it isn't something that shows up when you view the same antismash run in the online graphical output. |
This is my current solution to the issue described in nextgenusfs#736 where antiSMASH cluster numbering starts over from 1 on each contig. It isn't terribly elegant, but at least each cluster ends up with a different number.
This is my current solution to the issue described in #736 where antiSMASH cluster numbering starts over from 1 on each contig. It isn't terribly elegant, but at least each cluster ends up with a different number.
Deals with the secondary metabolite cluster plotting error @ernesfranco described in a comment on nextgenusfs#736.
Hi @ernesfranco: I was able to reproduce your error with my own data and I think it is another issue with how antiSMASH v6 output is getting parsed... in this case it seems to be because of more flexibility in the product names assigned to NRPS and PKS genes? I have a solution that is working and will submit a pull request shortly. |
Are you using the latest release?
version: 1.8.9
Describe the bug
AntiSMASH data do not seem to be incorporated properly during the annotate function.
Although there was no error during the analysis, only nine clusters were shown in 'annotations.antismash.clusters.txt' as follow. Each cluster has more than hundred genes.
CNY61_000480-T1 note antiSMASH:Cluster_1
CNY61_000481-T1 note antiSMASH:Cluster_1
CNY61_000482-T1 note antiSMASH:Cluster_1
CNY61_000483-T1 note antiSMASH:Cluster_1
CNY61_000484-T1 note antiSMASH:Cluster_1
CNY61_002216-T1 note antiSMASH:Cluster_1
CNY61_002217-T1 note antiSMASH:Cluster_1
CNY61_002218-T1 note antiSMASH:Cluster_1
CNY61_002219-T1 note antiSMASH:Cluster_1
CNY61_002220-T1 note antiSMASH:Cluster_1
CNY61_002222-T1 note antiSMASH:Cluster_1
CNY61_002223-T1 note antiSMASH:Cluster_1
CNY61_002224-T1 note antiSMASH:Cluster_1
CNY61_002225-T1 note antiSMASH:Cluster_1
CNY61_005335-T1 note antiSMASH:Cluster_1
CNY61_005336-T1 note antiSMASH:Cluster_1
CNY61_005337-T1 note antiSMASH:Cluster_1
CNY61_005338-T1 note antiSMASH:Cluster_1
CNY61_005340-T1 note antiSMASH:Cluster_1
CNY61_005341-T1 note antiSMASH:Cluster_1
CNY61_005342-T1 note antiSMASH:Cluster_1
CNY61_005343-T1 note antiSMASH:Cluster_1
CNY61_005344-T1 note antiSMASH:Cluster_1
CNY61_005345-T1 note antiSMASH:Cluster_1
CNY61_005346-T1 note antiSMASH:Cluster_1
.......(more than hundred lines)
CNY61_013295-T1 note antiSMASH:Cluster_1
CNY61_013296-T1 note antiSMASH:Cluster_1
CNY61_013297-T1 note antiSMASH:Cluster_1
CNY61_013298-T1 note antiSMASH:Cluster_1
CNY61_013299-T1 note antiSMASH:Cluster_1
CNY61_000691-T1 note antiSMASH:Cluster_2
CNY61_000692-T1 note antiSMASH:Cluster_2
CNY61_000693-T1 note antiSMASH:Cluster_2
CNY61_000694-T1 note antiSMASH:Cluster_2
CNY61_000695-T1 note antiSMASH:Cluster_2
CNY61_000696-T1 note antiSMASH:Cluster_2
CNY61_000697-T1 note antiSMASH:Cluster_2
CNY61_000698-T1 note antiSMASH:Cluster_2
....
AntiSMASH analysis was performed manually at the server homepage (https://fungismash.secondarymetabolites.org/#!/start).
The .gbk file was downloaded and used for annotate function.
What command did you issue?
funannotate annotate -i fun_pred --cpus 32 --sbt template_js0361.sbt --phobius ./fun_pred/annotate_misc/phobius_results.txt --antismash ./fun_pred/annotate_misc/Colletotrichum_nymphaeae_JS-0361.gbk --iprscan ./fun_pred/annotate_misc/iprscan_result.xml -s "Colletotrichum nymphaeae" --isolate JS-0361
Logfiles
Please provide relavent log files of the error.
[05/15/22 07:01:23]: Now parsing antiSMASH v6 results, finding SM clusters
[05/15/22 07:01:27]: Found 68 clusters, 154 biosynthetic enyzmes, and 242 smCOGs predicted by antiSMASH
[05/15/22 07:01:34]: Found 0 duplicated annotations, adding 107,113 valid annotations
[05/15/22 07:01:35]: Parsing tbl file: /home/linu/workspace/genomes/Colletotrichum/JS-0361/GenomeAssembly/fun_pred/annotate_misc/genome.tbl
[05/15/22 07:01:36]: Converting to final Genbank format, good luck!
[05/15/22 07:01:36]: /home/linu/anaconda3/envs/funannotate/bin/python /home/linu/anaconda3/envs/funannotate/lib/python3.7/site-packages/funannotate/aux_scripts/tbl2asn_parallel.py -i fun_pred/annotate_misc/tbl2asn/genome.tbl -f fun_pred/annotate_misc/tbl2asn/genome.fsa -o fun_pred/annotate_misc/tbl2asn --sbt template_js0361.sbt -d discrepency.report.txt -s Colletotrichum nymphaeae -t -l paired-ends -v 1 -c 32 --isolate JS-0361
[05/15/22 07:02:55]: Creating AGP file and corresponding contigs file
[05/15/22 07:02:57]: Cross referencing SM cluster hits with MIBiG database version 1.4
[05/15/22 07:02:57]: diamond blastp --sensitive --query fun_pred/annotate_misc/antismash/smcluster.proteins.fasta --threads 32 --out fun_pred/annotate_misc/antismash/smcluster.MIBiG.blast.txt --db /home/linu/workspace/tools/funannotate_db/mibig.dmnd --max-hsps 1 --evalue 0.001 --max-target-seqs 1 --outfmt 6
[05/15/22 07:03:04]: Creating tab-delimited SM cluster output
[05/15/22 07:03:09]: Writing genome annotation table.
[05/15/22 07:04:35]: Funannotate annotate has completed successfully!
OS/Install Information
Ubuntu 18.04.6 LTS /
Checking dependencies for 1.8.9
You are running Python v 3.7.10. Now checking python packages...
biopython: 1.77
goatools: 1.1.12
matplotlib: 3.4.3
natsort: 8.1.0
numpy: 1.21.5
pandas: 1.3.5
psutil: 5.9.0
requests: 2.27.1
scikit-learn: 1.0.2
scipy: 1.7.3
seaborn: 0.11.2
All 11 python packages installed
You are running Perl v b'5.026002'. Now checking perl modules...
Bio::Perl: 1.7.4
Carp: 1.38
Clone: 0.42
DBD::SQLite: 1.64
DBD::mysql: 4.046
DBI: 1.642
DB_File: 1.855
Data::Dumper: 2.173
File::Basename: 2.85
File::Which: 1.23
Getopt::Long: 2.5
Hash::Merge: 0.300
JSON: 4.02
LWP::UserAgent: 6.39
Logger::Simple: 2.0
POSIX: 1.76
Parallel::ForkManager: 2.02
Pod::Usage: 1.69
Scalar::Util::Numeric: 0.40
Storable: 3.15
Text::Soundex: 3.05
Thread::Queue: 3.12
Tie::File: 1.02
URI::Escape: 3.31
YAML: 1.29
threads: 2.15
threads::shared: 1.56
All 27 Perl modules installed
Checking Environmental Variables...
$FUNANNOTATE_DB=/home/linu/workspace/tools/funannotate_db
$PASAHOME=/home/linu/anaconda3/envs/funannotate/opt/pasa-2.4.1
$TRINITY_HOME=/home/linu/anaconda3/envs/funannotate/opt/trinity-2.8.5
$EVM_HOME=/home/linu/anaconda3/envs/funannotate/opt/evidencemodeler-1.1.1
$AUGUSTUS_CONFIG_PATH=/home/linu/anaconda3/envs/funannotate/config/
$GENEMARK_PATH=/home/linu/workspace/tools/genemark/gmes_linux_64
All 6 environmental variables are set
Checking external dependencies...
PASA: 2.4.1
CodingQuarry: 2.0
Trinity: 2.8.5
augustus: 3.3.3
bamtools: bamtools 2.5.1
bedtools: bedtools v2.30.0
blat: BLAT v36
diamond: 2.0.14
emapper.py: 2.1.7
ete3: 3.1.2
exonerate: exonerate 2.4.0
fasta: no way to determine
glimmerhmm: 3.0.4
gmap: 2017-11-15
hisat2: 2.2.1
hmmscan: HMMER 3.3.2 (Nov 2020)
hmmsearch: HMMER 3.3.2 (Nov 2020)
java: 11.0.13
kallisto: 0.46.1
mafft: v7.490 (2021/Oct/30)
makeblastdb: makeblastdb 2.11.0+
minimap2: 2.24-r1122
proteinortho: 6.0.33
pslCDnaFilter: no way to determine
salmon: salmon 0.14.1
samtools: samtools 1.12
signalp: 4.1
snap: 2006-07-28
stringtie: 2.2.1
tRNAscan-SE: 2.0.9 (July 2021)
tantan: tantan 26
tbl2asn: no way to determine, likely 25.X
tblastn: tblastn 2.11.0+
trimal: trimAl v1.4.rev15 build[2013-12-17]
trimmomatic: 0.39
ERROR: gmes_petap.pl not installed
The text was updated successfully, but these errors were encountered: