Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential Typo in Antismash parser #299

Closed
PlantDr430 opened this issue Jun 30, 2019 · 6 comments
Closed

Potential Typo in Antismash parser #299

PlantDr430 opened this issue Jun 30, 2019 · 6 comments

Comments

@PlantDr430
Copy link

Currently using funannotate v1.6.0-ad5c0de as I had to re-run one sample prior to NCBI submission.

When I can my sample through I was getting and MIBiG error where the program wasn't finding the correct file. I also noticed that the parser was finding SM clusters and reported them to the standard output, but only created the cluster.bed and didn't create any other file associated with antimash (i.e. the secmet.clusters.txt was left blank). The .bed file looked like this.

contig_5	0	37056	Cluster_5.1	0	+
contig_16	<57	8796	Cluster_16.1	0	+
contig_16	56302	127287	Cluster_16.2	0	+
contig_39	926	53245	Cluster_39.1	0	+
contig_39	926	53245	Cluster_39.2	0	+
contig_151	0	28522	Cluster_151.1	0	+
contig_172	0	44595	Cluster_172.1	0	+
contig_199	0	38485	Cluster_199.1	0	+
contig_205	19193	45936	Cluster_205.1	0	+
contig_215	0	27391	Cluster_215.1	0	+
contig_222	3292	44079	Cluster_222.1	0	+
contig_230	0	24777	Cluster_230.1	0	+

I noticed that for the start of the second cluster an "<" was inserted into the integer. I believe this was causing an error somewhere and failed to create the cluster.txt. file because of it. I checked my antismash.gbk file and noticed that it contained the "<" in the features, such as:

COMMENT     'Annotated using funannotate v1.5.1'.
            ##antiSMASH-Data-START##
            Version      :: 5.0.0
            Run date     :: 2019-06-29 16:58:09
            ##antiSMASH-Data-END##
FEATURES             Location/Qualifiers
     source          1..129268
                     /db_xref="taxon:83212"
                     /mol_type="genomic DNA"
                     /organism="Claviceps africana"
                     /strain="CCC489"
     mRNA            complement(join(<58..1256,1476..1642,1720..2899))
                     /locus_tag="E4U42_001851"
                     /product="hypothetical protein"
     gene            complement(<58..2899)
                     /locus_tag="E4U42_001851"
     CDS             complement(join(<58..1256,1476..1642,1720..2899))
                     /codon_start=1
                     /gene_functions="transport (smcogs) SMCOG1288:ABC
                     transporter related protein (Score: 134.9; E-value: 6e-41)"
                     /gene_kind="transport"
                     /locus_tag="E4U42_001851"
                     /product="hypothetical protein"
                     /protein_id="ncbi_E4U42_001851-T1"
                     /transl_table=1
                     /translation="MAAAQALTQILPQMIAVSKAMAAAQNLFSTIDRVSNMDTLSEDGI
                     EPADFQGHIRLQGVGFSYPARPNTPVLQDVNLEIRPNQVTAIVGASGSGKSTIFGLIER
                     WYAYSSGEMTLDGHRLESIKLRWLRTKIRLVQQEPTLFSGSIYQNVMDGLAGCDDGLSD
                     GEKKHRVVAACKAVLMHDFIAELPRGYDSCIGERGASLSGGQRQRLVIARAIVSDPKVL
                     LLDEATSALDAHAEKAVQAALNNIARGRTVVVIAHRLSTVRDSDNIIVLGKGGRVMESG
                     THARLVALGGAYASLARTQDLAENMPDPVEGEEGSVASGEEEERAVAAPDVDSAQTPTA
                     RRGSGSGSGKKGESRRHGTLSSYGLLHGLFLIIKEQRTLWRPLSVTLVCCTAGGLLSSS
                     MAVVVANSLEVYRGADFDKARFFAIMFFAIGLCSILVYATIGWISNVIAQTIIRFYRRD
                     ILDNTLRQDMAFFDRPENNTGALVARLASEPLSLQELLSFNVSLVVISIVNAVCGCTVA
                     VISGWKLGLAMCLGAMPVIVGAGYLRIRLEVRFEQDTARSFASSSAVAAEAVMGIRTVC
                     SLALEEAVVERYSQSLQDLVRDSIGGLGVKAFLYALSQSASLLVMGLGFWYGGRLVSTG
                     EYTLRQFYVVYMVVIYSGGATAALFQHTTSISKACTAINYILGLRQTRVLLDDDDAEED
                     EDHDPGAAVARPVDEKGPGLEAGLERVHFAYPLRPKQKVLRGIDMSIRPGQMTALVGAS
                     GCGKSTLIGLLERFYDPSSGTVWVRDDGRRRDIRTLHRRRHRRDVALVQQEPVLYQGSI
                     LDNVALGIEHDRLRPADPPEARIEAACRAAHIWDFIA"
     protocluster    <58..8796
                     /aStool="rule-based-clusters"

When I removed the "<" from the .gbk file and re-ran the program and everything ran smoothly. I am not sure why antismash inserted the "<" but I did also see it on the website.

image

@nextgenusfs
Copy link
Owner

Those carrots (<>) in genbank format refer to a partial gene model. So you are saying in the genbank file that you submitted to antiSMASH the carrot was not in the file for this gene?

@PlantDr430
Copy link
Author

PlantDr430 commented Jun 30, 2019 via email

@nextgenusfs
Copy link
Owner

Okay will have a look today at the code.

@nextgenusfs
Copy link
Owner

This 2fe943e should strip the partial notation from the coordinates into the bed file -- which as you found out is used to parse further downstream.

@PlantDr430
Copy link
Author

PlantDr430 commented Jun 30, 2019 via email

@hyphaltip
Copy link
Collaborator

hyphaltip commented Jun 30, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants