Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Antismash v5.0 isn't being parsed #292

Closed
PlantDr430 opened this issue May 23, 2019 · 17 comments
Closed

Antismash v5.0 isn't being parsed #292

PlantDr430 opened this issue May 23, 2019 · 17 comments

Comments

@PlantDr430
Copy link

I am currently using the latest version which I pulled off of github today (v1.5.3-21ad095).

I am also using the newest version of antiSMASH v5, however, I noticed that the qualifiers in the .gbk output are different than previous .gbk that I had from antiSMASH v4. Perhaps these could be why not clusters or smCOGs are being parsed out.

I have attached my log file and a version of the .gbk results showing a portion of the output of antiSMASH v5.
funannotate-annotate.log
antiSMASH.results.txt

@nextgenusfs
Copy link
Owner

nextgenusfs commented May 23, 2019

Yippeee! Always fun when formats change..... is there a tag saying which version of antismash the result is from?

Edit: looks like version in the comment section. So an updated parser will need to be added to the code.

@PlantDr430
Copy link
Author

yea, in the .gbk they have this:

##antiSMASH-Data-START##
Version :: 5.0.0rc1
Run date :: 2019-05-10 16:52:23
##antiSMASH-Data-END##

but there isn't a tag such as this in the v4 .gbk's

@nextgenusfs
Copy link
Owner

nextgenusfs commented May 27, 2019

In the example you posted above, it seems that the annotation is not numerically incrementing properly, ie there are two 'protocluster' features, however, they say there are from the same "number". Is this the case throughout the gbk file output?
Here are the two "protocluster" features:

     protocluster    31439..78329
                     /aStool="rule-based-clusters"
                     /contig_edge="False"
                     /core_location="join{[51438:51715](+), [51814:52199](+),
                     [52265:52794](+), [52859:57416](+), [57480:58329](+)}"
                     /cutoff="20000"
                     /detection_rule="cds(PKS_AT and (PKS_KS or ene_KS or mod_KS
                     or hyb_KS or itr_KS or tra_KS))"
                     /neighbourhood="20000"
                     /product="T1PKS"
                     /protocluster_number="1"
                     /tool="antismash"
     proto_core      join(51439..51715,51815..52199,52266..52794,52860..57416,
                     57481..58329)
                     /aStool="rule-based-clusters"
                     /cutoff="20000"
                     /detection_rule="cds(PKS_AT and (PKS_KS or ene_KS or mod_KS
                     or hyb_KS or itr_KS or tra_KS))"
                     /neighbourhood="20000"
                     /product="T1PKS"
                     /protocluster_number="1"

And then is one other one, looks like this:

     protocluster    64344..107816
                     /aStool="rule-based-clusters"
                     /contig_edge="True"
                     /core_location="join{[91647:92574](-), [91554:91580](-),
                     [91368:91464](-), [91070:91264](-), [85323:90989](-),
                     [85064:85241](-), [84343:84982](-)}"
                     /cutoff="20000"
                     /detection_rule="cds(PKS_AT and (PKS_KS or ene_KS or mod_KS
                     or hyb_KS or itr_KS or tra_KS))"
                     /neighbourhood="20000"
                     /product="T1PKS"
                     /protocluster_number="1"
                     /tool="antismash"
     proto_core      complement(join(84344..84982,85065..85241,85324..90989,
                     91071..91264,91369..91464,91555..91580,91648..92574))
                     /aStool="rule-based-clusters"
                     /cutoff="20000"
                     /detection_rule="cds(PKS_AT and (PKS_KS or ene_KS or mod_KS
                     or hyb_KS or itr_KS or tra_KS))"
                     /neighbourhood="20000"
                     /product="T1PKS"
                     /protocluster_number="1"

So I'm wondering if this is correct? These two "protocluster" features are part of the same cluster? Or is this a mistake? Does the html output match this? They appear to overlap -- so perhaps underlying code is correct. So does that mean that all "clusters" have this protocluster annotation or is this a subset of the cluster annotation?

@nextgenusfs
Copy link
Owner

nextgenusfs commented May 27, 2019

Screen Shot 2019-05-27 at 12 41 04 PM
Update: I think I figured out what is happening. It seems that the numbering is contig specific, i.e. it starts over counting from 1 for each GenBank record (contig). And then looks like they are now using a contig.num for naming on html.

@PlantDr430
Copy link
Author

PlantDr430 commented May 27, 2019

That would make sense as my results usually only have one cluster per contig. Although, interesting that your run appears to indicate more clusters than mine.
image

I also noticed that in some other contigs /protocluster_number="1" appeared with different products such as NRPS-like, or terpenes. Which would indicate that it isn't product related and does appear to be contig related.

     protocluster    9399..53295
                     /aStool="rule-based-clusters"
                     /contig_edge="False"
                     /core_location="[29398:33295](-)"
                     /cutoff="0"
                     /detection_rule="cds((PP-binding or NAD_binding_4) and
                     (AMP-binding or A-OX))"
                     /neighbourhood="20000"
                     /product="NRPS-like"
                     /protocluster_number="1"
                     /tool="antismash"
     proto_core      complement(29399..33295)
                     /aStool="rule-based-clusters"
                     /cutoff="0"
                     /detection_rule="cds((PP-binding or NAD_binding_4) and
                     (AMP-binding or A-OX))"
                     /neighbourhood="20000"
                     /product="NRPS-like"
                     /protocluster_number="1"

@nextgenusfs
Copy link
Owner

I didn't use the same genome ;)

@nextgenusfs
Copy link
Owner

Goal is to get this updated today, I'll post here when its working.

@nextgenusfs
Copy link
Owner

Okay, I think I have it fixed, if you wouldn't mind testing the latest commit that would be helpful. Version should be:

$ funannotate version
funannotate v1.6.0-046e957

@PlantDr430
Copy link
Author

The parser picked up on clusters and smCOGs, but stated that I don't have any backbone biosynthetic enzymes. While I believe I do have some as antiSMASH is picking up some genes are "core biosynthetic genes".

[03:48 PM]: Now parsing antiSMASH v5 results, finding SM clusters
[03:48 PM]: Found 32 clusters, 0 backbone biosynthetic enyzmes, and 77 smCOGs predicted by antiSMASH
[03:48 PM]: Found 0 duplicated annotations, adding 52,327 valid annotations
[03:48 PM]: Converting to final Genbank format, good luck!
[03:50 PM]: Creating AGP file and corresponding contigs file
[03:50 PM]: Cross referencing SM cluster hits with MIBiG database version 1.3
[03:50 PM]: Creating tab-delimited SM cluster output
[03:50 PM]: Writing genome annotation table.
[03:50 PM]: Funannotate annotate has completed successfully!

    We need YOUR help to improve gene names/product descriptions:
       0 gene/products names MUST be fixed, see LM461_fun_output/annotate_results/Gene2Products.must-fix.txt
       1 gene/product names need to be curated, see LM461_fun_output/annotate_results/Gene2Products.need-curating.txt
       60 gene/product names passed but are not in Database, see LM461_fun_output/annotate_results/Gene2Products.new-names-passed.txt

    Please consider contributing a PR at https://github.com/nextgenusfs/gene2product

-------------------------------------------------------
stephenwyka@bspmgenomics:/data/wyka$

@nextgenusfs
Copy link
Owner

Ok, thanks. Its not really a big deal/change, I don't think, as it is simply a counter. Do the results in annotate_result/*.cluster.txt make sense?

Wonder if this is difference in 5.0.0 [what I ran on web server] and 5.0.0rc1 [which seems to be what you have].

@PlantDr430
Copy link
Author

Yes, the results in annotate_result/*.cluster.txt make sense

@nextgenusfs
Copy link
Owner

Thanks, I'll see if I can fix the counter.

@nextgenusfs
Copy link
Owner

Okay, should now be counting the biosynthetic enzymes based on the 'gene_kind' = 'biosynthetic' in the CDS metadata.

@PlantDr430
Copy link
Author

Thank you

@PlantDr430
Copy link
Author

So this fixed worked on all my genomes except one, where I got this error:

[03:12 PM]: Now parsing antiSMASH v5 results, finding SM clusters
Traceback (most recent call last):
  File "/data/wyka/funannotate-master/bin/funannotate-functional.py", line 878, in <module>
    lib.ParseAntiSmash(antismash_input, AntiSmashFolder, AntiSmashBed, AntiSmash_annotations) #results in several global dictionaries
  File "/data/wyka/funannotate-master/lib/library.py", line 5320, in ParseAntiSmash
    numericalContig = int(''.join(filter(str.isdigit, chr)))
UnboundLocalError: local variable 'chr' referenced before assignment
stephenwyka@bspmgenomics:/data/wyka/final_funannotate/Cpur20_1$

@nextgenusfs
Copy link
Owner

Thanks, that one was typo: 0c6732d. git pull should fix it.

@PlantDr430
Copy link
Author

Got it to work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants