-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST] Preserve prodigal metadata for anvi-export-gene-calls
#2181
Comments
Just a quick note as I'm passing through this: When we do this we need to think of a way that doesn't require a specific design that locks us in with Prodigal for gene calling. A generic design that can keep track of additional features for genes (or genomic regions, or nucleotides, or codons) that can also be populated from Prodigal output. |
anti-export-gene-calls
anvi-export-gene-calls
Because I understand this exists in the context of broader changes that need be done (that are beyond my current mastery of the Anvi'o codebase), here is a temporary pseudo-solution for anyone who ends up here. I wrote this script which essentially piggy backs on Anvi'o prodigal caller and response parser:
This won't update your contigs database or otherwise modify any Anvi'o functionality, however if you call it with a FASTA as input, it will return the same dict anvi'o generates in addition to keeping the other prodigal outputs chosen here (e.g. 'gc_cont'). Example run command |
Thanks for letting me know about the typo, @Ge0rges. I'm not sure how did it survive this long. I guess because no one is using Prodigal Your temporary workaround is masterful and beautiful. Regarding the original feature request: this has been a difficult one to address because it requires a change in the way we keep gene calls in our relevant table with the addition of a few new columns, which will likely add millions of additional data points to that table, increasing the contigs-db size by a lot while only being relevant to a fraction of the users. A better solution would be to extend that table if |
That makes sense. I was wondering if the revamped mentioned in #2152 is thought of to affect the contigs-db or to involve the creation of new type of artifact centered around genomes/MAGs? If the latter was the case, this feature could be relegated to that artifact rather than expanding on contigs-db. |
I think it will have to be new, optional tables in contigs-db. We already have the code to mark nucleotide, codon/amino acid positions in contig sequences in contigs-db files, but they are not used outside of anvi'o structure currently. We will have to make them more accessible to mainstream programs :) The best way to get these things done is to have a project in the lab that needs this solution to be in place. That's why there is a delay currently :( |
Hi @meren I've run into an issue with my workaround and wanted to see if you had any ideas. My goal is to get list of gene calls, and a list of functions, (for an anvio curated MAG, i.e. bin in my contigs DB/ a fasta file) that I can use separately and then downstream match the The issue with my workaround is that it produces an export of gene calls with What would be the good way to go about this? I thought of modifying the function annotation commands to take in a fasta file as well but that seems pretty daunting and perhaps inefficient? |
Dear @Ge0rges, I just sent a PR for anvio-dev (#2306), which replaces prodigal with pyrodigal. The PR also includes a new parameter for
The What do you think about this as a solution? |
Hi @meren sorry this slipped by me for too long (I was on vacation last month). Your solution is great. I had worked around my work around so I lost focus of this, but this is an elegant feature to add to Anvio. Thanks for working on it! Feel free to close this issue. |
Perfect :) Thank you for the feedback, @Ge0rges. I'm closing the issue. |
The need
Identifying things like RBS motif, start codon, etc. can come in handy with gene-aware analyses. Such analyses may become more frequent in the future especially given the current effort
#2152
The solution
From @ivagljiva on discord:
Perhaps this should be relegated to the effort mentioned by @semiller10 in #2152, but I thought it pertinent to bring it up.
Beneficiaries
Anyone doing gene aware analyses.
The text was updated successfully, but these errors were encountered: