-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Result assessment! #30
Comments
Hi Jiulong Thank you for the kind words! I will address your questions one by one.
The last two rows in your table looks very much like Giant-viruses and were predicted by the AAI-model. I hope you find this information helpfull. Best, |
Hi @joacjo,
Thanks for your reply! |
Hi Jiulong
Gene annotation could potentially be affected by the random order, even though I think it's quite rare and I have not seen a benchmark paper on this. I.e. if the last gene on Contig 1 is partial and has no stop codon gets connected to the first gene on Contig 2 that only has a stop codon, thus creating an artificial complete gene. So worst case scenario, you might make one artificial gene for every 2 contigs in your bin, if you're very unlucky.
Best, |
Hi Joachim,
Best, |
Hi Jiulong Regarding (1) - Yes if you write the contigs of your viral bins of interest into separate fasta files, that is the most straightforward and safest way for you to do gene annotations and, plus as you say, you have your viral bins in a format suitable for dereplication with tools like dRep 👍 Hope the information was helpfull to you. I will close the issue, feel free to open another if other questions or suggestions arise. Best, |
Hi, I have some questions and would like to consult with you. I apologize for my lack of experience in metagenomic analysis, which leads me to question the practice of directly concatenating sequences within the same "bin" as the output file. I would like to understand the purpose behind this approach and how to interpret and use the output results from phamb. What is the typical purpose of using phamb? I thought that the "binning" step clusters similar contigs (perhaps to select a representative sequence for redundancy reduction or simply to understand the clustering patterns of sequences) and potentially further assembles them to obtain more complete sequences. However, if all contigs from a same bin are randomly concatenated, it seems that neither of the aforementioned purposes of binning is achieved. So, if my goal is to assemble and acquire viral sequences from a metagenomic sequencing dataset as comprehensively as possible, should I not use the fasta file output from phamb? Instead, should I run virus prediction or completeness assessment (e.g., CheckV) software on all sequences within each viral bin predicted by phamb, and then extract each viral contig sequences separately? In this case, I'm not sure when and how to make use of the results generated by phamb/vamb? To be more specific, even though the results from vamb indicate clustering, it seems that my downstream analysis still needs to be performed on individual sequences rather than clusters. The concatenated sequences outputted by phamb also cannot be considered as a complete or fragmented genome sequence for use in downstream analysis? If I could receive your answer, I would greatly appreciate it. |
Hi actledge The binning step with VAMB clusters contigs likely originating from the same genome (not similar contigs). Therefore the Phamb workflow helps you in some cases recover more complete virus genomes as vMAGs, compared to not using and evaluating single-contigs. Hope this helps! |
Hi joacjo, Thank you very much for your response,it has been really helpful to me. But I still have a slight confusion because I am relatively new to metagenomic analysis. The genome order obtained from vMAGs is indeed shuffled, and although it seems that the shuffled order may not significantly affect gene annotation and completeness assessment. But does this also mean that the vMAGs obtained through such binning approach cannot be considered as a "genome" sequence and cannot be uploaded to public databases (such as NCBI) as a draft or complete genome? Or is it generally accepted to have shuffled order for vMAGs obtained from metagenomic sequencing? Or is it because this is a strategy for virus research? Because as far as I understand, most of the current analysis software for phages or viruses defaults to treating a single sequence as a virus. Is it because of this reason that all sequences are concatenated together to form a single vMAG, instead of treating them as different "contigs" from the same genome like other species, and putting them in the same fasta file to indicate that they originate from the same genome? Thanks! |
Hi actledge Ah! If the purpose of your research is to identify and upload new and complete virus genomes to NCBI , then I would not recommend phamb which is more oriented towards virus research. For this purpose, I would advice you to run Genomad followed up by CheckV on the individual contigs. There are some recommended and strict guidelines for submitting new and complete viruses to NCBI and they do not currently cover vMAGs. Checkout this paper: https://www.nature.com/articles/s41587-023-01844-2 |
Hi developers!
Thanks for your contribution to the study field of viral ecology!
Recently, I used the PHAMB tool to identify viral bins from my bulk metagenomes, and I had some questions about the output results.
Thanks for your attention and reply in patience!
Looking forward to your reply!
Jiulong
The text was updated successfully, but these errors were encountered: