-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add long-read support #101
Comments
This won't be in the 2019Q2 update. Tentatively targeting this for end-of-year 2019. |
What's the current status? People love those long reads! 🧬 |
Waiting on the R package 1.12 to propagate through bioconda to Q2, as that made some important long read improvements. |
IMO, this is the next bit of functionality I'd really like to propagate through to the plugin. First step would be to upgrade Q2 to the 1.14 version of the package or later. Then, it will be pretty straightforward to add a Also consider adding |
Are we ready to try to do this? It looks like we're still on |
Can't add this functionality until Q2's version Versions up to 1.16 are already available through bioconda: https://bioconda.github.io/recipes/bioconductor-dada2/README.html |
I finished installing qiime2 recently, and DADA2 is still 1.10. When are the upgrades planned to incorporate PacBio Long Reads? @benjjneb Would you suggest upgrading the installed version and modifying the existing script (for Pyro or single ends) to incorporate PacBio specific error profiles? |
@harish0201 DADA2 1.10 is an excellent release of DADA2 that hold up to this day. That said, yes it is missing some features needed for long-read amplicon sequencing. I don't know when the Q2 DADA2 version will be updated. Until then, on could modify the the script in place on a Q2 install to achieve something like PacBio long-read processing, but I couldn't in good faith recommend that. Can you do the initial data processing in R, and then import into Q2? That would probably be a better idea at this ponit in time. |
Thanks for the idea! I'm planning to do the same, but I had a doubt.
I'll have to run dada2 till bimera removal and taxon assignments? Once that
is done, can I export the outcome in a biom format? Or should I go towards
phyloseq and get it done?
…On Tue, Oct 20, 2020, 22:51 Benjamin Callahan ***@***.***> wrote:
@harish0201 <https://github.com/harish0201> DADA2 1.10 is an excellent
release of DADA2 that hold up to this day. That said, yes it is missing
some features needed for long-read amplicon sequencing.
I don't know when the Q2 DADA2 version will be updated.
Until then, on could modify the the script in place on a Q2 install to
achieve something like PacBio long-read processing, but I couldn't in good
faith recommend that. Can you do the initial data processing in R, and then
import into Q2? That would probably be a better idea at this ponit in time.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#101 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACFMFNYTTL2ERYTPRIVYG6TSLXBIBANCNFSM4FWL3CWA>
.
|
Or even just until chimera removal if you want to go back to QIIME2 and use one of their taxonomy tools.
I think so, but I suspect there is a straightforward guide for moving data to/from R and QIIME2 already out there somewhere. Might want to search the QIIME2 forum, I think something like that has been posted there before.
You can also do downstream analysis in R/phyloseq if you are more comfortable with that route. But I don't think moving data between R/Q2 will be that hard assuming that guide can be found. |
I think I have implemented the method But the behavior is a little bit different between the Then I compare the filterAndTrim result between the original
I have already change the R package into same version: I am not sure why those two ways behave different. |
@sixvable Interesting! Will evaluate and get to you! |
@sixvable Great work!
I'll note that the Zymo processing in the paper was done on DADA2 version 1.12.1, not 1.10. |
Test the R code version at 1.10.1,1.12.1 and 1.16 at both linux and windows. Always got the same result! I do not think it is because of the dada2 version. Could you try the qiime2 version? |
My guess (also from my own experience) is that the discrepancy then is caused by a mismatch in how arguments are passed from the Q2 call into the R script. Unforunately because the R script depends entirely on positional arguments, it is more susceptible to difficult to recognize errors at that step. Is it possible to have the Rscript echo out the exact arguments it is using in the |
Sorry , that is beyond my code ability. Have no idea to implement that. |
I think adding the following code directly before the
|
I find a mistake in Any way, thank you @benjjneb for patiently answering my questions! |
It looks like dada2 1.18 has been incorporated into the latest Q2 branch. fb5b7ff Pending busywork updates to make sure it is passing, this would now enable straightforward implementation of an official |
I think Danger: Some of the forward PHRED quality values are out of range. This is likely because an incorrect PHRED offset was chosen on import of your raw data. You can learn how to choose your PHRED offset during import in the importing tutorial. |
Does Pacbio CCS use something different than PHRED 33 for encoding quality scores? Can you provide some links or other information? |
Pacbio CCS use the same PHRED 33 but normally with higher Q value than illumina (most of the CCS sequences base quality ascii character is ~ as 92 for q-value). The Quality Plot in |
This is the key difference -- much, much higher Q scores in the fastqs produced by PacBio CCS basecaller than in other technologies. |
Hello @sixvable! Thank you so much for your efforts on making this new function! I've followed the instruction on your GitHub page, and run my data with the following commands. However, it is stuck at "An error was encountered while running DADA2 in R (return code 127)". Detailed information of the error are attached below. Do you have any ideas about which part went wrong? Any help from you will be greatly appreciated! --i-demultiplexed-seqs single-end-demux34.qza --p-front AGRGTTTGATCMTGGCTCAG --p-adapter GGGTTACCTTGTTACGACTT --p-trunc-len 0 --p-min-len 1000 --p-max-len 1600 --p-n-threads 0 --output-dir dada_ccs --verbose Command: run_dada_ccs.R /var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/qiime2-archive-w7ksn8h7/2c82a2ca-8402-46d1-8fb7-e0cfc427a21b/data /var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/output.tsv.biom /var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/track.tsv /var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/nop /var/folders/xb/yclwjhdn4b93p249x541cg3c0000gn/T/tmpuyh28ely/filt AGRGTTTGATCMTGGCTCAG GGGTTACCTTGTTACGACTT 2 False 0 0 2.0 2 1000 1600 independent consensus 3.5 0 1000000 env: Rscript\r: No such file or directory During handling of the above exception, another exception occurred: Traceback (most recent call last): Plugin error from dada2: An error was encountered while running DADA2 in R (return code 127), please inspect stdout and stderr to learn more. |
That is strange. I guess your QIIME2 environment is broken. Try to remove and reinstall the whole environment and follow the instruction in repo https://github.com/sixvable/q2-dada2-CCS. |
Thank you very much! I will try for it. Do I need to prepare anything in R? Downloading certain packages? It was said that the error was encountered while running in R. |
Just checking on this, we have a few local researchers inquiring about using QIIME2 for PacBio (at the moment we're steering them to using DADA2 directly). |
Improvement Description
Add long-read support
Current Behavior
We've added support for PacBio long-read amplicon sequencing to the devel version of the R package and it seems to work quite well. Preprint: High-throughput amplicon sequencing of the full-length 16S rRNA gene with single-nucleotide resolution.
Proposed Behavior
I think it will make sense to add this as a tech-specific
denoise-pacbio
command in the plugin. This is similar to thedenoise-pyro
approach already in the plugin, with the purpose of having a dedicated command being to automatically turn on the right flags and options for PacBio data rather than relying on the user to do so. There is a downside in the repetition of much of the code between the differentdenoise-[technology]
commands.References
The text was updated successfully, but these errors were encountered: