Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improving results for nanopore #2236

Open
jsgounot opened this issue Aug 23, 2022 · 6 comments
Open

Improving results for nanopore #2236

jsgounot opened this issue Aug 23, 2022 · 6 comments

Comments

@jsgounot
Copy link

Hi. I explore the possibility to use sourmash to identify isolate origin based on nanopore data. Each sample is supposed to have only one species. I know that ONT reads are not ideal for a k-mer approach but as reported in this tracker, at least one paper used those for a paper. I tried to use gather with or without trimming (even though it's not really appropriate, trim-low-abund.py -C 3 -Z 18 -V -M 2e9) and I while the best hit seems concordant with what is expected, the f_orig_query is very low both for raw (mean=2%) and trimmed (mean=5%) data. Did you explore some other sourmash or khmer parameters to improve results with nanopore reads?

@ctb
Copy link
Contributor

ctb commented Sep 22, 2022

this came across lab slack today -

https://labs.epi2me.io/progressive-kraken2/

Luiz said:

Granularity is different (reads, not contigs/genomes), but would be fun to try
with sourmash (maybe with a s=100 db it would work with reads too?)

@ctb
Copy link
Contributor

ctb commented Nov 16, 2022

hi @jsgounot this paper systematically confirms that ONT messes up sourmash -

Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets: https://www.biorxiv.org/content/10.1101/2022.01.31.478527v2

See Fig 3 in particular; screenshot:

Screen Shot 2022-11-16 at 6 08 45 AM

It seems pretty clear that the error profile for nanopore is terrible for sourmash :(.

@dportik, @bluegenes and I are thinking of doing a bit more exploring, but we have no simple solution to offer. thoughts welcome!

@ctb
Copy link
Contributor

ctb commented Nov 16, 2022

(see #2360 for some discussion of thresholding that is not entirely irrelevant ;)

@jsgounot
Copy link
Author

hi @ctb, thank for you sharing this. Looks like the MEGAN-LR is the good approach for this kind of data at the moment, do you share the same conclusion?

@ctb
Copy link
Contributor

ctb commented Nov 17, 2022

that's my reading as well but @bluegenes @dportik should weigh in!

@dportik
Copy link

dportik commented Nov 17, 2022

Hi @jsgounot - as @ctb mentioned the error profile of ONT appears to negatively affect sourmash's performance (at least for now).

There are two good options for ONT. We found BugSeq actually had the best performance - it is highly tuned to ONT. But, that is a cloud-based analysis and you've got to sign up for it. If you are looking for a DIY, I would recommend the DIAMOND & MEGAN-LR approach. That pipeline is available as a snakemake workflow at https://github.com/PacificBiosciences/pb-metagenomics-tools. If you choose to make an independent pipeline for this, just be aware there are some landmines involved with getting the DIAMOND outputs into MEGAN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants