Improving results for nanopore #2236

jsgounot · 2022-08-23T09:04:38Z

Hi. I explore the possibility to use sourmash to identify isolate origin based on nanopore data. Each sample is supposed to have only one species. I know that ONT reads are not ideal for a k-mer approach but as reported in this tracker, at least one paper used those for a paper. I tried to use gather with or without trimming (even though it's not really appropriate, trim-low-abund.py -C 3 -Z 18 -V -M 2e9) and I while the best hit seems concordant with what is expected, the f_orig_query is very low both for raw (mean=2%) and trimmed (mean=5%) data. Did you explore some other sourmash or khmer parameters to improve results with nanopore reads?

The text was updated successfully, but these errors were encountered:

ctb · 2022-09-22T11:51:57Z

this came across lab slack today -

https://labs.epi2me.io/progressive-kraken2/

Luiz said:

Granularity is different (reads, not contigs/genomes), but would be fun to try
with sourmash (maybe with a s=100 db it would work with reads too?)

ctb · 2022-11-16T14:11:59Z

hi @jsgounot this paper systematically confirms that ONT messes up sourmash -

Evaluation of taxonomic classification and profiling methods for long-read shotgun metagenomic sequencing datasets: https://www.biorxiv.org/content/10.1101/2022.01.31.478527v2

See Fig 3 in particular; screenshot:

It seems pretty clear that the error profile for nanopore is terrible for sourmash :(.

@dportik, @bluegenes and I are thinking of doing a bit more exploring, but we have no simple solution to offer. thoughts welcome!

ctb · 2022-11-16T14:12:43Z

(see #2360 for some discussion of thresholding that is not entirely irrelevant ;)

jsgounot · 2022-11-17T01:18:27Z

hi @ctb, thank for you sharing this. Looks like the MEGAN-LR is the good approach for this kind of data at the moment, do you share the same conclusion?

ctb · 2022-11-17T02:19:44Z

that's my reading as well but @bluegenes @dportik should weigh in!

dportik · 2022-11-17T22:26:38Z

Hi @jsgounot - as @ctb mentioned the error profile of ONT appears to negatively affect sourmash's performance (at least for now).

There are two good options for ONT. We found BugSeq actually had the best performance - it is highly tuned to ONT. But, that is a cloud-based analysis and you've got to sign up for it. If you are looking for a DIY, I would recommend the DIAMOND & MEGAN-LR approach. That pipeline is available as a snakemake workflow at https://github.com/PacificBiosciences/pb-metagenomics-tools. If you choose to make an independent pipeline for this, just be aware there are some landmines involved with getting the DIAMOND outputs into MEGAN.

ctb mentioned this issue Apr 21, 2023

Low recovered matches for nanopore? #2585

Open

ezherman mentioned this issue Mar 8, 2024

Best practice for ONT metagenomics #3070

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improving results for nanopore #2236

Improving results for nanopore #2236

jsgounot commented Aug 23, 2022

ctb commented Sep 22, 2022

ctb commented Nov 16, 2022

ctb commented Nov 16, 2022

jsgounot commented Nov 17, 2022

ctb commented Nov 17, 2022

dportik commented Nov 17, 2022

Improving results for nanopore #2236

Improving results for nanopore #2236

Comments

jsgounot commented Aug 23, 2022

ctb commented Sep 22, 2022

ctb commented Nov 16, 2022

ctb commented Nov 16, 2022

jsgounot commented Nov 17, 2022

ctb commented Nov 17, 2022

dportik commented Nov 17, 2022