long read metagenomic profiling #27

JensUweUlrich · 2023-03-27T19:43:56Z

Dear Wei Shen,
I really like your tool and your tutorials. I just have a question regarding long read metagenomic profiling. Is there a specific parameter combination you would recommend to use to taxonomic profiling? It seems like I'm missing some organisms from the Zymo Mock Community even when using profiling mode m=0.
Thanks
Jens

shenwei356 · 2023-03-28T01:06:32Z

Thanks for your interest.

KMCP is only suitable for short-read metagenomic profiling, with much lower sensitivity on long-read datasets. My initial plan was to support both short and long reads. But the read matching strategy, i.e., keeping reads with enough (>= 50% ) k-mers contained in a genome chunk, is of low sensitivity for long reads, even for HIFI reads.

Some strategies were tried, but the results were out of expectation.

Setting a lower similarity threshold. For our probabilistic data structure, lower thresholds will significantly increase the false-positive rates of a read, though the FPR can also be reduced at the cost of bigger databases.
Using sketching algorithm. ScaledMinash, Closed Syncmers, and Minimizer were all implemented (available in the current version) and tested, but they didn't work well on error-prone long reads with lower sensitivity. Though tools like minimap2 benefit from Minimizer with location information for seeding and chaining in sequence alignment, we failed to utilize them in taxonomic profiling.
Using multiple k-mers. K-mers of different lengths, e.g., 17, 21, 31, didn't do better than a single value and doubled the database size.
Using Simhash with a higher tolerance than k-mer on base substitution. It's slower and has lower sensitivity unexpectedly.
Breaking long reads into short ones. It only applies to HIFI reads, but the strength of the long reads is wasted.

shenwei356 · 2023-06-14T04:47:43Z

The answer is added to FAQs page: https://bioinf.shenwei.me/kmcp/faq/#does-kmcp-support-long-read-metagenomic-profiling

AstrobioMike mentioned this issue May 22, 2023

suitable for CDS and/or contig taxonomic assignment? #28

Open

shenwei356 added the documentation Improvements or additions to documentation label Jun 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

long read metagenomic profiling #27

long read metagenomic profiling #27

JensUweUlrich commented Mar 27, 2023

shenwei356 commented Mar 28, 2023

shenwei356 commented Jun 14, 2023

long read metagenomic profiling #27

long read metagenomic profiling #27

Comments

JensUweUlrich commented Mar 27, 2023

shenwei356 commented Mar 28, 2023

shenwei356 commented Jun 14, 2023