Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metabuli for eukaryotes: use cases and future plans. #55

Open
jaebeom-kim opened this issue Feb 6, 2024 · 0 comments
Open

Metabuli for eukaryotes: use cases and future plans. #55

jaebeom-kim opened this issue Feb 6, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@jaebeom-kim
Copy link
Collaborator

Metabuli project was started targeting prokaryotes and viruses.
However, since we are hearing use cases for eukaryotes and some promising performance for user side,
we are planning to optimize default settings or to add some parameters for eukaryotes.
Providing a pre-built database covering both eukaryotes and prokaryotes is also listed in the to-do list.

Here are some cases of Metabuli with eukaryotes.

  1. Environmental DNA metabarcoding for surveying marine vertebrate (benchmarks)
    Metabuli showed promising performance in classifying simulated 12S and 16S amplicon data of marine vertebrates
    Working parameters: --seq-mode 1 --min-cons-cnt-euk 4 --tie-ratio 0.99

  2. Test Metabuli for fungi.
    With --min-cons-cnt-euk 4, Metabuli correctly classified 97% of paired-end reads simulated from a fungal species when its genome is included in DB.
    But the percentage was dropped to 12% with the default setting (--min-cons-cnt 9).

For now, --min-cons-cnt-euk is thought to be a critical parameter.
It determines the minimum number of consecutive k-mer hits to be classified.
The strict default value of --min-cons-cnt-euk 9 was decided on older version of Metabuli as a quick remedy to reduce false positive eukaryote hits resulted by their larger genomes.
Even though we added noise filtering steps to reduce the false positives, we didn't tweak the value for eukaryotes.
Based on the user's report, setting --min-cons-cnt-euk as lower value like 4 or 5 would be good for now.
After some tests, we will make a new releases with an optimized default value.

+++
Please share your thoughts on how and what to optimize Metabuli for eukaryotes!
It helps us a lot to make Metabuli more useful for your research.

@jaebeom-kim jaebeom-kim added the enhancement New feature or request label Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant