Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Availability of old gtdb databases? #41

Closed
fplazaonate opened this issue Nov 11, 2023 · 5 comments
Closed

Availability of old gtdb databases? #41

fplazaonate opened this issue Nov 11, 2023 · 5 comments

Comments

@fplazaonate
Copy link

Hi @shenwei356,

I am performing some benchmarks and I was wondering if the gtdb r207 database was still available for download?

Best,
Florian

@shenwei356
Copy link
Owner

shenwei356 commented Nov 11, 2023

Yes, they are still there, follow the download page: https://1drv.ms/u/s!Ag89cZ8NYcqtjHwpe0ND3SUEhyrp?e=QDRbEC

path: kmcp/v2021.12/metagenomic-profiling

@fplazaonate
Copy link
Author

Hi @shenwei356 ,
I have just reopened the issue.
The gtdb v2021.12 database has 47,894 representative genomes.
It corresponds to gtdb r202 not gtdb r207 (https://gtdb.ecogenomic.org/stats/r202).
Do you have a prebuilt database for gtdb r207?

@shenwei356
Copy link
Owner

Oh .... I see. Sorry, I don't think I have a prebuilt database for gtdb r207. But you can make one, it's easy.

Please follow these steps: https://bioinf.shenwei.me/kmcp/database/#gtdb . Please skip the step of "Masking prophage regions and removing plasmid sequences with genomad (optional)".

@fplazaonate
Copy link
Author

# reference genomes are split into 10 chunks with 100bp overlap
I think it is 150bp

@fplazaonate
Copy link
Author

I successfully created the database.
Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants