Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

build a streamlined Genbank/RefSeq based on GTDB 25k genomes #8

Closed
ctb opened this issue Jul 17, 2020 · 1 comment
Closed

build a streamlined Genbank/RefSeq based on GTDB 25k genomes #8

ctb opened this issue Jul 17, 2020 · 1 comment

Comments

@ctb
Copy link
Contributor

ctb commented Jul 17, 2020

ref #4 and #7, I think providing a small database using the GTDB 25k genomes, but with NCBI names/taxonomies instead, would be quite useful to many people.

the logic is that:

  • NCBI taxonomy is a mess, but people like it and are used to it;
  • since GTDB 25k is a nice low-redundancy collection of genomes, they're good to match against;
  • so we could provide just those genomes, but with NCBI taxonomy instead of GTDB taxonomy.

I guess we'd want to make sure the names are NCBI names where possible, and we'd want to provide a lineages CSV with it.

see also sourmash-bio/sourmash#969

@ctb
Copy link
Contributor Author

ctb commented Apr 2, 2022

closing in favor of our modern database build processes, will document elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant