Skip to content

horkko/taxadb

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

82 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Taxadb

Build Status PyPI LICENSE

Taxadb is a application to locally query the ncbi taxonomy. Taxadb is written in python, and access its database using the peewee library.

Taxadb is very much a work in progress, the following are still not implemented:

  • taxadb download: download all the required files from the ncbi ftp
  • taxadb create: build the sqlite database
  • API: python library to query the database

Installation

Taxadb requires python 3.5 to work. To install, simply type the following in your terminal:

pip install taxadb

Available databases

The databases used by Taxadb are lengthy to build, therefore we provide pre-built databases. They are available for download below.

Sqlite

Name Size Size (gzipped) download link
full 21G 4.4G link
nucl 14G 2.9G link
prot 7.1G 1.6G link
gb 2.5G 576M link
wgs 8.5G 1.9G link
gss 880M 172M link
est 1.6G 320M link

Build date: December 2016

Usage

Querying the Database

Firstly, make sure you have downloaded or built the database

Below you can find basic examples. For more complete examples, please refer to the complete documentation (Available soon!)

    >>> from taxadb import taxid

    >>> name = taxid.sci_name(33208, 'mydb.sqlite')
    >>> print(name)
    Metazoa

    >>> lineage = taxid.lineage_name(33208, 'mydb.sqlite')
    >>> print(lineage)
    ['Metazoa', 'Opisthokonta', 'Eukaryota', 'cellular organisms']

To get the taxonomic information for accession numbers, you need to know from which ncbi division it originated. Example with accession numbers from the gb division:

    >>> from taxadb.schema import *
    >>> from taxadb import accession

    >>> my_accessions = ['X17276', 'Z12029']
    >>> taxids = accession.taxid(my_accessions, 'mydb.sqlite', Gb)
    >>> taxids
    <generator object taxid at 0x1051b0830>

    >>> for tax in taxids:
        print(tax)
    ('X17276', 9646)
    ('Z12029', 9915)

Creating the Database

Sqlite

The following commands will download the necessary files from the ncbi ftp and build a database called taxadb.sqlite in the current directory

taxadb download -o taxadb
taxadb create -i taxadb --dbname taxadb

You can then safely remove the downloaded files

rm -r taxadb

MySQL

Due to a problem with Foreign Keys, MySQL support has been put on hold for the time being

Creating databases is a very vendor specific task. Peewee, as most ORMs, can create tables but not databases. In order to use taxadb with MySQL, you'll have to create the database yourself.

Connect to your mysql server

mysql -u $user -p
mysql> create database taxadb;

then run taxadb

taxadb download -o taxadb
taxadb create -i taxadb -d taxadb -t mysql -u $user -p $password

License

Code is under the MIT license.

Issues

Found a bug or have a question? Please open an issue

Contributing

Thought about a new feature that you'd like us to implement? Open an issue or fork the repository and submit a pull request

About

🐣 locally query the ncbi taxonomy --

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%