Skip to content

rehrlich/prokka_database_maker

Repository files navigation

WARNING: Prokka doesn't recommend using a genus database anymore, so you probably don't want to make one.

This is a tool for creating genus databases for Prokka. The main input is a list of ncbi taxids. This program finds each taxid in the ncbi taxonomy tree, identifies all descendants, downloads the associated genome files from ncbi, and creates a single Prokka genus database.

Requirements: python 3.5 pandas Prokka https://github.com/tseemann/prokka

Usage: database_maker.py [-h] [-d DB_DIR] [-n NAME] [-o OUTDIR] [-a] [-i] -t TAXID [TAXID ...]

Create a genus database for use with Prokka

optional arguments: -h, --help show this help message and exit -d DB_DIR, --db_dir DB_DIR The directory for the database files Example: /path/to/prokka/db/genus (default: database_files/db) -n NAME, --name NAME A name for the genus database (default: new_genus_database) -o OUTDIR, --outdir OUTDIR A directory for storing intermediate outputs (default: database_files) -a, --all_files Use -a if you want to download all files, don't use it if you only want the files for the genus database (default: False) -i, --include_incomplete Use -i if you want incomplete genomes, don't use it if you don't. (default: False)

Required argument: -t TAXID [TAXID ...], --taxid TAXID [TAXID ...] The ncbi taxids for the database (default: None)

Detailed explanation of arguments: DB_DIR - This is where the final database files will be stored. In order to be used by Prokka, the files will eventually need to be in /path/to/prokka/db/genus, but you can move them manually later. NAME - This is the name of the database. When using the database with Prokka, this will be the argument to --genus. Remember to also set the --usegenus flag. ALL_FILES - Include this flag to use the progmram as a batch download tool. It will only download the current assemblies. INCLUDE_INCOMPLETE - I use this flag when creating a Prokka database unless there aren't many complete genomes. Using this flag will only download the current assemblies. TAXID - This is a list of ncbi taxonomy ids. This is a good tool for looking up ncbi taxonomy id's http://www.ncbi.nlm.nih.gov/Taxonomy/TaxIdentifier/tax_identifier.cgi. At least one taxonomy id is required so '-t 1' is valid and so is '-t 1 2'

Usage as a batch download tool: Set -a to use. This program can be used to download all files associated with the latest version of an assembly. ftp://ftp.ncbi.nlm.nih.gov/genomes/genbank/bacteria/Escherichia_coli/all_assembly_versions/GCA_000005845.2_ASM584v2 is an example of all of the files from an assembly. Files (and directories) containing the phrase 'assembly_structure' will not be downloaded for legacy reasons related to symbolic links in the ncbi file structure. If you need them, open an issue and include an ftp path that includes those files. Prokka is not required to use this as a batch download tool. The creation of the Prokka database is the final step of the program so it won't hurt anything when it fails.

Installing Prokka: Instructions are here: https://github.com/tseemann/prokka You only need to be able to use prokka-genbank_to_fasta_db, cd-hit, and makeblastdb so you may not have to install everything.

Installing python: The easy way to install python is to use Anaconda. This will download a lot of things you don't need right now. Go to https://www.continuum.io/downloads and follow the instructions for pythonn 3.5 for linux.

The slightly more difficult way is to use Miniconda which is like Anaconda but doesn't include the stuff you don't need. Go to http://conda.pydata.org/miniconda.html and follow the instructions for pythonn 3.5 for linux. After installing go to the command line and type: conda install pandas

If you don't want another package manager and just want python you can download it at https://www.python.org/downloads. Install pandas by typing: pip install pandas It has some dependencies. pip install should help with that.

About

Create a genus database to use with Prokka

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages