Skip to content

Database Requirements

Lyndon Coghill edited this page Jul 15, 2015 · 23 revisions

General List

In order to run the entire pipeline you'll need a couple of different sources of additional data and specific methods of accessing that data. Everything you need to get setup is outlined below. When all said and done this part can require a significant amount of free disk space (~10 GB or more) so make sure you have that free before getting started.

  1. gi_taxid dump from NCBI in sqlite format
  2. repbase (used if you want to try and filter transposable elements)

Linux Installation Guide

Install Required Software
sudo apt-get install ncbi-blast+
sudo apt-get install sqlite3 libsqlite3-dev

######Setup and Build Repbase BLAST Database In order to setup a local blast db you'll need to download the latest copy of Repbase in fasta format. The steps are below:

  1. Download Repbase
  2. Choose the eukaryote files you want to include.
  3. Place those files in a directory
  4. Download and Run convert-repbase.py from Transeeker specifying the directory where you decompressed Repbase.
  5. Build a blast DB with this fasta file:
    makeblastdb -in <REPBASE.FASTA> -out <BLASTDB FILENAME> -dbtype 'nucl'

######Setup and Prime SQLite3 Database The last primary step is the NCBI GI -> TAXID csv dump (~6GB) in sqlite3 format for fast and easy renaming of the sequences in the clusters. Later parts of the pipeline need to know the taxid of the sequences, and it's faster and easier to just change the sequence id for each sequence early on using a direct source like this instead of making repetitive Entrez calls. You'll need to download the gi_taxid_nucl.zip file:

wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/gi_taxid_nucl.zip

Then you'll need to create and prime the sqllite3 database with the csv file:

sqlite3 gi_taxid_nucl.db
sqlite> create table gi_to_ti (gi integer primary key, ti integer index);
sqlite> create index ti on gi_to_ti (ti);
sqlite> .separator "\t"
sqlite> .import gi_taxid_nucl.dmp gi_to_ti
sqlite> .quit

Mac OS X Installation Guide

COMING SOON