Skip to content
sr320 edited this page Aug 5, 2014 · 1 revision

#BLASTing

Below is a list of the different kind of blast algorithms.
BLAST__Basic_Local_Alignment_Search_Tool_19911BE7.png

Which you will find on commandline ...

dhcp075:ipynbs sr320$ cd /Applications/BLAST/ncbi-blast-2.2.29+/bin    
dhcp075:bin sr320$ ls
blast_formatter		deltablast		rpstblastn
blastdb_aliastool	dustmasker		segmasker
blastdbcheck		legacy_blast.pl		tblastn
blastdbcmd		makeblastdb		tblastx
blastn			makembindex		update_blastdb.pl
blastp			makeprofiledb		windowmasker
blastx			psiblast
convert2blastmask	rpsblast

In terms of databases you can make any fasta file a database. NCBI does make all of their databases available at

ftp://ftp.ncbi.nlm.nih.gov/blast/db/

Index_of__blast_db__19911CBE.png

From the README file

2. Contents of the /blast/db/ directory

The pre-formatted BLAST databases are archived in this directory. The 
name of these databases and their contents are listed below.
+----------------------+-----------------------------------------------+
|File Name             | Content Description                           |
+----------------------+-----------------------------------------------+
/FASTA                 | subdirectory for FASTA formatted sequences
    
README                 | README for this subdirectory (this file)

env_nr.*tar.gz         | Environmental protein sequences
env_nt.*tar.gz         | Environmental nucleotide sequences

est.*tar.gz            | volumes of the formatted est database
                       | from the EST division of GenBank, EMBL, 
                       | and DDBJ

est_human.tar.gz       | alias and mask files for human subset of the est
est_mouse.tar.gz       | alias and mask files for mouse subset of the est
est_others.tar.gz      | alias and mask files for non-human and non-mouse
                       | subset of the est database
                       | These alias and mask files need all volumes of
                       | est to function properly.

gss.*tar.gz            | volumes of the formatted gss database
                       | from the GSS division of GenBank, EMBL, and
                       | DDBJ

htgs.*tar.gz           | volumes of htgs database with entries
                       | from HTG division of GenBank, EMBL, and DDBJ

human_genomic.*tar.gz  | human RefSeq (NC_######) chromosome records
                       | with gap adjusted concatenated NT_ contigs
 
nr.*tar.gz             | non-redundant protein sequence database with 
                       | entries from GenPept, Swissprot, PIR, PDF, PDB,
                       | and NCBI RefSeq

nt.*tar.gz             | nucleotide sequence database, with entries 
                       | from all traditional divisions of GenBank,  
                       | EMBL, and DDBJ excluding bulk divisions (gss, 
                       | sts, pat, est, and htg divisions. wgs entries
                       | are also excluded. Not non-redundant.

other_genomic.*tar.gz  | RefSeq chromosome records (NC_######) for 
                       | organisms other than human

pataa.*tar.gz          | patent protein sequence database
patnt.*tar.gz          | patent nucleotide sequence database
                       | The above two databases are directly from 
                       | USPTO or from EU/Japan Patent Agencies via 
                       | EMBL/DDBJ

pdbaa.*tar.gz          | protein sequences from pdb protein structures,
                       | its parent database is nr.
pdbnt.*tar.gz          | nucleotide sequences from pdb nucleic acid 
                       | structures, its parent database it nt. They are 
                       | NOT the protein coding sequences for the 
                       | corresponding pdbaa entries.

refseq_genomic.*tar.gz | NCBI genomic reference sequences
refseq_protein.*tar.gz | NCBI protein reference sequences
refseq_rna.*tar.gz     | NCBI Transcript reference sequences

sts.*tar.gz            | Sequences from the STS division of GenBank, EMBL,
                       | and DDBJ

swissprot.tar.gz       | swiss-prot sequence databases (last major update),
                       | its parent database is nr.

taxdb.tar.gz           | Additional taxonomy information for the formatted 
                       | database (contains common and scientific names)

wgs.*tar.gz            | volumes for whole genome shotgun sequence assemblies 
                       | for different organisms
                       
Clone this wiki locally