-
Notifications
You must be signed in to change notification settings - Fork 3
Blasting
sr320 edited this page Aug 5, 2014
·
1 revision
#BLASTing
Below is a list of the different kind of blast algorithms.
Which you will find on commandline ...
dhcp075:ipynbs sr320$ cd /Applications/BLAST/ncbi-blast-2.2.29+/bin
dhcp075:bin sr320$ ls
blast_formatter deltablast rpstblastn
blastdb_aliastool dustmasker segmasker
blastdbcheck legacy_blast.pl tblastn
blastdbcmd makeblastdb tblastx
blastn makembindex update_blastdb.pl
blastp makeprofiledb windowmasker
blastx psiblast
convert2blastmask rpsblast
In terms of databases you can make any fasta file a database. NCBI does make all of their databases available at
ftp://ftp.ncbi.nlm.nih.gov/blast/db/
From the README file
2. Contents of the /blast/db/ directory
The pre-formatted BLAST databases are archived in this directory. The
name of these databases and their contents are listed below.
+----------------------+-----------------------------------------------+
|File Name | Content Description |
+----------------------+-----------------------------------------------+
/FASTA | subdirectory for FASTA formatted sequences
README | README for this subdirectory (this file)
env_nr.*tar.gz | Environmental protein sequences
env_nt.*tar.gz | Environmental nucleotide sequences
est.*tar.gz | volumes of the formatted est database
| from the EST division of GenBank, EMBL,
| and DDBJ
est_human.tar.gz | alias and mask files for human subset of the est
est_mouse.tar.gz | alias and mask files for mouse subset of the est
est_others.tar.gz | alias and mask files for non-human and non-mouse
| subset of the est database
| These alias and mask files need all volumes of
| est to function properly.
gss.*tar.gz | volumes of the formatted gss database
| from the GSS division of GenBank, EMBL, and
| DDBJ
htgs.*tar.gz | volumes of htgs database with entries
| from HTG division of GenBank, EMBL, and DDBJ
human_genomic.*tar.gz | human RefSeq (NC_######) chromosome records
| with gap adjusted concatenated NT_ contigs
nr.*tar.gz | non-redundant protein sequence database with
| entries from GenPept, Swissprot, PIR, PDF, PDB,
| and NCBI RefSeq
nt.*tar.gz | nucleotide sequence database, with entries
| from all traditional divisions of GenBank,
| EMBL, and DDBJ excluding bulk divisions (gss,
| sts, pat, est, and htg divisions. wgs entries
| are also excluded. Not non-redundant.
other_genomic.*tar.gz | RefSeq chromosome records (NC_######) for
| organisms other than human
pataa.*tar.gz | patent protein sequence database
patnt.*tar.gz | patent nucleotide sequence database
| The above two databases are directly from
| USPTO or from EU/Japan Patent Agencies via
| EMBL/DDBJ
pdbaa.*tar.gz | protein sequences from pdb protein structures,
| its parent database is nr.
pdbnt.*tar.gz | nucleotide sequences from pdb nucleic acid
| structures, its parent database it nt. They are
| NOT the protein coding sequences for the
| corresponding pdbaa entries.
refseq_genomic.*tar.gz | NCBI genomic reference sequences
refseq_protein.*tar.gz | NCBI protein reference sequences
refseq_rna.*tar.gz | NCBI Transcript reference sequences
sts.*tar.gz | Sequences from the STS division of GenBank, EMBL,
| and DDBJ
swissprot.tar.gz | swiss-prot sequence databases (last major update),
| its parent database is nr.
taxdb.tar.gz | Additional taxonomy information for the formatted
| database (contains common and scientific names)
wgs.*tar.gz | volumes for whole genome shotgun sequence assemblies
| for different organisms