# BLAST Docker Jupyter Notebook

This notebook is created from NCBI's [BLAST Docker documentation](https://github.com/ncbi/docker/tree/master/blast) using a customized [BLAST]((https://www.ncbi.nlm.nih.gov/books/NBK279690/) and [E-direct](https://www.ncbi.nlm.nih.gov/books/NBK179288/) Docker image. 


# What is NCBI BLAST?<a class="anchor" id="what-is-ncbi-blast"></a>
The National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool [(BLAST)](https://www.ncbi.nlm.nih.gov/pubmed/2231712) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.

For a full description of the features and capabilities of BLAST+, please refer to the [BLAST Command Line Applications User Manual](https://www.ncbi.nlm.nih.gov/books/NBK279690/).

## How to use this notebook?<a class="anchor" id="how-to-use-this-image"></a>

Jupyter Notebook is a powerful way to share free text and code.  Take a look at the [documentation](https://jupyter-notebook-beginner-guide.readthedocs.io/en/latest/) if you are not familiar with Jupyter Notebook.  The tools are already installed using Docker in the environment that generated this notebook.  


### Data provisioning<a class="anchor" id="data-provisioning"></a>
To create directories to save data, please run the following command block.  First, click anywhere inside the code box (in grey), then click the "Run" button above (or by pressing shift + enter).


In [4]:
!cd 
!mkdir blastdb queries fasta results blastdb_custom
!ls -al

mkdir: blastdb: File exists
mkdir: queries: File exists
mkdir: fasta: File exists
mkdir: results: File exists
mkdir: blastdb_custom: File exists
total 40
drwxr-xr-x  9 hsang  staff    288 Apr 16 11:18 [34m.[m[m
drwxr-xr-x  3 hsang  staff     96 Apr 16 10:13 [34m..[m[m
drwxr-xr-x  3 hsang  staff     96 Apr 16 10:13 [34m.ipynb_checkpoints[m[m
-rw-r--r--  1 hsang  staff  17474 Apr 16 11:17 Untitled.ipynb
drwxr-xr-x  2 hsang  staff     64 Apr 16 11:18 [34mblastdb[m[m
drwxr-xr-x  2 hsang  staff     64 Apr 16 11:18 [34mblastdb_custom[m[m
drwxr-xr-x  2 hsang  staff     64 Apr 16 11:18 [34mfasta[m[m
drwxr-xr-x  2 hsang  staff     64 Apr 16 11:18 [34mqueries[m[m
drwxr-xr-x  2 hsang  staff     64 Apr 16 11:18 [34mresults[m[m


To populate these directories with sample data used in these examples, please
run the commands below:

In [None]:
!efetch -db protein -format fasta \
    -id P01349 > $HOME/queries/P01349.fsa
!efetch -db protein -format fasta \
    -id Q90523,P80049,P83981,P83982,P83983,P83977,P83984,P83985,P27950 \
    > $HOME/fasta/nurse-shark-proteins.fsa

### Install NCBI-provided BLAST databases<a class="anchor" id="install-ncbi-provided-blast-databases"></a>

The following command will download the `swissprot_v5` BLAST database from
Google Cloud Platform (GCP) into `$HOME/blastdb`

In [None]:
!update_blastdb.pl --source gcp swissprot_v5

### Make and install my own BLAST databases<a class="anchor" id="make-and-install-my-own-blast-databases"></a>

If you have your own sequence data in a file called
`$HOME/fasta/sequences.fsa` and want to make a BLAST database, please run the
command below:

In [None]:
!makeblastdb -in /blast/fasta/nurse-shark-proteins.fsa -dbtype prot \
    -parse_seqids -out nurse-shark-proteins -title "Nurse shark proteins" \
    -taxid 7801 -blastdb_version 5

To verify the newly created BLAST database above, one can run the command
below to display the accessions, sequence length and common name of the
sequences in the database:

In [None]:
!blastdbcmd -entry all -db nurse-shark-proteins -outfmt "%a %l %C"

### Show available BLAST databases on local host<a class="anchor" id="show-avaiable-blast-databases-on-local-host"></a>

In [None]:
!blastdbcmd -list /blast/blastdb -remove_redundant_dbs

### Show BLAST databases available for download from NCBI<a class="anchor" id="show-blast-databases-available-to-download-from-ncbi"></a>

In [None]:
!update_blastdb.pl --showall --source ncbi

For instructions on how to download them, please [the documentation for update_blastdb.pl][update_blastdb_doc].

### Show BLAST databases available for download from GCP<a class="anchor" id="show-blast-databases-available-for-download-from-gcp"></a>

*This feature is experimental*.  

In [None]:
!update_blastdb.pl --showall pretty --source gcp
 

For instructions on how to download them, please [the documentation for update_blastdb.pl](https://www.ncbi.nlm.nih.gov/books/NBK62345/).

## Running BLAST<a class="anchor" id="running-blast"></a>

To run a BLAST search, one can issue the following command:


In [None]:
!blastp -query /blast/queries/P01349.fsa -db nurse-shark-proteins -out /blast/results/blastp.out

The results will be stored on the local host's `$HOME/results` directory.