# *Blast Command Line Tutorial*

We will discuss basic usage of the blast+ tools (available in graham and cedar or at the [NCBI FTP](ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)).

## *Objectives / learning outcomes:*
At the end of this tutorial you shoud be able to:
1. Download databases from NCBI 
2. Create your own custom local database
3. Do a basic/intermediate nucleotide blast search
4. Change the search parameters to fulfill your needs
5. Search for help regarding the available parameters

## *Prerequisites:*
Before we start, make sure you went over your bash notes, since we will be using several of the commands we saw in prevoious tutorials. Despite I will touch briefly on the web-based blast, I am assuming you know the basics on how to make blast searches on the [BLAST website](https://blast.ncbi.nlm.nih.gov/Blast.cgi).


I will discuss only the `blastn` (nucleotide blast) for time sake, but will comment on other types of blasts briefly.

## Outline of the tutorial
1. Introduction to BLAST: Types of databases, searches (this is brief, remember to brush up on this)
2. Downloading databases from NCBI FTP with `update_blastdb.pl`
3. Creating local databases with `makeblastdb`
4. Basic `blastn` search
5. Tunning parameters
6. Basic bash manipulation of output

## Before we start
Log in into your Compute Canada account, create a folder for this tutorial (e.g Blast_tutorial), and open an interactive shell with salloc as we explained last tutorial with 2GB of memory. in the terminal type (remember to change the account for your own group if you are not in Cristescu lab):

```bash
salloc -A def-mcristes --mem=2000 
module load nixpkgs/16.09  gcc/5.4.0 blast+/2.6.0
update_blastdb.pl patnt --decompress
```


Let it run in the background, we will get back at this, but it takes some time (about 10 mins after allocation)!!

## Introduction to BLAST
BLAST or Basic Local Alignment Search Tool, is an alignment service available at [NCBI](https://blast.ncbi.nlm.nih.gov/Blast.cgi). As its name exlpains, this software aligns a query sequence (your sequence) with the database sequences, and returns the closest match to your query. It does this in 4 steps (images from [Alarfaj et al.](https://www.doi.org/10.4172/jcsb.1000260)):

1. Preprocessing
<img src="https://raw.githubusercontent.com/jshleap/CristescuLab_misc/master/Tutorials/Blast/img/pre_processing.png" alt="alt text" width="3000">
2. Seeding
![alt text](https://raw.githubusercontent.com/jshleap/CristescuLab_misc/master/Tutorials/Blast/img/Seeding-step.png)
3. Extension
![alt text](https://raw.githubusercontent.com/jshleap/CristescuLab_misc/master/Tutorials/Blast/img/extension_step.png)
4. Evaluation
![alt text](https://raw.githubusercontent.com/jshleap/CristescuLab_misc/master/Tutorials/Blast/img/evaluation_step.png)

Let's check their [webpage](https://blast.ncbi.nlm.nih.gov/Blast.cgi) and run an example blast


## Downloading databases from NCBI FTP with update_blastdb.pl

Now let's go back to the commands we typed before, by now it should be done! Let's examine the contents of the folder, you should have multiple files with extensions `.nhd`, `.nhi`, `.nhr`, `.nnd`, `.nni`,  `.nog`, `.nsd`, `.nsi`, and `.nsq`. You dont need to worry about their contents, but just in case:
- nhr: deflines
- nin: indices
- nsq: sequence data
- nnd: GI data
- nni: GI indices
- nsd: non-GI data
- nsi: non-GI indices

If you downloaded the sequences with the update_blastdb.pl script, you will also find a `.nal` file, which is an alias for the database to be searched, as well as two `taxdb.*` binary files with the relationships with taxonomy. However, for us to be able to search taxonomy related information we must download the taxonoy database as well. As you see we have downloaded the `patnt` database which is "Patent nucleotide sequences. Both patent databases
are directly from the USPTO, or from the EPO/JPO via EMBL/DDBJ". I chose this database more for convenience, since it has more than one olume, but is small enough that we can do the tutorial with. But let's explore what other databases we can use. For that, go to the [NCBI FTP](ftp://ftp.ncbi.nlm.nih.gov/blast/db/). We can see all available volumes and databases to download, but we cannot tell what they are. Let's peak at the README file... BINGO! all the information relating these databases is sumarized in this file. Now, I mentioned that we cannot access taxonomy information without downloading the `taxdb` datatabase, so let's give it a try:
```
update_blastdb.pl taxdb --decompress
```
This command should run fast since it is a smaller database. Now we have set up one database to search against. Now we only need our query file in fasta format. For easyness, I've uploaded a working example [here]()
