# Aligning the query sequence

Now that we have built the reference index using BWA, we can align the query sequence (`input.fq`) to this reference.

![](images/workflow-alignment.png)

Let us take a look again at the files:

In [None]:
ls -lh

### Taking a peek at the input FASTQ file

In [None]:
head input.fq

### Aligning the sequence file using BWA

We will run BWA to see the options

In [None]:
bwa

There are several commands for BWA alignment (`mem`, `bwasw`) that are optimized for sequences of different lengths. For most purposes, BWA `mem` will give good results.

Let us take a look at the options for alignment using the `mem` command

In [None]:
bwa mem 

For a simple alignment, we will just need to specify 2 things:

- reference index (we will use the prefix name)
- the input query fastq file

To speed up the alignment, we can use additional cores in the CPU by specifying the `-t` option.

The BWA program will output the alignments in the SAM format (we will look at this shortly). To save the output to a file, we will redirect the output to a file using the `>` operator.

In [None]:
bwa mem -t 4 chr5.fa input.fq > mapped.sam

Let us take a look at the SAM output

In [None]:
head mapped.sam

## Looking at the SAM format

The SAM format is a tab delimited text file for storing alignments. The file usually starts with a header containing one/several lines marked by the letter `@`. This usually specifies the reference chromosomes used in the alignment, as well the the parameters used for the alignment

Following the header, each line of alignment consists of several tab-delimited columns.

<pre>QNAME FLAG RNAME POS MAPQ CIGAR MRNM MPOS ISIZE SEQ QUAL [TAG:VTYPE:VALUE[...]]</pre>
* The first 11 are mandatory
* Additional columns can be added using the format TAG:VTYPE:VALUE

From: http://zyxue.github.io/assets/sam_format_example.jpg
![](http://zyxue.github.io/assets/sam_format_example.jpg)

Let us take a look at one alignment for the SAM output

In [None]:
head -n 3 mapped.sam

We can break down the contents according to the tab-delimited columns:


* QNAME = SRRQ866988.19885082 
* FLAG = 0 
* RNAME = chr5
* POS = 148351452
* MAPQ = 60
* CIGAR = 91M
* MRNM = *
* MPOS = 0
* ISIZE = 0
* SEQ = CCAAGTAAGATTGAGCTTGAAGGCTGTTCTCATTTTGTAAAAACATAAGCTCAGGAAGTGTTGAAGATATTTTAACTCTACACTGAGACTT
* QUAL = GIIGIIIIIIIIHIIIIIIIIIIIIIIIIIIGIIIIIIIIIIHIIIIIGIIIIEHBGGEGIIHIHIIIFIIIIHIIBHIIGEHIE<EII<G

The tags (after column 11) form additional columns:

* NM:i:0 = Edit distance, integer type, 0
