# Sequences Allignment

## Building the index for alignment


Before we can align the query sequence, we need to build the index for alignment. In this case, we will be using the `ref/Homo_sapiens.GRCh38.dna.primary_assembly.fa` file

One of the programs for alignment is Bowtie2 (http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)

Bowtie 2 is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes. Bowtie 2 indexes the genome with an FM Index to keep its memory footprint small: for the human genome, its memory footprint is typically around 3.2 GB. Bowtie 2 supports gapped, local, and paired-end alignment modes.


We will run `bowtie2` to see what the options are available

In [None]:
bowtie2

Prior to the alignment, the reference genome must be indexed. 
This process may take several hours if indexing the full human genome (~4 GB).
The indexing process of reference genome will give 6 files started with `hg38`

In [None]:
bowtie2-build Homo_sapiens.GRCh38.dna.primary_assembly.fa hg38

## Aligning the query sequence

Now that we have built the reference index using Bowtie2, we can align the query sequences (`SRR12165154_1.fastq` and `SRR12165154_2.fastq`) to this reference.

In [None]:
bowtie2 -x hg38 -1 SRR12165154_1.fastq -2 SRR12165154_2.fastq -S SRR12165154.sam

Let us take a look at the SAM output

## Looking at the SAM format

The SAM format is a tab delimited text file for storing alignments. The file usually starts with a header containing one/several lines marked by the letter `@`. This usually specifies the reference chromosomes used in the alignment, as well the the parameters used for the alignment

Following the header, each line of alignment consists of several tab-delimited columns.

<pre>QNAME FLAG RNAME POS MAPQ CIGAR MRNM MPOS ISIZE SEQ QUAL [TAG:VTYPE:VALUE[...]]</pre>
* The first 11 are mandatory
* Additional columns can be added using the format TAG:VTYPE:VALUE

From: http://zyxue.github.io/assets/sam_format_example.jpg
![](http://zyxue.github.io/assets/sam_format_example.jpg)