# Variant Calling

Now that we have the aligned sequences to the reference, we can look for difference between the aligned sequences and the reference. This process is also known as variant calling.

In order to perform variant calling, we need to preprocess the SAM file as well as the reference chromosome 5 fasta file. The following steps will be performed :

* Converting SAM to BAM format (this is the binary compressed version)
* Sorting the BAM file
* Indexing the BAM file
* Indexing the reference fasta file

### Processing the SAM file generated by Bowtie2

We will process the SAM file using samtools, which can be accessed after loading the module.

We will begin by converting the SAM file to the compressed BAM format. To do this, we will need to refer to the reference fasta file and the SAM file. The output will be redirected to a new BAM file.

In [None]:
samtools view -Sb SRR12165154.sam -o SRR12165154.bam

Notice that the BAM files is much smaller than the SAM file

Next, we will sort and index the BAM file

In [None]:
samtools sort SRR12165154.bam -o SRR12165154_sorted.bam

In [None]:
samtools index SRR12165154_sorted.bam

### Indexing the reference fasta file

To prepare the reference fasta file for variant calling, we need to index the file

In [None]:
samtools faidx Homo_sapiens.GRCh38.dna.primary_assembly.fa

### Variant calling using Bcftools

Bcftools is a utilities for variant calling and manipulating VCFs and BCFs.
http://samtools.github.io/bcftools/bcftools.html


In [None]:
bcftools mpileup -ou -f Homo_sapiens.GRCh38.dna.primary_assembly.fa SRR12165154_sorted.bam -d 80 -o SRR12165154_output.pileup

In [None]:
bcftools call -mv -Ov SRR12165154_output.pileup --ploidy 1 -o SRR12165154_snp.vcf

### VCF Format

The Variant Call Format (VCF) is a text-based format for specifying variants. An example is shown below:

![image](https://raw.githubusercontent.com/fikaparamita/var-calling/main/images/vcfexample.png?token=ARW6RR7SE67JOD742UJ6CZS7VSB5I)


The basic fields are as follows:

![image](https://raw.githubusercontent.com/fikaparamita/var-calling/main/images/vcf.png?token=ARW6RR3E5QW7527EJ26H2CC7VSB3K)

