<a href="https://colab.research.google.com/github/sk-sahu/notebooks/blob/master/variant_call.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Variant Calling
Starting from raw reads (.fastq) to calling variants (.vcf) and their downstream analysis.



In [7]:
!date
!$HOME

Mon Sep 23 16:28:52 UTC 2019
/bin/bash: /root: Is a directory


## Tools Required:
* BWA 
* Picard 
* GATK-4 
* SnpEFF 
* Samtools 

## Tool installation
Install the required tools in a conda environment using “variant_call.yml” file.

But before that, you need to have conda installed in the system. 
If it's already not installed. To install conda, from the command line:

In [0]:
!wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86.sh -O miniconda.sh
!bash miniconda.sh -b -p $HOME/miniconda3

Once the conda is installed. Now let's install the tools. This will take time because all the tools will be installed at once.


In [10]:
!conda env create -f environment.yml

/bin/bash: conda: command not found


Now, If you type `conda env list` you should be able to see an environment called - variant_call. Change to that environment by entering

In [0]:
!conda activate variant_call


## Start Analysis


### Step1: Alignment – Map to Reference

In [0]:
!bwa mem -M -t 16 ref.fa read1.fq read2.fq > aln.sam

### Step2: Generate sorted BAM file


In [0]:
!picard SortSam INPUT=aligned_reads.sam OUTPUT=sorted_reads.bam SORT_ORDER=coordinate

### Step3: Check Alignment Summary



In [0]:
!picard.jar CollectAlignmentSummaryMetrics R=ref I=sorted_reads.bam O=alignment_metrics.txt

!picard CollectInsertSizeMetrics INPUT=sorted_reads.bam OUTPUT=insert_metrics.txt HISTOGRAM_FILE=insert_size_histogram.pdf

!samtools depth -a sorted_reads.bam > depth_out.txt

### Step4: Mark Duplicates 


In [0]:
!picard MarkDuplicates INPUT=sorted_reads.bam OUTPUT=dedup_reads.bam METRICS_FILE=metrics.txt

### Step5: Build BAM Index

In [0]:
!picard.jar BuildBamIndex INPUT=dedup_reads.bam


### Step6: Create Realignment Targets


In [0]:
!gatk -T RealignerTargetCreator -R ref -I dedup_reads.bam -o realignment_targets.list

### Step7: Realign Indels

In [0]:
!gatk -T IndelRealigner -R ref -I dedup_reads.bam -targetIntervals realignment_targets.list -o realigned_reads.bam

### Step8: Call Variants

In [0]:
!gatk -T HaplotypeCaller -R ref -I realigned_reads.bam -o raw_variants.vcf

### Step9: Extract SNPs & Indels


#### For SNP

In [0]:
!gatk -T SelectVariants -R ref -V raw_variants.vcf -selectType SNP -o raw_snps.vcf

#### For Indel

In [0]:
!gatk -T SelectVariants -R ref -V raw_variants.vcf -selectType INDEL -o raw_indels.vcf