# Annotation of Variants

We have uncovered variants that differ from the reference genome, but we do not know if the variants affect genes/regions in the genome that may explain a disease or a phenotype.

![](images/workflow-annotate.png)

To do this, we will annotate the VCF file by using a tool called `SnpEff/SnpSift`

http://snpeff.sourceforge.net

We will be using the SnpSift tool specifically to compare our variants against another variant database. Running `SnpSift` will give us the options available

In [None]:
SnpSift

We will first take a look at the list of files again:

In [None]:
ls -lh

We will annotate the VCF file against the ClinVar database

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5753237/

This resource aggregates data from various laboratories and expert panels about the interpretation of variants

We will download the GRCh38 version as our read were mapped to the GRCh38 version of chromosome 5 https://www.ncbi.nlm.nih.gov/variation/docs/ClinVar_vcf_files/


In [None]:
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz
wget https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh38/clinvar.vcf.gz.tbi

Before we annotate the VCF file, we can filter the list of variant by the quality `QUAL`. Conventionally, we can choose those that are >30 using SnpSift

In [None]:
SnpSift filter "( QUAL >= 30 )" result.vcf > result.filter.vcf

We can now annotate our VCF against the ClinVar VCF database using `SnpSift`

In [None]:
SnpSift annotate -v clinvar.vcf.gz result.filter.vcf > result.filter.annotate.vcf

# Taking a look at the annotated variant file

![](images/clinvar.png)

In [None]:
cat result.filter.annotate.vcf

The significance of a variant is classified into different tiers depending on the level of evidence
- pathogenic
- likely pathogenic
- uncertain significance
- likely benign
- benign

The recommendations and guidelines by the American College of Medical Genetics and Genomics and the Association for Molecular Pathology
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4544753/

We can look for annotations where the keyword `Pathogenic` is present

In [None]:
grep Pathogenic result.filter.annotate.vcf

# Preparing the VCF file for visualization

We can compress and index the VCF file so that it can be visualized using the IGV browser

In [None]:
bgzip -c result.filter.annotate.vcf > result.filter.annotate.vcf.gz
tabix -p vcf result.filter.annotate.vcf.gz

In [None]:
ls -lh

To visualize the aligned reads with the variants, we will need to download 4 files
- mapped.dedup.sort.bam
- mapped.dedup.sort.bam.bai
- result.filter.annotate.vcf.gz
- result.filter.annotate.vcf.gz.tbi

We will import these into the IGV browser (GRCh38 human genome)

![](images/igv-genome.png)
![](images/igv-load.png)

# Using a web tool (VarMap)

We can also use a webtool to annotate a subset of variants and map it to a 3D structure of an affected protein (if available).

![](images/varmap.png)

https://academic.oup.com/bioinformatics/article/35/22/4854/5514476
https://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/DisaStr/GetPage.pl?varmap=TRUE