# Annotation of Variants

We have uncovered variants that differ from the reference genome, but we do not know if the variants affect genes/regions in the genome that may explain a disease or a phenotype.


To do this, we will annotate the VCF file by using a tool called `SnpEff/SnpSift`

http://snpeff.sourceforge.net

We will be using the SnpSift tool specifically to compare our variants against another variant database. Running `SnpSift` will give us the options available

In [None]:
java -jar /home/snpEff/snpEff.jar eff -v GRCh38.86 SRR13882963_snp.vcf > SRR13882963_ann.vcf

We will annotate the VCF file against the dbSNP and the ClinVar database

https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/

This resource aggregates data from various laboratories and expert panels about the interpretation of variants

In [None]:
java -Xmx16G -jar /home/snpEff/SnpSift.jar annotate -v /home/ref/All_20180418.vcf.gz SRR13882963_ann.vcf > SRR13882963_dbSNP.vcf

In [None]:
java -jar /home/snpEff/SnpSift.jar annotate -v /home/ref/clinvar_20220122.vcf.gz SRR13882963_dbSNP.vcf > SRR13882963_clinvar.vcf

We can filter the list of variant by the quality `QUAL`. Conventionally, we can choose those that are >30 using SnpSift.
We also filter the sequence with read depth >=18

In [None]:
cat SRR13882963_clinvar.vcf | java -jar /home/snpEff/SnpSift.jar filter "( QUAL >= 30 ) & (DP >= 18)" > SRR13882963_pass.vcf

## Taking a look at the annotated variant file

<img src="images/clinvar.png"/>



The significance of a variant is classified into different tiers depending on the level of evidence
- pathogenic
- likely pathogenic
- uncertain significance
- likely benign
- benign

The recommendations and guidelines by the American College of Medical Genetics and Genomics and the Association for Molecular Pathology
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4544753/