# Pangenomics
--------------------------------------------

# Variant Calling with vg

## Overview

Variants can be called within the pangenomic graph and by aligning reads to the graph. You will learn how to call variants both ways in this submodule.

## Learning Objectives
+ Understand different types of variants
+ Understand our ability to call variants with different types of reads and pangenomic graphs
+ Learn how to call and interpret variants with vg

## Get Started

First we will learn how to identify variants that are supported by the graph. Then we'll look at identifying novel variants that are not in the graphs.

### Call Variants

We will look for variants that are supported by the graph as well as for variants that are novel (not in the graph but supported by the reads aligned to the graph).

We will call variants against the graph, though you could also call variants using the surjected BAM file and traditional variant calling methods.

## Calling Graph Supported Variants

Compute read support for variation already in the graph using `vg pack`.

The parameters:

-x  the graph 
-g  aligments in gam format  
-Q  ignore mapping and base qualities < N  
-s  ignore the first and last N nucleotides of each read  
-o  the output pack file  
-t  use N threads

In [None]:
!vg pack -x yprp.chrVIII.pggb.giraffe.gbz -g SK1xyprp.chrVIII.pggb.mapped.gam -Q 5 -s 5 -o yprp.chrVIII.pggb.mapped.pack -t 4

Generate a VCF from the read support using `vg call`.

The parameters:

-k  The read support file to read in  
-t  The number of threads
The graph

In [None]:
!vg call -k yprp.chrVIII.pggb.mapped.pack -t 4 yprp.chrVIII.pggb.giraffe.gbz > yprp.chrVIII.pggb.graph_calls.vcf

<div class="alert alert-block alert-info"> <b>Try this:</b>  
    <ul>
        <li>Create a blank code cell below.</li>
        <li>Call variants in the full genome graph by computing read support and then generating a vcf file.</li>
    </ul>

<details>
<summary>Click for help</summary>
<br>
!vg pack -x yprp.fullgenome.pggb.giraffe.gbz -g SK1xyprp.fullgenome.pggb.mapped.gam -Q 5 -s 5 -o yprp.fullgenome.pggb.mapped.pack -t 4   


!vg call -k yprp.fullgenome.pggb.mapped.pack -t 4 yprp.fullgenome.pggb.giraffe.gbz > yprp.fullgenome.pggb.graph_calls.vcf
</details>

## Calling Novel Variants

To call novel variants, those variants supported by the aligned reads, we need to embed the variation from the reads we aligned back into the graph. To do this we need to convert the graph into a form that we can change. We will use `vg convert` to convert the .gbz file to a .vg file.

In [None]:
!vg convert yprp.chrVIII.pggb.giraffe.gbz > yprp.chrVIII.pggb.giraffe.vg

Now, we can augment the graph with the mapped reads using `vg augment`. This will embed the variation from the alignments back into the graph.

The Parameters:

-A  new, augmented graph with aligned reads
-t  the number of threads to use
The graph
The input alignment (gam) file


In [None]:
!vg augment yprp.chrVIII.pggb.giraffe.vg SK1xyprp.chrVIII.pggb.mapped.gam -A SK1xyprp.chrVIII.pggb.mapped.aug.gam -t 4 > SK1xyprp.chrVIII.pggb.aug.vg 

Index the augmented graph using `vg index`. We will make a .xg index.

The prameters:

-x  output file
-t  the number of threads  
The input graph

In [None]:
!vg index -t 4 -x SK1xyprp.chrVIII.pggb.aug.xg SK1xyprp.chrVIII.pggb.aug.vg

Now that the variation from the reads is embedded into the graph, we can procede to call variants like we did above. 

Compute read support.

In [None]:
!vg pack -x SK1xyprp.chrVIII.pggb.aug.xg -g SK1xyprp.chrVIII.pggb.mapped.aug.gam -Q 5 -s 5 -o SK1xyprp.chrVIII.pggb.mapped.aug.pack -t 4

Generate a VCF from the support.

In [None]:
!vg call SK1xyprp.chrVIII.pggb.aug.xg -k SK1xyprp.chrVIII.pggb.mapped.aug.pack -t 4 > SK1xyprp.chrVIII.pggb.aug_calls.vcf

<div class="alert alert-block alert-info"> <b>Try this:</b>  
    <ul>
        <li>Create a blank code cell below.</li>
        <li>Call novel variants for the yprp.fullgenome.pggb.giraffe.gbz graph.</li>
        <li>+ Convert the graph to vg format.</li>
        <li>+ Augment the graph to embed the read alignments into it.</li>
        <li>+ Create an index (xg).</li>
        <li>+ Compute read support.</li>
        <li>+ Generate a VCF.</li>
    </ul>

<details>
<summary>Click for help</summary>
<br>
!vg convert yprp.fullgenome.pggb.giraffe.gbz > yprp.fullgenome.pggb.giraffe.vg

!vg augment yprp.fullgenome.pggb.giraffe.vg SK1xyprp.fullgenome.pggb.mapped.gam -A SK1xyprp.fullgenome.pggb.mapped.aug.gam -t 4 > SK1xyprp.fullgenome.pggb.aug.vg 

!vg index -t 4 -x SK1xyprp.fullgenome.pggb.aug.xg SK1xyprp.fullgenome.pggb.aug.vg

!vg pack -x SK1xyprp.fullgenome.pggb.aug.xg -g SK1xyprp.fullgenome.pggb.mapped.aug.gam -Q 5 -s 5 -o SK1xyprp.fullgenome.pggb.mapped.aug.pack -t 4

!vg call SK1xyprp.fullgenome.pggb.aug.xg -k SK1xyprp.fullgenome.pggb.mapped.aug.pack -t 4 > SK2xyprp.fullgenome.pggb.aug_calls.vcf


</details>

## Conclusion

In this submodule, you learned different ways to call and characterize variants from the graph, including variants supported within the graph and variants supported by reads mapped to the graph.

## Clean up
No cleanup is necessary for this submodule. Don't forget to shutdown your Workbench when you are done working through this module!