# Pangenomics
--------------------------------------------

# Indexing Graphs with vg

## Overview

The Variation Graph Toolkit (VG) allows us to perform different operations on pangenomic graphs. You will learn about VG's capabilities and use VG to index PGGB graphs.

## Learning Objectives
+ Understand VG and the different things it can be used for
+ Learn how to index graphs using vg 

## Get Started

We will use the Variation Graph Toolkit (VG) to index our PGGB graphs, map sequences to them, and call variants.

However, VG can also create graphs and do many other steps in pangenomic analysis.
If you would like to learn about how to construct and manipulate graphs using VG and other pangenomic tools, please see [our virtual workshops](https://inbre.ncgr.org/ncgr-workshops/upcoming-ncgr-workshops.html).


### Variation Graph Toolkit (VG)

While we will not use VG to create pangenomics graphs in this module, it is important to understand the kinds of graphs that VG understands.

VG creates graphs that are cyclic, meaning that paths through the graph can be revisited.
This is important for capturing, for example, duplicated genomic regions.

VG graphs are otherwise general.
They are considered reference graphs, iterative, and reference-free.

VG has tools that can do the following pangenomic steps.

+ Constructs graphs
+ Manipulates graphs
+ Indexes graphs
+ Maps sequences to graphs
+ Calls variants on mapped sequences
+ Visualizes graphs

VG can also do:

+ [Transcriptomic analysis](https://github.com/vgteam/vg#transcriptomic-analysis)
+ Assembly-based pipelines
+ So much more

Citation:

![Garrison, E., Sirén, J., Novak, A. et al.](./Figures/VGref.png)

![vg Graph Genomics Pipeline: https://github.com/vgteam/vg](./Figures/VGpipe.png)


A reference genome "decorated" with variants:

![GRAF™ Pan Genome Reference: https://www.sevenbridges.com/graf/](./Figures/GRAF.png)
 

### VG Index Formats

VG has several different index formats.

XG (lightweight graph / path representation)

+ Binary file containing graph structure (nodes, edges, paths) but no sequences
+ Complex data structure that answers graph queries efficiently

GCSA (Generalized Compressed Suffix Array)

+ Equivalent to .sa file created by bwa index
+ Binary file containing a suffix array that efficiently looks up where sequences occur in the graph

For more information, visit https://github.com/vgteam/vg/wiki/File-Formats

### Converting our graphs from GFA to VG format

Previously, you created two graphs using PGGB:  
yprp.chrVIII.pggb.gfa (a graph for yeast chromosome VIII)  
yprp.all.pggb.gfa (a graph for the entire yeast genome)

The first step is to convert the graphs from GFA to VG format.

NOTE: You can index a GFA file rather than a VG file but this may have implications for mapping reads.
There’s also an [autoindex](https://github.com/vgteam/vg/wiki/Automatic-indexing-for-read-mapping-and-downstream-inference) command.

In [None]:
!vg convert -f yprp.chrVIII.pggb.gfa > yprp.chrVIII.pggb.vg

<div class="alert alert-block alert-info"> <b>Try this:</b>  
    <ul>
        <li>Create a blank code cell below (hover over the icons in the upper right of this cell and choose "Insert a cell below (B)").</li>
        <li>Convert yprp.all.pggb.gfa from GFA format to VG.</li>
        <li>Call the resulting file yprp.all.pggb.vg</li></a>. </div>
    </ul>


<details>
<summary>Click for help</summary>
<br>
vg convert -f yprp.all.pggb.gfa > yprp.all.pggb.vg
</details>

### Indexing with VG

Generate .xg and .gcsa files on the S288C.vg file that you generated previously using PGGB.

The parameters:

-x Name of the .xg index file  
-g Name of the .gcsa index file  
-t The number of threads


In [None]:
!vg index -x yprp.chrVIII.pggb.xg -g yprp.chrVIII.pggb.gcsa yprp.chrVIII.pggb.vg -t 20

<div class="alert alert-block alert-info"> <b>Try this:</b>  
    <ul>
        <li>Create a blank code cell below.</li>
        <li>Index yprp.all.pggb.vg.</li>
    </ul>

<details>
<summary>Click for help</summary>
<br>
vg index -x yprp.all.pggb.xg -g yprp.all.pggb.gcsa yprp.all.pggb.vg -t 20
</details>

## Conclusion
In this submodule you learned about the VG toolkit and what it can be used for. You also learned how to index graphs with VG and indexed the PGGB graph that you had previously made. In the next submodule, you will learn how to map reads to the indexed graph.

## Clean up
No cleanup is necessary for this submodule. Don't forget to shutdown your Workbench when you are done working through this module!