# Pangenomics
--------------------------------------------

# Visualizing Graphs with Bandage

## Overview

Bandage can be used to visualize and interact with pangenomic graphs. You will learn how to visualize graphs with Bandage and how to BLAST DNA sequences and visualize them on the graph.

## Learning Objectives
+ Learn how to visualize pangenomic graphs using Bandage
+ Learn how to BLAST sequences against the graph and visualize them within Bandage
+ Learn how to trace paths through the graph

## Get Started

In this submodule you will learn how to visualize graphs with Bandage, how to BLAST query sequences directly against the graph in Bandage, and how to trace paths through the graph.

#### Visualizing Genomes

#### Bandage 
- Overview
- Visualization in Bandage
- BLASTing Pangenomic Graphs
- Paths

----------------------

## Visualizing Genomes

Pangenomic graphs can be very complex, often looking like "hairballs" such as those in the figure below. Bandage allows us to zoom in on interesting regions to visualize and interact with them.

<figure>
  <img
    src="./Figures/hairballsB.png"
    alt="Compressed de Bruijn graphs “hairballs”" />
  <figcaption><a href="https://academic.oup.com/bioinformatics/article/30/24/3476/2422268">https://academic.oup.com/bioinformatics/article/30/24/3476/2422268</a></figcaption>
</figure>


----------------------

## Bandage

Bandage is an interactive graph visualizer. The Bandage acronym stands for:
a **B**ioinformatics **A**pplication for **N**avigating **D**e novo **A**ssembly **G**raphs **E**asily

<figure>
  <img
    src="./Figures/Bandage.png"
    alt="Bandage logo" />
  <figcaption><a href="https://rrwick.github.io/Bandage/">https://rrwick.github.io/Bandage/</a></figcaption>
</figure>

In addition to visualizing graphs, Bandage has BLAST integration that allows you to: 
- Query the graph with DNA sequences.
- Build a local BLAST database of the graph.
- Do a web BLAST search with sequences from nodes.

Finally, Bandage also allows you to label and color nodes by uploading comma separated value (CSV) text files with metadata. Details on creating CSV files can be found [here](https://github.com/rrwick/Bandage/wiki/CSV-labels).


### Open Bandage

1. Open up Bandage in your browser:

From the Launcher tab (File : New Launcher), scroll down to the bottom. In the "Visualization Software" section, click on "Bandage".

### Load Graph

2. Once in Bandage, choose "File : Load Graph". Navigate to "yprp.chrVIII.pggb.gfa" (click on "Computer" then "/" then navigate to /home/jupyter/NIGMS-Sandbox-Pangenomics-Module/module_notebooks/graphs/yprp.chrVIII.pggb.gfa).


3. Click on "Draw Graph".

<div class="alert alert-block alert-info"> <b>NOTE:</b> It will draw the graph differently every time.

4. Under "Graph Information", choose "More Info" to see more stats about the graph.

----------------------

## BLAST graph

1. Open up "Graph Search" and then click on "Create/view graph search".

2. Click on "Build BLAST database" to build a database from the graph.

3. Click on "Load from FASTA file" and choose the "genes.fa" file.

4. Click "Run BLAST search".

5. Take a look at the table down below and see the hits that both of the genes have and the colors each has been assigned.

6. Hit "close".

7. Under "Graph Drawing", change scope to "Around Query Hits" and in the "Distance" box, type in 100. This will show the CUP1 and YHR054C gene hits plus the surrounding 100 nodes.

8. Under "Graph Display", change "Random Colours" to "Gray Color" so that the gene hits will stand out.

9. Scroll down on the lefthand side and click on "Annotations". Double-click on the "Blast Hits" that appears under annotations. Click on "Solid" and click on the "x" in the upper right to close the screen. This will show the blast hits in solid colors.

<div class="alert alert-block alert-info"> <b>NOTE:</b> If you don't see both blue (CUP1) and green (YHR054C) genes then choose "Query: All" under "Graph Search".

Your graph should look something like this:

Blue = CUP1 genes  
Green = YHR054C genes

<figure>
  <img
    src="./Figures/cup1only.png"
    alt="Yeast CUP1 structure" />
  <figcaption><Yeast CUP1 Structure</figcaption>
</figure>


<div class="alert alert-block alert-info"> <b>NOTE:</b> The graph layout algorithm does not lay out the graph the same each time so yours might look slightly different. 

----------------------

## Paths

Paths through the graph show how each of the assemblies goes through each of the segments or nodes. In other words, the SK1_chrVIII path would be the sequence of nodes that you traverse if you laid out SK1's chrVIII on the graph.

<div class="alert alert-block alert-info"> <b>NOTE:</b> There might be some theoretical paths through the graph that no assembly actually takes. In addition, not all nodes will be traversed by all paths.

1. Look at the SK1 path through the CUP1 region of the graph. On the righthand side under "Find paths", in the name box, start typing "SK1_chrVIII" (it should pop up as you type and you can choose it).

2. Click on "Find Path" and dismiss the "Nodes not found" dialog that pops up ("x" in the upper right of the dialog box). The nodes not found are the nodes in the SK1 genome assembly path that are not in our view.

3. Click on "recolor" and then "set colour" and choose a color. This will color any nodes this genome goes through, though the coloring will be underneath the BLAST hit colors. It will also change the color under "Graph Display" to "Custom Colours."

<div class="alert alert-block alert-info"> <b>NOTE:</b> You can toggle the gene BLAST hit colors on and off by double-clicking on "BLAST hits" under "Annotation" to see the paths more clearly (if BLAST hits are off) or to see the genes in relation to the BLAST hits (if BLAST hits are on). You can also drag the nodes around to further explore the graph.

4. See if you can figure out how to trace the highlighted path through the graph. To help, you can have Bandage show the node labels (`Node labels: Name` on the lefthand panel) and compare them to the paths from the GFA file (see more information in the video linked below).

Example: Extract the S288C subpath for the CUP1 region out of the GFA file. Based on the Bandage visualization with node labels on, the relevant nodes range from ~7200-7700. We will use `grep` to get the S288C_chrVIII path line, `sed` to introduce hard returns so that each node is on its own line, and then `awk` to get the relevant node numbers. We will redirect the output into a file called *S288C_CUP1_subpath.txt*. Then take a look at the file using `head`.

In [None]:
!grep "^P.S288C_chrVIII" graphs/yprp.chrVIII.pggb.gfa | sed 's/+,/\n/g' | awk '$1>=7200 && $1 <=7800{print}' > graphs/S288C_CUP1_subpath.txt

!head graphs/S288C_CUP1_subpath.txt

5. When you are done, set the color under "Graph Display" to "Gray Color" to remove the path.

<div class="alert alert-block alert-success"> <b>Try this in Bandage:</b>  
    <ul>
        <li>Identify the Y12_chrVIII path and count how many CUP1 genes there are.</li>
        <li>Identify the S288C_chrVIII path and count how many CUP1 genes there are.</li>
    </ul>

### Quiz

Run the code below to take the quiz.

In [None]:
#Display quiz as html
#Instructions for creating quiz .json files and converting to html provided in the links below
from IPython.display import IFrame
IFrame('../html/quiz_cup1.html', width=800, height=400)

Your graphs of the CUP1 region with the paths highlighted should look similar to those below, though the way it is layed out might look slightly different. Take a look at the video for more information about how to follow the three paths through the CUP1 region.

Blue = CUP1 genes  
Green = YHR054C genes


SK1 path = light blue

<figure>
  <img
    src="./Figures/SK1.png"
    alt="SK1 CUP1 structure" />
  <figcaption><SK1 CUP1 Path</figcaption>
</figure>


Y12 path = yellow

<figure>
  <img
    src="./Figures/Y12.png"
    alt="Y12 CUP1 structure" />
  <figcaption><Y12 CUP1 Path</figcaption>
</figure>


S288C path = pink

<figure>
  <img
    src="./Figures/S288C.png"
    alt="S288C CUP1 structure" />
  <figcaption><S288C CUP1 Path</figcaption>
</figure>


And here is the published figure that you saw in the previous chapter. The number of CUP1 genes and YHR054C genes we found in our graph match those in the figures (see video for more details).

<figure>
  <img
    src="./Figures/StructuralRearrangements.png"
    alt="Yeast CUP1 structure" />
  <figcaption><a href="https://yjx1217.github.io/Yeast_PacBio_2016/welcome/">https://yjx1217.github.io/Yeast_PacBio_2016/welcome/</a></figcaption>
</figure>


<div class="alert alert-block alert-info"> <b>NOTE:</b> Some of the genes are labelled in the figure above as pseudogenes or partial genes, which is hard to tell from our graphs.

----------------------

## Conclusion
You have learned how to visualize a pangenomic graph, find genes using BLAST, and interact with the graph structures.

In the next submodule, you will learn how to index graphs to get them ready for downstream analyses.

----------------------

## Clean up

<div class="alert alert-warning">No cleanup is necessary for this submodule. Don't forget to shutdown your Workbench when you are done working through this module!.</div>