# Pangenomics
--------------------------------------------

# Visualizing Graphs with Bandage

## Overview

Bandage can be used to visualize and interact with pangenomic graphs. You will learn how to visualize graphs with Bandage and how to BLAST DNA sequences and visualize them on the graph.

## Learning Objectives
+ Learn how to visualize pangenomic graphs using Bandage
+ Learn how to BLAST sequences against the graph and visualize them within Bandage

## Get Started

### Visualizing Genomes

Pangenomic graphs can be very complex, often looking like "hairballs". Bandage allows us to zoom in on interesting regions to visualize and interact with them.

![Compressed de Bruijn graphs “hairballs”: https://academic.oup.com/bioinformatics/article/30/24/3476/2422268](./Figures/hairballsB.png)


### Bandage

Bandage is an interactive graph visualizer. The Bandage acronym stands for:
a **B**ioinformatics **A**pplication for **N**avigating **D**e novo **A**ssembly **G**raphs **E**asily

![Bandage: https://rrwick.github.io/Bandage/](./Figures/Bandage.png)

In addition to visualizing graphs, Bandage has BLAST integration that allows you to query the graph with DNA sequences.

+ BLAST integration
  + Can build a local BLAST database of the graph
  + Can do a web BLAST search with sequences from nodes

Bandage also allows you to label and color nodes by uploading comma separated value (CSV) text files with metadata.

Details on creating CSV files can be found [here](https://github.com/rrwick/Bandage/wiki/CSV-labels).


Open up Bandage in your browswer:

- From the Launcher tab (File : New Launcher), scroll down to the "Visualization Software" section and click on "Bandage".

Once in Bandage, choose "File : Load Graph". Navigate to "yprp.chrVIII.pggb.gfa" (click on "Computer" then "/" then navigate to /home/jupyter/NIGMS-Sandbox-Pangenomics-Module/module_notebooks/).

Click on "Draw Graph". Note: it will draw the graph differently every time.

Under "Graph Information", choose "More Info" to see more stats about the graph.

### BLAST graph

Open up "Graph Search" and then click on "Create/view graph search".

Click on "Build BLAST database" to build a database from the graph.

Click on "Load from FASTA file" and choose the "genes.fa" file.

Click "Run BLAST search".

Take a look at the table down below and see the hits that both of the genes have and the colors each has been assigned.

Hit "close".

Under "Graph Drawing", change scope to "Around Query Hits" and in the "Distance" box, type in 100. This will show the CUP1 and YHR054C gene hits plus the surrounding 100 nodes.

Under "Graph Display", change "Random Colours" to "Gray Color" so that the gene hits will stand out.

Scroll down on the lefthand side and click on "Annotations". Double-click on the "Blast Hits" that appears under annotations. Click on "Solid" and click on the "x" in the upper right to close the screen. This will show the blast hits in solid colors.

NOTE: If you don't see both blue (CUP1) and green (YHR054C) genes then choose "Query: All" under "Graph Search".

<details>
<summary>Graph of the CUP1 region.</summary>
<br>
The graph layout algorithm does not lay out the graph the same each time so yours might look slightly different.  
    
Blue = CUP1 genes  
Green = YHR054C genes

![CUP1 region](./Figures/CUP1region.png)
</details>

### Paths

Look at the paths through the graph (ie. the paths that each of the accession's genome assembly takes through the graph). On the righthand side under "Find paths", in the name box, start typing "Y12_chrVIII" (it should pop up as you type and you can choose it).

Click on "Find Path" and dismiss the "Nodes not found" dialog that pops up ("x" in the upper right of the dialog box). The nodes not found are the nodes in the Y12 genome assembly path that are not in our view.

Click on "recolor" and then "set colour" and choose a color. This will color any nodes this genome goes through, though the coloring will be underneath the BLAST hit colors. It will also change the color under "Graph Display" to "Custom Colours."

NOTE: You can toggle the gene BLAST hit colors on and off by double-clicking on "BLAST hits" under "Annotation" to see the paths more clearly (if BLAST hits are off) or to see the genes in relation to the BLAST hits (if BLAST hits are on).

NOTE: you can drag the nodes around to further explore the graph.

When you are done, set the color under Graph Display" to "Gray Color" to remove the path.

Now identify the other 2 paths (SK1_chrVIII and S288C_chrVIII) and count how many CUP1 genes there are.

In [None]:
#Load jupyterquiz library
from jupyterquiz import display_quiz

In [None]:
#Display quiz as html
#Instructions for creating quiz .json files and converting to html provided in the links below
from IPython.display import IFrame
IFrame('../html/quiz_cup1.html', width=800, height=400)

<details>
<summary>Graphs of the CUP1 region with the paths highlighted.</summary>
<br>
The graph layout algorithm does not lay out the graph the same each time so yours might look slightly different.  
    
Blue = CUP1 genes  
Green = YHR054C genes

Y12 = yellow

![Y12](./Figures/Y12cup1.png)


SK1 = light blue

![SK1](./Figures/SK1cup1.png)


S288C = pink

![S288C](./Figures/S288Ccup1.png)

</details>

And here is the published figure that you saw in the previous chapter. The number of CUP1 genes and YHR054C genes we found in our graph match those in the figures. Note that some of the genes are labelled in the figure as pseudogenes or partial genes, which is hard to tell from our graphs.

<details>
<summary>Figure showing the CUP1 region across several yeast accessions.</summary>
<br>

![Yeast Genomes: https://yjx1217.github.io/Yeast_PacBio_2016/welcome/](./Figures/YeastB.png)

</details>



## Conclusion
You have learned how to visualize a pangenomic graph, find genes using BLAST, and interact with the graph structures.

## Clean up
No cleanup is necessary for this submodule. Don't forget to shutdown your Workbench when you are done working through this module!