# Pangenomics
--------------------------------------------

# Searching Graphs with BLAST


## Overview
Here you will learn how to search graphs with BLAST. In other words, you can use a DNA sequence, such as your favorite gene, to search the pangenomic graph, discover the structure of the graph, and explore homologous sequences.

## Learning Objectives
+ Learn how to use BLAST to search a pangenome graph

## Get Started

### Get the CUP1 and YHR054C gene sequences

We will blast the CUP1 (YHR053C) and YHR054C gene sequences against a linearized version of the graph.

First, get the gene sequences. There are multiple copies of each but we'll grab the first instance and use it to identify all copies through BLAST alignment.

CUP1  
S288C_chrVIII:213043-213228

YHR054C  
S288C_chrVIII:213693-214757

Both are on the "-" strand.

Use `samtools faidx`.

The parameters:

-i  reverse-complement  
input fasta  
region coordinates

In [None]:
!samtools faidx -i yprp.chrVIII.fa S288C_chrVIII:213043-213228 > genes.bed

!samtools faidx -i yprp.chrVIII.fa S288C_chrVIII:213693-214757 >> genes.bed

Take a look at the file you just made.

In [None]:
!cat genes.bed

Let's rename the sequences so they have the gene names rather than coordinates. Use `sed`.

The parameters:

-i edit in place

In [None]:
!sed -i 's/S288C_chrVIII:213043-213228.rc/CUP1/' genes.bed

!sed -i 's/S288C_chrVIII:213693-214757.rc/YHR054C/' genes.bed

Take a look at it again.

In [None]:
!cat genes.bed

### BLAST the graph manually

Create a FASTA file containing the graph sequence

In [None]:
!gfatools gfa2fa yprp.chrVIII.pggb.gfa > yprp.chrVIII.pggb.fa

Build a BLAST database for the FASTA using `makeblastdb`.

The parameters:

-in fasta_file_from_graph&nbsp;&nbsp;&nbsp;the file to build a database for  
-input_type fasta &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;  the format of the input file (fasta)  
-dbtype nucl  &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; type of sequence (nucl=DNA)

In [None]:
!makeblastdb -in yprp.chrVIII.pggb.fa -input_type fasta -dbtype nucl

   
Query the database for [CUP1](https://www.yeastgenome.org/locus/S000001095) and [YHR054C](https://www.yeastgenome.org/locus/S000001096)
```
blastn -db yprp.chrVIII.pggb.fa -query S288C_YHR053C_CUP1-1_genomic.fsa
```

XXX Move this to the bandage chapter
View your Chromsome VIII chunk graph with Bandage (exercise):

1. Find the CUP1 and YHR054C BLAST hits by node ID
2. Take a screenshot



## Conclusion

You learned how to blast against a pangenomic graph. Specifically, you searched for the CUP1 and YHR054C genes in the graph.

## Clean up
No cleanup is necessary for this submodule. Don't forget to shutdown your Workbench when you are done working through this module!