# Multiple Alignment - Real-World Example

Extending our conversation from Canvas - recall that we used Nanopore to sequence the SARS-CoV-2 virus from several patients with COVID-19. 
In some of those samples, we found a variant at position 26,305 that looked like a frame-shift deletion. 
Given that this variant was not likely due to its consequences, we sent some of the samples with this variant to another lab to perform Sanger sequencing. 

In an experiment like this, we are tasked with ultimately comparing multiple sequences to figure out where there are differences. 
On a small scale, this probably seems pretty simple. 
But in most cases, you are working with long sequences (100s to 1000s of nucleotides) where there are multiple possible differences. 

## Solution: Multiple Sequence Alignment

This is a very large and complex area. 
For this activity, we will focus on this very basic case in which we have a reference sequence (the "baseline" SARS-CoV-2 genome), the Nanopore sequence, and the Sanger sequence. 
To make things simpler, we will work with a region of 101 nucleotides from 26,250 to 2,350. 
Here's where I get that information: https://www.ncbi.nlm.nih.gov/nuccore/NC_045512.2?report=genbank&log$=seqview&from=26250&to=26350

- The reference sequence is: "CTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTG"
- The Nanopore sequence for one sample was: "CTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTG"
- The Sanger sequence for the same sample was: "CTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTTTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTG"

We will use a Python library called biopython, specifically pairwise sequencing to compare the Nanopore and Sanger sequences to the reference. 
This should help us find out if the variant identified in Nanopore was correct, or if the Sanger results indicate something different. 

<span style="color: blue; background-color: white">**TASK**: Prepare your environment</span>

You need to make two imports for this activity. Copy/paste the following code into the next cell and run it. 

```python
from Bio import pairwise2
from Bio.Seq import Seq
```

## Create Seq Objects

To work with sequence data, biopython needs to convert your raw sequence data into "Seq" objects. 
This makes it so that the pairwise alignment method can properly access and manipulate the sequences. 

The syntax is this: 

```python
sequence_variable = Seq("Your Sequence")
```

For example, if your sequence was "ACTGTG" that corresponded to a unicorn sequence, you might use the following code: 

```python
unicorn_reference = Seq("ACTGTG")
```

<span style="color: blue; background-color: white">**TASK**: Make Seq objects</span>

Use these sequences from the first cell: 
- The reference sequence is: "CTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTCTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTG"
- The Nanopore sequence for one sample was: "CTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTG"
- The Sanger sequence for the same sample was: "CTCATTCGTTTCGGAAGAGACAGGTACGTTAATAGTTAATAGCGTACTTCTTTTTTTTGCTTTCGTGGTATTCTTGCTAGTTACACTAGCCATCCTTACTG"

To make Seq variables called: 

- reference
- nanopore
- sanger

## Run pairwise aligment

We are going to use a function called "globalms"

This is called global alignment, and we will use parameters that correspond to how the program scores matches, mismatches, and gaps (i.e. how to handle when it looks like a nucleotide is deleted/added relative to the comparator). 
You don't need to know how to tune those parameters, just use the commands that I give you. 

<span style="color: blue; background-color: white">**TASK**: Make Seq objects</span>

Run pairwise global alignment for sanger vs. reference and nanopore vs. reference with the following code: 

```python
nanopore_vs_reference = pairwise2.align.globalms(reference, nanopore, 5, -4, -3, -.1)
sanger_vs_reference = pairwise2.align.globalms(reference, sanger, 5, -4, -3, -.1)
```

## View Nanopore alignment output

Now, let's iterate over the results (i.e. there might be multiple possible alignments). 
biopython has functions to display the results in a way that is intuitive. 

<span style="color: blue; background-color: white">**TASK**: View Nanopore results</span>

Use the following code: 

```python
for alignment in nanopore_vs_reference:
    print(pairwise2.format_alignment(*alignment))
```

## View Sanger alignment output

Now, let's iterate over the results (i.e. there might be multiple possible alignments). 
biopython has functions to display the results in a way that is intuitive. 

<span style="color: blue; background-color: white">**TASK**: View Sanget results</span>

Use the following code: 

```python
for alignment in sanger_vs_reference:
    print(pairwise2.format_alignment(*alignment))
```

## Observe

What's different about the two alignments?