# Day 9 notebook

The objectives of this notebook are to practice running (by hand) the dynamic programming algorithms for 

* global alignment with linear gap penalty
* local alignment with linear gap penalty
* global alignment with affine gap penalty

## Sequences to align

In this activity, you will align the same pair of sequences multiple times, but with different alignment algorithms.  The two sequences to align are: `CAATATG` and `CATA`.

You may find the included [worksheet](day09_activity_worksheet.pdf) useful for running the dynamic programming algorithms.

### PROBLEM 1: Global alignment with linear gap penalty (3 POINTS)

Align the sequences by hand using the Needleman–Wunsch algorithm (global alignment with linear gap penalty).  Use the following scoring scheme:
* Match: +1
* Mismatch: -1
* Space: -2

To submit your solution, do the following variable assignments in the solution cell below:

* assign to the variable `global_linear_opt_score` the optimal alignment *score* 
* assign to the variable `global_linear_opt_alignments` a *list* of *all* alignments that achieve that optimal score
* assign to the variable `global_linear_last_row` a *list* representing the entries in the last row of the dynamic programming matrix.

Each alignment should be represented by a list of two strings.  The first sequence, `CAATATG`, should be represented by the first string.  For example, here is an example of a list of alignments (non-optimal alignments):

In [25]:
# example of a list of alignments
[["CAATATG",
  "CATA---"],
 ["CAATATG",
  "--C-ATA"],
 ["CA-ATATG",
  "CATA----"]]

[['CAATATG', 'CATA---'], ['CAATATG', '--C-ATA'], ['CA-ATATG', 'CATA----']]

To make sure you are formatting your alignments correctly, below are functions to check that an alignment is valid.

In [26]:
def check_global_alignment(alignment, x, y):
    """Checks if alignment is a valid global alignment of strings x and y"""
    check_alignment_format(alignment)
    assert alignment[0].replace('-', '') == x, "The first alignment string does not match x"
    assert alignment[1].replace('-', '') == y, "The second alignment string does not match y"
    
def check_local_alignment(alignment, x, y):
    """Checks if alignment is a valid local alignment of strings x and y"""
    check_alignment_format(alignment)
    assert alignment[0].replace('-', '') in x, "The first alignment string is not a substring of x"
    assert alignment[1].replace('-', '') in y, "The second alignment string is not a substring of y"
    
def check_alignment_format(alignment):
    """Checks if alignment is in the correct format"""
    assert isinstance(alignment, list), "Alignment is not a list"
    assert len(alignment) == 2, "Alignment does not have two elements"
    assert all(isinstance(s, str) for s in alignment), "Elements of alignment are not strings"
    assert len(alignment[0]) == len(alignment[1]), "Alignment strings do not have the same length"

In [60]:
###
global_linear_opt_score = -2
###
###
global_linear_opt_alignments = [['CAATATG', 'C-ATA--'], ['CAATATG', 'CA-TA--']]
###
###
global_linear_last_row = [-14, -11, -8, -5, -2]
###


In [61]:
# tests for global_linear_opt_score
assert isinstance(global_linear_opt_score, int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [62]:
# test for global_linear_opt_alignments
assert isinstance(global_linear_opt_alignments, list)
for alignment in global_linear_opt_alignments: check_global_alignment(alignment, "CAATATG", "CATA")
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [63]:
# test for global_linear_last_row_entry_0
assert isinstance(global_linear_last_row[0], int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [64]:
# test for global_linear_last_row_entry_1
assert isinstance(global_linear_last_row[1], int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [65]:
# test for global_linear_last_row_entry_2
assert isinstance(global_linear_last_row[2], int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [66]:
# test for global_linear_last_row_entry_3
assert isinstance(global_linear_last_row[3], int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### PROBLEM 2: Local alignment with linear gap penalty (3 POINTS)

Align the sequences by hand using the Smith–Waterman algorithm (local alignment with linear gap penalty).  Use the following scoring scheme:
* Match: +1
* Mismatch: -1
* Space: -2

To submit your solution, do the following variable assignments in the solution cell below:

* assign to the variable `local_linear_opt_score` the optimal alignment *score* 
* assign to the variable `local_linear_opt_alignments` a *list* of *all* alignments that achieve that optimal score
* assign to the variable `local_linear_last_row` a *list* representing the entries in the last row of the dynamic programming matrix.

In [75]:
###
local_linear_opt_score = 3
###
###
local_linear_opt_alignments = [['ATA', 'ATA']]
###
###
local_linear_last_row = [0, 0, 0, 0, 1]
###


In [76]:
# tests for local_linear_opt_score
assert isinstance(local_linear_opt_score, int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [77]:
# test for local_linear_opt_alignments
assert isinstance(local_linear_opt_alignments, list)
for alignment in local_linear_opt_alignments: check_local_alignment(alignment, "CAATATG", "CATA")
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [78]:
# test for local_linear_last_row_entry_0
assert isinstance(local_linear_last_row[0], int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [79]:
# test for local_linear_last_row_entry_1
assert isinstance(local_linear_last_row[1], int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [80]:
# test for local_linear_last_row_entry_2
assert isinstance(local_linear_last_row[2], int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [81]:
# test for local_linear_last_row_entry_3
assert isinstance(local_linear_last_row[3], int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [82]:
# test for local_linear_last_row_entry_4
assert isinstance(local_linear_last_row[4], int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


### PROBLEM 3: Global alignment with affine gap penalty (3 POINTS)

Align the sequences by hand using the global alignment with affine gap penalty algorithm.  Use the following scoring scheme:
* Match: +1
* Mismatch: -1
* Gap: -3
* Space: -2

To submit your solution, do the following variable assignments in the solution cell below:

* assign to the variable `global_affine_opt_score` the optimal alignment *score* 
* assign to the variable `global_affine_opt_alignments` a *list* of *all* alignments that achieve that optimal score
* assign to the variable `global_affine_last_row` a *list* representing the entries in the last row of the dynamic programming matrix.

For the last row, we will imagine that the three matrices, $M$, $I_x$, and $I_y$ have been collapsed into a single matrix, where the entry in each cell of the collapsed matrix has the entries from the three matrices represented as a tuple.  That is, if $C$ is the collapsed matrix, then $C[i, j] = (M[i,j], I_x[i,j], I_y[i,j])$.

In [85]:
# Constant variable with the value of negative infinity to use in specifying entries of the last row
NEG_INF = float("-inf")

In [86]:
###
global_affine_opt_score = -7
###
###
global_affine_opt_alignments = [["CAATATG",
                                 "C---ATA"],
                                ["CAATATG",
                                 "CA---TA"]]
###
###
global_affine_last_row = [(NEG_INF, -17, NEG_INF),
                          (-16, -14, NEG_INF),
                          (-13, -11, -21),
                          (-10, -10, -18),
                          (-7, -8, -15)]
###


In [87]:
# tests for global_affine_opt_score
assert isinstance(global_affine_opt_score, int)
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [88]:
# test for global_affine_opt_alignments
assert isinstance(global_affine_opt_alignments, list)
for alignment in global_affine_opt_alignments: check_global_alignment(alignment, "CAATATG", "CATA")
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [89]:
# test for global_affine_last_row_entry_0
assert isinstance(global_affine_last_row[0], tuple)
assert len(global_affine_last_row[0]) == 3
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [90]:
# test for global_affine_last_row_entry_1
assert isinstance(global_affine_last_row[1], tuple)
assert len(global_affine_last_row[1]) == 3
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [91]:
# test for global_affine_last_row_entry_2
assert isinstance(global_affine_last_row[2], tuple)
assert len(global_affine_last_row[2]) == 3
###
### AUTOGRADER TEST - DO NOT REMOVE
###


In [92]:
# test for global_affine_last_row_entry_3
assert isinstance(global_affine_last_row[3], tuple)
assert len(global_affine_last_row[3]) == 3
###
### AUTOGRADER TEST - DO NOT REMOVE
###
