# Day 9 notebook

The objectives of this notebook are to practice running (by hand) the dynamic programming algorithms for 

* global alignment with linear gap penalty
* local alignment with linear gap penalty
* global alignment with affine gap penalty

## Sequences to align

In this activity, you will align the same pair of sequences multiple times, but with different alignment algorithms.  The two sequences to align are: `CAATATG` and `CATA`.

You may find the included [worksheet](day09_activity_worksheet.pdf) useful for running the dynamic programming algorithms.

### PROBLEM 1: Global alignment with linear gap penalty (3 POINTS)

Align the sequences by hand using the Needleman–Wunsch algorithm (global alignment with linear gap penalty).  Use the following scoring scheme:
* Match: +1
* Mismatch: -1
* Space: -2

To submit your solution, do the following variable assignments in the solution cell below:

* assign to the variable `global_linear_opt_score` the optimal alignment *score* 
* assign to the variable `global_linear_opt_alignments` a *list* of *all* alignments that achieve that optimal score
* assign to the variable `global_linear_last_row` a *list* representing the entries in the last row of the dynamic programming matrix.

Each alignment should be represented by a list of two strings.  The first sequence, `CAATATG`, should be represented by the first string.  For example, here is an example of a list of alignments (non-optimal alignments):

In [1]:
# example of a list of alignments
[["CAATATG",
  "CATA---"],
 ["CAATATG",
  "--C-ATA"],
 ["CA-ATATG",
  "CATA----"]]

[['CAATATG', 'CATA---'], ['CAATATG', '--C-ATA'], ['CA-ATATG', 'CATA----']]

In [2]:
### BEGIN SOLUTION TEMPLATE=global_linear_opt_score = ?
global_linear_opt_score = -2
### END SOLUTION
### BEGIN SOLUTION TEMPLATE=global_linear_opt_alignments = ?
global_linear_opt_alignments = [["CAATATG",
                                 "C-ATA--"],
                                ["CAATATG",
                                 "CA-TA--"]]
### END SOLUTION
### BEGIN SOLUTION TEMPLATE=global_linear_last_row = ?
global_linear_last_row = [-14, -11, -8, -5, -2]
### END SOLUTION

In [3]:
# tests for global_linear_opt_score
assert isinstance(global_linear_opt_score, int)
### BEGIN HIDDEN TESTS
assert global_linear_opt_score == -2
### END HIDDEN TESTS

In [4]:
# test for global_linear_opt_alignments
assert isinstance(global_linear_opt_alignments, list)
### BEGIN HIDDEN TESTS
assert sorted(global_linear_opt_alignments) == [["CAATATG",
                                                 "C-ATA--"],
                                                ["CAATATG",
                                                 "CA-TA--"]]
### END HIDDEN TESTS

In [5]:
# test for global_linear_last_row_entry_0
assert isinstance(global_linear_last_row[0], int)
### BEGIN HIDDEN TESTS
assert global_linear_last_row[0] == -14
### END HIDDEN TESTS

In [6]:
# test for global_linear_last_row_entry_1
assert isinstance(global_linear_last_row[1], int)
### BEGIN HIDDEN TESTS
assert global_linear_last_row[1] == -11
### END HIDDEN TESTS

In [7]:
# test for global_linear_last_row_entry_2
assert isinstance(global_linear_last_row[2], int)
### BEGIN HIDDEN TESTS
assert global_linear_last_row[2] == -8
### END HIDDEN TESTS

In [8]:
# test for global_linear_last_row_entry_3
assert isinstance(global_linear_last_row[3], int)
### BEGIN HIDDEN TESTS
assert global_linear_last_row[3] == -5
### END HIDDEN TESTS

### PROBLEM 2: Local alignment with linear gap penalty (3 POINTS)

Align the sequences by hand using the Smith–Waterman algorithm (local alignment with linear gap penalty).  Use the following scoring scheme:
* Match: +1
* Mismatch: -1
* Space: -2

To submit your solution, do the following variable assignments in the solution cell below:

* assign to the variable `local_linear_opt_score` the optimal alignment *score* 
* assign to the variable `local_linear_opt_alignments` a *list* of *all* alignments that achieve that optimal score
* assign to the variable `local_linear_last_row` a *list* representing the entries in the last row of the dynamic programming matrix.

In [9]:
### BEGIN SOLUTION TEMPLATE=local_linear_opt_score = ?
local_linear_opt_score = 3
### END SOLUTION
### BEGIN SOLUTION TEMPLATE=local_linear_opt_alignments = ?
local_linear_opt_alignments = [["ATA",
                                "ATA"]]
### END SOLUTION
### BEGIN SOLUTION TEMPLATE=local_linear_last_row = ?
local_linear_last_row = [0, 0, 0, 0, 1]
### END SOLUTION

In [10]:
# tests for local_linear_opt_score
assert isinstance(local_linear_opt_score, int)
### BEGIN HIDDEN TESTS
assert local_linear_opt_score == 3
### END HIDDEN TESTS

In [11]:
# test for local_linear_opt_alignments
assert isinstance(local_linear_opt_alignments, list)
### BEGIN HIDDEN TESTS
assert sorted(local_linear_opt_alignments) == [["ATA",
                                                "ATA"]]
### END HIDDEN TESTS

In [12]:
# test for local_linear_last_row_entry_0
assert isinstance(local_linear_last_row[0], int)
### BEGIN HIDDEN TESTS
assert local_linear_last_row[0] == 0
### END HIDDEN TESTS

In [13]:
# test for local_linear_last_row_entry_1
assert isinstance(local_linear_last_row[1], int)
### BEGIN HIDDEN TESTS
assert local_linear_last_row[1] == 0
### END HIDDEN TESTS

In [14]:
# test for local_linear_last_row_entry_2
assert isinstance(local_linear_last_row[0], int)
### BEGIN HIDDEN TESTS
assert local_linear_last_row[2] == 0
### END HIDDEN TESTS

In [15]:
# test for local_linear_last_row_entry_3
assert isinstance(local_linear_last_row[0], int)
### BEGIN HIDDEN TESTS
assert local_linear_last_row[3] == 0
### END HIDDEN TESTS

### PROBLEM 3: Global alignment with affine gap penalty (3 POINTS)

Align the sequences by hand using the global alignment with affine gap penalty algorithm.  Use the following scoring scheme:
* Match: +1
* Mismatch: -1
* Gap: -3
* Space: -2

To submit your solution, do the following variable assignments in the solution cell below:

* assign to the variable `global_affine_opt_score` the optimal alignment *score* 
* assign to the variable `global_affine_opt_alignments` a *list* of *all* alignments that achieve that optimal score
* assign to the variable `global_affine_last_row` a *list* representing the entries in the last row of the dynamic programming matrix.

For the last row, we will imagine that the three matrices, $M$, $I_x$, and $I_y$ have been collapsed into a single matrix, where the entry in each cell of the collapsed matrix has the entries from the three matrices represented as a tuple.  That is, if $C$ is the collapsed matrix, then $C[i, j] = (M[i,j], I_x[i,j], I_y[i,j])$.

In [16]:
# Constant variable with the value of negative infinity to use in specifying entries of the last row
NEG_INF = float("-inf")

In [17]:
### BEGIN SOLUTION TEMPLATE=global_affine_opt_score = ?
global_affine_opt_score = -7
### END SOLUTION
### BEGIN SOLUTION TEMPLATE=global_affine_opt_alignments = ?
global_affine_opt_alignments = [["CAATATG",
                                 "C---ATA"],
                                ["CAATATG",
                                 "CA---TA"]]
### END SOLUTION
### BEGIN SOLUTION TEMPLATE=global_affine_last_row = ?
global_affine_last_row = [(NEG_INF, -17, NEG_INF),
                          (-16, -14, NEG_INF),
                          (-13, -11, -21),
                          (-10, -10, -18),
                          (-7, -8, -15)]
### END SOLUTION

In [18]:
# tests for global_affine_opt_score
assert isinstance(global_affine_opt_score, int)
### BEGIN HIDDEN TESTS
assert global_affine_opt_score == -7
### END HIDDEN TESTS

In [19]:
# test for global_affine_opt_alignments
assert isinstance(global_affine_opt_alignments, list)
### BEGIN HIDDEN TESTS
assert sorted(global_affine_opt_alignments) == [["CAATATG",
                                                 "C---ATA"],
                                                ["CAATATG",
                                                 "CA---TA"]]
### END HIDDEN TESTS

In [20]:
# test for global_affine_last_row_entry_0
assert isinstance(global_affine_last_row[0], tuple)
assert len(global_affine_last_row[0]) == 3
### BEGIN HIDDEN TESTS
assert global_affine_last_row[0] == (NEG_INF, -17, NEG_INF)
### END HIDDEN TESTS

In [21]:
# test for global_affine_last_row_entry_1
assert isinstance(global_affine_last_row[1], tuple)
assert len(global_affine_last_row[1]) == 3
### BEGIN HIDDEN TESTS
assert global_affine_last_row[1] == (-16, -14, NEG_INF)
### END HIDDEN TESTS

In [22]:
# test for global_affine_last_row_entry_2
assert isinstance(global_affine_last_row[2], tuple)
assert len(global_affine_last_row[2]) == 3
### BEGIN HIDDEN TESTS
assert global_affine_last_row[2] == (-13, -11, -21)
### END HIDDEN TESTS

In [23]:
# test for global_affine_last_row_entry_3
assert isinstance(global_affine_last_row[3], tuple)
assert len(global_affine_last_row[3]) == 3
### BEGIN HIDDEN TESTS
assert global_affine_last_row[3] == (-10, -10, -18)
### END HIDDEN TESTS