# Space-Efficient Fitting Alignment Using the Hirschberg Algorithm

The fitting alignment problem refers to finding the maximal global alignment between a string v and a longer reference string w, out of all possible substrings of string w. Space-efficient algorithms are designed to reduce the relative amount of memory needed to achieve a certain functionality. The goal of this final project is to implement an algorithm to determine optimal fitting alignment in linear space and polynomial time. Specifically, the algorithm will run in O(m) space and O(mn) time for input sequences of length m and n. This algorithm will feature the Hirschberg Algorithm described in class and drawn from  the paper: “A Linear Space Algorithm for Computing Maximal Common Subsequences”. The Hirschberg Algorithm requires an initial call containing the beginning position and the ending position. In a global alignment problem, these values are inherently known. However, this is not true for a fitting alignment problem, which is why we must define subroutines to find the beginning of the optimal fitting alignment and the ending of the optimal fitting alignment while preserving O(m) memory usage and O(mn) time complexity. This is done below.

In [88]:
import numpy as np

def findEnd(short, reference, delta):
    M = [[0 for j in range(2)] for i in range(len(short)+1)]
    M = np.array(M)
    
    bestEnd = (len(short), len(reference))
    maxScore = float('-inf')
    for i in range(len(short) + 1):
        if i > 0:
            M[i][0] = M[i-1][0] + delta[short[i-1]]['-']
    for j in range(len(reference) + 1):
        for i in range(len(short) + 1):
            if j == 0:
                M[0][1] = 0
            if i > 0:
                diag = M[i-1][0] + delta[short[i-1]][reference[j-1]]
                delete = M[i-1][1] + delta[short[i-1]]['-']
                insert = M[i][0] + delta[reference[j-1]]['-']
                M[i][1] = max(diag, delete, insert)
                if i == len(M)-1:
                    if M[i][1] >= maxScore:
                        maxScore = M[i][1]
                        bestEnd = (i, j)
                    M[:,0] = M[:,1]
    
    return maxScore, bestEnd
                
            

In [89]:
keys = ['A', 'C', 'T', 'G', '-']
delta = {}
for i in range(len(keys)):
    delta[keys[i]] = {k : v for (k,v) in zip(keys, [1 if keys[i] == keys[j]  else -1 for j in range(len(keys))])}

score,fend = findEnd("TAGATA", "GTAGGCTTAAGGTTA", delta)
score,fend

(2, (6, 15))

In [90]:
def findStart(short, reference, delta, end):
    M = [[0 for j in range(2)] for i in range(len(short)+1)]
    M = np.array(M)
    bestStart = (0, len(reference))
    maxScore = float('-inf')
    for i in range(len(short) + 1):
        if i > 0:
            M[i][0] = M[i-1][0] + delta[short[i-1]]['-']
    for j in range(1,end[1]):
        for i in range(1,end[0]):
            if j == 0:
                M[0][1] = 0
            if i > 0:
                k = end[1] - j
                l = end[0] - i
                diag = M[i-1][0] + delta[short[l+1]][reference[k+1]]
                delete = M[i-1][1] + delta[short[l+1]]['-']
                insert = M[i][0] + delta[reference[k+1]]['-']
                M[i][1] = max(diag, delete, insert)
                if i == 0:
                    if M[i][1] > maxScore:
                        maxScore = M[i][1]
                        bestStart = (i, j)
                    M[0][0] = M[0][1]
    
    return bestStart

In [91]:
keys = ['A', 'C', 'T', 'G', '-']
delta = {}
for i in range(len(keys)):
    delta[keys[i]] = {k : v for (k,v) in zip(keys, [1 if keys[i] == keys[j]  else -1 for j in range(len(keys))])}

start = findStart("TAGATA", "GTAGGCTTAAGGTTA", delta, fend)
start

IndexError: string index out of range