<h1>Challenge 6: Longest Common Subsequence</h1>

Another classic dynamic programming problem: the longest common subsequence problem.

> **Problem statement**

Given two strings, find the length of the longest common subsequence between them. A common subsequence in a pair of strings is a sequence that follows the same order of characters, but the sequence does not necessarily have to be contiguous. For example, two strings two and too have a common subsequence of to. Even though to does not appear contiguously in two, the order of characters is still preserved, i.e., o follows t. While these strings had smaller subsequences as well, such as t and o, in the context of this problem, we are only interested in finding the length of the longest subsequence.

> **Input**

Your algorithm will take two strings, i.e., str1 and str2, as input. Strings can be of variable length, even empty too.

str1 = "two"

str2 = "too"

> **Output**

Your algorithm should return an integer representing the length of the longest common subsequence.

LCS("two", "too") = 2

> **Coding challenge**

You have already solved a similar problem of the longest common substring in the previous chapter. This problem is slightly different because we are finding subsequence instead of substring here. Think about a simple recursive solution first and then build on that to write a dynamic programming solution.

> **Solution #1: Simple recursion**




In [1]:
# helper function with updated signature: i is current index in str1, j is current index in str2
def LCS_(str1, str2, i, j): 
    if i == len(str1) or j == len(str2): # base case
        return 0
    elif str1[i] == str2[j]:  # if current characters match, increment 1
        return 1 + LCS_(str1, str2, i+1, j+1)
    # else take max of either of two possibilities
    return max(LCS_(str1, str2, i+1, j), LCS_(str1, str2, i, j+1))

def LCS(str1, str2):
    return LCS_(str1, str2, 0, 0)

print(LCS("bed", "read"))

2


> **Solution #2: Top-down dynamic programming**




Let’s see how this problem satisfies both pre-requisites of using dynamic programming.

**Optimal substructure**

If we have a pair of strings str1 and str2 with lengths of n and m, we could construct their optimal solution if we had answers to following three subproblems:

The Case of Match:

*  The solution of substrings of str1 and str2 formed by removing the first characters. (i+1, j+1)

The Case of Mismatch:

*   The solution of the substring of str1 formed by removing its first character and str2 as it is. (i+1, j)

*   The solution of the substring of str2 formed by removing its first character and str1 as it is. (i, j+1)

**Overlapping subproblem**

This clearly shows our algorithm can benefit from tabulation or memoization. Let’s first look at a solution with memoization.

In [2]:
# helper function with updated signature: i is current index in str1, j is current index in str2
def LCS_(str1, str2, i, j, memo): 
    if i == len(str1) or j == len(str2): # base case
        return 0
    elif (i,j) in memo:
        return memo[(i,j)]
    elif str1[i] == str2[j]:  # if current characters match, increment 1
        memo[(i,j)] = 1 + LCS_(str1, str2, i+1, j+1, memo)
        return memo[(i,j)]
    # else take max of either of two possibilities
    memo[(i,j)] = max(LCS_(str1, str2, i+1, j, memo), LCS_(str1, str2, i, j+1, memo))
    return memo[(i,j)]

def LCS(str1, str2):
    memo = {}
    return LCS_(str1, str2, 0, 0, memo)

print(LCS("bed", "read"))

2


>**Solution #3: Bottom-up dynamic programming**




In [3]:
def LCS(str1, str2):
    n = len(str1)   # length of str1
    m = len(str2)   # length of str1

    dp = [[0 for j in range(m+1)] for i in range(n+1)]  # table for tabulation of size m x n
    
    # iterating to fill table
    for i in range(1, n+1):           
        for j in range(1, m+1):
            # if characters at this position match, 
            if str1[i-1] == str2[j-1]:    
                # add 1 to the previous diagonal and store it in this diagonal
                dp[i][j] = dp[i-1][j-1] + 1 
            else:
                # if character don't match, take max of last two positions vertically and horizontally
                dp[i][j] = max(dp[i-1][j], dp[i][j-1]) 
    return dp[n][m]

print(LCS("bed", "read"))

2


>**Solution #4: Space optimized bottom-up dynamic programming**




As we can see in the visualization above, we only use the results of the previous rows to evaluate the next row. Thus, there is no point in keeping the results of the complete m×n size table. Following is an implementation where instead of making a 2d list, we can work with only a 1-d list of size n.

In [5]:
def LCS(str1, str2):
    n = len(str1)   # length of str1
    m = len(str2)   # length of str1

    # table for tabulation, only maintaining state of last row
    dp = [0 for i in range(n+1)]  

    for j in range(1, m+1):           # iterating to fill table
        # calculate new row (based on previous row i.e. dp)
        thisrow = [0 for i in range(n+1)] 
        for i in range(1, n+1):
            # if characters at this position match, 
            if str1[i-1] == str2[j-1]:    
                # add 1 to the previous diagonal and store it in this diagonal
                thisrow[i] = dp[i-1] + 1 
            else:
                # if character don't match, use i-th result from dp, and previous result from thisrow
                thisrow[i] = max(dp[i], thisrow[i-1]) 
        # after evaluating thisrow, set dp equal to this row to be used in the next iteration
        dp = thisrow   
    return dp[n]

print(LCS("who", "wow"))

2
