<h1>Challenge 8: The Edit Distance Problem</h1>

Another classic string related dynamic programming problem.

> **Problem statement**

Given two strings, str1 and str2, find the minimum number of operations required to be operated on str1 to convert it into str2. There can be three kinds of operations: i) insertion of a character at some specific position, ii) deletion of a character at some specific position, or iii) changing a character at some specific position into some other character. The visualization below shows all these operations. Each operation has a cost of one unit. Thus, you want to find the minimum cost of converting str1 into str2. The following visualization also shows how different sequences of operations can entail different costs.

Note: This problem has a direct application in the autocorrect feature.

The alignment depiction in the above visualization is a hint towards the solution. You basically need to find an alignment between two strings that has the least cost. When characters match, there is no cost, but there is a cost of one unit when characters do not match, or when a character is skipped in either of the strings

> **Input**

Your algorithm will take as input two strings, str1 and str2.

str1 = "teh"

str2 = "the"

> **Output**

Your algorithm should output the minimum cost of converting str1 into str2 or minimum cost of aligning both strings.

editDistance("teh", "the") = 2

> **Coding challenge**

Hopefully, the above visualization will give you some hint about the solution. However, think about some examples and convince yourself how aligning them returns the edit distance. Then think about a way to align two strings.

> **Solution #1: Simple recursion**




In [1]:
def editDistanceRecurse(str1, str2, i, j):
    if i == len(str1):  # base case of reaching the end of str1
        return len(str2) - j
    
    if j == len(str2):  # base case of reaching the end of str2
        return len(str1) - i
    
    if str1[i] == str2[j]:  # if the characters match, we move ahead
        return editDistanceRecurse(str1, str2, i+1, j+1)
    # if characters don't match
    return 1 + min(editDistanceRecurse(str1, str2, i+1, j+1),   # we can change characters
                   editDistanceRecurse(str1, str2, i, j+1),     # we can have an insertion in str1 (or skip a character in str1)
                   editDistanceRecurse(str1, str2, i+1, j))     # we can have a deletion in str1 (or skip a character in str2)

def editDistance(str1, str2):
    return editDistanceRecurse(str1, str2, 0, 0)

print(editDistance("teh", "the"))

2


> **Solution #2: Top-down dynamic programming**




Let’s first see how this problem satisfies both prerequisites for applying dynamic programming.

**Optimal substructure**

The optimal answer for a pair of strings of size nn and mm can be found by using the following:

*   Optimal answer to the subproblem of substrings of sizes n-1 and m-1 formed by removing the first characters.
*   Optimal answer to the subproblem of keeping the first string as it is (size of n) and removing the first character of the second string (size of m-1).
*   Optimal answer to the subproblem of keeping the second string as it is (size of m) and removing the first character of the first string (size of n-1).

Since we can break down the main problem in terms of specific subproblems, this problem has an optimal substructure.

**Overlapping subproblem**

So, this shows that we can benefit from memoization. Let’s look at the memoized version of this algorithm.

In [2]:
def editDistanceRecurse(str1, str2, i, j, memo):
    if i == len(str1):  # base case of reaching the end of str1
        return len(str2) - j
    
    if j == len(str2):  # base case of reaching the end of str2
        return len(str1) - i
    
    if (i,j) in memo:
        return memo[(i,j)]

    if str1[i] == str2[j]:  # if the characters match, we move ahead
        memo[(i,j)] = editDistanceRecurse(str1, str2, i+1, j+1, memo)
        return memo[(i,j)]
    # if characters don't match
    memo[(i,j)] = 1 + min(editDistanceRecurse(str1, str2, i+1, j+1, memo),   # we can change characters
                          editDistanceRecurse(str1, str2, i, j+1, memo),     # we can have an insertion in str1 (or skip a character in str1)
                          editDistanceRecurse(str1, str2, i+1, j, memo))     # we can have a deletion in str1 (or skip a character in str2)

    return memo[(i,j)]

def editDistance(str1, str2):
    memo = {}
    return editDistanceRecurse(str1, str2, 0, 0, memo)

print(editDistance("teh", "the"))

2


>**Solution #3: Bottom-up dynamic programming**




In [3]:
def editDistance(str1, str2):
    n = len(str1)
    m = len(str2)
    # dp table of size nxm
    dp = [[0 for j in range(m+1)] for i in range(n+1)]

    # filling up dp
    for i in range(n+1):
        for j in range(m+1):
            if i == 0:          # base case of running out of str1
                dp[i][j] = j
            elif j == 0:         # base case of running out of str2
                dp[i][j] = i
            elif str1[i-1] == str2[j-1]:    # case when both characters match
                dp[i][j] = dp[i-1][j-1]
            else:               # case of mismatch
                dp[i][j] = 1 + min(dp[i-1][j-1],    # change character
                                   dp[i][j-1],      # insert i-th character
                                   dp[i-1][j])      # delete i-th character
    return dp[n][m]

print(editDistance("teh", "the"))

2


>**Solution #4: Space optimized bottom-up dynamic programming**




As we can see in the above visualization, for filling any row, we only require the row before it. This means instead of saving a complete 2-d dp table we can just keep a list for the previous row.

In [4]:
def editDistance(str1, str2):
    n = len(str1)
    m = len(str2)
    # dp table of size n, stores a row at a time, for base case filled as [0,1,2..]
    dp = [i for i in range(n+1)]

    # filling up dp
    for j in range(1,m+1):
        thisrow = [0 for i in range(n+1)]
        for i in range(n+1):
            if i == 0:                      # base case of running out of str1
                thisrow[i] = j
            elif str1[i-1] == str2[j-1]:    # case when both characters match
                thisrow[i] = dp[i-1]
            else:                                # case of mismatch
                thisrow[i] = 1 + min(dp[i-1],    # change character
                                     dp[i],      # insert i-th character
                                     thisrow[i-1])      # delete i-th character
        dp = thisrow
    return dp[n]

print(editDistance("teh", "the"))

2
