# Dynamic Programming Warm-Up

## Edit Distance

Can you find the `edit distance` between two strings? The `edit distance` is smallest number of changes you can make to get from `s1` (the first string) to `s2` (the second string). The only changes you're allowed to make are to `Insert`, `Delete` and `Replace` any character from one of the strings.

*Note:* The strings will consist of only lowercase english letters

*Examples:*

1. s1: "addis" s2: "adis" -> 1 (delete second 'd')

3. s1: "ready" s2: "really"  -> 2 (replace 'd' with 'l', insert 'l')

In [None]:
def check(actual, expected):
    if expected != actual:
        print(f"Function should return the value {expected}, it is returning value {actual}")
    else:
        print(f"Congratulations, the test cases passed!")

### 1.1
What is the edit distance between:

s1: "sunday" s2: "saturday"

Write your solution here: 

<details>
<summary>Click here to see the solution</summary>
Solution: 3 (replace 'n' with 'r', insert 'a', insert 't')
</details>

### 1.2
How could we break this problem into subproblems? Write down the subproblem in words.

Your solution: 

<details>
<summary>Click here to see the solution</summary>
This problem gets easier when we compare shorter words. For example, let's say we have a string s1 with m characters and a string s2 with n characters. If we remove the last characters from s1 and s2, then our subproblem becomes: "what is the edit distance between the first m - 1 characters of s1 and the first n - 1 characters of s2?" We could also just remove the last character from one of the starting strings, which would lead to a slightly different subproblem.
</details>

### 1.3

What are the "easy" subproblems or base cases? What are their solutions?

Your solution: 

<details>
<summary>Click here to see the solution</summary>
The easiest case is when either of the words is an empty string.

If the first string is empty (m is 0), we have to insert all the letters from the second string, so the edit distance is the length of the second string.

If the second string is empty (n is 0), we have to delete all the letters from the first string, so the edit distance is the length of the first string.

`if m == 0: return n`

`if n == 0: return m`


</details>

### 1.4
How can we get from one subproblem to another subproblem? What are the cases?

More specifically, if we want to know the `edit_distance(s1[:m], s2[:n])`, how can we calculate that in terms of `s1[:m-1], s2[:n-1]), s1[:m]` and `s2[:n]`?


Write your solution here:

<details>
    <summary><i>Click here to see the solution</i></summary>
There are two cases:

<i>First Case: </i> The last characters in each word are the same. In this case, we don't need to increase the edit distance. We can just move on to the two previous characters.

`return edit_distance(s1[:m-1], s2[:n-1])`

<i>Second Case:</i> The last characters in each word are different. We have three options here: we can either delete, replace, or insert a character to make them the same. The edit distance of <code>s1[:m]</code>, <code>s2[:n]</code> will be the minimum of the edit distance resulting from deleting, inserting, or changing a character to make them the same. Don't forget to add one because we made an edit!


`return 1 + min(edit_distance(s1[:m], s2[:n-1]),        # Insert 
                   edit_distance(s1[:m-1], s2[:n]),    # Remove 
                   edit_distance(s1[:m-1], s2[:n-1])   # Replace 
                   )`

</details>

### 1.5
Put these pieces together to write a function `edit_distance` that uses recursion to solve the problem!



In [None]:
def edit_distance(s1, s2, m, n):
    pass

### Check your answer

Test your recursive solution on the following 2 test cases before you proceed to the next exercises (just execute the following cells).

In [None]:
def test_small(fn):
    def test_1():
        s1 = "ztlqqmtppm"
        s2 = "zfewmvuc"

        m, n = len(s1), len(s2)

        check(fn(s1, s2, m, n), 8)
    
    def test_2():
        s1 = "eodh"
        s2 = "hmowpx"
        
        m, n = len(s1), len(s2)

        check(fn(s1, s2, m, n), 5)
    
    test_1()
    test_2()

test_small(edit_distance)

<details>
<summary>Click here to see the solution</summary>

def edit_distance(s1, s2, m , n):

    if m == 0:
         return n
         
    if n == 0:
        return m
        
    if s1[m-1] == s2[n-1]:
        return edit_distance(s1, s2, m-1, n-1) 
        
    return 1 + min(edit_distance(s1, s2, m, n-1),    # Insert
                   edit_distance(s1, s2, m-1, n),    # Remove 
                   edit_distance(s1, s2, m-1, n-1)   # Replace
                   )
</details>

### 1.6
We can use memoization to improve our solution (remember that memoization means keeping track of the subproblems we've already solved so we don't do unnecessary work).

What would our data structure look like to store the subproblems?

Write your solution here:

<details>
<summary><i>Click here to see the solution</i></summary><br/>
We notice that every subproblem is completely defined by the state (m, n). That means we can use a 2-D array to represent every state (0, 0), (0, 1) . . . (m, n). Also note that we need to initialize our memo table with an invalid value to denote that we have not visited this state yet. Here we put 'None'.
    
```
m = len(s1)
n = len(s2)
memo = [[None for i in range(n+1)] for j in range(m+1)]
```
</details>

### 1.7

Write a new function called `edit_distance_memo` that uses memoization to save the answers into a memo table.

In [None]:
def edit_distance_memo(s1, s2, m, n):
    pass

### Check your answer

Now test your solution on the following 4 test cases (just execute the following cell). Also make sure that you have already tested your recursive solution.

In [None]:
def test_large(fn):
    def test_1():
        s1 = "unfhabkcodvvnhywehylksbxqrpcogowrwhfxppmekrxlmwzvpigguswoigazmwnvkblbcnpjcejetsnafbapjvaykdetadfxtapsgspvacmrzyhhtmquvrltnibowfvjsypixrkfryoaxzrmiyhmzygusdngbcobbvdsvucutgrelrgruipbljesxjudogkqmnqwwqotapqdijmslphcaylzbelbyvbuhghneq"
        s2 = "qyquxbguzngqxvqifkizoyelwpnfbcbpcjjnhwxxcuaxjxigibwfwspuouzukzdrobbpcdszhihsizewhtsgtkjncyezdxlfagpgukultsotfxheydwgmjgjrjalxaojblsddnnoavseroauchhryhzjgtgpczlozhikkiaycsteolhecwfutoyrhkrmerdrhmyeecukly"
        
        m, n = len(s1), len(s2)

        check(fn(s1, s2, m, n), 199)
    
    def test_2():
        s1 = "mzvwhhkckeszquyxfmkvjffintsnhszyvsnjbyoudmlinismsestwagqemblfmrmlakcerxphlyiqxoqmxuqmvrkjapqynworjgjndchvnyawpygkbwqiknhbjflboaauzmkexigmfhkpsckamsgqvtbmpwmnaovdvxbmfowlmarkxwnhldtrwvtbdifi"
        s2 = "ovkruoltszfadqmitjvkjakrxljydrbcoxuyiglwoebvhhhqzkopgtyjjrajlpbtkvqcnokttiaurqpueczbzqdtifwwltyxrllihsdskzdqcivcefbpobrinmmmlpodkzimqxrmhzfvdopohtgxeqdtqaugxagqwvsjmvsktjxtlcsixxomkcrrcetjbymuviwcyvssngdkudczeurhbecpjaozavlftpdubowvhfdwqvkijmbvroko"
        
        m, n = len(s1), len(s2)

        check(fn(s1, s2, m, n), 209)
    
    test_1()
    test_2()

test_small(edit_distance_memo)
test_large(edit_distance_memo)

<details>
<summary><i>Click here to see the solution</i></summary><br/>
    <code>
def edit_distance_memo(str1, str2, m, n):
    memo = [[None for i in range(n+1)] for j in range(m+1)]
    return dp(str1, str2, m, n, memo)

def dp(str1, str2, m, n, memo):
    if m == 0:
         return n
    if n == 0:
        return m
    if memo[m][n] != None:
        return memo[m][n]
    if str1[m-1] == str2[n-1]:
        memo[m][n] = dp(str1, str2, m-1, n-1, memo)
        return memo[m][n] 
    memo[m][n] = 1 + min(dp(str1, str2, m, n-1, memo),   # Insert
                       dp(str1, str2, m-1, n, memo),     # Remove 
                       dp(str1, str2, m-1, n-1, memo)    # Replace
                     )
    return memo[m][n]

</code>
</details>

## Remember the Steps

As you're solving other dynamic programming problems, remember the steps you followed here to break down the problem.

1. How could we break this problem into subproblems? Write down the subproblem in words. 

2. What are the base cases? What are their solutions?

3. How can we get from one subproblem to another subproblem? What are the cases?

4. Can you incorporate memoization?