The edit distance (or Levenshtein distance) between two strings `word1` and `word2` is the minimum number of operations required to transform one string into the other. The allowed operations are:

1. **Insert a character**
2. **Delete a character**
3. **Replace a character**

To solve this problem in Python, we typically use dynamic programming. Here's how you can approach it:

### Steps to Solve the Edit Distance Problem:

1. **Define the Problem:**
   - Let `word1` be the first string and `word2` be the second string.
   - We need to find the minimum number of operations to convert `word1` to `word2`.

2. **Initialize the DP Table:**
   - Create a 2D list `dp` where `dp[i][j]` represents the minimum number of operations required to convert the first `i` characters of `word1` to the first `j` characters of `word2`.

3. **Base Cases:**
   - If either string is empty, the edit distance is the length of the other string (all insertions or deletions).
   - `dp[i][0] = i` for all `i` (convert `word1` to an empty string by deleting all characters).
   - `dp[0][j] = j` for all `j` (convert an empty string to `word2` by inserting all characters).

4. **Fill the DP Table:**
   - For each character in `word1` and `word2`, determine if they are the same.
   - If they are the same, no new operation is needed, so `dp[i][j] = dp[i-1][j-1]`.
   - If they are different, consider the minimum of the three possible operations (insert, delete, replace) and add one to the result:
     - Insert: `dp[i][j-1] + 1`
     - Delete: `dp[i-1][j] + 1`
     - Replace: `dp[i-1][j-1] + 1`

5. **Return the Result:**
   - The value at `dp[len(word1)][len(word2)]` will be the minimum number of operations required to convert `word1` to `word2`.

Here is the Python code implementing this approach:

```python
def minDistance(word1: str, word2: str) -> int:
    m, n = len(word1), len(word2)

    # Initialize the dp table with dimensions (m+1) x (n+1)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    # Base cases
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j

    # Fill the dp table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]  # Characters match, no operation needed
            else:
                dp[i][j] = min(dp[i - 1][j] + 1,    # Delete
                               dp[i][j - 1] + 1,    # Insert
                               dp[i - 1][j - 1] + 1)  # Replace

    return dp[m][n]

# Example usage:
word1 = "horse"
word2 = "ros"
print(minDistance(word1, word2))  # Output: 3
```

### Explanation of Example Usage:
- `word1 = "horse"`, `word2 = "ros"`
- To transform "horse" into "ros":
  1. Replace 'h' with 'r' -> "rorse"
  2. Remove 'r' -> "rose"
  3. Remove 'e' -> "ros"
- Total of 3 operations.

The `minDistance` function calculates the minimum number of operations needed to convert `word1` into `word2` using the dynamic programming approach outlined above.

# 72. Edit Distance
Medium

### Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:
- Insert a character
- Delete a character
- Replace a character

```
Example 1:
    Input: word1 = "horse", word2 = "ros"
    Output: 3
    Explanation:
    horse -> rorse (replace 'h' with 'r')
    rorse -> rose (remove 'r')
    rose -> ros (remove 'e')
Example 2:
    Input: word1 = "intention", word2 = "execution"
    Output: 5
    Explanation:
    intention -> inention (remove 't')
    inention -> enention (replace 'i' with 'e')
    enention -> exention (replace 'n' with 'x')
    exention -> exection (replace 'n' with 'c')
    exection -> execution (insert 'u')

Constraints:
    0 <= word1.length, word2.length <= 500
    word1 and word2 consist of lowercase English letters.
```

To understand the intuition behind solving the edit distance problem using a 2D array (dynamic programming table), let's break down the problem and the role of each element in the table:

### 1. Problem Breakdown

The goal is to find the minimum number of operations required to transform one string (`word1`) into another string (`word2`). The operations allowed are insertion, deletion, and substitution.

### 2. Dynamic Programming Table Setup

We use a 2D table `dp` where `dp[i][j]` represents the minimum number of operations required to transform the first `i` characters of `word1` into the first `j` characters of `word2`.

### 3. Initialization

- **Base Cases:**
  - `dp[0][j]`: Transforming an empty `word1` to the first `j` characters of `word2` requires `j` insertions.
  - `dp[i][0]`: Transforming the first `i` characters of `word1` to an empty `word2` requires `i` deletions.

### 4. Filling the DP Table

For each cell `dp[i][j]`:
- **If the characters are the same** (`word1[i-1] == word2[j-1]`), no new operation is needed, and thus `dp[i][j] = dp[i-1][j-1]`.
- **If the characters are different**, we consider the three possible operations (insert, delete, replace) and take the minimum of these operations plus one:
  - **Insert**: `dp[i][j-1] + 1`
  - **Delete**: `dp[i-1][j] + 1`
  - **Replace**: `dp[i-1][j-1] + 1`

### 5. Intuition Behind the Diagonal

The diagonal in the 2D table (`dp[i-1][j-1]`) is significant because it represents the state where the previous characters of both strings have already been matched or transformed optimally. Here's why it's important:

- **Diagonal Move (`dp[i-1][j-1]`)**:
  - If the characters are the same, no operation is needed, and we carry over the previous state (`dp[i-1][j-1]`).
  - If the characters are different, a substitution is required, representing the diagonal move plus one substitution operation.

- **Vertical Move (`dp[i-1][j]`)**:
  - Represents a deletion operation from `word1`.

- **Horizontal Move (`dp[i][j-1]`)**:
  - Represents an insertion operation to `word2`.

The diagonal move is particularly important for matching characters and performing substitutions, making it a critical part of the transformation logic.

### Example to Illustrate

Consider transforming "intention" into "execution":

```python
word1 = "intention"
word2 = "execution"
```

The `dp` table will be filled as follows:

1. **Initialization**:
   - `dp[0][j]` for `j` in range(len(word2)+1)
   - `dp[i][0]` for `i` in range(len(word1)+1)

2. **Filling the Table**:
   - Iterate over each character pair (`i`, `j`) and update `dp[i][j]` based on whether the characters match or not.

Here's a partial visualization of the table filling process:

|    |   | e | x | e | c | u | t | i | o | n |
|----|---|---|---|---|---|---|---|---|---|---|
|    | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
| i  | 1 | 1 | 2 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| n  | 2 | 2 | 2 | 3 | 3 | 4 | 5 | 6 | 7 | 8 |
| t  | 3 | 3 | 3 | 3 | 4 | 4 | 4 | 5 | 6 | 7 |
| e  | 4 | 3 | 4 | 3 | 4 | 5 | 5 | 5 | 6 | 7 |
| n  | 5 | 4 | 4 | 4 | 4 | 5 | 6 | 6 | 6 | 7 |
| t  | 6 | 5 | 5 | 5 | 5 | 5 | 6 | 7 | 7 | 7 |
| i  | 7 | 6 | 6 | 6 | 6 | 6 | 6 | 6 | 7 | 8 |
| o  | 8 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| n  | 9 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 8 | 7 |

The final value at `dp[len(word1)][len(word2)]` gives the minimum edit distance, which is 5 in this case.

### Conclusion

Using a 2D array allows us to keep track of the optimal transformation costs for all subproblems (prefixes of the strings). The diagonal entries are particularly significant because they represent situations where characters are either matched (no change needed) or substituted, providing a clear pathway for optimal transformation decisions. This structured approach ensures that we consider all possible transformations efficiently.

# Why compare the previous cells for the value of the current cell?
In the edit distance problem, the character comparison at the previous index is crucial for building up the solution incrementally. Let's explore why this is necessary by understanding the nature of the problem and the dynamic programming approach used to solve it.

### Edit Distance Problem

The edit distance (or Levenshtein distance) between two strings is the minimum number of operations (insertions, deletions, or substitutions) required to transform one string into the other.

### Dynamic Programming Approach

We use a 2D array `dp` where `dp[i][j]` represents the edit distance between the first `i` characters of string `A` and the first `j` characters of string `B`.

### Character Comparison

When comparing characters from the two strings, we have three possible operations:
1. **Insertion**: Inserting a character into `A` to match `B`.
2. **Deletion**: Deleting a character from `A` to match `B`.
3. **Substitution**: Replacing a character in `A` to match `B`.

### Recursive Relation

To fill in `dp[i][j]`, we consider the following cases:

1. **Characters Match**: If the characters `A[i-1]` and `B[j-1]` are the same, no new operation is needed, so:
   \[
   dp[i][j] = dp[i-1][j-1]
   \]
   This means the edit distance remains the same as for the previous indices.

2. **Characters Don't Match**: If the characters `A[i-1]` and `B[j-1]` are different, we consider the minimum operations required from the previous steps and add 1 (for the current operation):
   \[
   dp[i][j] = 1 + \min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1])
   \]
   - `dp[i-1][j]`: Deletion (removing a character from `A`)
   - `dp[i][j-1]`: Insertion (adding a character to `A`)
   - `dp[i-1][j-1]`: Substitution (replacing a character in `A`)

### Example

Let's consider two strings `A = "kitten"` and `B = "sitting"`.

Here's how the comparison and filling of the `dp` table work:

```plaintext
  "" s i t t i n g
"" 0 1 2 3 4 5 6 7
k 1 1 2 3 4 5 6 7
i 2 2 1 2 3 4 5 6
t 3 3 2 1 2 3 4 5
t 4 4 3 2 1 2 3 4
e 5 5 4 3 2 2 3 4
n 6 6 5 4 3 3 2 3
```

- `dp[1][1]`: Compare `k` and `s`. They are different. So, `dp[1][1] = 1 + min(dp[0][1], dp[1][0], dp[0][0]) = 1 + min(1, 1, 0) = 1`.
- `dp[2][2]`: Compare `i` and `i`. They are the same. So, `dp[2][2] = dp[1][1] = 1`.
- And so on...

By looking at the previous indices, we ensure that we are considering all possible ways to transform the strings up to that point and choose the minimum operations required.

In summary, the character comparison at the previous index allows us to build the solution incrementally, ensuring we account for all possible operations (insertions, deletions, substitutions) and choose the optimal solution at each step.

### The comparison "if word1[i-1] == word2[j-1]" rather than "if word1[i] == word2[j]" ensures that we are correctly comparing the appropriate characters based on the current lengths of the substrings being considered since the loop begins at 1, and the words need to compare starting from the 0th index of the words. This aligns with the definition of the DP table, where dp[i][j] represents the edit distance between the first i characters of word1 and the first j characters of word2.

In [84]:
def minDistance(word1: str, word2: str) -> int:
    m, n = len(word1), len(word2)

    # Initialize the dp table with dimensions (m+1) x (n+1)
    dp = [[0] * (n + 1) for _ in range(m + 1)]

    # Base cases
    for i in range(m + 1):
        dp[i][0] = i
    for j in range(n + 1):
        dp[0][j] = j
    print("base case, ",dp)

    # Fill the dp table
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                dp[i][j] = dp[i - 1][j - 1]  # Characters match, no operation needed
            else:
                dp[i][j] = min(dp[i - 1][j] + 1,    # Delete
                               dp[i][j - 1] + 1,    # Insert
                               dp[i - 1][j - 1] + 1)  # Replace
    print("\ndp: ",dp)
    return dp[m][n]

In [82]:
word1 = "horse"
word2 = "ros"
minDistance(word1,word2)

base case,  [[0, 1, 2, 3], [1, 0, 0, 0], [2, 0, 0, 0], [3, 0, 0, 0], [4, 0, 0, 0], [5, 0, 0, 0]]

dp:  [[0, 1, 2, 3], [1, 1, 2, 3], [2, 2, 1, 2], [3, 2, 2, 2], [4, 3, 3, 2], [5, 4, 4, 3]]


3

In [83]:
word1 = "intention"
word2 = "execution"
minDistance(word1,word2)

base case,  [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [2, 0, 0, 0, 0, 0, 0, 0, 0, 0], [3, 0, 0, 0, 0, 0, 0, 0, 0, 0], [4, 0, 0, 0, 0, 0, 0, 0, 0, 0], [5, 0, 0, 0, 0, 0, 0, 0, 0, 0], [6, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 0, 0, 0, 0, 0, 0, 0, 0, 0], [8, 0, 0, 0, 0, 0, 0, 0, 0, 0], [9, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

dp:  [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 1, 2, 3, 4, 5, 6, 6, 7, 8], [2, 2, 2, 3, 4, 5, 6, 7, 7, 7], [3, 3, 3, 3, 4, 5, 5, 6, 7, 8], [4, 3, 4, 3, 4, 5, 6, 6, 7, 8], [5, 4, 4, 4, 4, 5, 6, 7, 7, 7], [6, 5, 5, 5, 5, 5, 5, 6, 7, 8], [7, 6, 6, 6, 6, 6, 6, 5, 6, 7], [8, 7, 7, 7, 7, 7, 7, 6, 5, 6], [9, 8, 8, 8, 8, 8, 8, 7, 6, 5]]


5

In [15]:
def func(word1, word2):
    
    # get lengths of words
    n = len(word1)
    m = len(word2)

    # initialize dp 2D array to count the number of operations for the ith,jth cell
        # let insert be (j - 1) + 1
        # let delete be (i - 1) + 1
        # let replace/no change be j, i +/- 1
    dp = [ [0] * (n + 1) for _ in range(m + 1) ]
        
    # initialize base cases
    for j in range(n+1):
        dp[0][j] = j
        # perform only insertion on the word
    for i in range(m+1):
        dp[i][0] = i
        # perform only deletion on the word
    print(dp)
    
    # LOOP: iterate each cell, begin from 1
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            # check if the characters are same/diff
            # case 1: same chars. take the value from the prev.upper.diag, dont add 1.
            if word1[i-1] == word2[j-1]:
                dp[i][j] = dp[i-1][j-1]
            # case 2: different chars. choose the minimum distance between the three neighbour cells for the current cell
            else:
                least = min(dp[i-1][j-1], dp[i-1][j], dp[i][j-1])
                dp[i][j] = least + 1
                
    print("\ndp: ",dp)
    return dp[m][n]
    # at the end, return the minimum

In [16]:
word1 = "intention"
word2 = "execution"
func(word1,word2)

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 0, 0, 0, 0, 0, 0, 0, 0, 0], [2, 0, 0, 0, 0, 0, 0, 0, 0, 0], [3, 0, 0, 0, 0, 0, 0, 0, 0, 0], [4, 0, 0, 0, 0, 0, 0, 0, 0, 0], [5, 0, 0, 0, 0, 0, 0, 0, 0, 0], [6, 0, 0, 0, 0, 0, 0, 0, 0, 0], [7, 0, 0, 0, 0, 0, 0, 0, 0, 0], [8, 0, 0, 0, 0, 0, 0, 0, 0, 0], [9, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

dp:  [[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [1, 1, 2, 3, 4, 5, 6, 6, 7, 8], [2, 2, 2, 3, 4, 5, 6, 7, 7, 7], [3, 3, 3, 3, 4, 5, 5, 6, 7, 8], [4, 3, 4, 3, 4, 5, 6, 6, 7, 8], [5, 4, 4, 4, 4, 5, 6, 7, 7, 7], [6, 5, 5, 5, 5, 5, 5, 6, 7, 8], [7, 6, 6, 6, 6, 6, 6, 5, 6, 7], [8, 7, 7, 7, 7, 7, 7, 6, 5, 6], [9, 8, 8, 8, 8, 8, 8, 7, 6, 5]]


5