### An Auto correct system is an application that changes mispelled words into the correct ones.



In [None]:
# In this notebook I'll show how to implement an Auto Correct System that its very usefull.
# This auto correct system only search for spelling erros, not contextual errors.



*The implementation can be divided into 4 steps:*

[1]. **Identity a mispelled word.**

[2]. **Find strings n Edit Distance away**

[3]. **Filter Candidates** (*as Real Words that are spelled correct*)

[4]. **Calculate Word Probabilities.** (*Choose the most likely cadidate to be the replacement*)

### 1. Identity a mispelled Word

*To identify if a word was mispelled, you can check if the word is in the dictionary / vocabulary.*

In [1]:
vocab = ['dean','deer','dear','fries','and','coke', 'congratulations', 'my']

word_test = 'Congratulations my deah'
word_test = word_test.lower()
word_test = word_test.split()

for word in word_test:
    if word in vocab:
        print(f'The word: {word} is in the vocab')
    else:
        print(f"The word: {word} isn't in the vocabulary")


The word: congratulations is in the vocab
The word: my is in the vocab
The word: deah isn't in the vocabulary


### 2. Find strings n Edit Distance Away

*Edit is a operation performed on a string to change into another string. Edit distance count the number of these operations*

*So **n Edit Distance** tells you how many operations away one string is from another.*

*For this application we'll use the Levenshtein Distance value's cost, where this edit value are:*

* **Insert** - Operation where you insert a letter, the cost is equal to 1.

* **Delete** - Operation where you delete a letter, the cost is equal to 1.

* **Replace** - Operation where you replace one letter to another, the cost is equal to 2.

* **Switch** - Operation where you swap  2 **adjacent** letters

*Also we'll use the Minimum Edit Distance which is the minimum number of edits needed to transform 1 string into the other, for that we are using n = 2 and the Dynamic Programming algorithm. ( will be explained when it is implemented ) for evaluate our model*


In [2]:
# To implement this operations we need to split the word into 2 parts in all possible ways

word = 'dear'

split_word = [[word[:i], word[i:]] for i in range(len(word) + 1)]
for i in split_word:
    print(i)

['', 'dear']
['d', 'ear']
['de', 'ar']
['dea', 'r']
['dear', '']


In [3]:
# The delete operation need to delete each possible letter from the original word.

delete_operation = [[L + R[1:]] for L, R in split_word if R ]

for i in delete_operation:
    print(i)

['ear']
['dar']
['der']
['dea']


In [4]:
# The same way the insert operation need to add each possible letter from the vocab to the original word

letters = 'abcdefghijklmnopqrstuvwxyz'
insert_operation = [L + s + R for L, R in split_word for s in letters]

c = 0
print('the first insert operations: ')
print()
for i in insert_operation:
    print(i)
    c += 1
    if c == 4:
        break
c = 0
print('the last insert operations:')
print()
for i in insert_operation:
    c += 1
    if c > 126:
        print(i)


the first insert operations: 

adear
bdear
cdear
ddear
the last insert operations:

dearw
dearx
deary
dearz


In [5]:
# Switch Operation

switch_operation = [[L[:-1] + R[0] + L[-1] + R[1:]] for L, R in split_word if R and L]

for i in switch_operation:
    print(i)


['edar']
['daer']
['dera']


In [6]:
# Replace Operation

letters = 'abcdefghijklmnopqrstuvwxyz'
replace_operation = [L + s + (R[1:] if len(R) > 1 else '') for L, R in split_word if R for s in letters ] 

c = 0
print('the first replace operations: ')
print()
for i in replace_operation:
    print(i)
    c += 1
    if c == 4:
        break

c = 0
print('the last replace operations:')
print()
for i in replace_operation:
    c += 1
    if c > 100:
        print(i)

        
# Remember that at the end we need to remove the word it self
replace_operation = set(replace_operation)
replace_operation.discard('dear')

the first replace operations: 

aear
bear
cear
dear
the last replace operations:

deaw
deax
deay
deaz


### 3. Filter Candidates

*We only want to consider real and correctly spelled words form the candidate lists, so we need to compare to a know dictionary.*

*If the string does not appears in the dict, remove from the candidates, this way resulting in a list of actual words only*

In [7]:
vocab = ['dean','deer','dear','fries','and','coke', 'congratulations', 'my']

# for example we can use the replace operations words to filter in our vocab

filtered_words = [word for word in replace_operation if word in vocab]
print(filtered_words)

['deer', 'dean']


### 4. Calculate the words probabilities

*We need to find the most likely word from the cadidate list, to calculate the probability of a word in the 
sentence we need to first calculate the word frequencies, also we want to count the total number of word in the body of texts
or corpus.*

*So we compute the probability that each word will appear if randomly selected from the corpus of words.*

$$P(w_i) = \frac{C(w_i)}{M} \tag{Eq 01}$$
*where*

$C(w_i)$ *is the total number of times $w_i$ appears in the corpus.*

$M$ *is the total number of words in the corpus.*

*For example, the probability of the word 'am' in the sentence **'I am happy because I am learning'** is:*

$$P(am) = \frac{C(w_i)}{M} = \frac {2}{7} \tag{Eq 02}.$$

### Now the we know the four steps of the Auto Correct System, we can start to implement it

In [8]:
# import libraries
import re
from collections import Counter
import numpy as np
import pandas as pd

*The first thing to do is the data pre processing, for this example we'll use the file called **'shakespeare.txt'** this file can be found in the directory.*

In [9]:
def process_data(filename):
    """
    Input: 
        A file_name which is found in the current directory. We just have to read it in. 
    Output: 
        words: a list containing all the words in the corpus (text file you read) in lower case. 
    """
    
    words = []
    with open(filename, 'r') as f:
        text = f.read()
    
    words = re.findall(r'\w+', text)
    words = [word.lower() for word in words]
    
    return words

In [10]:
words = process_data('shakespeare.txt')
vocab = set(words) # eliminate duplicates 

print(f'The vocabulary has {len(vocab)} unique words.')

The vocabulary has 6116 unique words.


*The second step, we need to count the frequency of every word in the dictionary to later calculate the probabilities*


In [11]:
def get_count(word):
    '''
    Input:
        word_l: a set of words representing the corpus. 
    Output:
        word_count_dict: The wordcount dictionary where key is the word and value is its frequency.
    '''
    word_count_dict = {}
    
    word_count_dict = Counter(word)
    
    return word_count_dict


word_count_dict = get_count(words)
print(f'There are {len(word_count_dict)} key par values')
print(f"The count for the word 'thee' is {word_count_dict.get('thee',0)}")

There are 6116 key par values
The count for the word 'thee' is 240


*Now we must calculate the probability that each word appears using the (eq 01):*

In [12]:

def get_probs(word_count_dict):
    '''
    Input:
        word_count_dict: The wordcount dictionary where key is the word and value is its frequency.
    Output:
        probs: A dictionary where keys are the words and the values are the probability that a word will occur. 
    '''
    
    probs = {}
    total_words = 0
    
    for word, value in word_count_dict.items():
        total_words += value  # we add the quantity of each word appears
        
    for word, value in word_count_dict.items():
        probs[word] = value / total_words
        
    
    return probs

probs = get_probs(word_count_dict)
print(f"Length of probs is {len(probs)}")
print(f"P('thee') is {probs['thee']:.4f}")

Length of probs is 6116
P('thee') is 0.0045


*Now, that we have computed $P(w_i)$ for all the words in the corpus, we'll write the functions such as delete, insert, switch and replace to manipulate strings so that we can edit the erroneous strings and return the right spellings of the words.*

In [13]:
def delete_letter(word, verbose = False):
    '''
    Input:
        word: the string/word for which you will generate all possible words 
                in the vocabulary which have 1 missing character
    Output:
        delete_l: a list of all possible strings obtained by deleting 1 character from word
    '''
    
    delete = []
    split_word = []
    
    split_word = [[word[:i], word[i:]] for i in range(len(word))]
    
    delete = [L + R[1:] for L, R in split_word if R]
    
    if verbose: print(f"input word {word}, \nsplit_word = {split_word}, \ndelete_word = {delete}")

    return delete

delete_word = delete_letter(word="cans",
                        verbose=True)

input word cans, 
split_word = [['', 'cans'], ['c', 'ans'], ['ca', 'ns'], ['can', 's']], 
delete_word = ['ans', 'cns', 'cas', 'can']


In [14]:
def switch_letter(word, verbose = False):
    '''
    Input:
        word: input string
     Output:
        switches: a list of all possible strings with one adjacent charater switched
    ''' 
    
    switch = []
    split_word = []
    
    split_word = [[word[:i], word[i:]] for i in range(len(word))]
    
    switch = [L[:-1] + R[0] + L[-1] + R[1:] for L, R in split_word if L and R]
    
    if verbose: print(f"Input word = {word} \nsplit = {split_word} \nswitch = {switch}") 

    return switch

switch_word_l = switch_letter(word="eta",
                         verbose=True)

Input word = eta 
split = [['', 'eta'], ['e', 'ta'], ['et', 'a']] 
switch = ['tea', 'eat']


In [15]:
def replace_letter(word, verbose=False):
    '''
    Input:
        word: the input string/word 
    Output:
        replaces: a list of all possible strings where we replaced one letter from the original word. 
    ''' 
    
    letters = 'abcdefghijklmnopqrstuvwxyz'
    replace = []
    split_word = []
    

    split_word = [(word[:i], word[i:]) for i in range(len(word))]
    
    replace = [L + s + (R[1:] if len(R) > 1 else '') for L, R in split_word if R for s in letters ]
    
    # we need to remove the actual word from the list
    replace = set(replace)
    replace.discard(word)

   
    
    replace = sorted(list(replace)) # turn the set back into a list and sort it, for easier viewing
    
    if verbose: print(f"Input word = {word} \nsplit = {split_word} \nreplace {replace}")   
    
    return replace

replace_l = replace_letter(word='can',
                              verbose=True)

Input word = can 
split = [('', 'can'), ('c', 'an'), ('ca', 'n')] 
replace ['aan', 'ban', 'caa', 'cab', 'cac', 'cad', 'cae', 'caf', 'cag', 'cah', 'cai', 'caj', 'cak', 'cal', 'cam', 'cao', 'cap', 'caq', 'car', 'cas', 'cat', 'cau', 'cav', 'caw', 'cax', 'cay', 'caz', 'cbn', 'ccn', 'cdn', 'cen', 'cfn', 'cgn', 'chn', 'cin', 'cjn', 'ckn', 'cln', 'cmn', 'cnn', 'con', 'cpn', 'cqn', 'crn', 'csn', 'ctn', 'cun', 'cvn', 'cwn', 'cxn', 'cyn', 'czn', 'dan', 'ean', 'fan', 'gan', 'han', 'ian', 'jan', 'kan', 'lan', 'man', 'nan', 'oan', 'pan', 'qan', 'ran', 'san', 'tan', 'uan', 'van', 'wan', 'xan', 'yan', 'zan']


In [16]:

def insert_letter(word, verbose=False):
    '''
    Input:
        word: the input string/word 
    Output:
        inserts: a set of all possible strings with one new letter inserted at every offset
    ''' 
    letters = 'abcdefghijklmnopqrstuvwxyz'
    insert = []
    split_word = []
    

    split_word = [(word[:i], word[i:]) for i in range(len(word) + 1 )]
    insert = [L + s + R for L, R in split_word  for s in letters]



    if verbose: print(f"Input word {word} \nsplit = {split_word} \ninsert = {insert}")
    
    return insert

insert = insert_letter('at', True)
print(f"Number of strings output by insert_letter('at') is {len(insert)}")

Input word at 
split = [('', 'at'), ('a', 't'), ('at', '')] 
insert = ['aat', 'bat', 'cat', 'dat', 'eat', 'fat', 'gat', 'hat', 'iat', 'jat', 'kat', 'lat', 'mat', 'nat', 'oat', 'pat', 'qat', 'rat', 'sat', 'tat', 'uat', 'vat', 'wat', 'xat', 'yat', 'zat', 'aat', 'abt', 'act', 'adt', 'aet', 'aft', 'agt', 'aht', 'ait', 'ajt', 'akt', 'alt', 'amt', 'ant', 'aot', 'apt', 'aqt', 'art', 'ast', 'att', 'aut', 'avt', 'awt', 'axt', 'ayt', 'azt', 'ata', 'atb', 'atc', 'atd', 'ate', 'atf', 'atg', 'ath', 'ati', 'atj', 'atk', 'atl', 'atm', 'atn', 'ato', 'atp', 'atq', 'atr', 'ats', 'att', 'atu', 'atv', 'atw', 'atx', 'aty', 'atz']
Number of strings output by insert_letter('at') is 78


*Now that we have implemented the string manipulations, we'll create two functions that, given a string, will return all the possible single and double edits on that string. These will be `edit_one_letter()` and `edit_two_letters()`.*

In [17]:
def edit_one_letter(word, allow_switches = True): # The 'switch' function is a less common edit function, 
                                                # so  will be selected by an "allow_switches" input argument.
    """
    Input:
        word: the string/word for which we will generate all possible wordsthat are one edit away.
    Output:
        edit_one_set: a set of words with one possible edit. Please return a set. and not a list.
    """
    
    edit_one_set = set()
    all_word, words = [] , []
    
    words.append(insert_letter(word))
    words.append(delete_letter(word))
    words.append(replace_letter(word))
    if allow_switches == True:
        words.append(switch_letter(word))
        
    for i in words:
        for each_word in i:
            if each_word == word: # we exclude the word it self
                continue
            all_word.append(each_word)
    
    edit_one_set = set(all_word)
    
    return edit_one_set

tmp_word = "at"
tmp_edit_one_set = edit_one_letter(tmp_word)
# turn this into a list to sort it, in order to view it
tmp_edit_one = sorted(list(tmp_edit_one_set))

print(f"input word: {tmp_word} \nedit_one \n{tmp_edit_one}\n")
print(f"The type of the returned object should be a set {type(tmp_edit_one_set)}")
print(f"Number of outputs from edit_one_letter('at') is {len(edit_one_letter('at'))}")

input word: at 
edit_one 
['a', 'aa', 'aat', 'ab', 'abt', 'ac', 'act', 'ad', 'adt', 'ae', 'aet', 'af', 'aft', 'ag', 'agt', 'ah', 'aht', 'ai', 'ait', 'aj', 'ajt', 'ak', 'akt', 'al', 'alt', 'am', 'amt', 'an', 'ant', 'ao', 'aot', 'ap', 'apt', 'aq', 'aqt', 'ar', 'art', 'as', 'ast', 'ata', 'atb', 'atc', 'atd', 'ate', 'atf', 'atg', 'ath', 'ati', 'atj', 'atk', 'atl', 'atm', 'atn', 'ato', 'atp', 'atq', 'atr', 'ats', 'att', 'atu', 'atv', 'atw', 'atx', 'aty', 'atz', 'au', 'aut', 'av', 'avt', 'aw', 'awt', 'ax', 'axt', 'ay', 'ayt', 'az', 'azt', 'bat', 'bt', 'cat', 'ct', 'dat', 'dt', 'eat', 'et', 'fat', 'ft', 'gat', 'gt', 'hat', 'ht', 'iat', 'it', 'jat', 'jt', 'kat', 'kt', 'lat', 'lt', 'mat', 'mt', 'nat', 'nt', 'oat', 'ot', 'pat', 'pt', 'qat', 'qt', 'rat', 'rt', 'sat', 'st', 't', 'ta', 'tat', 'tt', 'uat', 'ut', 'vat', 'vt', 'wat', 'wt', 'xat', 'xt', 'yat', 'yt', 'zat', 'zt']

The type of the returned object should be a set <class 'set'>
Number of outputs from edit_one_letter('at') is 129


In [18]:
def edit_two_letters(word, allow_switches = True):
    '''
    Input:
        word: the input string/word 
    Output:
        edit_two_set: a set of strings with all possible two edits
    '''
    
    edit_two_set = set()
    
    
    if allow_switches == True:
        first_edit = edit_one_letter(word)
    
    else:
        first_edit = edit_one_letter(word, allow_switches = False)
    
    
    
    
    first_edit = set(first_edit)
    second_edit = []
    final_edit = []
    
    if allow_switches == True:
        for each_word in first_edit:
            second_edit.append(edit_one_letter(each_word))
        for i in second_edit:
               for each_word in i:
                    final_edit.append(each_word)
        edit_two_set = set(final_edit)
    
    else:
        for each_word in first_edit:
            second_edit.append(edit_one_letter(each_word, allow_switches = False))
        for i in second_edit:
               for each_word in i:
                    final_edit.append(each_word)
        edit_two_set = set(final_edit)
   
    
    return edit_two_set



In [19]:
tmp_edit_two_set = edit_two_letters("a")
tmp_edit_two_l = sorted(list(tmp_edit_two_set))
print(f"Number of strings with edit distance of two: {len(tmp_edit_two_l)}")
print(f"First 10 strings {tmp_edit_two_l[:10]}")
print(f"Last 10 strings {tmp_edit_two_l[-10:]}")
print(f"The data type of the returned object should be a set {type(tmp_edit_two_set)}")
print(f"Number of strings that are 2 edit distances from 'at' is {len(edit_two_letters('at'))}")

Number of strings with edit distance of two: 2654
First 10 strings ['', 'a', 'aa', 'aaa', 'aab', 'aac', 'aad', 'aae', 'aaf', 'aag']
Last 10 strings ['zv', 'zva', 'zw', 'zwa', 'zx', 'zxa', 'zy', 'zya', 'zz', 'zza']
The data type of the returned object should be a set <class 'set'>
Number of strings that are 2 edit distances from 'at' is 7154


*Now we will use the `edit_two_letters` function to get a set of all the possible 2 edits on our word. We will then use those strings to get the most probable word we meant to substitute our word typing suggestion.*

In [20]:
def get_corrections(word, probs, vocab, n=2, verbose = False):
    '''
    Input: 
        word: a user entered string to check for suggestions
        probs: a dictionary that maps each word to its probability in the corpus
        vocab: a set containing all the vocabulary
        n: number of possible word corrections you want returned in the dictionary
    Output: 
        n_best: a list of tuples with the most probable n corrected words and their probabilities.
    '''
    
    suggestions = []
    n_best = []
    
    # look if the word exist in the vocab, if doesn't, the edit_one_letter fuction its used, if any of the letter created 
    # exists in the vocab, take the two letter edit function, if any of this situations are in the vocab, take the input word
    suggestions = list((word in vocab) or (edit_one_letter(word).intersection(vocab)) or (edit_two_letter(word).intersection(vocab)) or word)
    

    n_best= [[word, probs[word]] for word in (suggestions)]  # make a list with the possible word and probability.
    
    
    if verbose: print("entered word = ", word, "\nsuggestions = ", set(suggestions))

    return n_best



In [21]:
my_word = 'dys' 
tmp_corrections = get_corrections(my_word, probs, vocab, 2, verbose=True) # keep verbose=True
for i, word_prob in enumerate(tmp_corrections):
    print(f"word {i}: {word_prob[0]}, probability {word_prob[1]:.6f}")

print(f'The highest score for all the candidates is the word {tmp_corrections[np.argmax(word_prob)][0]}')


entered word =  dys 
suggestions =  {'days', 'dye'}
word 0: days, probability 0.000410
word 1: dye, probability 0.000019
The highest score for all the candidates is the word days


*Now that we have implemented the auto-correct system, how do you evaluate the similarity between two strings? For example: 'waht' and 'what'.*

*Also how do you efficiently find the shortest path to go from the word, 'waht' to the word 'what'?*

*We will implement a dynamic programming system that will tell you the minimum number of edits required to convert a string into another string.*

### Dynamic Programming

*Dynamic Programming breaks a problem down into subproblems which can be combined to form the final solution. Here, given a string source[0..i] and a string target[0..j], we will compute all the combinations of substrings[i, j] and calculate their edit distance. To do this efficiently, we will use a table to maintain the previously computed substrings and use those to calculate larger substrings.*

*You have to create a matrix and update each element in the matrix as follows:*

$$\text{Initialization}$$

\begin{align}
D[0,0] &= 0 \\
D[i,0] &= D[i-1,0] + del\_cost(source[i]) \tag{eq 03}\\
D[0,j] &= D[0,j-1] + ins\_cost(target[j]) \\
\end{align}

*So converting the source word **play** to the target word **stay**, using an insert cost of one, a delete cost of 1, and replace cost of 2 would give you the following table:*
<table style="width:20%">

  <tr>
    <td> <b> </b>  </td>
    <td> <b># </b>  </td>
    <td> <b>s </b>  </td>
    <td> <b>t </b> </td> 
    <td> <b>a </b> </td> 
    <td> <b>y </b> </td> 
  </tr>
   <tr>
    <td> <b>  #  </b></td>
    <td> 0</td> 
    <td> 1</td> 
    <td> 2</td> 
    <td> 3</td> 
    <td> 4</td> 
 
  </tr>
  <tr>
    <td> <b>  p  </b></td>
    <td> 1</td> 
 <td> 2</td> 
    <td> 3</td> 
    <td> 4</td> 
   <td> 5</td>
  </tr>
   
  <tr>
    <td> <b> l </b></td>
    <td>2</td> 
    <td>3</td> 
    <td>4</td> 
    <td>5</td> 
    <td>6</td>
  </tr>

  <tr>
    <td> <b> a </b></td>
    <td>3</td> 
     <td>4</td> 
     <td>5</td> 
     <td>4</td>
     <td>5</td> 
  </tr>
  
   <tr>
    <td> <b> y </b></td>
    <td>4</td> 
      <td>5</td> 
     <td>6</td> 
     <td>5</td>
     <td>4</td> 
  </tr>
  

</table>



*The operations used in this algorithm are 'insert', 'delete', and 'replace'. These correspond to the functions that we defined earlier: insert_letter(), delete_letter() and replace_letter(). switch_letter() is not used here.*

*The diagram below describes how to initialize the table. Each entry in D[i,j] represents the minimum cost of converting string source[0:i] to string target[0:j]. The first column is initialized to represent the cumulative cost of deleting the source characters to convert string "EER" to "". The first row is initialized to represent the cumulative cost of inserting the target characters to convert from "" to "NEAR".*

<div style="width:image width px; font-size:100%; text-align:center;"><img src='EditDistInit4.PNG' alt="alternate text" width="width" height="height" style="width:1000px;height:400px;"/> Figure 1 Initializing Distance Matrix</div>     

*Note that the formula for $D[i,j]$ shown in the image is equivalent to:*

\begin{align}
 \\
D[i,j] =min
\begin{cases}
D[i-1,j] + del\_cost\\
D[i,j-1] + ins\_cost\\
D[i-1,j-1] + \left\{\begin{matrix}
rep\_cost; & if src[i]\neq tar[j]\\
0 ; & if src[i]=tar[j]
\end{matrix}\right.
\end{cases}
\tag{5}
\end{align}

*The variable `sub_cost` (for substitution cost) is the same as `rep_cost`; replacement cost.  We will stick with the term "replace" whenever possible.*

<div style="width:image width px; font-size:100%; text-align:center;"><img src='EditDistExample1.PNG' alt="alternate text" width="width" height="height" style="width:1200px;height:400px;"/> Figure 2 Examples Distance Matrix</div>    

In [22]:
def min_edit_distance(source, target, ins_cost = 1, del_cost = 1, rep_cost = 2):
    '''
    Input: 
        source: a string corresponding to the string you are starting with
        target: a string corresponding to the string you want to end with
        ins_cost: an integer setting the insert cost
        del_cost: an integer setting the delete cost
        rep_cost: an integer setting the replace cost
    Output:
        D: a matrix of len(source)+1 by len(target)+1 containing minimum edit distances
        med: the minimum edit distance (med) required to convert the source string to the target
    '''
    
    m = len(source)
    n = len(target)
    
    # initialize cost matrix with zeros and dimensions (m+1, n+1)
    D = np.zeros((m+1, n+1), dtype = int)
    
    # Fill in column 0, from row 1 to row m, both inclusive
    for row in range(1, m+1): # Replace None with the proper range
        D[row, 0] = D[row -1, 0] + del_cost
     
    # Fill in row 0, for all columns from 1 to n, both inclusive
    for column in range(1, n+1):
        D[0, column] = D[0, column - 1] + ins_cost
        
    # Loop through row 1 to row m, both inclusive
    for row in range(1, m+1):
        
        # Loop through column 1 to column n, both inclusive
        for column in range(1, n+1):
            
            # initialize r_cost to the 'replace' cost that is passed into this function
            r_cost = rep_cost
            
            # check to see if source character at the previous row
            # matches the target haracter at the previous column
            if source[row - 1] == target[column - 1]:
                # Update the replacement cost to 0 if source and
                # target are equal
                r_cost = 0
                
            # Update the cost atow, col based on previous entries in the cost matrix
            # Refer to the equation calculate for D[i,j] (the mininum of the three calculated)
            D[row, column] = min([D[row-1, column] + del_cost, D[row, column-1] + ins_cost, D[row-1, column-1] + r_cost])
            
    # Set the minimum edit distance with the cost found at row m, column n
    
    med = D[m, n]
    return D, med

In [23]:
# testing your implementation 
source =  'play'
target = 'stay'
matrix, min_edits = min_edit_distance(source, target)
print("minimum edits: ",min_edits, "\n")
idx = list('#' + source)
cols = list('#' + target)
df = pd.DataFrame(matrix, index=idx, columns= cols)
print(df)

minimum edits:  4 

   #  s  t  a  y
#  0  1  2  3  4
p  1  2  3  4  5
l  2  3  4  5  6
a  3  4  5  4  5
y  4  5  6  5  4


In [24]:
# testing your implementation 
source =  'eer'
target = 'near'
matrix, min_edits = min_edit_distance(source, target)
print("minimum edits: ",min_edits, "\n")
idx = list(source)
idx.insert(0, '#')
cols = list(target)
cols.insert(0, '#')
df = pd.DataFrame(matrix, index=idx, columns= cols)
print(df)

minimum edits:  3 

   #  n  e  a  r
#  0  1  2  3  4
e  1  2  1  2  3
e  2  3  2  3  4
r  3  4  3  4  3
