## Null Space Research

We build an algorithm for computing the **Levenshtein edit distance** between two strings - an application of dynamic programming. This program can be used to correct spelling errors. 

- For instance, assume a user wanted to write the word "awesome" but instead ended up writing "ossome". In this case we compute the **Levenshtein distance** of this word with various other english words. 

- The english words for which the Levenshtein distance is minimum, would give us the intended correctly spelled word. In our example, we have compared the levenshtein distance of the word **ossome** with the words **awesome** and **amazing**. 

- In the end we find that the Levenshtein distance between **ossome** and **awesome** is indeed much lower as compared to the words **ossome** and **amazing**. 

- This indicates that the first pair of words are quite similar in terms of edit distance and hence we conclude that **awesome** must have been the intended word for the user. With this result, the correct spelling can be suggested to the user. 

In [98]:
import numpy as np
from nltk.corpus import words
from nltk.corpus import brown

In [142]:
def levenshtein(source, target):
    n = len(source)
    m = len(target)
    
    S = {i+1: source[i] for i in range(len(source))}
    T = {i+1: target[i] for i in range(len(target))}
    
    D = np.zeros(shape=(n+1, m+1))
    
    # initialize

    for i in range(1, n+1):
        D[i, 0] = D[i-1, 0] + 1
    for j in range(1, m+1):
        D[0, j] = D[0, j-1] + 1
    
    # compute
    
    for i in range(1, n+1):
        for j in range(1, m+1):
            d_cost = 3
            i_cost = 1
            if S[i] == T[j]:
                sub_cost = 0
            elif S[i] != T[j]:
                sub_cost = 1

            D[i, j] = min(D[i-1, j] + d_cost, D[i-1, j-1] + sub_cost, D[i, j-1] + i_cost)
    
    return int(D[n, m])

In [150]:
levenshtein('ossome', 'awesome')

3

In [151]:
levenshtein('ossome', 'amazing')

7

In [152]:
print("since LEVENSHTEIN('ossome', 'awesome') = 3 < LEVENSHTEIN('ossome', 'amazing') = 7 ...")
print()
print("we conclude that 'awesome' is the intended word when the user spelled 'ossome'")

since LEVENSHTEIN('ossome', 'awesome') = 3 < LEVENSHTEIN('ossome', 'amazing') = 7 ...

we conclude that 'awesome' is the intended word when the user spelled 'ossome'
