## Levenshtein distance

The Levenshtein distance is a string metric for measuring the difference between two sequences. Informally, the Levenshtein distance between two words is the minimum number of single-character edits (i.e. insertions, deletions, or substitutions) required to change one word into the other. It is named after Vladimir Levenshtein, who discovered this equation in 1965.

Levenshtein distance may also be referred to as edit distance, although it may also denote a larger family of distance metrics. It is closely related to pairwise string alignments.

## Recursive

In [4]:
import datetime

def levenshteinDistance_reursion(x, y):
    
    def helper(xl, yl):
        if xl == 0:
            return yl

        if yl == 0:
            return xl
        
        cost = 0

        if x[xl-1] != y[yl-1]:
            cost = 1
            

        return min( min(helper(xl-1, yl) + 1, 
                        helper(xl, yl-1) + 1 ),
                    helper(xl-1, yl-1) + cost)

    return helper(len(x), len(y))

In [5]:
start = datetime.datetime.now()
print(levenshteinDistance_reursion('kittens', 'sittings'))
print('non-DP:', datetime.datetime.now()-start)

3
non-DP: 0:00:00.091681


In [6]:
start = datetime.datetime.now()
print(levenshteinDistance_reursion('elephant', 'element'))
print('non-DP:', datetime.datetime.now()-start)

3
non-DP: 0:00:00.106364


## Dynamic Programming

In [7]:
def levenshteinDistance(x, y):

    def helper(xl, yl, maps):
        if xl == 0:
            return yl

        if yl == 0:
            return xl

        combo = str(xl) + '|' + str(yl)
        
        if combo not in maps:
            cost = 0

            if x[xl-1] != y[yl-1]:
                cost = 1

            maps[combo] = min(min(helper(xl-1, yl, maps) + 1,
                       helper(xl, yl-1, maps) + 1),
                    helper(xl-1, yl-1, maps) + cost)

        #print(maps)
        return maps[combo]

    return helper(len(x), len(y), {})

In [8]:
start = datetime.datetime.now()
print(levenshteinDistance('dog', 'doc'))
print('DP:', datetime.datetime.now()-start)

1
DP: 0:00:00.000466


In [9]:
start = datetime.datetime.now()
print(levenshteinDistance('kittens', 'sittings'))
print('DP:', datetime.datetime.now()-start)

3
DP: 0:00:00.001189


In [10]:
start = datetime.datetime.now()
print(levenshteinDistance('elephant', 'element'))
print('DP:', datetime.datetime.now()-start)

3
DP: 0:00:00.001179
