Exercises: Calculating Minimum Edit Distance with Dynamic Programming
===

---
Levenshtein Distance
---

![](http://image.slidesharecdn.com/knnandtreedistance2-090624110504-phpapp02/95/tree-distance-algorithm-5-728.jpg?cb=1245841553)

Levenshtein distance (LD) is a measure of the similarity between two strings. 

The distance is the number of deletions, insertions, or substitutions required to transform one string into another.

We are going to calculate Levenshtein Distance using Dynamic Programming.

In [1]:
def levenshtein(s1, s2):
    """Takes 2 strings, returns Levenshtein distance.
    
    See https://en.wikipedia.org/wiki/Levenshtein_distance
    """
    
    if len(s1) < len(s2): # If one word is shorter than the other then change the order (bookkeeping to be consistent)
        return levenshtein(s2, s1)
 
    if len(s2) == 0: # Make sure we get a word
        return len(s1) # If not the cost is simply dropping all the letters in one of the words, i.e. the length
 
    previous_row = range(len(s2)+1) # Create an array of length of the second word+1
   
    for i, c1 in enumerate(s1): # Interate through the first word 
        current_row = [i + 1]
        for j, c2 in enumerate(s2): # Interate through the second word
            insertions = previous_row[j + 1] + 1 
            deletions = None # TODO: WRITE CODE TO DETEMINE DELETIONS     
            substitutions = None # TODO: WRITE CODE TO DETEMINE SUBSTITUTION
            current_row.append(min(insertions, deletions, substitutions))
        previous_row = current_rows
 
    return previous_row[-1]

Hints
----

> The sooner you start to code, the longer the program will take.  
> – Roy Carlson, University of Wisconsin

- You will write 2 lines of code that will be ~140 characters. Like most computer science challenges, it will require more thinking that writing.

> If you can’t write it down in English, you can’t code it.
> – Peter Halpern

- Practice the algorithm on your white desk table several times before trying to code.
 

-----

Let's compare your solution to the built-in solution

In [2]:
from nltk.metrics.distance import edit_distance 

In [3]:
edit_distance('foo', 'poo') # Assumes a substitution cost of 1

1

In [5]:
pairs = [('foo', 'poo'),
         ('intention', 'execution')]

for pair in pairs:
    assert levenshtein(*pair) == edit_distance(*pair)

TypeError: unorderable types: NoneType() < int()

TODO: extend the function to include substitution cost argument

In [5]:
def levenshtein(s1, s2, cost_sub):
    """Takes 2 words and a cost of substitution, returns Levenshtein distance.
    """
    pass

In [7]:
assert levenshtein('foo', 'poo', cost_sub=2) == 2
assert levenshtein('intention', 'execution', cost_sub=2) == 8

TypeError: levenshtein() got an unexpected keyword argument 'cost_sub'

----
Challenge Exercises
----

__1)__ Making change

![](http://i1183.photobucket.com/albums/x464/elj4s/Screenshot2012-09-20at42821PM_zpsdb433335.png)

__Problem__: making change problem. The objective is to determine the smallest number of currency of a particular denomination required to make change for a given amount. 

This algorithm could be applied for automated self-checkout at the grocery store.

![](https://si.wsj.net/public/resources/images/MK-CG842_HIGHDE_G_20131006174545.jpg)

__Solution__:
Today we are going to explore dynamic programming (DP). 

For example, if the denomination of the currency are \$1 and \$2 and it was required to make change for \$3 then we would use \$1 + \$2 i.e. 2 pieces of currency. 

However if the amount was \$4 then we would could either use \$1+\$1+\$1+\$1 or \$1+\$1+\$2 or \$2+\$2 and the minimum number of currency would 2 (\$2+\$2). 

The minimum number of coins required to make change for \$P is the number of coins required to make change for the amount \$P-x plus 1 (+1 because we need another coin to get us from \$P-x to P). 

These can be illustrated mathematically as: 

> Let us assume that we have $n$ currecy of distinct denomination. Where the denomination of the currency $i$ is $v_i$. We can sort the currency according to denomination values such that $v_1<v_2<v_3<..<v_n$

> Let us use $C(p)$ to denote the minimum number of currency required to make change for $ \$p$ 

> Using the principles of recursion $C(p)=min_i C(p-v_i)+1$

> For example, assume we want to make 5, and $v_1=1, v_2=2, v_3=3$. <br>
Therefore $C(5) = min(C(5-1)+1, C(5-2)+1, C(5-3)+1)$  $\Longrightarrow min(C(4)+1, C(3)+1, C(2)+1)$

In [7]:
def make_change(currency=[], value=0):
    """Given a list of currency and the target value, return the minimum number of currency and a list of values.
    Dynamic programming (DP) is an efficient solution.    
    """
    pass

In [8]:
assert make_change(currency=[1, 5, 10], value=10) == (1, [10]) # 1 piece of currency, value of 10
assert make_change(currency=[1, 5, 10], value=15) == (2, [10, 5]) # 2 pieces of currency, values of 10 and 5
assert make_change(currency=[1, 5, 10], value=30) == (3, [10, 10, 10]) 
assert make_change(currency=[1, 5, 21, 25], value=63) == (3, [21, 21, 21])
assert make_change(currency=[5, 10], value=3) == 'No solution possible' # Error handling

AssertionError: 

<br>
<br>
---