# Chapter 8: Dynamic Programming (Completed 6/26: 23%)

## Edit Distance

### 8.1 [3]

Typists often make transposition errors exchanging neighboring characters, such as typing “setve” when you mean “steve.” This requires two substitutions to fix under the conventional definition of edit distance. Incorporate a swap operation into our edit distance function, so that such neighboring transposition errors can be fixed at the cost of one operation.

*Solution:*

The following is nearly verbatim from the text (including errata), except for the addition of option 4, which is the new swap operation.

Let $D[i,j]$ be the minimum number of differences between the segment of $P$ ending at position $i$ and the segment of $T$ ending at position $j$. $D[i,j]$ is the *minimum* of the **four** possible ways to extend smaller strings:

1. If ($P_i = T_j$), then $D[i-1, j-1]$, else $D[i-1, j-1] + 1$. This means we either match or substitute the $i$th and $j$th characters, depending upon whether the tail characters are the same.
2. $D[i − 1, j] + 1$. This means that there is an extra character in the text to account for, so we do not advance the pattern pointer and pay the cost of an insertion.
3. $D[i, j − 1] + 1$. This means that there is an extra character in the pattern to remove, so we do not advance the text pointer and pay the cost on a deletion.
4. If ($P_{i} = T_{j-1}$ and $P_{i-1} = T_{j}$), then $D[i-1, j-1]$. This means that characters in P at positions $i-1$ and $i$ are the same as those in T at positions $j-1$ and $j$, only swapped. We must have already payed a cost of 1 for the mismatch between $P_{i-1}$ and $T_{j-1}$ so we advance both patterns and do not pay a second time, for a total cost of 1 for a swap operation.

### 8.2 [4]

Suppose you are given three strings of characters: $X$, $Y$, and $Z$, where $|X| = n$, $|Y| = m$, and $|Z| = n + m$. $Z$ is said to be a *shuffle* of $X$ and $Y$ iff $Z$ can be formed by interleaving the characters from $X$ and $Y$ in a way that maintains the left-to-right ordering of the characters from each string.

**(a):** Show that "*cchocohilaptes*" is a shuffle of "*chocolate*" and "*chips*", but "*chocochilatspe*"
is not.  
**(b):** Give an efficient dynamic-programming algorithm that determines whether $Z$ is a shuffle of $X$ and $Y$. Hint: the values of the dynamic programming matrix you construct should be Boolean, not numeric.

*Solution:*

**(a):**

"$\text{cchocohilaptes}$" can be written as "$\text{(c)choco(h)(i)la(p)te(s)}$" where letters surrounded by parentheses indicate they are from the word "chips", and letters without parentheses are from "chocolate". The letters in each group are in the same order as in their parent word. Therefore "cchocohilaptes" is indeed a shuffle of "chocolate" and "chips".

However, within "$\text{chocochilatspe}$", there is an "s" followed by a "p". Neither of these letters appear in "chocolate", and so they must be from "chips". But in "chips" their order is "p" then "s", while in "chocochilatspe" it is "s" then "p". Proper shuffles must maintain the letter orders of the parent strings. Therefore "chocochilatspe" cannot be a shuffle of "chocolate" and "chips".

**(b):**

We wish to answer the question "can string $Z$ be expressed as a shuffle of strings $X$ and $Y$?". This is a boolean question. We would like to express this question in terms of smaller substrings, enabling us to construct a dynamic programming matrix of answers to identical questions on these smaller substrings.

Two natural possibilities are to parametrize the matrix with an index $k$ along string $Z$, or with two indices $i$ and $j$, the first along $X$ and the second along $Y$. Note that we must have $i + j = k$, meaning that if the first $i$ characters of $X$ and the first $j$ characters of $Y$ can be shuffled to match the beginning of $Z$, this beginning substring of $Z$ must have $i + j = k$ characters. This suggests using a 2D matrix for indices $i$ and $j$, instead of a 1D array for index $k$.

Let $B[i,j]$ indicate whether the first $i$ characters of $X$ and the first $j$ characters of $Y$ can be shuffled to match the first $i + j = k$ characters of $Z$.

Two subtle, but important things to note: our definition of $B$ assumes that the beginning of the strings are padded with a space, otherwise $i$ would mean the first $i+1$ characters of $X$. For example, $X[1]$ would refer to the *second* character of $X$ rather than the *first*. Second, this padding with a space is necessary for another reason. If a word as $n$ letters, we need to be able to express including 0 of these letters up to including all of them. Therefore we need $n+1$ different values for our index.

$B$ can then be written recursively as:

$$ B[i,j] = \left(B[i-1, j] \text{ AND } Z[i+j]=X[i]\right)\text{ OR } \left( B[i, j-1] \text{ AND } Z[i+j]=Y[j] \right)$$

meaning the current character of $Z$, $Z[i+j]$, must match either $X[i]$ or $Y[j]$, and for each, the previous substring of $Z$ up to character $i + j -1$ must be a shuffle with that parent string's index decremented.

Basis cases:  
$B[0, j] = B[0, j-1] \text{ AND } Z[j]=Y[j]$  
$B[i, 0] = B[i-1, 0] \text{ AND } Z[i]=X[i]$  
$B[0, 0] = \text{True}$

Note that matrix will be of dimension $(|X|+1)\times(|Y|+1) = (n+1) \times (m+1$). In the program below, however, $n$ and $m$ are defined as the lengths of strings $X$ and $Y$ *after* the strings have been padded with a space.

The algorithm is $O(nm)$ since there are two nested loops of size $O(n)$ and $O(m)$, with constant time operations within the inner loop.

In [125]:
def is_shuffle(X, Y, Z):
    X = ' ' + X
    Y = ' ' + Y
    Z = ' ' + Z
    
    n = len(X)
    m = len(Y)
    
    if (n + m - 1) != len(Z):
        return False
    
    Matrix = [[False for x in range(m)] for y in range(n)]
    
    for i in range(n):
        for j in range(m):
            if (i == 0) and (j == 0):
                Matrix[i][j] = True
            if (i == 0) and (j != 0):
                if Matrix[0][j-1] and (Y[j] == Z[j]):
                    Matrix[i][j] = True
            if (i != 0) and (j == 0):
                if Matrix[i-1][0] and (X[i] == Z[i]):
                    Matrix[i][j] = True
            if (i != 0) and (j != 0):
                if (Matrix[i][j-1] and (Y[j] == Z[i+j])) or (Matrix[i-1][j] and X[i] == Z[i+j]):
                    Matrix[i][j] = True

    return Matrix[n-1][m-1]

X = 'chocolate'
Y = 'chips'
Z_True = 'cchocohilaptes' # True
Z_False = 'chocochilatspe' # False
    
print(is_shuffle(X, Y, Z_True))
print(is_shuffle(X, Y, Z_False))

True
False


***

## Greedy Algorithms

### 8.5 [4]

Let $P_1, P_2, . . . , P_n$ be $n$ programs to be stored on a disk with capacity $D$ megabytes. Program $P_i$ requires $s_i$ megabytes of storage. We cannot store them all because $D < \sum_{i=1}^{n} s_i$.

**(a):** Does a greedy algorithm that selects programs in order of nondecreasing $s_i$ maximize the number of programs held on the disk? Prove or give a counterexample.  
**(b):** Does a greedy algorithm that selects programs in order of nonincreasing order $s_i$ use as much of the capacity of the disk as possible? Prove or give a counterexample.

*Solution:*

**(a):**

The statement is true: a greedy algorithm would maximize the number of programs held on the disk. We'll prove this by contradiction.

First, let:  
$P = $set of $n$ programs  
$P^{(S)} = $ our "solution" set, a subset of $P$  
$P^{(U)} = $ the set of programs **not** in the solution; the complement of $P^{(S)}$; not empty since "we cannot store them all"  
$P_{largest}^{(S)} = $ the largest program in the solution  
$P_{smallest}^{(U)} = $ the smallest program **not** in the solution  

A solution $P^{(S)}$ is *greedy* if $P_i < P_{largest}^{(S)}$ implies $P_i \in P^{(S)}$, that is, if all the programs smaller than $P_{largest}^{(S)}$ are also in the solution $P^{(S)}$. This is equal to the condition: $P^{(S)}$ is *greedy* if $P_{smallest}^{(U)} \geq P_{largest}^{(S)}$, that is, if the smallest program **not** in the solution is larger than or equal to the largest program **in** the solution.

To prove that the greedy solution contains the largest possible number of programs, we'll assume otherwise and produce a contradiction. Suppose that $P^{(S)}$ contains the largest possible number of programs while having a total size less than $D$, and that it is not greedy. Further, assume that a greedy solution would contain a smaller number of programs, and therefore would not be optimal.


Since the solution set $P^{(S)}$ is not the greedy solution, there must be programs **not** in the solution that are smaller than $P_{largest}^{(S)}$. Specifically, we must have $P_{smallest}^{(U)} < P_{largest}^{(S)}$. We now propose the following algorithm for converting the given non-greedy solution into the greedy solution: we keep placing the smallest not-included program $P_{smallest}^{(U)}$ into the solution set if there is room, and if there isn't, we swap it with the largest program currently in the solution $P_{largest}^{(S)}$. Repeat this until the greedy solution is reached. In pseudocode:

$\hspace{2em} \text{while } P_{smallest}^{(U)} < P_{largest}^{(S)}:$  
$\hspace{4em} \text{if } \text{size}(P^{(S)}) + P_{smallest}^{(U)} \leq D:$  
$\hspace{6em} \text{add } P_{smallest}^{(U)} \text{ to } P^{(S)}$  
$\hspace{4em} \text{else}:$  
$\hspace{6em} \text{swap } P_{smallest}^{(U)} \text{ with } P_{largest}^{(S)}$  
$\hspace{2em} \text{If there is room, extend } P^{(S)} \text{ with the greedy algorithm}$

We wish to show 2 things:

**(1)** *that this process can always be done*: as long as the solution isn't greedy, and therefore $P_{smallest}^{(U)} < P_{largest}^{(S)}$, swapping them will only decrease the size of the solution set, and therefore will always be allowed; and

**(2)** *that the resulting greedy solution contains the same number of programs or greater than the original solution*: the swap operations maintain the same number of programs in the solution, but adding $P_l^{(U)}$ into the solution adds an additional program; thereore the size of the solution can only increase.

Therefore the greedy solution will be at least as good (i. e., contain at least as many programs) as the original non-greedy solution. And if it contains the same number of programs, then it freed up more space on the disk by swapping larger programs for smaller ones.

**(b):**

No. Suppose we have two programs of size 2 and 3, and that our disk size is $D = 3$. The greedy algorithm will produce a solution set $\{2|{-1}\}$, containing one program and leaving 1 unit of free space. However, the solution $\{3 | 0\}$ would also contain only one program, but would leave no free space on the disk.

### 8.6 [5]

Coins in the United States are minted with denominations of 1, 5, 10, 25, and 50 cents. Now consider a country whose coins are minted with denominations of $\{d_1, . . . , d_k\}$ units. We seek an algorithm to make change of $n$ units using the minimum number of coins for this country.

**(a):** The greedy algorithm repeatedly selects the biggest coin no bigger than the amount to be changed and repeats until it is zero. Show that the greedy algorithm does not always use the minimum number of coins in a country whose denominations are $\{1, 6, 10\}$.  
**(b):** Give an efficient algorithm that correctly determines the minimum number of coins needed to make change of $n$ units using denominations $\{d_1, . . . , d_k\}$. Analyze its running time.

*Solution:*

**(a):**

Suppose we want change of 13 cents. The greedy algorithm will produce coins $\{10, 1, 1, 1\}$, while a smaller set of coins is $\{6, 6, 1\}$. Therefore the greedy algorithm did not produce a minimal set of coins.

**(b):**

Let $C[n] = $ the smallest number of coins needed to make change of $n$ cents.

This can be written recursively as:

$$C[n] = \min_{d \in D,\, d \leq n} C[n-d] + 1$$

where $D = \{d_1, . . . , d_k\}$ are the denominations of the available coins. For example, change of 57 cents using US coins is 
$$C[57] = \min_{d \in D,\, D \leq n} C[d] + 1 = \min \left\{ C[56],\, C[52],\, C[47],\, C[32],\, C[50] \right\} + 1$$

Intuitively, 57 cents can be made with one additional coin from any set of coins with value $57 - d$, for any $d \in D$.

We can implement this recursive relation using dynamic progamming, where to find the number of coins needed to make change of $n$ cents, we make a 1D array of length $n+1$ and calculate the value of $C[i]$ for all $i$ up to $n$.

It is assumed that a 1-cent coin is included so that coins can produce any amount of cents. Due to this, every value between $1$ and $n$ will be calculated at some point. If this wasn't true, then it may have been more space efficient to implement this with explicit recursion with caching using a dictionary, that way only needed values are computed.

The algorithm fills up an array of size $n+1$, and for each cell computes the minimum of $\leq k$ values, where $k$ is the number of different types of coins, each of which is a constant time look up. Therefore the algorithm is $O(nk)$.

In [121]:
def subtract_coins(n, C, D):
    A = []
    for coin in D:
        if n - coin >= 0:
            A.append(C[n - coin])
    return A

def changer(n, D):
    C = [0]
    for i in range(1, n + 1):
        C.append(min(subtract_coins(i, C, D)) + 1)
    return C[n]

D = [1, 5, 10, 25, 50]
%time changer(10031, D)

CPU times: user 21 ms, sys: 178 µs, total: 21.2 ms
Wall time: 21.1 ms


203

**It is possible** to improve the efficiency on repeated function calls by using a **closure**. This allows the cached values to persist beyond a single call. However, the set of coin denominations $D$ must be constant. Therefore the `changer_with_closure(D)` function implemented below takes a set of coin denominations and produces a `changer(n)` function that will "change" different coin values with that fixed currency, with cached values persisting.

Repeated calls are shown to be much faster.

In [122]:
# Implementation using a closure
def changer_with_closure(D):
    C = [0]
    
    def dynamic_changer(n):   
        if len(C) - 1 < n:
            for i in range(len(C), n + 1):
                C.append(min(subtract_coins(i)) + 1)
        return C[n]
    
    def subtract_coins(n):
        A = []
        for coin in D:
            if n - coin >= 0:
                A.append(C[n - coin])
        return A
    
    return dynamic_changer
        
D = [1, 5, 10, 25, 50]
changer = changer_with_closure(D)
%time changer(10031)

CPU times: user 21.4 ms, sys: 107 µs, total: 21.6 ms
Wall time: 21.6 ms


203

In [123]:
# Repeat SAME computation on $100.31 dollars
%time changer(10031)

CPU times: user 5 µs, sys: 0 ns, total: 5 µs
Wall time: 6.91 µs


203

In [124]:
for i in range(0, 26):
    print('Change for %d cents with %d coins' % (i, changer(i)))

Change for 0 cents with 0 coins
Change for 1 cents with 1 coins
Change for 2 cents with 2 coins
Change for 3 cents with 3 coins
Change for 4 cents with 4 coins
Change for 5 cents with 1 coins
Change for 6 cents with 2 coins
Change for 7 cents with 3 coins
Change for 8 cents with 4 coins
Change for 9 cents with 5 coins
Change for 10 cents with 1 coins
Change for 11 cents with 2 coins
Change for 12 cents with 3 coins
Change for 13 cents with 4 coins
Change for 14 cents with 5 coins
Change for 15 cents with 2 coins
Change for 16 cents with 3 coins
Change for 17 cents with 4 coins
Change for 18 cents with 5 coins
Change for 19 cents with 6 coins
Change for 20 cents with 2 coins
Change for 21 cents with 3 coins
Change for 22 cents with 4 coins
Change for 23 cents with 5 coins
Change for 24 cents with 6 coins
Change for 25 cents with 1 coins


***

## Number Problems

***

## Graph Problems

***

## Design Problems

### 8.18 [4] Unfinished

Consider the problem of storing $n$ books on shelves in a library. The order of the books is fixed by the cataloging system and so cannot be rearranged. Therefore, we can speak of a book $b_i$, where $1 \leq i \leq n$, that has a thickness $t_i$ and height $h_i$. The length of each bookshelf at this library is $L$.

Suppose all the books have the same height $h$ (i.e. , $h = h_i = h_j$ for all $i$, $j$) and the shelves are all separated by a distance of greater than $h$, so any book fits on any shelf. The greedy algorithm would fill the first shelf with as many books as we can until we get the smallest $i$ such that $b_i$ does not fit, and then repeat with subsequent shelves. Show that the greedy algorithm always finds the optimal shelf placement, and analyze its time complexity.

*Solution:*

First note that the fact that you cannot reorder the books is crucial for the correctness of the greedy algorithm. For example, suppose 7 books have thicknesses $\{20, 20, 20, 50, 20, 20, 50\}$ and shelves have length $100$. If we had to keep them in order, the greedy algorithm would place them on shelves as $\text{Shelf 1} = \{20, 20, 20 | -40\}$, $\text{Shelf 2} = \{50, 20, 20 | -10\}$, and $\text{Shelf 3} =  \{50 | -50\}$, using 3 shelves with 100 units of free space. If they could be reordered, the optimal packing would be $\text{Shelf 1} = \{50, 50 |0\}$ and $\text{Shelf 2} = \{20, 20, 20, 20, 20 |  0\}$, using 2 shelves and leaving no free space.

Second, note that Skiena does not define what he means by the "optimal shelf placement". This could mean "using the smallest number of shelves" or "leaving the smallest amount of space free". However, these two definitions are equivalent; an arrangement of the same books on the same number of shelves will take up the same amount of space, and therefore will leave the same amount of space free. For two arrangements to leave different amounts of space free, they must use different amounts of shelves.

We will prove the optimality of the greedy algorithm by contradition. Suppose that a given arrangement $P$ uses $n$ shelves, but the greedy algorithm would use more than $n$ shelves.

For simplicity, assume that there is no empty shelf space between books, meaning that on every shelf the books are pushed all the way to the left, and any free space occurs to their right.

Without loss of generality, we will speak of the books being placed in their positions in catalog order.

Because the current arrangement $P$ is not greedy, there must have occured a time when a book was placed on the next shelf, even though there was still room for it on the current shelf. Therefore in the current arrangement, there must be a book $i$ at the beginning of a shelf that can be moved to the end of the previous shelf. Let us make that move. Doing so will reduce the amount of free space on the earlier shelf and will increase the amount of free space on the later shelf. If the moved book was the only book on its shelf, then we have just freed up a shelf. Any further moves will ignore that shelf. If we repeat this move at many times as possible, we will have produced the greedy arrangement. And because each move could only decrease the amount of shelves moved, and never can increase, we can be sure that the greedy solution contains the same number of shelves or less than the original, "optimal" arrangement. But this contradicts our assumption that the greedy solution was not optimal.

Therefore, the greedy arrangement is optimal.

***

## Interview Problems

### 8.24 [5]

Given a set of coin denominations, find the minimum number of coins to make a certain amount of change.

*Solution:*

Let $D = \{d_1, ..., d_k\}$ be the denominations of $k$ different types of coins in a currency, and let $C[n]$ be the minimal number of coins needed to equal $n$ cents. If we knew the value of $C[i]$ for all $i$ < $n$, we could look up the number of coins needed to produce values of $n-d$ for all $d \in D$. From each of these sets of coins, we would only need to add a single coin of denomination $d$ to equal $n$ cents. Therefore $C[n]$ is equal to the number of coins in the smallest of these sets, plus one. Written out, we have the recursion relation:

$$C[n] = \min_{d \in D, \, d \leq n} C[n-d] + 1$$

where the additional condition $d \leq n$ is needed to ensure that $n-d \geq 0$; otherwise we would be trying to produce a negative amount of cents, which... doesn't make sense.

This recursive relation can be used to implement a "dynamic" program that calculates the value of $C[i]$ for all $i$ up to $n$, caching the results. Each computation will require a comparison of up to $k$ different values. Therefore the algorithm will be $O(nk)$.

As done above for problem 8.6, this could be implemented using a closure so that the cached computations will persist across function calls. But in the spirit of an "interview" problem, I implemented this quickly and focused on correctness, not additional functionality.

In [131]:
def changer(n, D):
    C = []
    C.append(0)
    for i in range(1, n + 1):
        C.append(min(smaller_coins(i, C, D)) + 1)
    return C[n]

def smaller_coins(i, C, D):
    A = []
    for d in D:
        value = i - d
        if value >= 0:
            A.append(C[value])
    return A

D = [1, 5, 10, 25]
for i in range(26):
    print('%d cents with %d coins' % (i, changer(i, D)))

0 cents with 0 coins
1 cents with 1 coins
2 cents with 2 coins
3 cents with 3 coins
4 cents with 4 coins
5 cents with 1 coins
6 cents with 2 coins
7 cents with 3 coins
8 cents with 4 coins
9 cents with 5 coins
10 cents with 1 coins
11 cents with 2 coins
12 cents with 3 coins
13 cents with 4 coins
14 cents with 5 coins
15 cents with 2 coins
16 cents with 3 coins
17 cents with 4 coins
18 cents with 5 coins
19 cents with 6 coins
20 cents with 2 coins
21 cents with 3 coins
22 cents with 4 coins
23 cents with 5 coins
24 cents with 6 coins
25 cents with 1 coins


### 8.25 [5]

You are given an array of $n$ numbers, each of which may be positive, negative, or zero. Give an efficient algorithm to identify the index positions $i$ and $j$ that maximize the sum of the ith through jth numbers.

*Solution:*

Let $A = \{a_1, ..., a_n\}$ be the array of numbers.

A brute force method to find the largest contiguous sum would try all possible combinations of $i$ and $j$, calculating the sum each time. This would be $O(n^3)$.

If we pre-computed the partial sums $S[i] = \sum_{x=0}^{i} a_x$ and define $S[-1] = 0$, then the sum of the numbers between $i$ and $j$ would be $S[j] - S[i-1]$. This is a constant time operation, reducing our complexity to $O(n^2)$.

Another option is to sort the list in $O(n \log n)$ time, then perform a binary search for $0$ in $O(\log n)$ time, and take all the numbers to the right of $0$. This would be an $O(n \log n)$ time algorithm.

This problem has an inherent left-rightness, and so maybe a dynamic programming solution might work. Since there are 2 indices, we will try to find a solution that constructs either a 1D or 2D matrix.

Suppose we isolate a left portion of the set $A$ between positions $0$ and some $j$ < $n$. Furthermore, suppose we knew the maximum possible sum of a contiguous set of numbers *ending on position $j$*. We can denote this as $C[j]$. And now we want to know what is the largest possible sum of contiguous numbers *ending on position $j+1$*, $C[j+1]$. Well, *A[j+1]* must be included in the sum. Is it possible to produce a sum larger than $A[j+1]$ by including previous numbers? If we do, the maximum possible contribution of this set of "previous numbers" is exactly $C[j]$. Therefore these numbers should be included only if $C[j] > 0$.

Writing out this recursion relation, we have:
$$ C[j] = A[j] + \max\left(0, C[j-1]\right)$$

This is a constant time operation. Therefore computing $C[j]$ for all $j$ from $0$ to $n-1$ will be $O(n)$. Afterwards, we can do a single linear scan to find the maximum value in $C$, which will be the largest possible contiguous sum. Then we can do one more scan backwards from the location of the maximum to find the matching $i$ index that produces that sum.

To increase efficiency, the first of those two final linear scans can be avoided by tracking the largest value seen so far during the computation of the array $C$. This costs a small, constant amount of space, and saves us a search over all $n$ items in the array. 

The second final linear scan to locate the matching index $i$ can also be avoided if, during the computation of $C$, we record in a second array $B$ the locations of the corresponding $i$ indices. This will double the amount of linear space used, but save us a linear amount of time, depending on how many values are in the maximal contiguous set, which could be just $1$ or could be all $n$.

These two improvements reduce the amount of linear scans over the array from 3 to 1.

This algorithm will be $O(n)$.

In [164]:
def largest_contiguous_sequence(A):
    n = len(A)
    B = [0]
    C = [A[0]]
    max_j = 0
    
    for j in range(1, n):
        i = j
        value = A[j]
        if C[-1] > 0:
            value += C[-1]
            i = B[-1]
        B.append(i)
        C.append(value)
        if value > C[max_j]:
            max_j = j
            
    print('Largest sum is %d between i = %d and j = %d' % (C[max_j], B[max_j], max_j))

largest_contiguous_sequence([10, -1, -1, -1, -1, 5, -1])

Largest sum is 11 between i = 0 and j = 5
