# [Knapsack Problem](https://en.wikipedia.org/wiki/Knapsack_problem)

## Input
There are $n$ items where each item has a nonnegative value $v_i$ and a weight (nonnegative and integral) $w_i$.<br>
The total capacity (a nonnegative integer) $W$ is given.

## Output
A subset $S \subseteq \{1,2,\dots,n\}$ that maximize $\sum_{i \in S} v_i$ subject to $\sum_{i \in S} w_i < W$.

## Algorithm

Let $S$ be a max-value solution to an instance of knapsack.<br>
**Case 1** Suppose item $n {\not \in} S$, then $S$ must be optimal with the first $n - 1$ items (same capacity $W$).<br>
**Case 2** Suppose item $n \in S$, then $S - \{n\}$ is an optimal solution with respect to the 1st $n - 1$ items
and capacity $W − w_n$.<br>

Let $V_{i,x}$ denote the value of the best solution that uses only the first $i$ items and has total size $\leq x, x \in \{1,2,\dots,W\}$.
The above two cases suggest:
\begin{align}
  V_{i,x} = \max\{ V_{i-1,x} (\text{case 1}), v_i + V_{i-1,x-w_i} (\text{case 2}) \}.
\end{align}
If $w_i > x$, then item $i$ must be excluded and thus $V_{i,x} = V_{i−1,x}$.

```python
def knapsack(W, n, v, w):
    Initialize a 2D array A (n by W). Set A[0][x] = 0 for x = 0, 1, 2, ..., W.
    for i in 1, 2, ..., n:
        for x in 0, 1, ..., W:
            A[i][x] = max(A[i - 1][x], A[i - 1][x - w[i]] + v[i])
    
    vmax = A[n][W] # maximum value
    
    i = n, x = W
    items = []     # optimal solution
    while i > 0:
        if A[i][x] == A[i - 1][x]:
            i -= 1
        else:
            items.append(i)
            x -= w[i]
            i -= 1
    
    return vmax, items
```

# Problem 1

In this programming problem and the next you'll code up the knapsack algorithm from lecture.

Let's start with a warm-up. Download the text file below.

This file describes a knapsack instance, and it has the following format:

[knapsack_size][number_of_items]

[value_1] [weight_1]

[value_2] [weight_2]

...

For example, the third line of the file is "50074 659", indicating that the second item has value 50074 and size 659, respectively.

You can assume that all numbers are positive. You should assume that item weights and the knapsack capacity are integers.

In the box below, type in the value of the optimal solution.

ADVICE: If you're not getting the correct answer, try debugging your algorithm using some small test cases. And then post them to the discussion forum!

In [1]:
def readfile(filename):
    pairs = []
    W, nitems = 0, 0
    for num, line in enumerate(open(filename)):
        line = map(int, line.split())
        if num == 0:
            W, nitems = line
        else:
            vi, wi = line
            pairs.append((vi, wi))
    
    if len(pairs) != nitems:
        raise ValueError("Number of items mismatches the file description.")
    
    return W, nitems, pairs

def knapsack(W, nitems, pairs):
    A = [[0] * (W + 1) for i in range(nitems)]
        
    for i in range(nitems):
        v, w = pairs[i]
        for x in range(W + 1):
            if i == 0:
                if x < w:
                    continue
                else:
                    A[0][x] = v
            else:
                if x < w:
                    A[i][x] = A[i - 1][x]
                else:
                    A[i][x] = max(A[i - 1][x], A[i - 1][x - w] + v)
    
    vmax = A[nitems - 1][W]
    
    i = nitems - 1
    x = W
    items = []
    while i >= 0 and x >= 0:
        if i == 0:
            if A[0][x] != 0:
                items.append(pairs[0])
            else:
                break
        if A[i][x] != A[i - 1][x]:
            items.append(pairs[i])
            x -= pairs[i][1]
        i -= 1
    
    return vmax, items

In [2]:
# timer grabbed from 
# https://stackoverflow.com/questions/7370801/measure-time-elapsed-in-python
from timeit import default_timer as timer
class benchmark(object):
    def __init__(self, msg, fmt="%0.3g"):
        self.msg = msg
        self.fmt = fmt

    def __enter__(self):
        self.start = timer()
        return self

    def __exit__(self, *args):
        t = timer() - self.start
        print(("%s : " + self.fmt + " seconds") % (self.msg, t))
        self.time = t

In [3]:
# test case -- 1993
W, n, pairs = readfile("test1.txt")
with benchmark("Naive bottom-up knapsack") as r:
    vmax, items = knapsack(W, n, pairs)

def check_answer(vmax, items):
    V, W = 0, 0
    for v, w in items:
        V += v
        W += w
    
    if V != vmax:
        print "Knapsack value is inconsistent to output solution."
        print "\"Optimal\" value: {0}".format(vmax)
        print "\"Optimal\" solution (value, weight):"
        print items
        print "From this solution, value: {0}, total weight: {1}".format(V, W)
    else:
        print "Optimal value is consistent to optimal solution."
        print "Optimal value: {0}, total weight: {1}".format(vmax, W)
    
    return

check_answer(vmax, items)
assert vmax == 1993, "Naive bottom-up knapsack does not pass the test case!"
print "Naive bottom-up knapsack passes the test case!"

Naive bottom-up knapsack : 0.0967 seconds
Optimal value is consistent to optimal solution.
Optimal value: 1993, total weight: 100
Naive bottom-up knapsack passes the test case!


In [4]:
W, n, pairs = readfile("knapsack1.txt")
with benchmark("Naive bottom-up knapsack on knapsack1.txt") as r:
    vmax, items = knapsack(W, n, pairs)
# print vmax

Naive bottom-up knapsack on knapsack1.txt : 0.997 seconds


# Problem 2

This problem also asks you to solve a knapsack instance, but a much bigger one.

Download the text file below.

This file describes a knapsack instance, and it has the following format:

[knapsack_size][number_of_items]

[value_1] [weight_1]

[value_2] [weight_2]

...

For example, the third line of the file is "50074 834558", indicating that the second item has value 50074 and size 834558, respectively. As before, you should assume that item weights and the knapsack capacity are integers.

This instance is so big that the straightforward iterative implemetation uses an infeasible amount of time and space. So you will have to be creative to compute an optimal solution. One idea is to go back to a recursive implementation, solving subproblems --- and, of course, caching the results to avoid redundant work --- only on an "as needed" basis. Also, be sure to think about appropriate data structures for storing and looking up solutions to subproblems.

In the box below, type in the value of the optimal solution.

ADVICE: If you're not getting the correct answer, try debugging your algorithm using some small test cases. And then post them to the discussion forum!

## Top-Down Dynamic Programming (Memoization)

Ask the recursive program to save each value that it computes, and to check the saved values to avoid recomputing any of them. [**Algorithms in C**, p.210]

Example of computing Fibonacci numbers:
```python
def F_recursive(i):
    if i < 1:
        return 0
    if i == 1:
        return 1
    return F_recursive(i - 1) + F_recursive(i - 2)

def F_recursive_DP(i):
    t = None
    if knownF[i] is not None:
        return knownF[i]
    if i < 1:
        t = 0
    if i == 1:
        t = 1
    if i > 1:
        t = F_recursive_DP(i - 1) + F_recursive_DP(i - 2)
    knownF[i] = t
    return t
```

In [5]:
max_known = {}

# Adapted from http://www.geeksforgeeks.org/knapsack-problem/
def knapsack_recursive(W, n, items):
    try:
        return max_known[(W, n)]
    except:
        if W == 0:
            max_known[(0, n)] = 0
            return 0
        if n == 0:
            max_known[(W, 0)] = 0
            return 0
        
        v, w = items[n - 1]
        if w > W:
            t = knapsack_recursive(W, n - 1, items)
            max_known[(W, n)] = t
            return t
        else:
            t1 = knapsack_recursive(W, n - 1, items)
            t2 = v + knapsack_recursive(W - w, n - 1, items)
            t = max(t1, t2)
            max_known[(W, n)] = t
            return t

In [6]:
# test case -- 1993
import sys
sys.setrecursionlimit(5000)

max_known = {}
W, n, items = readfile("test1.txt")

with benchmark("Top-down knapsack on test1.txt") as r:
    vmax = knapsack_recursive(W, n, items)

assert vmax == 1993, "Top-down knapsack does not pass the test case!"
print "Top-down knapsack passes the test case!"

Top-down knapsack on test1.txt : 0.383 seconds
Top-down knapsack passes the test case!


In [7]:
max_known = {}
W, n, items = readfile("knapsack_big.txt")
with benchmark("Top-down knapsack on knapsack_big.txt") as r:
    vmax = knapsack_recursive(W, n, items)
# print vmax

Top-down knapsack on knapsack_big.txt : 94.6 seconds


In [8]:
import copy

DEBUG = 2
def knapsack_opt(W, n, pairs):
    previous_column = [0] * (W + 1)
    current_column = [0] * (W + 1)
    
    for i in range(n):
        v, w = pairs[i]
        current_column = copy.deepcopy(previous_column)
        for x in range(W + 1):
            t = v + previous_column[x - w]
            if x - w >= 0 and t > previous_column[x]:
                current_column[x] = t
        previous_column = current_column
        
        if DEBUG > 1:
            print previous_column
    
    vmax = previous_column[W]
    
    return vmax

DEBUG = 2
W, n, items = readfile("test.txt")
with benchmark("Memory optimized bottom-up knapsack on test.txt") as r:
    vmax = knapsack_opt(W, n, items)
print vmax

DEBUG = 0
W, n, items = readfile("test1.txt")
with benchmark("Memory optimized bottom-up knapsack on test1.txt") as r:
    vmax = knapsack_opt(W, n, items)
print vmax

[0, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
[0, 5, 5, 5, 5, 9, 9, 9, 9, 9, 9]
[0, 5, 5, 5, 5, 9, 9, 9, 9, 13, 13]
[0, 5, 5, 10, 10, 10, 10, 14, 14, 14, 14]
[0, 5, 5, 10, 10, 13, 13, 14, 14, 17, 17]
[0, 5, 5, 10, 10, 13, 13, 15, 15, 18, 18]
[0, 5, 5, 10, 10, 13, 13, 15, 15, 18, 18]
[0, 5, 5, 10, 10, 13, 13, 15, 15, 18, 18]
[0, 5, 5, 10, 10, 13, 14, 15, 17, 18, 19]
[0, 5, 5, 10, 10, 13, 14, 15, 17, 18, 19]
Memory optimized bottom-up knapsack on test.txt : 0.00123 seconds
19
Memory optimized bottom-up knapsack on test1.txt : 0.29 seconds
1993


In [9]:
import numpy as np

def knapsack_vertorized(W, n, pairs):
    column = np.zeros(W + 1, dtype=np.int)
    
    for i in range(n):
        v, w = pairs[i]
        
        shifted = v + column[:-w]
        column[w:] = np.where(shifted > column[w:], shifted, column[w:])
        
        if DEBUG > 1:
            print column
    
    vmax = column[W]
    
    return vmax

DEBUG = 2
W, n, items = readfile("test.txt")
with benchmark("Vectorized bottom-up knapsack on test.txt") as r:
    vmax = knapsack_vertorized(W, n, items)
print vmax

DEBUG = 0
W, n, items = readfile("test1.txt")
with benchmark("Vectorized bottom-up knapsack on test1.txt") as r:
    vmax = knapsack_vertorized(W, n, items)
print vmax

[0 5 5 5 5 5 5 5 5 5 5]
[0 5 5 5 5 9 9 9 9 9 9]
[ 0  5  5  5  5  9  9  9  9 13 13]
[ 0  5  5 10 10 10 10 14 14 14 14]
[ 0  5  5 10 10 13 13 14 14 17 17]
[ 0  5  5 10 10 13 13 15 15 18 18]
[ 0  5  5 10 10 13 13 15 15 18 18]
[ 0  5  5 10 10 13 13 15 15 18 18]
[ 0  5  5 10 10 13 14 15 17 18 19]
[ 0  5  5 10 10 13 14 15 17 18 19]
Vectorized bottom-up knapsack on test.txt : 0.00406 seconds
19
Vectorized bottom-up knapsack on test1.txt : 0.00468 seconds
1993


In [10]:
DEBUG = 0
W, n, items = readfile("knapsack_big.txt")
with benchmark("Vectorized bottom-up knapsack on knapsack_big.txt") as r:
    vmax = knapsack_vertorized(W, n, items)
# print vmax

Vectorized bottom-up knapsack on knapsack_big.txt : 28.7 seconds
