<a href="https://colab.research.google.com/github/shivavsrivastava/Algorithms/blob/main/Course3_W4_Knapsack.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Knapsack algorithm



In [2]:
import numpy as np
import random
import urllib3
import math
from collections import defaultdict
import time
import heapq
from heapq import heappush, heappop, heapify
import sys
from itertools import count

## Programming Assignment: Part 1
Warm-up problem, has just 100 items


In this programming problem and the next you'll code up the knapsack algorithm from lecture.

Let's start with a warm-up.  Download the text file below - **knapsack1.txt**

This file describes a knapsack instance, and it has the following format:

[knapsack_size][number_of_items]

[value_1] [weight_1]

[value_2] [weight_2]

...

For example, the third line of the file is "50074 659", indicating that the second item has value 50074 and size 659, respectively.

You can assume that all numbers are positive.  You should assume that item weights and the knapsack capacity are integers.


In [3]:
def knapsack_tabular(values, weights, A):
  r, c = A.shape
  # r is number of items
  # c is the weight capacity
  for i in range(1, r):
    for x in range(c):
      if A[i-1][x] != None:
        prevVal = A[i-1][x]
      # if the current weight exceeds the current capacity of knapsack
      if weights[i-1] > x:
        curVal = 0
      else:
        # look for curr capacity - weight to see if value can be added
        curVal = A[i-1][x-weights[i-1]] + values[i-1]
      A[i][x] = max(curVal, prevVal)
  return A[r-1][c-1]

In [4]:
def reconstruct(values, weights, A):
  ValueSet = []
  ItemsIndex = []
  r, c = A.shape
  # Start from the end of A matrix
  r, c = r-1, c-1
  while r >= 0 and c >= 0:
    if (weights[r-1] > c) or (A[r-1][c] == A[r][c]):
      r -= 1
      continue
    if(A[r][c] - values[r-1]) == (A[r-1][c-weights[r-1]]):
      ValueSet.append(values[r-1])
      ItemsIndex.append(r)
      c -= weights[r-1]
      r -= 1
  print("Optimal Value Set : {}".format(ValueSet))
  print("Items Used to find Optimal Value : {}".format(ItemsIndex))
  #return V

In [5]:
def knapsack(knapsackSize, nItems, values, weights):
  A = np.full((nItems+1, knapsackSize+1), None)
  A[0][:] = 0
  print("Inside knapsack: A = {}".format(A))
  optimalValue = knapsack_tabular(values, weights, A)
  print("After tabular call: A = {}".format(A))
  print("Optimal Value = {}".format(optimalValue))
  reconstruct(values, weights, A)


In [6]:
def knapsack_noreconstruct(knapsackSize, nItems, values, weights):
  A = np.full((nItems+1, knapsackSize+1), None)
  A[0][:] = 0
  print("Inside knapsack: A = {}".format(A))
  optimalValue = knapsack_tabular(values, weights, A)
  print("After tabular call: A = {}".format(A))
  print("Optimal Value = {}".format(optimalValue))


In [7]:
def knapsack_dp(W, n, values, weights):
  A = [0 for i in range(W+1)]
  for i in range(1, n+1):
    for x in range(W, 0, -1):
      if weights[i-1] <= x:
        A[x] = max(A[x], A[x-weights[i-1]] + values[i-1])
  return A[W]

In [8]:

# Can do better than defining a 2-D array
def knapsack_dp_2(W, n, values, weights):
  global A
  A = [[-1 for i in range(W + 1)] for j in range(n + 1)]
  ans = knapsack_recursive(W, n, values, weights)
  print("Optimal Value = {}".format(ans))
  return ans

def knapsack_recursive(W, n, values, weights):
  if n == 0 or W == 0:
    return 0
  if A[n][W] != -1:
    return A[n][W]

  if weights[n-1] <= W:
    A[n][W] = max(values[n-1] + knapsack_recursive(W-weights[n-1], n-1, values, weights),
                  knapsack_recursive(W, n-1, values, weights))
    return A[n][W]
  else:
    A[n][W] = knapsack_recursive(W, n-1, values, weights)
    return A[n][W]



### Small testcases

In [9]:
## Test 1
knapsack(8, 4, [1,2,5,6], [2,3,4,5])

Inside knapsack: A = [[0 0 0 0 0 0 0 0 0]
 [None None None None None None None None None]
 [None None None None None None None None None]
 [None None None None None None None None None]
 [None None None None None None None None None]]
After tabular call: A = [[0 0 0 0 0 0 0 0 0]
 [0 0 1 1 1 1 1 1 1]
 [0 0 1 2 2 3 3 3 3]
 [0 0 1 2 5 5 6 7 7]
 [0 0 1 2 5 6 6 7 8]]
Optimal Value = 8
Optimal Value Set : [6, 2]
Items Used to find Optimal Value : [4, 2]


In [10]:
print(knapsack_dp_2(8, 4, [1,2,5,6], [2,3,4,5]))

Optimal Value = 8
8


In [12]:
## Test 2
knapsack(6, 4, [3,2,4,4], [4,3,2,3])

Inside knapsack: A = [[0 0 0 0 0 0 0]
 [None None None None None None None]
 [None None None None None None None]
 [None None None None None None None]
 [None None None None None None None]]
After tabular call: A = [[0 0 0 0 0 0 0]
 [0 0 0 0 3 3 3]
 [0 0 0 2 3 3 3]
 [0 0 4 4 4 6 7]
 [0 0 4 4 4 8 8]]
Optimal Value = 8
Optimal Value Set : [4, 4]
Items Used to find Optimal Value : [4, 3]


In [11]:
print(knapsack_dp_2(6, 4, [3,2,4,4], [4,3,2,3]))

Optimal Value = 8
8


#### Assignment Part 1

In [14]:
def testcase():
    http = urllib3.PoolManager()
    r1 = http.request('GET', "https://d3c33hcgiwev3.cloudfront.net/_6dfda29c18c77fd14511ba8964c2e265_knapsack1.txt?Expires=1716076800&Signature=A2X5i1mxUpKDa94szC-KfhouM7n8hlJkZDHoq7If3dQ5-mKSAD6FL4BK3b~gdIEqBIPcMRdzmM79qu27XDThDcP8O4xPoIKZ4-T~eFS7z4JE1JzgWPFkOc0SyHiETEDBwJw0RxZCLuufFSa-HyQlINMIG5g1VightfwLQ2GAFX0_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A")
    IntegerMatrixStringJoin = r1.data.decode('utf8').split('\n')
    IntegerMatrixStringJoin.remove('')
    meta = IntegerMatrixStringJoin[0].split(' ')
    W = int(meta[0])
    n = int(meta[1])
    IntegerMatrixStringJoin.remove(IntegerMatrixStringJoin[0])
    v = []
    w = []
    for i in range(n):
        data = IntegerMatrixStringJoin[i].split(' ')
        v.append(int(data[0]))
        w.append(int(data[1]))
    return v, w, W


values, weights, W = testcase()

In [15]:
print(values)
print(W)

[16808, 50074, 8931, 27545, 77924, 64441, 84493, 7988, 82328, 78841, 44304, 17710, 29561, 93100, 51817, 99098, 13513, 23811, 80980, 36580, 11968, 1394, 25486, 25229, 40195, 35002, 16709, 15669, 88125, 9531, 27723, 28550, 97802, 40978, 8229, 60299, 28636, 23866, 39064, 39426, 24116, 75630, 46518, 30106, 19452, 82189, 99506, 6753, 36717, 54439, 51502, 83872, 11138, 53178, 22295, 21610, 59746, 53636, 98143, 27969, 261, 41595, 16396, 19114, 71007, 97943, 42083, 30768, 85696, 73672, 48591, 14739, 31617, 55641, 37336, 97973, 49096, 83455, 12290, 48906, 36124, 45814, 35239, 96221, 12367, 25227, 41364, 7845, 36551, 8624, 97386, 95273, 99248, 13497, 40624, 28145, 35736, 61626, 46043, 54680]
10000


In [16]:
knapsack(W, len(values), values, weights)

Inside knapsack: A = [[0 0 0 ... 0 0 0]
 [None None None ... None None None]
 [None None None ... None None None]
 ...
 [None None None ... None None None]
 [None None None ... None None None]
 [None None None ... None None None]]
After tabular call: A = [[0 0 0 ... 0 0 0]
 [0 0 0 ... 16808 16808 16808]
 [0 0 0 ... 66882 66882 66882]
 ...
 [0 0 0 ... 2414601 2414601 2414601]
 [0 0 0 ... 2456923 2456923 2460644]
 [0 0 0 ... 2493893 2493893 2493893]]
Optimal Value = 2493893
Optimal Value Set : [54680, 46043, 99248, 95273, 97386, 36551, 35239, 48906, 83455, 49096, 97973, 37336, 14739, 85696, 97943, 41595, 98143, 53636, 59746, 53178, 54439, 82189, 46518, 75630, 24116, 28636, 28550, 88125, 35002, 40195, 25229, 80980, 99098, 51817, 93100, 44304, 78841, 82328, 84493, 64441]
Items Used to find Optimal Value : [100, 99, 93, 92, 91, 89, 83, 80, 78, 77, 76, 75, 72, 69, 66, 62, 59, 58, 57, 54, 50, 46, 43, 42, 41, 37, 32, 29, 26, 25, 24, 19, 16, 15, 14, 11, 10, 9, 7, 6]


In [17]:
print(knapsack_dp_2(W, len(values), values, weights))

Optimal Value = 2493893
2493893


Answer:
Optimal Value = 2493893

## Programming Assignment: Part 2
The knapsack size is huge and the number of items too, so creating a big matrix doesn't make sense

This problem also asks you to solve a knapsack instance, but a much bigger one.

Download the text file below **knapsack_big.txt**

This file describes a knapsack instance, and it has the following format:

[knapsack_size][number_of_items]

[value_1] [weight_1]

[value_2] [weight_2]

...

For example, the third line of the file is "50074 834558", indicating that the second item has value 50074 and size 834558, respectively.  As before, you should assume that item weights and the knapsack capacity are integers.

This instance is so big that the straightforward iterative implemetation uses an infeasible amount of time and space.  So you will have to be creative to compute an optimal solution.  One idea is to go back to a recursive implementation, solving subproblems --- and, of course, caching the results to avoid redundant work --- only on an "as needed" basis.  Also, be sure to think about appropriate data structures for storing and looking up solutions to subproblems.

In [12]:
def testcase2():
    http = urllib3.PoolManager()
    r1 = http.request('GET', "https://d3c33hcgiwev3.cloudfront.net/_6dfda29c18c77fd14511ba8964c2e265_knapsack_big.txt?Expires=1716076800&Signature=IbO1S7v1ixY03Zalw~VQPt1L4PiSkxCnE7IhEFzqpJE2wk939jK~JiOyRLnnp~U44u9pVCFswkccxztNKmiUJEozAgR2sXQpBWzGgA8c-31LAMzgRy1dyLtzFprkMjWtLtvdrzR9bss59eK4K3FJovTv6ExO-ck7rRmnFvj8ceI_&Key-Pair-Id=APKAJLTNE6QMUY6HBC5A")
    IntegerMatrixStringJoin = r1.data.decode('utf8').split('\n')
    IntegerMatrixStringJoin.remove('')
    meta = IntegerMatrixStringJoin[0].split(' ')
    W = int(meta[0])
    n = int(meta[1])
    IntegerMatrixStringJoin.remove(IntegerMatrixStringJoin[0])
    v = []
    w = []
    for i in range(n):
        data = IntegerMatrixStringJoin[i].split(' ')
        v.append(int(data[0]))
        w.append(int(data[1]))
    return v, w, W


values, weights, W = testcase2()

In [13]:
sys.setrecursionlimit(10000)

In [14]:
print(knapsack_dp_2(W, len(values), values, weights))

Optimal Value = 4243395
4243395


Ans: 4243395

Note: It took around 4 minutes to solve this problem, with recursion limit set to 10000 and High RAM