# Greedy Heuristics: Knapsack Problem

---

**Greedy heuristics** are very commonly used approaches for solving **combinatorial optimisation** problems. In this tutorial, we will introduce how to use greedy heuristics to solve the **knapsack problem**, which is a well-known complex **combinatorial optimisation** problem.

## Captain Jack Sparrow on Ghost Island

---

One day, Captain [Jack Sparrow](https://en.wikipedia.org/wiki/Jack_Sparrow) arrived a ghost island with a lot of treasure. However, he was found by a group of ghosts, and chased by them. He had to pick some of the following items to his ship and flee. 

| Item | Value | Weight |
| ---- | ----- | ------ |
| Crown | \$10K |  7kg  |
| Chest | \$50K |  38kg  | 
| Diamond | \$20K |  6kg  | 
| Gold coins | \$7K |  10kg  | 
| Scepter | \$25K |  20kg  | 

Jack can carry at most 50kg. **Which items should he pick?**

<img src="img/knapsack.png" width=600 />

This is a **knapsack problem**. Given a set of **items**, each with a **value** and a **weight**, and a knapsack with a limited **capacity**, the problem is to select a **subset of items** so that 

1. **The total value of the selected items is maximised**, and 
2. **The total weight of the selected items cannot exceed the capacity**.

The above problem can be coded as follows.

In [1]:
items = [
    {'name': 'crown', 'value': 10, 'weight': 7},
    {'name': 'chest', 'value': 50, 'weight': 38},
    {'name': 'diamond', 'value': 20,'weight': 6},
    {'name': 'gold coins', 'value': 7, 'weight': 10},
    {'name': 'scepter', 'value': 25, 'weight': 20}
]

capacity = 50

## Greedy Heuristics

---

The main idea of **greedy heuristics** is to construct the solution step by step. At each iteration, it selects the most **greedy** step.

For the knapsack problem, the greedy heuristics construct the solution by selecting one item at a time. We can sort the items in some order, and select the sorted items one by one, until the knapsack is full. For example, we can select the most valuable item, and then the second valuable one, and so on. *Note that different greedy heuristics sort the items in different ways.*

The framework of greedy heuristics for the knapsack problem can be written as follows.

In [2]:
def greedy_knapsack(sorted_items, capacity):
    '''
    The greedy algorithm for the knapsack problem.
    It takes the sorted items, selects the items one by one until the knapsack is full.
    '''
    selected = []
    remaining_capacity = capacity
    for item in sorted_items:
        if item['weight'] <= remaining_capacity:
            selected.append(item)
            remaining_capacity -= item['weight']
            
    return selected

### Most Valuable First

The **Most Value First (MVF)** greedy heuristic selects the most valuable item first. 

It **sorts the items in the decreasing order of their value**, and then selects the sorted items one by one until the knapsack is full.

Let's use the MVF greedy heuristic to find a solution for Captain Jack Sparrow.

In [3]:
mvf_items = sorted(items, key=lambda x: x['value'], reverse=True)
mvf_selected = greedy_knapsack(mvf_items, capacity)

We print the obtained solution as follows.

In [4]:
def print_solution(sel_items, method_name):
    selected_names = [item['name'] for item in sel_items]
    selected_value = sum([item['value'] for item in sel_items])
    selected_weight = sum([item['weight'] for item in sel_items])

    print(f'{method_name} selected items: {selected_names}')
    print(f'{method_name} selected value: {selected_value}, weight: {selected_weight}/{capacity}.')

In [5]:
print_solution(mvf_selected, "Most Value First")

Most Value First selected items: ['chest', 'diamond']
Most Value First selected value: 70, weight: 44/50.


### Smallest Weight First

The **Smallest Weight First (SWF)** greedy heuristic selects the item with the smallest weight first, so that the knapsack can potentially hold the most remaining items after the selection. 

It **sorts the items in the increasing order of their weight**, and then selects the sorted items one by one until the knapsack is full.

Let's see how the SWF greedy heuristic works for Captain Jack Sparrow.

In [6]:
swf_items = sorted(items, key=lambda x: x['weight'], reverse=False)
swf_selected = greedy_knapsack(swf_items, capacity)

print_solution(swf_selected, "Smallest Weight First")

Smallest Weight First selected items: ['diamond', 'crown', 'gold coins', 'scepter']
Smallest Weight First selected value: 62, weight: 43/50.


### Efficiency First

The Efficiency First (EF) greedy heuristic is a combination of the above two heuristics. It selects the most efficient item (i.e., the one with the largest value per weight) first.

It **sorts the items in the decreasing order of their value/weight**, and then selects the sorted items one by one until the knapsack is full.

Let's see how the EF greedy heuristic works for Captain Jack Sparrow.

In [7]:
ef_items = sorted(items, key=lambda x: x['value'] / x['weight'], reverse=True)
ef_selected = greedy_knapsack(ef_items, capacity)

print_solution(ef_selected, "Efficiency First")

Efficiency First selected items: ['diamond', 'crown', 'scepter', 'gold coins']
Efficiency First selected value: 62, weight: 43/50.


We can see that the **Most Valuable First** greedy heuristic works the best for Captain Jack Sparrow, with the total value of 70, while the other heuristics obtained the total value of only 62. The main reason is that the SWF and EF heuristics left the heaviest chest to the end, but failed to select it.

> **NOTE**: If we can select a part of an item, e.g., 25% of item 2 and 58% of item 3, then the **Efficiency First** heuristic can guarantee to obtain the optimal solution.

## Larger Instances

---

Let's try the greedy heuristics on larger instances. Here we load a knapsack instance `data/100_1000.knapsack` with 100 items and capacity of 995. The data format is as follows.

| No. of Items | Capacity |
| ------------ | -------- | 
| value_1 | weight_1 |
| value_2 | weight_2 |
| ...     | ...      |
| value_n | weight_n |

First, we define the method to load the data instance.

In [8]:
import pandas as pd

df = pd.read_csv("data/100_1000.knapsack", header=None, sep=" ")

capacity = df.iloc[0][1]
item_data = df.iloc[1:]

items = []

for i in item_data.index:
    items.append({'name': i, 'value': item_data[0][i], 'weight': item_data[1][i]})

Then, we apply the three greedy heuristics to this instance and compare their obtained solutions.

In [9]:
mvf_items = sorted(items, key=lambda x: x['value'], reverse=True)
mvf_selected = greedy_knapsack(mvf_items, capacity)

print_solution(mvf_selected, "Most Value First")

Most Value First selected items: [43, 11]
Most Value First selected value: 1041, weight: 981/995.


In [10]:
swf_items = sorted(items, key=lambda x: x['weight'], reverse=False)
swf_selected = greedy_knapsack(swf_items, capacity)

print_solution(swf_selected, "Smallest Weight First")

Smallest Weight First selected items: [11, 49, 7, 54, 38, 24, 83, 61, 14, 33, 39, 36, 13]
Smallest Weight First selected value: 862, weight: 965/995.


In [11]:
ef_items = sorted(items, key=lambda x: x['value'] / x['weight'], reverse=True)
ef_selected = greedy_knapsack(ef_items, capacity)

print_solution(ef_selected, "Efficiency First")

Efficiency First selected items: [38, 24, 54, 33, 71, 57, 85, 28, 7, 11]
Efficiency First selected value: 1487, weight: 983/995.


## Summary

---

The greedy algorithms for knapsack problem is based on the **item sorting heuristic**. We can define the sorting heuristic in different ways.

No single greedy algorithm can guarantee to be better than others. For example, the Most Valuable First heuristics worked for Captain Jack Sparrow. However, for the large instance (and many real-world cases), the Efficiency First heuristic performs very well. Overall, no greedy heuristic can guarantee to find the optimal solution for knapsack problem. 

> **NOTE**: The knapsack problem is known to be NP-hard, which means there is no polynomial-time algorithm that can solve it to optimality.

The **advantages** of greedy heuristics are:

- Fast speed
- Solution quality is OK

The **disadvantages** of greedy heuristics are:

- Cannot guarantee optimality
- Sometimes the solution quality can be poor

There are more advanced techniques to find better solutions than greedy heuristics, including

- Dynamic Programming
- [Branch and Bound](https://github.com/meiyi1986/tutorials/blob/master/notebooks/knapsack-branch-bound.ipynb)
- Local Search
- Genetic Algorithms

---

- More tutorials can be found [here](https://github.com/meiyi1986/tutorials).
- [Yi Mei's homepage](https://meiyi1986.github.io/)