# 6.0002 Lecture 1: Introduction and Optimization Problems

**Speaker:** Prof. John Guttag

## 6.0002 prerequisites
- experience writing object-oriented programs in Python
    - preferably Python 3.5
- familiarity with concepts of computational complexity
- familiarity with some simple algorithms
- 6.0001 sufficient

## How does it compare to 6.0001?
- programming assignments a bit easier
    - focus more on the problem to be solved than on programming
- lecture content more abstract
- lectures will be a bit faster paced
- less about learning to program, more about dipping your toe into data science

## Honing your program skills
- a few additional bits of Python
- software engineering
- using packages
- How do you get to Carnegie Hall?
    - practice, practice, practice

## Computational Models
- using computation to help understand the world in which we live
- experimental devices that help us to understand something that has happened or to predict the future
- **optimization models**
    - will focus on this today
- statistical models
- simulation models

## What is an optimization model?
- start with an objective function that is to be maximized or minimized, e.g. 
    - minimize time spent traveling from New York to Boston
- a set of constraints (possibly empty) that must be honored, e.g.
    - cannot spend more than $100
    - must be in Boston before 5:00 PM

## Knapsack problem
- you have limited strength, so there is a maximum weight knapsack that you can carry
- you would like to take more stuff than you can carry
- how do you choose which stuff to take and which to leave behind?
    - want to maximize value of items you take
    - constrained by weight
- two variants
    - 0/1 knapsack problem --> you either take the object or you don't (all or none)
        - more complicated, because decisions affect future decisions
    - continuous or fractional knapsack problem --> can take pieces of items
        - boring; easy to solve: fill up with most valuable thing until run out of it or space, then just fill in remaining space with fractions of whatever the next most valuable thing is (solve via **Greedy algorithm**)

## 0/1 Knapsack Problem, formalized
- each item is represented by a pair, <value, weight>
- the knapsack can accommodate itemrs with a total weight of no more than w
- a vector, L, of length n, represents the set of available items
    - each element of the vector is an item
- a vector, V, of length n, is used to indicate whether or not items are taken
    - if V[i] = 1, item I[i] is taken
    - if V[i] = 0, item I[i] is not taken
- find a V that maximizes $$\sum_{i=0}^{n-1}V[i]\cdot I[i]\textrm{.value}$$
- subject to the constraint that $$\sum_{i=0}^{n-1}V[i] \cdot I[i]\textrm{.weight} \leq w$$

## Brute force algorithm
- 1.) enumerate all possible combinations of items.
    - that is to say, generate all subsets of the set of items, called the **power set**, $\mathcal{P}(L)$
- 2.) remove all of the combinations whose total units exceed the allowed weight
- 3.) from the remaining combinations, choose any one whose value is largest

## Often not practical
- how big is power set?
- How many possible different values can V have?
    - as many different binary numbers as can be represented in n bits
- for example, if there are 100 items to choose from, the power set is of size:
    - 1,267,650,600,228,229,401,496,703,205,376

## Are we just being stupid?
- alas, no
- 0/1 knapsack problem is inherently exponential
- but don't despair...

## Greedy Algorithm a practical alternative
- while knapsack not full
    - put "best" available item in knapsack
- but what does best mean?
    - most valuable
    - least expensive
    - highest value/units

## An example
- you are about to sit down to a meal
- you know how much you value different foods
    - e.g. you like donuts more than apples
- but you have a calorie budget, e.g. you don't want to consume more than 750 calories
- choosing what to eat is a knapsack problem
- given a menu of items, let's look at a program that we can use to decide what to order

In [1]:
class Food(object):
    def __init__(self, n, v, w):
        self.name = n
        self.value = v
        self.calories = w
    
    def getValue(self):
        return self.value
    
    def getCost(self):
        return self.calories
    
    def density(self):
        return self.getValue()/self.getCost()
    
    def __str__(self):
        return self.name + ': <' + str(self.value)\
                + ', ' + str(self.calories) + '>'

In [3]:
def buildMenu(names, values, calories):
    """ names, values, calories lists of same length.
        name a list of strings
        values and calories lists of numbers
        returns list of Foods"""
    menu = []
    for i in range(len(values)):
        menu.append(Food(names[i], values[i], calories[i]))
        
    return menu

## Implementation of flexible greedy

In [13]:
def greedy(items, maxCost, keyFunction):
    """Assumes items a list, maxCost >= 0,
        keyFunction maps elements of items to numbers"""
    itemsCopy = sorted(items, key=keyFunction, reverse=True) # sorted() makes a copy of the list
    result = []
    totalValue, totalCost = 0.0, 0.0
    
    for i in range(len(itemsCopy)): # go through loop n times, once for each item
        if (totalCost+itemsCopy[i].getCost()) <= maxCost:
            result.append(itemsCopy[i])
            totalCost += itemsCopy[i].getCost()
            totalValue += itemsCopy[i].getValue()
    
    return (result, totalValue) # O(nlogn) (pretty efficient)

## Using greedy

In [22]:
def testGreedy(items, constraint, keyFunction):
    taken, val = greedy(items, constraint, keyFunction)
    print('Total value of items taken =', val)
    for item in taken:
        print('   ', item)

In [23]:
def testGreedys(foods, maxUnits):
    print('Use greedy by value to allocate', maxUnits, 'calories')
    testGreedy(foods, maxUnits, Food.getValue)
    print('\nUse greedy by cost to allocate', maxUnits, 'calories')
    # lambda is used to create an anonymous function (has no name)
    testGreedy(foods, maxUnits, lambda x: 1/Food.getCost(x)) # invert cost to prefer cheaper items
    # x has to be of type Food
    print('\nUse greedy by density to allocate', maxUnits, 'calories')
    testGreedy(foods, maxUnits, Food.density)

## lambda
- used to create anonymous functions
    - lambda <id_1, id_2, ..., id_n>:<expression>
    - returns a function of n arguments
- can be very handy, as here
- possible to write amazing complicated lambda expressions
- **don't** -- use def instead

In [24]:
names = ['wine', 'beer', 'pizza', 'burger', 'fries', 'cola', 'apple', 'donut', 'cake']
values = [89, 90, 95, 100, 90, 79, 50, 10]
calories = [123, 154, 258, 354, 365, 150, 95, 195]
foods = buildMenu(names, values, calories)

testGreedys(foods, 750)

Use greedy by value to allocate 750 calories
Total value of items taken = 284.0
    burger: <100, 354>
    pizza: <95, 258>
    wine: <89, 123>

Use greedy by cost to allocate 750 calories
Total value of items taken = 318.0
    apple: <50, 95>
    wine: <89, 123>
    cola: <79, 150>
    beer: <90, 154>
    donut: <10, 195>

Use greedy by density to allocate 750 calories
Total value of items taken = 318.0
    wine: <89, 123>
    beer: <90, 154>
    cola: <79, 150>
    apple: <50, 95>
    donut: <10, 195>


## why different answers?
- sequence of locally "optimal" choices don't always yield a globally optimal solution
    - greedy algorithms can get you stuck at a local optimum rather than global optimum
- is greedy by density always a winner?
    - Try testGreedys(foods, 1000)

In [25]:
testGreedys(foods, 1000)

Use greedy by value to allocate 1000 calories
Total value of items taken = 424.0
    burger: <100, 354>
    pizza: <95, 258>
    beer: <90, 154>
    wine: <89, 123>
    apple: <50, 95>

Use greedy by cost to allocate 1000 calories
Total value of items taken = 413.0
    apple: <50, 95>
    wine: <89, 123>
    cola: <79, 150>
    beer: <90, 154>
    donut: <10, 195>
    pizza: <95, 258>

Use greedy by density to allocate 1000 calories
Total value of items taken = 413.0
    wine: <89, 123>
    beer: <90, 154>
    cola: <79, 150>
    apple: <50, 95>
    pizza: <95, 258>
    donut: <10, 195>


## The pros and cons of Greedy
- easy to implement
- computationally efficient
- but does not always yield the best solution
    - don't even know how good the approximation is
- in the next lecture we'll look at finding truly optimal solutions