In [None]:
import random

# Multiplicative Weights Update

In this notebook we will explore alternate solutions to the experts question and show why they are inferior to the multiplicative weights update algorithm covered in class. This notebook has multiple subparts. In each subpart, we provide some expert picking strategy, and your job is to implement a strategy the adversary will use to maximize regret. 

In this notebook, the formula for regret we use is $R = \frac{1}{T}(\sum\limits_{t=1}^{T}c_{i(t)}^t - \min\limits_{1\leq i \leq n}\sum\limits_{t=1}^{T}c_i^t)$

Where $T$ is the total number of days, $i(t)$ is the expert you choose on day $t$, and $c_i^t$ is the cost of expert $i$ on day $t$.



First, let's understand the functions you'll be working with:

- `pick_[strategy](costs_so_far, day, experts_picked_so_far, weights = [])`: returns the index of the expert we should pick on the current day, given the history of the costs accrued by each expert


- `adversary_[strategy](costs_so_far, day, experts_picked_so_far, weights = [])`: returns a loss array containing the new costs on the current day given the losses so far and a list of experts picked


- `compute_regret(strategy, adversary, n, num_experts, weights = [])`: calculates the total regret given a strategy used to pick experts, the adversaries actions, and the number of days n

The chooser will use the `pick_[strategy]` function to choose experts on each day. The adversary will use the `adversary_[strategy]` function to assign the losses to each expert on each day. The `compute_regret` function will then compute the total regret given the two strategies.

For example, let's observe the following strategy:

## 1) Always pick the same expert

In [None]:
def pick_same(costs_so_far, day, experts_picked_so_far, weights = []):
    '''
    Returns the index of the expert we should pick on the current day, 
    given the history of the costs accrued by each expert
    
    In this function, we always pick expert 0 regardless of costs
    
    args:
    costs_so_far: a list of list of integers. Each list k of integers represents 
                  the costs expert k accrued across all previous days.
                  
                  For example: 
                  [[1, 0, 1, 0.5],                                
                   [0.5, 1, 0.2, 0.3],
                   [0.1, 1, 1, 0.8]]
                  means expert 0 accrued the costs 1, 0, 1, and 0.5 on days 0, 1, 2, 3 respectively.
                  
    day: an integer representing the current day
    experts_picked_so_far: a lists of integers representing a list of experts we picked on each day
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic
    '''
    return 0
    

In [None]:
def adversary_pick_same(costs_so_far, day, experts_picked_so_far, weights = []):
    '''
    returns a loss array containing the new costs on the current day given the losses 
    so far and a list of experts picked
    
    args:
    costs_so_far: a list of list of integers. Each list k of integers represents 
                  the costs expert k accrued across all previous days.
                  
                  For example: 
                  [[1, 0, 1, 0.5],                                
                   [0.5, 1, 0.2, 0.3],
                   [0.1, 1, 1, 0.8]]
                  means expert 0 accrued the costs 1, 0, 1, and 0.5 on days 0, 1, 2, 3 respectively.
                  
    day: an integer representing the current day 
    experts_picked_so_far: a lists of integers representing a list of experts we picked on each day
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic
    '''
    num_experts = len(costs_so_far)
    new_costs = [1] + [0 for i in range(num_experts - 1)]
    return new_costs
    

In [None]:
def compute_regret(strategy, adversary, n, num_experts, weights = []):
    """
    calculates the total regret given a strategy used to pick experts, the adversaries 
    actions, and the number of days n
    
    args:
    strategy(costs_so_far, day): a function which returns the index of the expert we should pick on the 
        current day, given the history of the costs accrued by each expert 
    
    adversary(costs_so_far, day, experts_picked_so_far): a function which returns a cost matrix including 
        the new costs on the current day given the costs so far and a list of experts picked
    
    n: number of days we run the algorithm 
    num_experts: an integer representing the number of experts to choose from
    
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic
    
    return: a number representing the loss accrued
    """
    costs = [[] for i in range(num_experts)]
    experts_picked = []
    
    for day in range(n):
        experts_picked.append(strategy(costs, day, experts_picked, weights))
        new_costs = adversary(costs, day, experts_picked, weights)
        for i in range(num_experts):
            costs[i].append(new_costs[i])
        
    total_cost = sum([costs[experts_picked[day]][day] for day in range(n)])
    
    expert_costs = [sum(costs[i]) for i in range(num_experts)]
    best_expert_cost = min(expert_costs)

    return (total_cost - best_expert_cost)/n

## Analysis 
If the adversary knows that the same expert will be picked every day, the adversary can simply maximize regret by giving that one expert a loss of one everyday and everyone else a loss of zero. If we run the `compute_regret` function below, we see that our regret given 100 experts and 200 days is 1.

In [None]:
compute_regret(pick_same, adversary_pick_same, 200, 100)

Below we provide 4 different expert picking strategies which are far inferior to multiplicative weights update. We have implemented each expert picking strategy already. Your job is to implement the adversary's strategy to guarantee the highest possible (or highest expected) loss for that expert picking strategy. Each subpart is seperate from one another, so your adversary strategy only needs to consider the expert picking strategy in that subpart.

## 2) Picking the best expert from the previous day
This expert picking strategy only looks at the loss from the previous day and simply picks the best expert based on that loss. You may assume the following about this strategy.

- On day 0 when you pick the first expert, you pick expert 0.
- From day 1 and onwards, if multiple experts are tied for the best from the previous day, you pick an expert uniformly at random.
- The lowest non-zero loss that an expert can have is 0.1 (ie if the adversary can set the loss of an expert as 0 or any number between 0.1 and 1.0).

The expert picking strategy is implemented below. Implement a strategy the adversary can use to guarantee a regret of 0.9.

In [None]:
def pick_previous_best(costs_so_far, day, experts_picked_so_far, weights = []):
    '''
    Returns the index of the expert we should pick on the current day, 
    given the history of the costs accrued by each expert
    
    In this function, we always pick the best expert from the previous day
    
    args:
    costs_so_far: a list of list of integers. Each list k of integers represents 
                  the costs expert k accrued across all previous days.
                  
                  For example: 
                  [[1, 0, 1, 0.5],                                
                   [0.5, 1, 0.2, 0.3],
                   [0.1, 1, 1, 0.8]]
                  means expert 0 accrued the costs 1, 0, 1, and 0.5 on days 0, 1, 2, 3 respectively.

    day: an integer representing the current day
    experts_picked_so_far: a lists of integers representing a list of experts we picked on each day
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic
    '''
    if day == 0:
        return 0
    
    num_experts = len(costs_so_far)
    prev_day_costs = [costs_so_far[i][day - 1] for i in range(num_experts)]
    min_cost = min(prev_day_costs)
    previous_best = [i for i in range(num_experts) if prev_day_costs[i]== min_cost]
    return random.choice(previous_best)
    

In [None]:
def adversary_previous_best(costs_so_far, day, experts_picked_so_far, weights = []):
    '''
    returns a loss array containing the new costs on the current day given the losses 
    so far and a list of experts picked
    
    args:
    costs_so_far: a list of list of integers. Each list k of integers represents 
                  the costs expert k accrued across all previous days.
                  
                  For example: 
                  [[1, 0, 1, 0.5],                                
                   [0.5, 1, 0.2, 0.3],
                   [0.1, 1, 1, 0.8]]
                  means expert 0 accrued the costs 1, 0, 1, and 0.5 on days 0, 1, 2, 3 respectively.
                  
    day: an integer representing the current day 
    experts_picked_so_far: a lists of integers representing a list of experts we picked on each day
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic
    '''
    num_experts = len(costs_so_far)
    if day % 2 == 0:
        return [1, 0] + [0.1 for i in range(num_experts - 2)]
    else: # day % 2 == 1
        return [0, 1] + [0.1 for i in range(num_experts - 2)]
    
    

## Verifier
We run the `compute_regret` function on your strategies to verify that it achieves a regret of 0.9.

In [None]:
for i in range(100):
    regret = compute_regret(pick_previous_best, adversary_previous_best, 5, 5)
    assert regret <= 0.9, f"your algorithm achieved a regret of {regret}, make sure your losses are in the valid range"
    assert regret >= 0.9, f"your algorithm achieved a regret of {regret}, try creating a better strategy"
    
    
print("success")

## 3) picking uniformly at random

In this strategy, you always pick an expert uniformly at random, independent of the losses each expert accrues and independent of the experts you picked on the previous day.

The expert picking strategy is implemented below. Implement a strategy the adversary can use to guarantee an expected regret of $\frac{n - 1}{n}$, where $n$ is the number of experts

In [None]:
def pick_uniform_random(costs_so_far, day, experts_picked_so_far, weights = []):
    '''
    returns the index of the expert we should pick on the current day, 
    given the history of the costs accrued by each expert
    
    In this function, we always pick an expert uniformly at random regardless of the costs.
    
    args:
    costs_so_far: a list of list of integers. Each list k of integers represents 
                  the costs expert k accrued across all previous days.
                  
                  For example: 
                  [[1, 0, 1, 0.5],                                
                   [0.5, 1, 0.2, 0.3],
                   [0.1, 1, 1, 0.8]]
                  means expert 0 accrued the costs 1, 0, 1, and 0.5 on days 0, 1, 2, 3 respectively.

   day: an integer representing the current day
    experts_picked_so_far: a lists of integers representing a list of experts we picked on each day
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic
    '''
    return random.randint(0,len(costs_so_far) - 1)

In [None]:
def adversary_pick_uniform(costs_so_far, day, experts_picked_so_far, weights = []):
    '''
    returns a cost matrix including the new costs on the current day given the costs 
    so far and a list of experts picked
    
    args:
    costs_so_far: a list of list of integers. Each list k of integers represents 
                  the costs expert k accrued across all previous days.
                  
                  For example: 
                  [[1, 0, 1, 0.5],                                
                   [0.5, 1, 0.2, 0.3],
                   [0.1, 1, 1, 0.8]]
                  means expert 0 accrued the costs 1, 0, 1, and 0.5 on days 0, 1, 2, 3 respectively.
                  
    day: an integer representing the current day 
    experts_picked_so_far: a lists of integers representing a list of experts we picked on each day
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic    
    '''    
    num_experts = len(costs_so_far)
    new_costs = [0] + [1 for i in range(num_experts - 1)]
    return new_costs


## Verifier
We run the `compute_regret` function on your strategies to verify that it achieves an expected regret of $\frac{n-1}{n}$. Since this is a nondeterministic strategy, the regrets you end up getting may slightly deviate from $\frac{n-1}{n}$; thus, our tests simply require your regret to be within one percent of $\frac{n-1}{n}$.

In [None]:
for n in range(10,100,10):
    total = 0
    for i in range(10):
        total += compute_regret(pick_uniform_random, adversary_pick_uniform, 1000, n)
    total /= 10
    assert abs(total - ((n-1)/n)) < 0.01, f"your strategy has a regret of {total}, but we expected {(n-1)/n}"
print("success")

## 4) picking with a distribution

In this strategy, you will randomly pick each expert i with probability $p_i$, independent of the losses each expert accrues and independent of the experts you picked on the previous day. These probabilities are provided in the `weights` parameter. `weights` is also passed into the adversary function, so you can use it to create a strategy for the adversary.

The expert picking strategy is implemented below. Implement a strategy the adversary can use to guarantee an expected regret of 1 - $\min_i(p_i)$, where $p_i$ is the probability of picking expert i.

In [None]:
def pick_with_distribution(costs_so_far, day, experts_picked_so_far, weights = []):
    '''
    returns the index of the expert we should pick on the current day, 
    given the history of the costs accrued by each expert
    
    In this function, we always pick expert 0 regardless of costs
    
    args:
    costs_so_far: a list of list of integers. Each list k of integers represents 
                  the costs expert k accrued across all previous days.
                  
                  For example: 
                  [[1, 0, 1, 0.5],                                
                   [0.5, 1, 0.2, 0.3],
                   [0.1, 1, 1, 0.8]]
                  means expert 0 accrued the costs 1, 0, 1, and 0.5 on days 0, 1, 2, 3 respectively.

    day: an integer representing the current day
    experts_picked_so_far: a lists of integers representing a list of experts we picked on each day
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic    '''
    return random.choices([i for i in range(len(costs_so_far))], weights=weights)[0]
    

In [None]:
def adversary_pick_distribution(costs_so_far, day, experts_picked_so_far, weights = []):
    '''
    returns a cost matrix including the new costs on the current day given the costs 
    so far and a list of experts picked
    
    args:
    costs_so_far: a list of list of integers. Each list k of integers represents 
                  the costs expert k accrued across all previous days.
                  
                  For example: 
                  [[1, 0, 1, 0.5],                                
                   [0.5, 1, 0.2, 0.3],
                   [0.1, 1, 1, 0.8]]
                  means expert 0 accrued the costs 1, 0, 1, and 0.5 on days 0, 1, 2, 3 respectively.

    day: an integer representing the current day 
    experts_picked_so_far: a lists of integers representing a list of experts we picked on each day
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic
    '''
    num_experts = len(costs_so_far)
    min_expert = weights.index(min(weights))

    new_costs = [1 for i in range(num_experts)]
    new_costs[min_expert] = 0
    return new_costs

## Verifier
We run the `compute_regret` function on your strategies to verify that it achieves an expected regret of 1 - $\min_i(p_i)$. Since this is a nondeterministic strategy, the regrets you end up getting may slightly deviate from 1 - $\min_i(p_i)$; thus, our tests simply require your regret to be within one percent of 1 - $\min_i(p_i)$.

In [None]:
for num_experts in range(10,100,10):
    for i in range(10):

        #computes the probability of picking each expert
        weights = [random.randint(0,50) for i in range(num_experts)]
        total = sum(weights)
        weights = [w/total for w in weights]

        total_regret = 0
        for i in range(30):
            total_regret += compute_regret(pick_with_distribution, adversary_pick_distribution, 100, num_experts, weights)
        total_regret /= 30
        assert abs(total_regret - (1 - min(weights))) < 0.01, f"your strategy has a regret of {total_regret}, but we expected {(1 - min(weights))}"

print("success")

## 5) Picking the best expert so far
This expert picking strategy picks the expert with the smallest total loss so far. If there are ties, pick the lowest indexed one.

The expert picking strategy is implemented below. Implement a strategy the adversary can use to guarantee a regret of $\frac{n-1}{n}$.


In [None]:
def pick_best_so_far(costs_so_far, day, experts_picked_so_far, weights = []):
    '''
    returns the index of the expert we should pick on the current day, 
    given the history of the costs accrued by each expert
    
    Picks the best expert so far, if there are ties, pick the lowest indexed one.
    
    args:
    costs_so_far: a list of list of integers. Each list k of integers represents 
                  the costs expert k accrued across all previous days.
                  
                  For example: 
                  [[1, 0, 1, 0.5],                                
                   [0.5, 1, 0.2, 0.3],
                   [0.1, 1, 1, 0.8]]
                  means expert 0 accrued the costs 1, 0, 1, and 0.5 on days 0, 1, 2, 3 respectively.

    day: an integer representing the current day
    experts_picked_so_far: a lists of integers representing a list of experts we picked on each day
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic
    '''
    total_loss = [sum(expert_loss) for expert_loss in costs_so_far]
    return total_loss.index(min(total_loss))

In [None]:
def adversary_best_so_far(costs_so_far, day, experts_picked_so_far, weights = []):
    '''
    returns a cost matrix including the new costs on the current day given the costs 
    so far and a list of experts picked
    
    args:
    costs_so_far: a list of list of integers. Each list k of integers represents 
                  the costs expert k accrued across all previous days.
                  
                  For example: 
                  [[1, 0, 1, 0.5],                                
                   [0.5, 1, 0.2, 0.3],
                   [0.1, 1, 1, 0.8]]
                  means expert 0 accrued the costs 1, 0, 1, and 0.5 on days 0, 1, 2, 3 respectively.
                 
    day: an integer representing the current day 
    experts_picked_so_far: a lists of integers representing a list of experts we picked on each day
    weights = a non-empty list of numbers representing the weights used to pick experts if the strategy is 
        not deterministic
    '''
    num_experts = len(costs_so_far)
    if len(costs_so_far[0]) == 0:
        new_costs = [0 for i in range(num_experts)]
        new_costs[0] = 1
        return new_costs
    total_loss = [sum(expert_loss) for expert_loss in costs_so_far]
    min_expert = total_loss.index(min(total_loss))
    new_costs = [1 if i == min_expert else 0 for i in range(num_experts)] 
    return new_costs

## Verifier
We run the `compute_regret` function on your strategies to verify that it achieves an expected regret of $\frac{n-1}{n}$. Since this is a nondeterministic strategy, the regrets you end up getting may slightly deviate from 1 - $\min_i(p_i)$; thus, our tests simply require your regret to be within one percent of $\frac{n-1}{n}$.

In [None]:
for num_experts in range(10, 100, 10):
    total = 0
    for i in range(20):
        total += compute_regret(pick_best_so_far, adversary_best_so_far, 100, num_experts)
    total /= 20
    assert abs(total - ((num_experts-1)/num_experts)) < 0.01, f"your strategy has a regret of {total}, but we expected {((num_experts-1)/num_experts)}"

print("success")