In [25]:
# !pip install pyomo
# !pyomo install-extras
#!conda install -y -c conda-forge pyomo.extras
#!conda install -y -c conda-forge glpk
import random
import numpy as np
import pandas as pd
import pdb
import math
%matplotlib inline 

the dynamic programming worked well for this example, however there are few problems with it: first for cases of small meeting time with longer slot available, the algorithm may pic the same slot multiple times (as in example below), and second what if the client is providing multiple available solts in his calender, the same client may be picked for maultiple meetings in one day. Although you can modefy your algorithm to avoid these cases it is most likely not a good idea. In real world schenarios the problem will be modified in various ways that make the algorithm desing a hard task and you will need to revamp your algorithm over and over.

For example, think what will happen if you are asked to use the algorithm wiht a team of marketing agents instead of one, and what if each of them have differnt availability time and workload? these modifications wlll makehte problem harder and most likely NP-Complete and there will not be a good algorithm that scale to more than few marketing slots per calender. In that case to avoid being fired, you want to desing an algorithm that result in good enough solutions instead of optimal one.  

Instead of working hard on the algorithm, we want to work hard on modeling the problem and use a tool that come up with good solutions, even in case the problem is NP-Complete. This will allow us to save the algorith development time and make iterative solutions by changing the objective and adding constraints to our problem. Fortunatly, for large protion of such problem, there are optmization methods that could be used.


#### Knapask problem
To best demonistrate the idea lets take a simple example before going further. Suppose you are going into tip and want to take some items with you, however your car have limited capacity and you want to pic the itmes that are most valiable for you. suppose you have hammer, wrench, screwdriver, towel. The weights and importance of these items are listed in table x. This problem known as knapsak problem is NP-Complete problem.

To solve the proble we will use pyomo which is a great python package that wrap multiple optimizers and provide elegent way of modeling optimiztion problems.

First thing we define the data model, it includ the iteam, the importance and the wait. To model the problem we should define few things:
- **Optimization variable**, in this case a boolean variable per items taht indicate if it should be selected for the trip or not.
- **Objective we are trying to optimize**, in this case the sum of values of items that we can carry
- **Constraints** that are imposed to our problem, here the sum of picked items wait are less than 14

Fortunatly, these constructes are available in pyomo and are intutive to use. You can just define variables, objectives and constraints and attach them to a model. After that you just initiate an solver and voela, you have your answer!


In [1]:
from pyomo.environ import *
import pyomo.environ as pe 

# the data
A = ['hammer', 'wrench', 'screwdriver', 'towel']
b = {'hammer':8, 'wrench':3, 'screwdriver':6, 'towel':11}
w = {'hammer':5, 'wrench':7, 'screwdriver':4, 'towel':3}
W_max = 14

#the model 
model = ConcreteModel()

# the variables
model.x = Var( A, within=Binary )

# the objective 
model.value = Objective(expr = sum( b[i]*model.x[i] for i in A), sense = maximize )

# weights 
model.weight = Constraint(expr = sum( w[i]*model.x[i] for i in A) <= W_max )

#solver 
opt = SolverFactory('glpk')

#Voela !
result_obj = opt.solve(model) 
model.x.get_values()

{'screwdriver': 1.0, 'towel': 1.0, 'wrench': 0.0, 'hammer': 1.0}

#### Modifing the problem

now lets say you dicided that either hammer or wrench should be picked. No problems, you go the the model, define a constraint and run again and thats it. You can see now that instead of foucsing on the algorithm itself, you have room to focus on the problme and let pyomo do the heavy lifting.

In [None]:
model.c = Constraint(expr = sum(model.x[i] for i in [A[1], A[3]]) == 1)

#### A baseline for sanity check

So how to know that the solution is good enough after all. A good idea is to build some baseline to compare with, because we dont care about perfect solution the base line should be simple enough. in this example, lets say we pic the highest value fist until we dont have space.

In [None]:
#code

the solution of the base line is wrench, hammer and towel which sum up to 14 same as ... (find better weights)

#### Schedualing as an optmization problme

Now, lets return to our schedualling problem. We define the optmization problem as follows:

- **Variables**: boolean varaible for each possible slot to be picked or not.
- **Objective**: maximize the sum of value from schedualed meetings
- **Constraint**: sum(slots per client) <= 1 for each client (each client could be contacted at most once per day)
- **Constraint**: Sum(slots) <= 1 for each overlap

and prety much thats it, now lets put it into code:

**1. Generate smaple problems**: instead of using fixed problme we generate sample problmes for tesing our algorithms. for each client we generate 1-2 available times per day, each of these could randomly start between 8 and 14 and have random duration of 15, 30, 45 or 60 mins. And finally, each available time is associated with some random wieght.

In [1]:
def generate_schedual(clients):
    schedual = []
    for client in clients:
        num_slots = random.randint(1, 2)
        for i in range(num_slots):
            start = float(random.randint(8, 14)) + random.randint(0, 4) * 0.25
            duration = 0.25 * random.randint(1, 4)
            end = start + duration
            weight = random.randint(1, 5)
            schedual += [[client, start, end, duration, weight]]
    return schedual

**2. Enumerate slots** each available time per client may have different solts, we enumerate thme all and store them into a pandas dataframe for conviniance.

In [50]:
from collections import defaultdict
# tasks = [["T1", 08.50, 09.50, 1.00, 3], 
#          ["T2", 09.25, 10.00, 0.75, 4],
#          ["T3", 09.50, 10.75, 0.50, 2],
#          ["T4", 10.25, 11.50, 0.25, 1],
#          ["T5", 12.00, 13.50, 0.75, 1],
#          ["T6", 12.25, 13.75, 0.75, 2],
#          ["T1", 12.00, 13.50, 1.00, 3]]

tasks = generate_schedual(["T1", "T2", "T3", "T4", "T5", "T6"])
def find_possible_slots(start_time, end_time, duration):
    for i in range(int((end_time - start_time) / 0.25)):
        end_time_ = start_time + i * 0.25 + duration
        if end_time_ <= end_time:
            yield (start_time + i * 0.25, start_time + i * 0.25 + duration, duration)

def task_id(task, slot):
    return {"task": task[0] + "_" + str(slot[0]) + "_" + str(slot[1]), 
            "task_group": task[0], 
            "start": slot[0], 
            "finish": slot[1], 
            "duration": slot[2], 
            "weight": task[4]}

def compute_possible_tasks(tasks):
    tasks_ = []
    for task in tasks:
        slots = find_possible_slots(float(task[1]), float(task[2]), float(task[3]))
        for slot in slots:
            tasks_ += [task_id(task, slot)]
    tasks_  = sorted(tasks_, key=lambda x: x["finish"])
    return pd.DataFrame(tasks_)

**3. Find overlaps** We find overlapped tasks using pandas, please not this implementaiton is not effient however we use it here for illustration perpopse, in production you need to consider using better data structure like segment tree to find overlappoing tasks efficietly. 

In [51]:
def find_task_overlaps(possible_tasks):
    overlapping_tasks = {}
    for task in possible_tasks.task.unique():
        start, finish = possible_tasks[possible_tasks.task == task].iloc[0][["start", "finish"]].values
        overlapping_tasks_ = possible_tasks[(possible_tasks.start >= start) & (possible_tasks.start < finish)]
        overlapping_tasks_ = list(overlapping_tasks_.task.unique())
        if len(overlapping_tasks_) > 1:
            overlapping_tasks[task] = overlapping_tasks_
    return overlapping_tasks


**4. Optimiz** we define the optimization problem. (break it down)

In [52]:
def schedual_tasks(tasks):
    model = ConcreteModel()

    #data
    possible_tasks = compute_possible_tasks(tasks)
    tasks = possible_tasks.task.values
    w = {r.task: r.weight for _, r in possible_tasks[possible_tasks.task == tasks].iterrows()}

    #constarint data
    overlapping_tasks = find_task_overlaps(possible_tasks)
    task_groups = possible_tasks["task_group"].unique()

    #variables
    model.x = Var(tasks, within=Binary )

    #objective
    model.value = Objective(expr = sum(w[i]*model.x[i] for i in tasks), sense = maximize )

    #constraints
    @model.Constraint(task_groups)
    def one_each_group(m, tg):
        return sum(m.x[task] for task in possible_tasks[possible_tasks["task_group"] == tg]["task"].unique()) <= 1

    @model.Constraint(overlapping_tasks.keys())
    def one_each_overlap(m, t):
        return sum(m.x[task] for task in overlapping_tasks[t]) <= 1

    #solve
    opt = SolverFactory('glpk')
    result_obj = opt.solve(model)
    selected = [k for k, v in model.x.get_values().items() if v == 1]
    
    #formate resutls
    results = (possible_tasks
          .loc[possible_tasks.task.isin(selected)]
          .sort_values(by=['start'])
          .set_index("task_group")
          [["start", "finish", "duration", "weight"]])
    return results


In [53]:
%%capture
results = schedual_tasks(tasks)


In [54]:
results

Unnamed: 0_level_0,start,finish,duration,weight
task_group,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
T6,10.0,10.25,0.25,5
T4,10.25,10.5,0.25,1
T1,11.75,12.0,0.25,3
T3,12.75,13.25,0.5,5
T2,13.25,14.0,0.75,3
T5,14.25,14.75,0.5,3


In [55]:
results.weight.sum()

20

**5. Baseline sanity check** to verify that our solution is working fine, we set a greedy algorithm that take the most valuable tasks first and eliminate all conflicting tasks. 

In [56]:
def solve_by_elemination(tasks):
    schedual = []
    possible_tasks = compute_possible_tasks(tasks)
    possible_tasks_ = possible_tasks.copy().sort_values(by=['weight'], ascending=False)
    for i in range(100):
        try:
            task, task_group, start, finish = possible_tasks_.iloc[i][["task", "task_group", "start", "finish"]]
            possible_tasks_ = possible_tasks_[~((possible_tasks_.start >= start) & (possible_tasks_.start < finish) & ((possible_tasks_.task != task)))]
            possible_tasks_ = possible_tasks_[~((possible_tasks_.finish > start) & (possible_tasks_.finish <= finish) & ((possible_tasks_.task != task)))]
            possible_tasks_ = possible_tasks_[(possible_tasks_.task_group != task_group) | (possible_tasks_.task == task ) ]
            schedual += [task]
        except:
            break
    return possible_tasks[possible_tasks.task.isin(schedual)]

solve_by_elemination(tasks).weight.sum()

18

In this case we see that our optimization is surpassing the baseline and the resuls make sens.

### We don't know the weights

Looking back to what we have right now, we have clients that we want to meet during a day, each client provide a there free time slots, and each meating have specfic duration and imprtance associated with it, your job is to schedual the meetins to maximize the importance you get from the meetings. You have realized the problem complexity, switch gears from building complex algorithm to build simple model and use pyomo optimization libarary to solve it, so far so good.

Now, you can imagin that imprtance of each client is not pre defined, actually, we learnt it on the way. Is it possible to extend our algorithm to optmize the schedual and learn importance in the same time?. To answer this question, lets focus on the learning aspect. Imagin that you are travelling to new country, you want to get some lunch and you find 2 restorants nearby and you want to know which one to choose. Your measure from quality is on scale from 0 to 1, where 0 means you dont like the food at all and 1 for perfect food. If you tried one restorant for 10 times, you can compute the mean and standard diviation and form a confidence interval of the food quality at that restorant. Comparing two restorants is somehow tricky. If one resturant got 70% score and another get 75%, you could expect that the second is a better one, however it could be due to random reason. To establish healthy decision, you could use statistical procedures like T-Test or ANOVA to test if the difference is statistically significant or not.  



In [None]:
#T-Test/ANOVA

while with T-test it is prety much safe to decide which resturant is the best choose, you will need to try them all n times to draw the conclusion. That is simply not how we will solve it as humans, we dont try bad restorants hundred times to establish statisical significance! Instead people will try resorants at random and come back to the best ones they found so far, sometimes they will explore new options. This key in this method is to balance exploration of new restorants and exploitation of the best restorants. The value you loose due to exploring none optimal choices is called regret, and you basiclly want minimize your regret of going to bad resorants. Fortunatly, the exploration vs. exploitation problme known as Multi Arm Bandit MAB (add info in a side box) is well studied one, and in fact there are provably optimal statigies to solve it. In general we could reley on the principle of optmizm in face of uncertinatiy, the idea is to choose the best know choices when we are not certain.

explain UCB1 algorithm 

In [None]:
#UCB1 and regret plot

Going from resturants to optimization algorithms, we want to utilize the same principle optmizm in face of uncertainity. To make it possible it is better to model the optmization algorithm like an agent that interact with some enviroment to learn the about the problem and find optmial solution. Our model have the following components:

- **Problem**: which encapsulate the objective, the constraints, the structue of the problem and all otehr info except for the problem wights.
- **Parameters**: encapsulate the wieght of the problem 
- **Solver**: the optmizer we use to solve the problem 
- **Oracle**: the envirment sensor that will tell us the weights of the solution
- **Agent**: which take the problme, parameters and solver to solve the problem and then use the oracle to observe the solution quality.

#TODO draw a schmatic of the env


In [3]:
class Oracle:
    def observe(self, problem, solution):
        pass
        
class Solver:
    def solve(self, problem, weights):
        pass
    
class Problem:
    pass

class ProblemModel:
    def get_all_weights(self):
        pass
    
    def set_weights(self, weights):
        pass
    


For our schedualing example, the problem will hold the meetings informations 

In [6]:
class TaskSchedualingProblem(Problem):
    def __init__(self, tasks):
        self.tasks = tasks


The parameters will store the weights of each possilbe meeting, note that all possible tasks belong to parant task have the same weight. For conviniance, we set two modes to initialize the wieghts random and know waits. Know wieghts will be used to solve the problem as we know it, while random initial weights will be used whenever are in bandit mode. (rephrase)

In [7]:
class TaskSchedualingProblemModel(ProblemModel):
    def __init__(self, tasks, weights_known=False):
        if weights_known:
            self.weights = dict(compute_possible_tasks(tasks)[["task_group", "weight"]].drop_duplicates().values)
        else:
            self.weights = {i: random.random() for i in compute_possible_tasks(tasks)["task_group"].drop_duplicates().values}
        self.arms = self.weights.keys()  
        
    def get_all_weights(self):
        return self.weights
    
    def set_weights(self, weights):
        for k, v in weights.items():
            self.weights[k] = v
    



The solver will simply call the schedual taks function we defined before

In [9]:
class TaskSchedualingSolver(Solver):
    def __init__(self):
        pass 
    
    def solve(self, problem, weights):
        tasks = [task[:4] + [weights[task[0]]] for task in problem.tasks]
        return schedual_tasks(tasks).index.values

The Oracle models our enviroment, depending on the nature of the enviroment the observations could be true values or noisy readings. 

In [10]:
class TaskSchedualingOracle(Oracle):
    def __init__(self, tasks, noise_factor=3.0):
        self.weights = dict(compute_possible_tasks(tasks)[["task_group", "weight"]].drop_duplicates().values)
        self.arms = self.weights.keys()  
        self.noise_factor = noise_factor
        
    def get_weight(self, x, noisy=False):
        if noisy:
            return self.weights[x] + (random.random() - 0.5) * self.noise_factor
        else:
            return self.weights[x]
        
    def observe(self, problem, solution, noisy=False):
        return [self.get_weight(x, noisy=noisy) for x in solution]


Something

In [1545]:
problem = TaskSchedualingProblem(tasks)
problem_model = TaskSchedualingProblemModel(tasks, weights_known=True)
oracle = TaskSchedualingOracle(tasks, noise_factor=5)
solver = TaskSchedualingSolver()
sln = solver.solve(problem, problem_model.get_all_weights())
sum(oracle.observe(problem, sln, noisy=False))

12

Something

In [1546]:
problem_model_uncertain = TaskSchedualingProblemModel(tasks, weights_known=False)
sln = solver.solve(problem, problem_model_uncertain.get_all_weights())
sum(oracle.observe(problem, sln, noisy=False))


10

In [1]:
class CombUcb1:
    def __init__(self, problem, problem_model, solver, oracle, mode='Max'):
        self.problem = problem
        self.solver = solver
        self.oracle = oracle
        self.arms = problem_model.arms
        self.mode = mode
        self.init_algorithm()

    def init_algorithm(self):
        uu = {i: 0.0 for i in self.arms}
        w = {i: 0.0 for i in self.arms}
        t = 0
        for arm in uu.keys():
            if self.mode == 'Max':
                uu[arm] = 1.0
            elif self.mode == 'Min':
                uu[arm] = 0.0
            else:
                raise Exception('Mode is only Max or Min')
        solution_exists = True

        while ((self.mode == 'Min' and np.min(list(uu.values())) == 0)
                                        or (self.mode == 'Max' and np.max(list(uu.values())) == 1.0)):
            At = self.solver.solve(self.problem, uu)
            if At is None:
                break
            AtW = self.oracle.observe(self.problem, At, noisy=True)

            for idx, e in enumerate(At):
                w[e] = AtW[idx]
                if self.mode == 'Max':
                    uu[e] = 0.0
                elif self.mode == 'Min':
                    uu[e] = 1.0
                else:
                    raise Exception('Mode is only Max or Min')
            t += 1
        self.weights = w
        self.time_steps = {i: 1.0 for i in w.keys()}
        self.t = t

    def bandit_iter(self):
        weights = self.weights
        time_steps = self.time_steps
        t = self.t
        if self.mode == 'Max':
            u_ucb = {i: min(weights[i] + np.sqrt(1.5 * np.log(t)/time_steps[i]), 1.0) for i in weights.keys()}
        elif self.mode == 'Min':
            u_ucb = {i: max(weights[i] - np.sqrt(1.5 * np.log(t)/time_steps[i]), 0) for i in weights.keys()}
        else:
            raise Exception('Mode is only Max or Min')
        At = self.solver.solve(self.problem, u_ucb)
        wAt = self.oracle.observe(self.problem, At)
        for idx, e in enumerate(At):
            weights[e] = (time_steps[e] * weights[e] + wAt[idx]) / (time_steps[e] + 1)
            time_steps[e] += 1
        t += 1
        self.time_steps = time_steps
        self.weights = weights
        self.t = t

    def solve(self):
        weights = self.weights
        u_ucb = weights
        return self.solver.solve(self.problem, u_ucb)


In [1547]:
%%capture 
b = CombUcb1(problem=problem, problem_model=problem_model_uncertain, solver=solver, oracle=oracle, mode='Max')

for i in range(5):
    print (i)
    b.bandit_iter()

In [1548]:
sln = solver.solve(problem, b.weights)
sum(oracle.observe(problem, sln, noisy=False))

10

### Not only schedualing
wait, we can solve problems without knwing the parametrics! What about other problems? 

#### Knapsak with unkown waits 

#### shotest path with unknown waits


### Extend to a team of phone marekters

In [1549]:
agents_workload = ("Alex", "Jennifer", "Andrew", "DeAnna", "Jesse")

clients = (
    "Trista", "Meredith", "Aaron", "Bob", "Jillian",
    "Ali", "Ashley", "Emily", "Desiree", "Byron")

In [1550]:
import numpy as np
import sklearn

def score(agent, client):
    try:
        s = 1 / (1 + math.exp(-np.dot(agents_v[agent], clients_v[client])))
    except:
        print (np.dot(agents_v[agent], clients_v[client]))
        return random.random()
    return s
    
def generate_matching_problem(agents, clients):
    num_samples = len(agents) + len(clients)
    samples = sklearn.datasets.make_swiss_roll(num_samples, noise=3, random_state=0)[0]
    random.shuffle(samples)
    clients_v = {clients[i]: samples[i] for i in range(len(clients))}
    agents_v = {agents[i]: samples[i] for i in range(len(agents))}
    match_scores = dict(
        ((agent, client), score(agent, client))
        for agent in agents_workloads
        for client in clients)

    client_time = {client: random.randint(1, 4) for client in clients}
    agents_workload = {agent: random.randint(2, 5) for agent in agents}
    
    return match_scores, client_time, agents_workload, clients_v, agents_v

In [1551]:
def solve_matching(match_scores, client_time, agents_workload):
    agents = agents_workloads.keys()
    clients = client_time.keys()
    
    model = pe.ConcreteModel()
    model.agents = agents_workloads.keys()
    model.clients = clients
    model.match_scores = match_scores
    model.agents_workload = agents_workload

    model.assignments = pe.Var(match_scores.keys(), domain=pe.Binary)
    model.objective = pe.Objective(
            expr=pe.summation(model.match_scores, model.assignments),
            sense=pe.maximize)

    @model.Constraint(model.agents)
    def respect_workload(model, agent):
        return sum(model.assignments[agent, client] * client_time[client] for client in model.clients) <= model.agents_workload[agent]

    @model.Constraint(model.clients)
    def one_agent_per_client(model, client):
        return sum(model.assignments[agent, client] for agent in model.agents) <= 1


    solver = pe.SolverFactory("glpk")
    solver.solve(model)
    sln = [k for k, v in model.assignments.get_values().items() if v == 1.0]
    return sln, sum(match_scores[i] for i in sln)
    

In [1552]:
def solve_matching_greedy(match_scores, client_time, agents_workload):
    agents = agents_workloads.keys()
    clients = client_time.keys()
    matching = sorted(match_scores.items(), key=lambda x: -x[1])
    clients_indicator = {client: 0 for client in clients}
    agents_workload_ = agents_workload.copy()
    sln = []
    for (agent, client), score in matching:
        if clients_indicator[client] == 0 and agents_workload_[agent] >= client_time[client]:
            clients_indicator[client] = 1
            agents_workload_[agent] -= client_time[client]
            sln += [(agent, client)]
    return sln, sum([match_scores[i] for i in sln])



In [1553]:
match_scores, client_time, agents_workload, clients_v, agents_v = generate_matching_problem(agents, clients)

In [1554]:
solve_matching(match_scores, client_time, agents_workload)

([('DeAnna', 'Meredith'),
  ('Andrew', 'Ashley'),
  ('Jesse', 'Jillian'),
  ('DeAnna', 'Emily'),
  ('Jesse', 'Byron'),
  ('Jennifer', 'Bob'),
  ('Alex', 'Trista')],
 7.0)

In [1555]:
solve_matching_greedy(match_scores, client_time, agents_workload)

([('Alex', 'Trista'),
  ('Jennifer', 'Bob'),
  ('Andrew', 'Meredith'),
  ('DeAnna', 'Aaron'),
  ('DeAnna', 'Jillian'),
  ('Jesse', 'Ashley')],
 6.0)

In [1556]:
# the matching optimization problem
class MatchingProblem(Problem):
    def __init__(self, match_scores, client_time, agents_workload, agents_v, clients_v):
        self.match_scores = match_scores
        self.agents_workload = agents_workload
        self.client_time = client_time
        self.features = {(agent, client): np.hstack([agents_v[agent], clients_v[client]]) 
                         for agent, client in match_scores.keys()}


class MatchingProblemModel(ProblemModel):
    def __init__(self, match_scores, weights_known=False):
        if weights_known:
            self.weights = match_scores.copy()
        else:
            self.weights = {i: random.random() for i in match_scores.keys()}
        self.arms = self.weights.keys()  
        
    def get_all_weights(self):
        return self.weights
    
    def set_weights(self, weights):
        for k, v in weights.items():
            self.weights[k] = v
    

class MatchingSolver(Solver):
    def __init__(self):
        pass 
    
    def solve(self, problem, weights):
        return solve_matching(weights, problem.client_time, problem.agents_workload)
        
        
class MatchingOracle(Oracle):
    def __init__(self, match_scores, noise_factor=3.0):
        self.weights = match_scores.copy()
        self.arms = self.weights.keys()  
        self.noise_factor = noise_factor
        
    def get_weight(self, x, noisy=False):
        if noisy:
            return self.weights[x] + (random.random() - 0.5) * self.noise_factor
        else:
            return self.weights[x]
        
    def observe(self, problem, solution, noisy=True):
        return [self.get_weight(x, noisy=noisy) for x in solution]
        
        

In [1590]:
# CombTS
class CombLinTs:
    def __init__(self, problem, p_lambda, p_sigma, solver, oracle):
        self.d = np.array(list(problem.features.values())).shape[1]
        self.p_lambda = p_lambda
        self.p_sigma = p_sigma
        self.sigma = (p_lambda ** 2) * np.eye(self.d)
        self.theta = np.zeros(self.d)
        self.solver = solver
        self.oracle = oracle
        
    def sample_theta(self):
        return np.random.multivariate_normal(self.theta, self.sigma)
    
    def update_params(self, wAt):
        theta = self.theta
        sigma = self.sigma
        
        n = len(wAt)
        
        for k, v in wAt.items():
            f_vec = np.expand_dims(problem.features[k], axis=1)
            t1 = np.matmul(sigma, np.matmul(f_vec, f_vec.T))
            t2 = np.matmul(f_vec.T, np.matmul(sigma, f_vec)) + self.p_sigma ** 2
            t3 = np.matmul(sigma, f_vec)
            t4 = np.matmul(np.matmul(f_vec.T, sigma), f_vec) + self.p_sigma ** 2
            theta = np.matmul((np.eye(sigma.shape[0]) - t1 / t2), theta) + \
                        np.squeeze(t3 / t4) * wAt[k]

            t1 = np.matmul(np.matmul(sigma, np.matmul(f_vec, f_vec.T)), sigma)
            t2 = np.matmul(np.matmul(f_vec.T, sigma), f_vec) + self.p_sigma ** 2
            sigma = sigma - t1/t2
    
        self.theta = theta
        self.sigma = sigma
            
    def bandit_iter(self, problem):
        theta = self.sample_theta()
        At, _ = self.solver.solve(problem, {k: np.dot(v, theta) for k, v in problem.features.items()})
        wAt = self.oracle.observe(problem, At, noisy=True)
        wAt = dict(zip(At, wAt))
        self.update_params(wAt)
        
    def solve(self, problem):
        theta = self.sample_theta()
        sln, _ = self.solver.solve(problem, {k: np.dot(v, theta) for k, v in problem.features.items()})
        return sln

In [1634]:
match_scores, client_time, agents_workload, clients_v, agents_v = generate_matching_problem(agents, clients)
problem = MatchingProblem(match_scores, client_time, agents_workload, agents_v, clients_v)
problem_model = MatchingProblemModel(match_scores, weights_known=False)
oracle = MatchingOracle(match_scores, noise_factor=2)
solver = MatchingSolver()
sln, w = solver.solve(problem, problem_model.get_all_weights())
np.sum(oracle.observe(problem, At, noisy=False))


6.00008340099822

In [1635]:
problem_model = MatchingProblemModel(match_scores, weights_known=True)
sln, w = solver.solve(problem, problem_model.get_all_weights())
np.sum(oracle.observe(problem, sln, noisy=False))


8.0

In [1638]:
p_lambda = 10.
p_sigma = 0.1
features_dim =  10
bandit = CombLinTs(p_lambda=p_lambda,
                     p_sigma=p_sigma, problem=problem, solver=solver, oracle=oracle)

for i in range(100):
    bandit.bandit_iter(problem=problem)
    if i % 10 == 0:
        At = bandit.solve(problem)
        print (np.sum(oracle.observe(problem, At, noisy=False)))

7.0
7.0
8.0
8.0
7.0
7.0
8.0
8.0
8.0
8.0


* scale up all pairs 
* We assumed all weight are random, what if they are normal with different means and variances
* Emperical bayes for new joiners 
* Extend to team
* spending more time is useful
* Streamline with Tensorflow probability

In [1352]:
# TODO: the bandit example
# TODO: drwa gunt chart
# TODO: shortest path 
# TODO: Knapsack

6

You may want to do both selection and schedualing, learn improtrance of each client, start from reasonable wights for faster exporation or add more cool features. Very well, we will go through after we introduce some cool tools. For the time being, lets sharpen our skills by solving 2 other problems.

**The dauntless conclusion**: *Be a modeling super star, dont fear missing info!*

### Bibloigraphy (Text style like in intordction to algorithms)