### Add tuning

# Synthetic dataset creation

In this notebook I create a synthetic dataset to test the proposed bandit algorithm and compare it to benchmarks.
Synthetic dataset will have following features:
- 10,000 time points
- At each time $t$ the environment generates a context vector $x_t$ of dimension 6. Each feature $x_t^{(i)}$ is **categorical**, and lies in range(1,feature_size). This is needed for the `state_extract` function in SimOOS to correctly number all the states.
- Each context vector $x_t$ is associated with a cost vector $c_t$. Cost of each feature $c_t^{i}$ follows a Gaussian distribution with fixed standard deviation and **piecewise-constant** mean lying in range $[0; 0.04]$. Change-points of costs are different from change points of rewards.
- A fixed (for all $t$) set $\mathcal{A}$ of 5 arms are available for the algorithm to choose from.
- Rewards associated with each arm and context follow a Bernoulli distribution, parametrized by $p = \sigma(\bar{x_t}^T\theta_{t,a})$. Here $\bar{x_t}$ is not context observed at time t, but rather it is expected context inside one stationarity interval.  Also $\sigma()$ is the sigmoid function and $\theta_{t,a}$ is the ground truth bandit parameter (not disclosed to the algorithm) associated with each arm $a_i$ at time $t$. Bandit parameter satisfies condition $\lVert\theta_{t,a}\rVert_2 \le 1$.  For each arm $a_i$, parameter $\theta_{t,a}$ is piecewise-constant and thus the corresponding **generating process of rewards is piecewise-stationary** with change points every 2000 steps.
This method of generating rewards from contexts follows [1] and [2], but instead of Gaussian reward with mean $x_t^T\theta_{t,a}$ the reward is Bernoulli with parameter $\sigma(\bar{x_t}^T\theta_{t,a})$

[1] Learning Contextual Bandits in a Non-stationary Environment, Wu et al.

[2] Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests, Xu et al.

In [1]:
import os
os.chdir('../..')
os.getcwd()

'/Users/sbokupripeku/git/work/examples/costly_nonstationary_bandits'

In [2]:
%load_ext autoreload
%autoreload 2
import numpy as np
import matplotlib.pyplot as plt
import time
import math
import pickle

import costs
from plotting.costs import plot_costs

In [3]:
# Set random seeds for reproducibility
np.random.seed(42)

#### Setting important constants

In [4]:
TIME_POINTS = 10000
NUM_FEATURES = 4
N_ARMS = 5

## Loading the synthetic dataset

In [5]:

with open('dataset/synthetic/synthetic_data.pickle', 'rb') as f:
    data = pickle.load(f)

In [6]:
contexts, rewards, costs_vector = data

## Testing algorithms on synthetic data

In [7]:
%load_ext autoreload
%autoreload 2

import algorithms
import evaluation

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [8]:
NUM_OF_TRIALS = 10000
TUNING_NUM_TRIALS = 2500

### Random policy

#### Evaluation

In [9]:
p_random = algorithms.RandomPolicy()
gain_random = evaluation.evaluate_on_synthetic_data(
    p_random,
    contexts,
    rewards,
    costs_vector,
    stop_after=NUM_OF_TRIALS,
)

Random policy
Total gain: 12108.0
	Total reward: 12108.0
	Total cost: 0
Execution time: 0.1s



### $\epsilon$-greedy

#### Tuning

In [10]:
egreedy_gains = {}

for eps in [0.001, 0.005, 0.01, 0.05, 0.08, 0.1, 0.2, 0.3]:
    egreedy = algorithms.EpsilonGreedy(epsilon=eps, n_arms=rewards.shape[1])

    gain_egreedy = evaluation.evaluate_on_synthetic_data(
        egreedy,
        contexts,
        rewards,
        costs_vector,
        stop_after=TUNING_NUM_TRIALS,
    )
    egreedy_gains[eps] = gain_egreedy

E-greedy(epsilon=0.001)
Total gain: 2712.0
	Total reward: 2712.0
	Total cost: 0
Execution time: 0.0s

E-greedy(epsilon=0.005)
Total gain: 2056.0
	Total reward: 2056.0
	Total cost: 0
Execution time: 0.0s

E-greedy(epsilon=0.01)
Total gain: 2864.0
	Total reward: 2864.0
	Total cost: 0
Execution time: 0.0s

E-greedy(epsilon=0.05)
Total gain: 4452.0
	Total reward: 4452.0
	Total cost: 0
Execution time: 0.0s

E-greedy(epsilon=0.08)
Total gain: 4184.0
	Total reward: 4184.0
	Total cost: 0
Execution time: 0.0s

E-greedy(epsilon=0.1)
Total gain: 4704.0
	Total reward: 4704.0
	Total cost: 0
Execution time: 0.0s

E-greedy(epsilon=0.2)
Total gain: 3512.0
	Total reward: 3512.0
	Total cost: 0
Execution time: 0.0s

E-greedy(epsilon=0.3)
Total gain: 4132.0
	Total reward: 4132.0
	Total cost: 0
Execution time: 0.0s



In [11]:
last_gains = {k:v[-1] for k,v in egreedy_gains.items()}
best_eps = sorted(last_gains.items(), key=lambda x: x[1])[-1][0]

In [12]:
best_eps

0.1

In [13]:
del egreedy_gains

#### Evaluation

In [14]:
egreedy = algorithms.EpsilonGreedy(epsilon=best_eps, n_arms=rewards.shape[1])

gain_egreedy = evaluation.evaluate_on_synthetic_data(
    egreedy,
    contexts,
    rewards,
    costs_vector,
    stop_after=NUM_OF_TRIALS,
)

E-greedy(epsilon=0.1)
Total gain: 17496.0
	Total reward: 17496.0
	Total cost: 0
Execution time: 0.1s



### UCB1

#### Tuning

In [15]:
ucb_gains = {}

for alpha in [0.01, 0.05, 0.1, 0.15, 0.2, 0.25, 0.3, 0.35, 0.4, 0.45, 0.5,
                0.9, 1.0, 2.0, 5.0, 8.0, 10.0, 20.0, 30.0, 40.0
             ]:
    ucb_alg = algorithms.UCB1(
        n_trials=TUNING_NUM_TRIALS,
        n_arms=rewards.shape[1],
        alpha=alpha,
    )

    gain_ucb = evaluation.evaluate_on_synthetic_data(
        ucb_alg,
        contexts,
        rewards,
        costs_vector,
        stop_after=TUNING_NUM_TRIALS,
    )
    ucb_gains[alpha] = gain_ucb

UCB1 (α=0.01)
Total gain: 1808.0
	Total reward: 1808.0
	Total cost: 0
Execution time: 0.0s

UCB1 (α=0.05)
Total gain: 1956.0
	Total reward: 1956.0
	Total cost: 0
Execution time: 0.0s

UCB1 (α=0.1)
Total gain: 2472.0
	Total reward: 2472.0
	Total cost: 0
Execution time: 0.0s

UCB1 (α=0.15)
Total gain: 3148.0
	Total reward: 3148.0
	Total cost: 0
Execution time: 0.0s

UCB1 (α=0.2)
Total gain: 3860.0
	Total reward: 3860.0
	Total cost: 0
Execution time: 0.0s

UCB1 (α=0.25)
Total gain: 1948.0
	Total reward: 1948.0
	Total cost: 0
Execution time: 0.0s

UCB1 (α=0.3)
Total gain: 4684.0
	Total reward: 4684.0
	Total cost: 0
Execution time: 0.0s

UCB1 (α=0.35)
Total gain: 4944.0
	Total reward: 4944.0
	Total cost: 0
Execution time: 0.0s

UCB1 (α=0.4)
Total gain: 4436.0
	Total reward: 4436.0
	Total cost: 0
Execution time: 0.0s

UCB1 (α=0.45)
Total gain: 4652.0
	Total reward: 4652.0
	Total cost: 0
Execution time: 0.0s

UCB1 (α=0.5)
Total gain: 4820.0
	Total reward: 4820.0
	Total cost: 0
Execution time:

In [16]:
last_gains = {k:v[-1] for k,v in ucb_gains.items()}
best_alpha_ucb = sorted(last_gains.items(), key=lambda x: x[1])[-1][0]

In [17]:
best_alpha_ucb

0.35

In [18]:
del ucb_gains

#### Evaluation

In [19]:
ucb_alg = algorithms.UCB1(
        n_trials=NUM_OF_TRIALS,
        n_arms=rewards.shape[1],
        alpha=best_alpha_ucb,
    )

gain_ucb = evaluation.evaluate_on_synthetic_data(
    ucb_alg,
    contexts,
    rewards,
    costs_vector,
    stop_after=NUM_OF_TRIALS,
)

UCB1 (α=0.35)
Total gain: 16356.0
	Total reward: 16356.0
	Total cost: 0
Execution time: 0.1s



### LinUCB

#### Tuning

In [20]:
linucb_gains = {}

for alpha in [0.01, 0.05, 0.1, 0.2, 0.3, 0.35, 0.4, 0.5, 1.0, 5.0, 8.0, 20.0, 30.0, 40.0, 50.0, 60.0
             ]:
    linucb = algorithms.LinUCB(
        n_trials=TUNING_NUM_TRIALS,
        context_dimension=contexts.shape[1],
        n_arms=rewards.shape[1],
        alpha=alpha,
    )
    gain_linucb = evaluation.evaluate_on_synthetic_data(
        linucb,
        contexts,
        rewards,
        costs_vector,
        stop_after=TUNING_NUM_TRIALS,
    )
    linucb_gains[alpha] = gain_linucb

LinUCB (alpha=0.01)
Total gain: -1019.0552734886533
	Total reward: 1808.0
	Total cost: 2827.055273488652
Execution time: 0.1s

LinUCB (alpha=0.05)
Total gain: -1027.0552734886523
	Total reward: 1800.0
	Total cost: 2827.055273488652
Execution time: 0.1s

LinUCB (alpha=0.1)
Total gain: 1372.944726511348
	Total reward: 4200.0
	Total cost: 2827.055273488652
Execution time: 0.2s

LinUCB (alpha=0.2)
Total gain: 648.9447265113465
	Total reward: 3476.0
	Total cost: 2827.055273488652
Execution time: 0.2s

LinUCB (alpha=0.3)
Total gain: 952.9447265113458
	Total reward: 3780.0
	Total cost: 2827.055273488652
Execution time: 0.1s

LinUCB (alpha=0.35)
Total gain: 740.9447265113465
	Total reward: 3568.0
	Total cost: 2827.055273488652
Execution time: 0.1s

LinUCB (alpha=0.4)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total cost: 2827.055273488652
Execution time: 0.1s

LinUCB (alpha=0.5)
Total gain: 424.94472651134635
	Total reward: 3252.0
	Total cost: 2827.055273488652
Execution time: 0.1s



In [21]:
last_gains = {k:v[-1] for k,v in linucb_gains.items()}
best_alpha_linucb = sorted(last_gains.items(), key=lambda x: x[1])[-1][0]

In [22]:
best_alpha_linucb

20.0

In [23]:
del linucb_gains

#### Evaluation

In [24]:
linucb = algorithms.LinUCB(
    n_trials=NUM_OF_TRIALS,
    context_dimension=contexts.shape[1],
    n_arms=rewards.shape[1],
    alpha=best_alpha_linucb,
)
gain_linucb = evaluation.evaluate_on_synthetic_data(
    linucb,
    contexts,
    rewards,
    costs_vector,
    stop_after=NUM_OF_TRIALS,
)

LinUCB (alpha=20.0)
Total gain: 11913.463821340843
	Total reward: 24600.0
	Total cost: 12686.536178659138
Execution time: 0.6s



### PS-LinUCB

#### Tuning

In [25]:
ps_linucb_gains = {}

for alpha in [0.01, 0.05, 0.1, 0.2, 0.3, 0.35, 0.4, 0.5, 1.0, 5.0, 8.0, 20.0, 30.0, 40.0]:
    for omega in [100, 150,  250, 500, 750, 1000, 1500, 2000]:
        for delta in [0.001, 0.003, 0.004, 0.005, 0.006, 0.007, 0.01, 0.03, 0.05, 0.1, 0.15, 0.2, ]:
            ps_linucb = algorithms.PSLinUCB(
                n_trials=TUNING_NUM_TRIALS,
                context_dimension=contexts.shape[1],
                n_arms=rewards.shape[1],
                alpha=alpha,
                omega=omega,
                delta=delta,
            )


            gain_pslinucb = evaluation.evaluate_on_synthetic_data(
                ps_linucb,
                contexts,
                rewards,
                costs_vector,
                stop_after=TUNING_NUM_TRIALS,
            )
            
            ps_linucb_gains[(alpha, omega, delta)] = gain_pslinucb

PSLinUCB (alpha=0.01, omega=100, delta=0.001)
Total gain: 1856.9447265113467
	Total reward: 4684.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.01, omega=100, delta=0.003)
Total gain: 1856.9447265113467
	Total reward: 4684.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.01, omega=100, delta=0.004)
Total gain: 1856.9447265113467
	Total reward: 4684.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.01, omega=100, delta=0.005)
Total gain: 1856.9447265113467
	Total reward: 4684.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.01, omega=100, delta=0.006)
Total gain: 1856.9447265113467
	Total reward: 4684.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.01, omega=100, delta=0.007)
Total gain: 1856.9447265113467
	Total reward: 4684.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.01, omega=100, delta=0.01)
Total gain: 1856.9447265113467
	Total rewar

PSLinUCB (alpha=0.01, omega=750, delta=0.03)
Total gain: -1019.0552734886533
	Total reward: 1808.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.01, omega=750, delta=0.05)
Total gain: -1019.0552734886533
	Total reward: 1808.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.01, omega=750, delta=0.1)
Total gain: -1019.0552734886533
	Total reward: 1808.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.01, omega=750, delta=0.15)
Total gain: -1019.0552734886533
	Total reward: 1808.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.01, omega=750, delta=0.2)
Total gain: -1019.0552734886533
	Total reward: 1808.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.01, omega=1000, delta=0.001)
Total gain: -1023.0552734886528
	Total reward: 1804.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.01, omega=1000, delta=0.003)
Total gain: -1023.0552734886528
	Total re

PSLinUCB (alpha=0.05, omega=150, delta=0.003)
Total gain: 1764.9447265113479
	Total reward: 4592.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.05, omega=150, delta=0.004)
Total gain: 1764.9447265113479
	Total reward: 4592.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.05, omega=150, delta=0.005)
Total gain: 1764.9447265113479
	Total reward: 4592.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.05, omega=150, delta=0.006)
Total gain: 1764.9447265113479
	Total reward: 4592.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.05, omega=150, delta=0.007)
Total gain: 1764.9447265113479
	Total reward: 4592.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.05, omega=150, delta=0.01)
Total gain: 1764.9447265113479
	Total reward: 4592.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.05, omega=150, delta=0.03)
Total gain: 2116.944726511348
	Total reward:

PSLinUCB (alpha=0.05, omega=1000, delta=0.05)
Total gain: -159.0552734886517
	Total reward: 2668.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.05, omega=1000, delta=0.1)
Total gain: -159.0552734886517
	Total reward: 2668.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.05, omega=1000, delta=0.15)
Total gain: -159.0552734886517
	Total reward: 2668.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.05, omega=1000, delta=0.2)
Total gain: -159.0552734886517
	Total reward: 2668.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.05, omega=1500, delta=0.001)
Total gain: -1027.0552734886523
	Total reward: 1800.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.05, omega=1500, delta=0.003)
Total gain: -1027.0552734886523
	Total reward: 1800.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.05, omega=1500, delta=0.004)
Total gain: -1027.0552734886523
	Total 

PSLinUCB (alpha=0.1, omega=250, delta=0.006)
Total gain: 1692.9447265113474
	Total reward: 4520.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.1, omega=250, delta=0.007)
Total gain: 1692.9447265113474
	Total reward: 4520.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.1, omega=250, delta=0.01)
Total gain: 1692.9447265113474
	Total reward: 4520.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.1, omega=250, delta=0.03)
Total gain: 1712.9447265113474
	Total reward: 4540.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.1, omega=250, delta=0.05)
Total gain: 1680.9447265113477
	Total reward: 4508.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.1, omega=250, delta=0.1)
Total gain: 1460.9447265113479
	Total reward: 4288.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.1, omega=250, delta=0.15)
Total gain: 1692.9447265113472
	Total reward: 4520.0
	T

PSLinUCB (alpha=0.1, omega=1500, delta=0.2)
Total gain: 1372.944726511348
	Total reward: 4200.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.1, omega=2000, delta=0.001)
Total gain: 1372.944726511348
	Total reward: 4200.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.1, omega=2000, delta=0.003)
Total gain: 1372.944726511348
	Total reward: 4200.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.1, omega=2000, delta=0.004)
Total gain: 1372.944726511348
	Total reward: 4200.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.1, omega=2000, delta=0.005)
Total gain: 1372.944726511348
	Total reward: 4200.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.1, omega=2000, delta=0.006)
Total gain: 1372.944726511348
	Total reward: 4200.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.1, omega=2000, delta=0.007)
Total gain: 1372.944726511348
	Total reward: 4200.

PSLinUCB (alpha=0.2, omega=500, delta=0.03)
Total gain: 796.9447265113463
	Total reward: 3624.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.2, omega=500, delta=0.05)
Total gain: 796.9447265113463
	Total reward: 3624.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.2, omega=500, delta=0.1)
Total gain: 796.9447265113463
	Total reward: 3624.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.2, omega=500, delta=0.15)
Total gain: 796.9447265113463
	Total reward: 3624.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.2, omega=500, delta=0.2)
Total gain: 796.9447265113463
	Total reward: 3624.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.2, omega=750, delta=0.001)
Total gain: 648.9447265113465
	Total reward: 3476.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.2, omega=750, delta=0.003)
Total gain: 648.9447265113465
	Total reward: 3476.0
	Total cos

PSLinUCB (alpha=0.3, omega=100, delta=0.005)
Total gain: 2420.94472651135
	Total reward: 5248.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.3, omega=100, delta=0.006)
Total gain: 2420.94472651135
	Total reward: 5248.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.3, omega=100, delta=0.007)
Total gain: 2420.94472651135
	Total reward: 5248.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.3, omega=100, delta=0.01)
Total gain: 2420.94472651135
	Total reward: 5248.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.3, omega=100, delta=0.03)
Total gain: 2420.94472651135
	Total reward: 5248.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.3, omega=100, delta=0.05)
Total gain: 2576.94472651135
	Total reward: 5404.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.3, omega=100, delta=0.1)
Total gain: 2324.9447265113504
	Total reward: 5152.0
	Total cost: 

PSLinUCB (alpha=0.3, omega=750, delta=0.15)
Total gain: 952.9447265113458
	Total reward: 3780.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.3, omega=750, delta=0.2)
Total gain: 952.9447265113458
	Total reward: 3780.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.3, omega=1000, delta=0.001)
Total gain: 952.9447265113458
	Total reward: 3780.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.3, omega=1000, delta=0.003)
Total gain: 952.9447265113458
	Total reward: 3780.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.3, omega=1000, delta=0.004)
Total gain: 952.9447265113458
	Total reward: 3780.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.3, omega=1000, delta=0.005)
Total gain: 952.9447265113458
	Total reward: 3780.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.3, omega=1000, delta=0.006)
Total gain: 952.9447265113458
	Total reward: 3780.0
	

PSLinUCB (alpha=0.35, omega=150, delta=0.007)
Total gain: 1132.9447265113463
	Total reward: 3960.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.35, omega=150, delta=0.01)
Total gain: 1132.9447265113463
	Total reward: 3960.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.35, omega=150, delta=0.03)
Total gain: 1048.944726511347
	Total reward: 3876.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.35, omega=150, delta=0.05)
Total gain: 1620.9447265113472
	Total reward: 4448.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.35, omega=150, delta=0.1)
Total gain: 1796.9447265113472
	Total reward: 4624.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.35, omega=150, delta=0.15)
Total gain: 1180.9447265113465
	Total reward: 4008.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.35, omega=150, delta=0.2)
Total gain: 1244.9447265113465
	Total reward: 4072.

PSLinUCB (alpha=0.35, omega=1500, delta=0.003)
Total gain: 740.9447265113465
	Total reward: 3568.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.35, omega=1500, delta=0.004)
Total gain: 740.9447265113465
	Total reward: 3568.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.35, omega=1500, delta=0.005)
Total gain: 740.9447265113465
	Total reward: 3568.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.35, omega=1500, delta=0.006)
Total gain: 740.9447265113465
	Total reward: 3568.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.35, omega=1500, delta=0.007)
Total gain: 740.9447265113465
	Total reward: 3568.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.35, omega=1500, delta=0.01)
Total gain: 740.9447265113465
	Total reward: 3568.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.35, omega=1500, delta=0.03)
Total gain: 740.9447265113465
	Total reward

PSLinUCB (alpha=0.4, omega=250, delta=0.1)
Total gain: 876.9447265113461
	Total reward: 3704.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.4, omega=250, delta=0.15)
Total gain: 876.9447265113461
	Total reward: 3704.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.4, omega=250, delta=0.2)
Total gain: 780.9447265113467
	Total reward: 3608.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.4, omega=500, delta=0.001)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.4, omega=500, delta=0.003)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.4, omega=500, delta=0.004)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.4, omega=500, delta=0.005)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total c

PSLinUCB (alpha=0.4, omega=2000, delta=0.006)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.4, omega=2000, delta=0.007)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.4, omega=2000, delta=0.01)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.4, omega=2000, delta=0.03)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.4, omega=2000, delta=0.05)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.4, omega=2000, delta=0.1)
Total gain: 696.9447265113467
	Total reward: 3524.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.4, omega=2000, delta=0.15)
Total gain: 696.9447265113467
	Total reward: 3524.0
	T

PSLinUCB (alpha=0.5, omega=750, delta=0.001)
Total gain: 424.94472651134635
	Total reward: 3252.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.5, omega=750, delta=0.003)
Total gain: 424.94472651134635
	Total reward: 3252.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.5, omega=750, delta=0.004)
Total gain: 424.94472651134635
	Total reward: 3252.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.5, omega=750, delta=0.005)
Total gain: 424.94472651134635
	Total reward: 3252.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.5, omega=750, delta=0.006)
Total gain: 424.94472651134635
	Total reward: 3252.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=0.5, omega=750, delta=0.007)
Total gain: 424.94472651134635
	Total reward: 3252.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=0.5, omega=750, delta=0.01)
Total gain: 424.94472651134635
	Total reward: 3252

PSLinUCB (alpha=1.0, omega=100, delta=0.03)
Total gain: 3168.944726511347
	Total reward: 5996.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=1.0, omega=100, delta=0.05)
Total gain: 3196.944726511347
	Total reward: 6024.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=1.0, omega=100, delta=0.1)
Total gain: 2988.944726511347
	Total reward: 5816.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=1.0, omega=100, delta=0.15)
Total gain: 2668.944726511349
	Total reward: 5496.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=1.0, omega=100, delta=0.2)
Total gain: 3284.944726511348
	Total reward: 6112.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=1.0, omega=150, delta=0.001)
Total gain: 1364.944726511346
	Total reward: 4192.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=1.0, omega=150, delta=0.003)
Total gain: 1364.944726511346
	Total reward: 4192.0
	Total cos

PSLinUCB (alpha=1.0, omega=1000, delta=0.004)
Total gain: 412.9447265113458
	Total reward: 3240.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=1.0, omega=1000, delta=0.005)
Total gain: 412.9447265113458
	Total reward: 3240.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=1.0, omega=1000, delta=0.006)
Total gain: 412.9447265113458
	Total reward: 3240.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=1.0, omega=1000, delta=0.007)
Total gain: 412.9447265113458
	Total reward: 3240.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=1.0, omega=1000, delta=0.01)
Total gain: 412.9447265113458
	Total reward: 3240.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=1.0, omega=1000, delta=0.03)
Total gain: 412.9447265113458
	Total reward: 3240.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=1.0, omega=1000, delta=0.05)
Total gain: 412.9447265113458
	Total reward: 3240.0

PSLinUCB (alpha=5.0, omega=150, delta=0.1)
Total gain: 3060.944726511347
	Total reward: 5888.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=5.0, omega=150, delta=0.15)
Total gain: 2996.9447265113463
	Total reward: 5824.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=5.0, omega=150, delta=0.2)
Total gain: 2940.944726511347
	Total reward: 5768.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=5.0, omega=250, delta=0.001)
Total gain: 2508.944726511347
	Total reward: 5336.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=5.0, omega=250, delta=0.003)
Total gain: 2508.944726511347
	Total reward: 5336.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=5.0, omega=250, delta=0.004)
Total gain: 2508.944726511347
	Total reward: 5336.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=5.0, omega=250, delta=0.005)
Total gain: 2508.944726511347
	Total reward: 5336.0
	Total 

PSLinUCB (alpha=5.0, omega=1500, delta=0.006)
Total gain: 2016.9447265113456
	Total reward: 4844.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=5.0, omega=1500, delta=0.007)
Total gain: 2016.9447265113456
	Total reward: 4844.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=5.0, omega=1500, delta=0.01)
Total gain: 2016.9447265113456
	Total reward: 4844.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=5.0, omega=1500, delta=0.03)
Total gain: 2016.9447265113456
	Total reward: 4844.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=5.0, omega=1500, delta=0.05)
Total gain: 2016.9447265113456
	Total reward: 4844.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=5.0, omega=1500, delta=0.1)
Total gain: 2016.9447265113456
	Total reward: 4844.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=5.0, omega=1500, delta=0.15)
Total gain: 2016.9447265113456
	Total reward: 48

PSLinUCB (alpha=8.0, omega=250, delta=0.2)
Total gain: 2652.9447265113463
	Total reward: 5480.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=8.0, omega=500, delta=0.001)
Total gain: 2144.9447265113463
	Total reward: 4972.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=8.0, omega=500, delta=0.003)
Total gain: 2144.9447265113463
	Total reward: 4972.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=8.0, omega=500, delta=0.004)
Total gain: 2144.9447265113463
	Total reward: 4972.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=8.0, omega=500, delta=0.005)
Total gain: 2144.9447265113463
	Total reward: 4972.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=8.0, omega=500, delta=0.006)
Total gain: 2144.9447265113463
	Total reward: 4972.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=8.0, omega=500, delta=0.007)
Total gain: 2144.9447265113463
	Total reward: 4972.

PSLinUCB (alpha=8.0, omega=2000, delta=0.03)
Total gain: 2024.9447265113456
	Total reward: 4852.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=8.0, omega=2000, delta=0.05)
Total gain: 2024.9447265113456
	Total reward: 4852.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=8.0, omega=2000, delta=0.1)
Total gain: 2024.9447265113456
	Total reward: 4852.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=8.0, omega=2000, delta=0.15)
Total gain: 2024.9447265113456
	Total reward: 4852.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=8.0, omega=2000, delta=0.2)
Total gain: 2024.9447265113456
	Total reward: 4852.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=20.0, omega=100, delta=0.001)
Total gain: 3136.9447265113463
	Total reward: 5964.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=20.0, omega=100, delta=0.003)
Total gain: 3136.9447265113463
	Total reward: 596

PSLinUCB (alpha=20.0, omega=750, delta=0.004)
Total gain: 2388.944726511347
	Total reward: 5216.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=20.0, omega=750, delta=0.005)
Total gain: 2388.944726511347
	Total reward: 5216.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=20.0, omega=750, delta=0.006)
Total gain: 2388.944726511347
	Total reward: 5216.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=20.0, omega=750, delta=0.007)
Total gain: 2388.944726511347
	Total reward: 5216.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=20.0, omega=750, delta=0.01)
Total gain: 2388.944726511347
	Total reward: 5216.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=20.0, omega=750, delta=0.03)
Total gain: 2388.944726511347
	Total reward: 5216.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=20.0, omega=750, delta=0.05)
Total gain: 2388.944726511347
	Total reward: 5216.0

PSLinUCB (alpha=30.0, omega=100, delta=0.1)
Total gain: 3032.944726511347
	Total reward: 5860.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=30.0, omega=100, delta=0.15)
Total gain: 3008.9447265113463
	Total reward: 5836.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=30.0, omega=100, delta=0.2)
Total gain: 3060.944726511347
	Total reward: 5888.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=30.0, omega=150, delta=0.001)
Total gain: 2628.9447265113454
	Total reward: 5456.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=30.0, omega=150, delta=0.003)
Total gain: 2628.9447265113454
	Total reward: 5456.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=30.0, omega=150, delta=0.004)
Total gain: 2628.9447265113454
	Total reward: 5456.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=30.0, omega=150, delta=0.005)
Total gain: 2628.9447265113454
	Total reward: 545

PSLinUCB (alpha=30.0, omega=1000, delta=0.006)
Total gain: 2084.944726511347
	Total reward: 4912.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=30.0, omega=1000, delta=0.007)
Total gain: 2084.944726511347
	Total reward: 4912.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=30.0, omega=1000, delta=0.01)
Total gain: 2084.944726511347
	Total reward: 4912.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=30.0, omega=1000, delta=0.03)
Total gain: 2084.944726511347
	Total reward: 4912.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=30.0, omega=1000, delta=0.05)
Total gain: 2084.944726511347
	Total reward: 4912.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=30.0, omega=1000, delta=0.1)
Total gain: 2084.944726511347
	Total reward: 4912.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=30.0, omega=1000, delta=0.15)
Total gain: 2084.944726511347
	Total reward: 49

PSLinUCB (alpha=40.0, omega=150, delta=0.2)
Total gain: 2492.9447265113463
	Total reward: 5320.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=40.0, omega=250, delta=0.001)
Total gain: 2356.944726511347
	Total reward: 5184.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=40.0, omega=250, delta=0.003)
Total gain: 2356.944726511347
	Total reward: 5184.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=40.0, omega=250, delta=0.004)
Total gain: 2356.944726511347
	Total reward: 5184.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=40.0, omega=250, delta=0.005)
Total gain: 2356.944726511347
	Total reward: 5184.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=40.0, omega=250, delta=0.006)
Total gain: 2356.944726511347
	Total reward: 5184.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=40.0, omega=250, delta=0.007)
Total gain: 2356.944726511347
	Total reward: 5184

PSLinUCB (alpha=40.0, omega=1500, delta=0.03)
Total gain: 2112.9447265113504
	Total reward: 4940.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=40.0, omega=1500, delta=0.05)
Total gain: 2112.9447265113504
	Total reward: 4940.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=40.0, omega=1500, delta=0.1)
Total gain: 2112.9447265113504
	Total reward: 4940.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=40.0, omega=1500, delta=0.15)
Total gain: 2112.9447265113504
	Total reward: 4940.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=40.0, omega=1500, delta=0.2)
Total gain: 2112.9447265113504
	Total reward: 4940.0
	Total cost: 2827.055273488652
Execution time: 0.2s

PSLinUCB (alpha=40.0, omega=2000, delta=0.001)
Total gain: 2112.9447265113504
	Total reward: 4940.0
	Total cost: 2827.055273488652
Execution time: 0.1s

PSLinUCB (alpha=40.0, omega=2000, delta=0.003)
Total gain: 2112.9447265113504
	Total rewa

In [26]:
last_gains = {k:v[-1] for k,v in ps_linucb_gains.items()}
best_alpha_ps_linucb, best_omega_ps_linucb, best_delta_ps_linucb  = sorted(
    last_gains.items(), key=lambda x: x[1]
)[-1][0]

In [27]:
best_alpha_ps_linucb, best_omega_ps_linucb, best_delta_ps_linucb

(5.0, 100, 0.01)

In [28]:
del ps_linucb_gains

#### Evaluation

In [29]:
ps_linucb = algorithms.PSLinUCB(
    n_trials=NUM_OF_TRIALS,
    context_dimension=contexts.shape[1],
    n_arms=rewards.shape[1],
    alpha=best_alpha_ps_linucb,
    omega=best_omega_ps_linucb,
    delta=best_delta_ps_linucb,
)


gain_pslinucb = evaluation.evaluate_on_synthetic_data(
    ps_linucb,
    contexts,
    rewards,
    costs_vector,
    stop_after=NUM_OF_TRIALS,
)
change_points = ps_linucb.change_points

PSLinUCB (alpha=5.0, omega=100, delta=0.01)
Total gain: 17737.463821340938
	Total reward: 30424.0
	Total cost: 12686.536178659138
Execution time: 0.6s



### SimOOS

#### Tuning

In [31]:
simoos_gains = {}

for beta in [0.1, 0.3, 0.5, 0.7, 1.0, 3.0, 5.0, 8.0]:
    for delta in [0.005, 0.01, 0.05, 0.1, 0.5, 0.8, 1.0, 5.0]:
        s = time.time()
        p_simoos = algorithms.SimOOSAlgorithm(
            all_contexts=contexts[:TUNING_NUM_TRIALS], 
            number_of_actions=rewards.shape[1],
            max_no_red_context=contexts.shape[1],
            beta_SimOOS=beta,
            delta_SimOOS=delta,
        )
        print(f"Took {time.time() - s} seconds")

        import warnings
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")
            s = time.time()
            gain_simoos = evaluation.evaluate_on_synthetic_data(
                p_simoos,
                contexts[:TUNING_NUM_TRIALS],
                rewards[:TUNING_NUM_TRIALS],
                costs_vector[:TUNING_NUM_TRIALS],
                stop_after=TUNING_NUM_TRIALS,
            )
            print(f"Took {time.time() - s} seconds")

        simoos_gains[(beta, delta)] = gain_simoos

Took 0.0008878707885742188 seconds
Trial 0, time 2022-04-21 12:47:58.796595
Trial 500, time 2022-04-21 12:48:15.678465
Trial 1000, time 2022-04-21 12:48:24.489415
Trial 1500, time 2022-04-21 12:48:29.055907
Trial 2000, time 2022-04-21 12:48:30.998863
SimOOS (beta=0.1, delta=0.005)
Total gain: 6686.675610743825
	Total reward: 8976.0
	Total cost: 2289.324389256159
Execution time: 32.5s

Took 32.51989674568176 seconds
Took 0.001264810562133789 seconds
Trial 0, time 2022-04-21 12:48:31.317603
Trial 500, time 2022-04-21 12:48:38.546107
Trial 1000, time 2022-04-21 12:48:38.731228
Trial 1500, time 2022-04-21 12:48:38.861593
Trial 2000, time 2022-04-21 12:48:38.931128
SimOOS (beta=0.1, delta=0.01)
Total gain: 9202.61342412426
	Total reward: 9520.0
	Total cost: 317.3865758757409
Execution time: 8.1s

Took 8.07958197593689 seconds
Took 0.0011279582977294922 seconds
Trial 0, time 2022-04-21 12:48:39.398123
Trial 500, time 2022-04-21 12:48:44.179319
Trial 1000, time 2022-04-21 12:48:44.447983
Tria

Trial 500, time 2022-04-21 12:53:00.796751
Trial 1000, time 2022-04-21 12:53:00.920350
Trial 1500, time 2022-04-21 12:53:01.044009
Trial 2000, time 2022-04-21 12:53:01.110884
SimOOS (beta=0.5, delta=0.5)
Total gain: 9686.953642035478
	Total reward: 9696.0
	Total cost: 9.04635796452224
Execution time: 1.7s

Took 1.7011687755584717 seconds
Took 0.0008499622344970703 seconds
Trial 0, time 2022-04-21 12:53:01.232431
Trial 500, time 2022-04-21 12:53:02.504604
Trial 1000, time 2022-04-21 12:53:02.625267
Trial 1500, time 2022-04-21 12:53:02.747310
Trial 2000, time 2022-04-21 12:53:02.815533
SimOOS (beta=0.5, delta=0.8)
Total gain: 9654.953642035478
	Total reward: 9664.0
	Total cost: 9.04635796452224
Execution time: 1.7s

Took 1.703516960144043 seconds
Took 0.0009982585906982422 seconds
Trial 0, time 2022-04-21 12:53:02.936996
Trial 500, time 2022-04-21 12:53:19.513118
Trial 1000, time 2022-04-21 12:53:25.859320
Trial 1500, time 2022-04-21 12:53:29.006989
Trial 2000, time 2022-04-21 12:53:30.8

KeyboardInterrupt: 

In [None]:
last_gains = {k:v[-1] for k,v in simoos_gains.items()}
best_beta_simoos, best_delta_simoos  = sorted(
    last_gains.items(), key=lambda x: x[1]
)[-1][0]

In [None]:
best_beta_simoos, best_delta_simoos

In [None]:
del simoos_gains

#### Evaluation

In [None]:
s = time.time()
p_simoos = algorithms.SimOOSAlgorithm(
    all_contexts=contexts, 
    number_of_actions=rewards.shape[1],
    max_no_red_context=contexts.shape[1],
    beta_SimOOS=best_beta_simoos,
    delta_SimOOS=best_delta_simoos,
)
print(f"Took {time.time() - s} seconds")

import warnings
with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    s = time.time()
    gain_simoos = evaluation.evaluate_on_synthetic_data(
        p_simoos,
        contexts[:NUM_OF_TRIALS],
        rewards[:NUM_OF_TRIALS],
        costs_vector[:NUM_OF_TRIALS],
        stop_after=NUM_OF_TRIALS,
    )
    print(f"Took {time.time() - s} seconds")


### Algorithm1

#### Tuning

In [None]:
# Tuning of algorithm1 is done separately in script, namely in /scripts/3_evaluate_alg1_on_synthetic.py

In [None]:
# alg1_gains = {}

# for beta in [1.0]:
#     for delta in  [0.1, 0.15]:
#         for window in [100, 500, 1000, 1250]:
#             s = time.time()
#             p_alg1 = algorithms.Algorithm1(
#                 all_contexts=contexts[:TUNING_NUM_TRIALS], 
#                 number_of_actions=rewards.shape[1],
#                 max_no_red_context=contexts.shape[1],
#                 beta=beta,
#                 delta=delta,
#                 window_length=window,
#             )
#             print(f"Took {time.time() - s} seconds")

#             import warnings
#             with warnings.catch_warnings():
#                 warnings.simplefilter("ignore")
#                 s = time.time()
#                 gain_alg1 = evaluation.evaluate_on_synthetic_data(
#                     p_alg1,
#                     contexts[:TUNING_NUM_TRIALS],
#                     rewards[:TUNING_NUM_TRIALS],
#                     costs_vector[:TUNING_NUM_TRIALS],
#                     stop_after=TUNING_NUM_TRIALS,
#                 )
#                 print(f"Took {time.time() - s} seconds")
#             alg1_gains[(beta, delta, window)] = gain_alg1
            

In [None]:
# last_gains = {k:v[-1] for k,v in alg1_gains.items()}
# best_beta_alg1, best_delta_alg1, best_window_alg1  = sorted(
#     last_gains.items(), key=lambda x: x[1]
# )[-1][0]

In [None]:
# sorted(
#     last_gains.items(), key=lambda x: x[1]
# )

In [None]:
# best_beta_alg1, best_delta_alg1, best_window_alg1

In [None]:
# del alg1_gains

#### Evaluation

In [None]:
# s = time.time()
# p_alg1 = algorithms.Algorithm1(
#     all_contexts=contexts, 
#     number_of_actions=rewards.shape[1],
#     max_no_red_context=contexts.shape[1],
#     beta=best_beta_alg1,
#     delta=best_delta_alg1,
#     window_length=best_window_alg1,
# )
# print(f"Took {time.time() - s} seconds")

# import warnings
# with warnings.catch_warnings():
#     warnings.simplefilter("ignore")
#     s = time.time()
#     gain_alg1 = evaluation.evaluate_on_synthetic_data(
#         p_alg1,
#         contexts[:NUM_OF_TRIALS],
#         rewards[:NUM_OF_TRIALS],
#         costs_vector[:NUM_OF_TRIALS],
#         stop_after=NUM_OF_TRIALS,
#     )
#     print(f"Took {time.time() - s} seconds")


### Plot cumulative gain

In [None]:
def plot_gains(gain_dict, change_points=()):
    fig, ax = plt.subplots(1,1, figsize=(16, 8));
        
    max_vline=0
    
    for label, gain in gain_dict.items():
        ax.plot(gain, label=label)
        max_vline=max(max_vline, max(gain))
    ax.set_xlabel('Trial')
    ax.set_ylabel('Cumulative gain')
    ax.set_title('Evaluation on synthetic data')
    
#     ax.vlines(change_points, 0, max_vline, label='change points', color='pink')

    plt.legend();
    plt.show();

In [None]:
plot_gains({
    'Random': gain_random,
    'E-greedy': gain_egreedy,
    'UCB1': gain_ucb,
    'LinUCB': gain_linucb,
    'PS-LinUCB': gain_pslinucb,
    'SimOOS': gain_simoos,
#     'Algorithm1': gain_alg1,
},
)
