Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bandits regressors for model selection (new PR to use Github CI/CD) #397

Merged
merged 30 commits into from Jan 4, 2021
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
da7a823
[WIP] first layout for bandits classes
etiennekintzler Nov 15, 2020
13154a6
improve docstring, rm class not in PR, clean some
etiennekintzler Nov 22, 2020
b0fefc6
delete nb from pr, clean some
etiennekintzler Nov 22, 2020
e128e61
enumerate all parameters in __init__, fix \epsilon
etiennekintzler Nov 23, 2020
a50693f
align on convention: single quote and import
etiennekintzler Nov 23, 2020
22c7923
rm print_every, change print_info->__repr__, skip line after class
etiennekintzler Nov 23, 2020
afe1cac
substitute stdlib for numpy
etiennekintzler Nov 23, 2020
3c25c93
rm metrics tracing, add _learn_one for powerusers
etiennekintzler Nov 24, 2020
860e779
forget to rm object tracing in class __init__ signature
etiennekintzler Nov 24, 2020
9fe14f1
add type to models, _default_params, use '+=' for list append
etiennekintzler Nov 24, 2020
f0b1b6c
change parameters in _default_params
etiennekintzler Nov 25, 2020
9cb720c
intercept_lr instead of lr in LinearRegression
etiennekintzler Nov 25, 2020
d3169dc
mv argmax to utils.math
etiennekintzler Nov 25, 2020
c7b20ca
fix mistake: EpsilonGreedyRessor didnt inherit from base.Regressor
etiennekintzler Nov 25, 2020
bf1fcf5
Merge branch 'master' into bandits_regressors_PR
etiennekintzler Nov 26, 2020
35d37f8
add seed arg for reproducibility/tests
etiennekintzler Nov 29, 2020
0ba1cc1
add typing for seed parameter, rm seed from _default_params
etiennekintzler Dec 1, 2020
02097ae
first draft for docstring's example
etiennekintzler Dec 11, 2020
a954fb7
raw: sigmoid scaler, warm_up, mv explore_each_arm in Bandit,cut Examp…
etiennekintzler Dec 19, 2020
af32a24
Merge branch 'master' into bandits_regressors_PR
etiennekintzler Dec 19, 2020
7bea5bb
chg classmethod's name _default_params to _unit_test_params, fix star…
etiennekintzler Dec 19, 2020
513a982
run black, test commit hook
etiennekintzler Dec 21, 2020
929c035
more docstring; add randomize argmax, default value for metric; rm _n…
etiennekintzler Dec 24, 2020
99926a0
possibility to add function with seed in argmax
etiennekintzler Dec 27, 2020
69b0d11
fix docstring Example output
etiennekintzler Dec 27, 2020
8e6963a
small modif to docstring
etiennekintzler Dec 27, 2020
14420ef
Merge branch 'master' into bandits_regressors_PR
etiennekintzler Dec 27, 2020
8599730
shorten some lines
etiennekintzler Dec 27, 2020
0cb664e
docstring effort (not finished), rm c=1 parameter from _compute_scale…
etiennekintzler Dec 28, 2020
8d5c0a1
cosmetic
etiennekintzler Jan 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
4 changes: 4 additions & 0 deletions river/expert/__init__.py
Expand Up @@ -16,15 +16,19 @@

"""

from .bandit import EpsilonGreedyRegressor
from .bandit import UCBRegressor
from .ewa import EWARegressor
from .sh import SuccessiveHalvingClassifier
from .sh import SuccessiveHalvingRegressor
from .stacking import StackingClassifier


__all__ = [
"EpsilonGreedyRegressor",
"EWARegressor",
"SuccessiveHalvingClassifier",
"SuccessiveHalvingRegressor",
"StackingClassifier",
"UCBRegressor",
]
358 changes: 358 additions & 0 deletions river/expert/bandit.py
@@ -0,0 +1,358 @@
import abc
import copy
import math
import random
import typing

from river import base
from river import compose
from river import linear_model
from river import metrics
from river import optim
from river import preprocessing
from river import utils


__all__ = [
'EpsilonGreedyRegressor',
'UCBRegressor',
]


class Bandit(base.EnsembleMixin):

def __init__(self, models: typing.List[base.Estimator], metric: metrics.Metric, explore_each_arm: int, start_after: int, seed: int = None):

if len(models) <= 1:
raise ValueError(f"You supply {len(models)} models. At least 2 models should be supplied.")
etiennekintzler marked this conversation as resolved.
Show resolved Hide resolved

# Check that the model and the metric are in accordance
for model in models:
if not metric.works_with(model):
raise ValueError(f"{metric.__class__.__name__} metric can't be used to evaluate a " +
f'{model.__class__.__name__}')
super().__init__(models)
self.metric = copy.deepcopy(metric)
self._y_scaler = copy.deepcopy(preprocessing.StandardScaler())

# Initializing bandits internals
self._n_arms = len(models)
self._n_iter = 0 # number of times learn_one is called
self._N = [0] * self._n_arms
self.explore_each_arm = explore_each_arm
self._average_reward = [0.0] * self._n_arms

# Warm up
self.start_after = start_after
self.warm_up = True

# Randomization
self.seed = seed
self._rng = random.Random(seed)

def __repr__(self):
return (
f"{self.__class__.__name__}" +
f"\n\t{str(self.metric)}" +
f"\n\t{'Best model id: ' + str(self._best_model_idx)}"
).expandtabs(2)

@abc.abstractmethod
def _pull_arm(self):
pass

@abc.abstractmethod
def _update_arm(self, arm, reward):
pass

@abc.abstractmethod
def _pred_func(self, model):
pass

@property
def _best_model_idx(self):
# average reward instead of cumulated (otherwise favors arms which are pulled often)
return utils.math.argmax(self._average_reward)

@property
def best_model(self):
return self[self._best_model_idx]

@property
def percentage_pulled(self):
if not self.warm_up:
percentages = [n / sum(self._N) for n in self._N]
else:
percentages = [0] * self._n_arms
return percentages
etiennekintzler marked this conversation as resolved.
Show resolved Hide resolved

def predict_one(self, x):
best_arm = self._best_model_idx
y_pred = self._pred_func(self[best_arm])(x)

return y_pred

def learn_one(self, x, y):
self._learn_one(x, y)
return self

def add_models(self, new_models: typing.List[base.Estimator]):
length_new_models = len(new_models)
self.models += new_models
self._n_arms += length_new_models
self._N += [0] * length_new_models
self._average_reward += [0.0] * length_new_models

def _learn_one(self, x, y):
# Explore all arms pulled less than `explore_each_arm` times
never_pulled_arm = [i for (i, n) in enumerate(self._N) if n < self.explore_each_arm]
if never_pulled_arm:
chosen_arm = self._rng.choice(never_pulled_arm)
else:
chosen_arm = self._pull_arm()

# Predict and learn with the chosen model
chosen_model = self[chosen_arm]
y_pred = chosen_model.predict_one(x)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be predict_proba for a classifier right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right, didn't anticipate that. Also, for classifier the whole scaling thing is less of a problem (since the target is {0, 1}).

self.metric.update(y_pred=y_pred, y_true=y)
chosen_model.learn_one(x=x, y=y)

# Update bandit internals
if self.warm_up and (self._n_iter == self.start_after):
self._n_iter = 0 # must be reset to 0 since it is an input to some model
self.warm_up = False

self._n_iter += 1
reward = self._compute_scaled_reward(y_pred=y_pred, y_true=y)
if not self.warm_up:
self._update_bandit(chosen_arm=chosen_arm, reward=reward)

return self.metric._eval(y_pred, y), reward, chosen_arm

def _update_bandit(self, chosen_arm, reward):
# Updates common to all bandits
self._n_iter += 1
self._N[chosen_arm] += 1
self._average_reward[chosen_arm] += (1.0 / self._N[chosen_arm]) * \
(reward - self._average_reward[chosen_arm])

# Specific update of the arm for certain bandit model
self._update_arm(chosen_arm, reward)


def _compute_scaled_reward(self, y_pred, y_true, c=1, scale_y=True):
if scale_y:
y_true = self._y_scaler.learn_one(dict(y=y_true)).transform_one(dict(y=y_true))["y"]
y_pred = self._y_scaler.transform_one(dict(y=y_pred))["y"]
etiennekintzler marked this conversation as resolved.
Show resolved Hide resolved

metric_value = self.metric._eval(y_pred, y_true)
metric_value = metric_value if self.metric.bigger_is_better else (-1) * metric_value
reward = 1 / (1 + math.exp(- c * metric_value)) if c * metric_value > -30 else 0 # to avoid overflow
etiennekintzler marked this conversation as resolved.
Show resolved Hide resolved

return reward


class EpsilonGreedyBandit(Bandit):

def __init__(self, models: typing.List[base.Estimator], metric: metrics.Metric, seed: int = None, start_after=25,
epsilon=0.1, epsilon_decay=None, explore_each_arm=0):
super().__init__(models=models, metric=metric, seed=seed, explore_each_arm=explore_each_arm, start_after=start_after)
self.epsilon = epsilon
self.epsilon_decay = epsilon_decay
if epsilon_decay:
self._starting_epsilon = epsilon

def _pull_arm(self):
if self._rng.random() > self.epsilon:
chosen_arm = utils.math.argmax(self._average_reward)
else:
chosen_arm = self._rng.choice(range(self._n_arms))

return chosen_arm

def _update_arm(self, arm, reward):
# The arm internals are already updated in the `learn_one` phase of class `Bandit`.
if self.epsilon_decay:
self.epsilon = self._starting_epsilon * math.exp(-self._n_iter*self.epsilon_decay)


class EpsilonGreedyRegressor(EpsilonGreedyBandit, base.Regressor):
"""Epsilon-greedy bandit algorithm for regression.

This bandit selects the best arm (defined as the one with the highest average reward) with
probability $(1 - \\epsilon)$ and draws a random arm with probability $\\epsilon$. It is also
called Follow-The-Leader (FTL) algorithm.


Parameters
----------
models
The models to compare.
metric
Metric used for comparing models with.
epsilon
Exploration parameter (default : 0.1).


Examples
--------
Let's use `UCBRegressor` to select the best learning rate for a linear regression model. First, we define the grid of models:

>>> from river import compose
>>> from river import linear_model
>>> from river import preprocessing
>>> from river import optim

>>> models = [
... compose.Pipeline(
... preprocessing.StandardScaler(),
... linear_model.LinearRegression(optimizer=optim.SGD(lr=lr))
... ) for lr in [1e-4, 1e-3, 1e-2, 1e-1]
... ]

We decide to use TrumpApproval dataset:

>>> from river import datasets
>>> dataset = datasets.TrumpApproval()

We then define the metric and the scaler for the reward

>>> from river.expert import EpsilonGreedyRegressor
>>> from river import metrics

>>> metric = metrics.MSE()
>>> bandit = EpsilonGreedyRegressor(models=models, metric=metric, seed=1)

We can then train the models in the bandit in an online fashion:




References
----------
[^1]: [Sutton, R. S., & Barto, A. G. (2018). Reinforcement learning: An introduction. MIT press.](http://incompleteideas.net/book/RLbook2020.pdf)
[^2]: [Rivasplata, O. (2012). Subgaussian random variables: An expository note. Internet publication, PDF.]: (https://sites.ualberta.ca/~omarr/publications/subgaussians.pdf)
[^3]: [Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press.](https://tor-lattimore.com/downloads/book/book.pdf)
"""
@classmethod
def _unit_test_params(cls):
return {
'models': [
compose.Pipeline(
preprocessing.StandardScaler(),
linear_model.LinearRegression(optimizer=optim.SGD(lr=0.01))),
compose.Pipeline(
preprocessing.StandardScaler(),
linear_model.LinearRegression(optimizer=optim.SGD(lr=0.1)))
],
'metric': metrics.MSE(),
}

def _pred_func(self, model):
return model.predict_one


class UCBBandit(Bandit):

def __init__(self, models: typing.List[base.Estimator], metric: metrics.Metric, seed: int = None, start_after=25, explore_each_arm=1, delta=None):
if explore_each_arm < 1:
raise ValueError("Argument 'explore_each_arm' should be >= 1")
super().__init__(models=models, metric=metric, seed=seed, explore_each_arm=explore_each_arm, start_after=start_after)
if delta is not None and (delta >= 1 or delta <= 0):
raise ValueError("The parameter delta should be comprised in ]0, 1[ (or set to None)")
self.delta = delta

def _pull_arm(self):
if self.delta:
exploration_bonus = [math.sqrt(2 * math.log(1/self.delta) / n) for n in self._N]
else:
exploration_bonus = [math.sqrt(2 * math.log(self._n_iter) / n) for n in self._N]
upper_bound = [
avg_reward + exploration
for (avg_reward, exploration)
in zip(self._average_reward, exploration_bonus)
]
chosen_arm = utils.math.argmax(upper_bound)

return chosen_arm

def _update_arm(self, arm, reward):
# The arm internals are already updated in the `learn_one` phase of class `Bandit`.
pass


class UCBRegressor(UCBBandit, base.Regressor):
"""Upper Confidence Bound bandit for regression.

The class offers 2 implementations of UCB:
- UCB1 from [^1], when the parameter delta has value None
- UCB(delta) from [^2], when the parameter delta is in (0, 1)

For this bandit, rewards are supposed to be 1-subgaussian (see Lattimore and Szepesvári,
chapter 6, p. 91) hence the use of the `StandardScaler` and `MaxAbsScaler` as `reward_scaler`.

Parameters
----------
models
The models to compare.
metric
Metric used for comparing models with.
delta
For UCB(delta) implementation. Lower value means more exploration.

Let's use `UCBRegressor` to select the best learning rate for a linear regression model.

Examples
--------
Let's use `UCBRegressor` to select the best learning rate for a linear regression model. First, we define the grid of models:

>>> from river import compose
>>> from river import linear_model
>>> from river import preprocessing
>>> from river import optim

>>> models = [
... compose.Pipeline(
... preprocessing.StandardScaler(),
... linear_model.LinearRegression(optimizer=optim.SGD(lr=lr))
... ) for lr in [1e-4, 1e-3, 1e-2, 1e-1]
... ]

We decide to use TrumpApproval dataset:

>>> from river import datasets
>>> dataset = datasets.TrumpApproval()

We then define the metric

>>> from river.expert import UCBRegressor
>>> from river import metrics

>>> metric = metrics.MSE()
>>> bandit = UCBRegressor(models=models, metric=metric, seed=1)

We can then train the models in the bandit in an online fashion:


References
----------
[^1]: [Auer, P., Cesa-Bianchi, N., & Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine learning, 47(2-3), 235-256.](https://link.springer.com/content/pdf/10.1023/A:1013689704352.pdf)
[^2]: [Lattimore, T., & Szepesvári, C. (2020). Bandit algorithms. Cambridge University Press.](https://tor-lattimore.com/downloads/book/book.pdf)
[^3]: [Rivasplata, O. (2012). Subgaussian random variables: An expository note. Internet publication, PDF.]: (https://sites.ualberta.ca/~omarr/publications/subgaussians.pdf)
"""
@classmethod
def _unit_test_params(cls):
return {
'models': [
compose.Pipeline(
preprocessing.StandardScaler(),
linear_model.LinearRegression(optimizer=optim.SGD(lr=0.01))),
compose.Pipeline(
preprocessing.StandardScaler(),
linear_model.LinearRegression(optimizer=optim.SGD(lr=0.1)))
],
'metric': metrics.MSE(),
}

def _pred_func(self, model):
return model.predict_one