Skip to content
This repository has been archived by the owner on Feb 28, 2024. It is now read-only.

Demo of using skopt "ask-and-tell" API for multi-armed bandit problems #407

Closed
Naereen opened this issue Jun 19, 2017 · 7 comments
Closed

Comments

@Naereen
Copy link
Contributor

Naereen commented Jun 19, 2017

Hi,
For my research, I wrote this small notebook, showing how to use the (awesome) "ask-and-tell" API for skopt Optimizers.
It shows that it is easy to compare a Black-Box Bayesian optimization algorithm, here RandomForestRegressor, on Multi-armed bandits problems 🎰. It is compared against classical online algorithms like UCB and Thompson sampling.

I don't know if it can be interesting for skopt's documentation, but I wanted to let you know. ✨

If you want to include it / point to it, don't hesitate to ask me for rewriting or cleaning or improving the notebook! The most important part is here 🛠️

Thanks 😃.

@betatim
Copy link
Member

betatim commented Jun 19, 2017

I think we need more examples and narrative documentation. So I think we should add it to the examples section. Is there someone around who knows enough about multi-arm bandits and the like to judge the content?

@betatim
Copy link
Member

betatim commented Jun 19, 2017

After a first read through I think we should include a shortened version in the docs and link to the long version.

@Naereen
Copy link
Contributor Author

Naereen commented Jun 19, 2017

Hi,
I agree, and this long version uses a toolbox I am working on for which a doc is already online (http://banditslilian.gforge.inria.fr/) but the code itself is not (yet) open-source.

I will write a smaller version, with manual implementation of UCB and Thompson Sampling, as short as possible, and try to make it self-contained.

@betatim
Copy link
Member

betatim commented Jun 19, 2017

Ok. Making it self contained is good, especially if it otherwise involves code that people can't get :) We should find someone who understands both SMBO and the more classic multi arm bandit methods so we have some kind of sanity check on the comparisons (comparing A with B almost always upsets someone if you aren't careful :) )

@Naereen
Copy link
Contributor Author

Naereen commented Jun 19, 2017

Here, on that small notebook, I compare:

  • "very slow" Black Box Bayesian optimization,
  • against "quick" online bandit algorithms (state-of-the-art),

Both perform almost as well, the Bayesian optimization being slightly better (on some toy problem) but really slower.

@MechCoder
Copy link
Member

Thanks a lot for the highly detailed notebook. I am not too familiar with multi-arm bandit methods either but upon a quick read, I find that the goal of multi-armed bandits is to minimize the expected reward after n iterations, while bayesian optimisation finds the argmin of the expensive function. Both seem to solve different problems right?

Or is the rationale that inspite of SMBO solving a different problem, it performs as well?

@Naereen
Copy link
Contributor Author

Naereen commented Jun 20, 2017

@MechCoder Indeed the Multi-Armed bandit problem is a reinforcement learning problem where the task is the maximize the expected reward (or minimize the regret) upon an (unknown) horizon n. This is essentially done by finding the best arm and not pulling all the sub-optimal arms. And here the Bayesian optimization task aims finding the optimal arm, by trying all of them, which will result in a small regret also.

I think what was interesting is to show that black box Bayesian optimization can also be applied to reinforcement learning experiments, even if the cost in terms of computational complexity is not worth the improvement in terms of rewards gain / regret diminution (as BlackBoxOpt performs indeed way better than state-of-the-art Thompson Sampling or klUCB on the basic examples I considered).

I am re-running the experiments with a longer horizon and more repetitions (reduce noise in the plots), and after I will rewrite it more concisely to be self-contained (and only present one experiment).

@Naereen Naereen closed this as completed Jan 28, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants