Demo of using skopt "ask-and-tell" API for multi-armed bandit problems #407
Comments
I think we need more examples and narrative documentation. So I think we should add it to the examples section. Is there someone around who knows enough about multi-arm bandits and the like to judge the content? |
After a first read through I think we should include a shortened version in the docs and link to the long version. |
Hi, I will write a smaller version, with manual implementation of UCB and Thompson Sampling, as short as possible, and try to make it self-contained. |
Ok. Making it self contained is good, especially if it otherwise involves code that people can't get :) We should find someone who understands both SMBO and the more classic multi arm bandit methods so we have some kind of sanity check on the comparisons (comparing A with B almost always upsets someone if you aren't careful :) ) |
Here, on that small notebook, I compare:
Both perform almost as well, the Bayesian optimization being slightly better (on some toy problem) but really slower. |
Thanks a lot for the highly detailed notebook. I am not too familiar with multi-arm bandit methods either but upon a quick read, I find that the goal of multi-armed bandits is to minimize the expected reward after n iterations, while bayesian optimisation finds the argmin of the expensive function. Both seem to solve different problems right? Or is the rationale that inspite of SMBO solving a different problem, it performs as well? |
@MechCoder Indeed the Multi-Armed bandit problem is a reinforcement learning problem where the task is the maximize the expected reward (or minimize the regret) upon an (unknown) horizon n. This is essentially done by finding the best arm and not pulling all the sub-optimal arms. And here the Bayesian optimization task aims finding the optimal arm, by trying all of them, which will result in a small regret also. I think what was interesting is to show that black box Bayesian optimization can also be applied to reinforcement learning experiments, even if the cost in terms of computational complexity is not worth the improvement in terms of rewards gain / regret diminution (as
|
Hi,
For my research, I wrote this small notebook, showing how to use the (awesome) "ask-and-tell" API for skopt
Optimizer
s.It shows that it is easy to compare a Black-Box Bayesian optimization algorithm, here
RandomForestRegressor
, on Multi-armed bandits problems 🎰. It is compared against classical online algorithms like UCB and Thompson sampling.I don't know if it can be interesting for
skopt
's documentation, but I wanted to let you know. ✨If you want to include it / point to it, don't hesitate to ask me for rewriting or cleaning or improving the notebook! The most important part is here 🛠️
Thanks 😃.
The text was updated successfully, but these errors were encountered: