-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare the results of limbo with BayesOPT #13
Comments
The first thing I tried is the simplest scenario possible, optimizing the Branin function:
To be fair, I had to remove some tricks that BayesOpt perform. For example, when optimizing the acquisition, they perform first a global search with DIRECT, and then a shorter local search with BOBYQA; also, after this optimization, they perform some random variations of the current minimum sample and then a small local search around it; another trick is that if the difference between two consecutive samples(squared) is less than the noise for N iterations (N configured by the user) then it performs a totally random sample. Maybe there are more tricks that I haven't seen, but I think this are the main ones. |
Really curious to see what the results will be. |
Hi, I am curious about the results too. However, be careful as your benchmark should consider the “vanilla” version of the lib. Typically, a normal user of the libs will not go into the source code and remove these tricks (as you did), so the performances that we to compare are those that this user is likely to observe. Do you see what I mean? Antoine CULLY
|
Hi Antoine, |
You are completely right. I look forward to see the results! Cheers, Antoine CULLY
|
Thank you for these preliminary results. Do you have an idea about why BayesOpt performs better than Limbo in terms of accuracy? Antoine CULLY
|
Honestly, I have no idea. I'm using the exact same inner-optimization process for both (DIRECT algorithm from NLOpt library, I used the nlopt branch of limbo, with some modifications to match BayesOpt). |
Hum,.. interesting. Except the random initialization of the GP, which parts of the algorithm(s) are stochastic? I am asking this, because if all the algorithms are deterministic then we can compare them step by step and see when the behave differently. Do the two algorithms use different inversion procedures? (if our inversion is less “exact” then it may also explain why limbo is faster) Antoine CULLY
|
I think that DIRECT is deterministic, and the only point of stochasticness is the initialization. I will try next week providing the samples. With limbo is super easy, but for BayesOpt I have to dig deeper in the code to modify it. |
In the case where there is no hyper-parameter optimization, then the kernel of the next iteration contains the kernel of the current one (and only adds one line and one row) Enjoy your week end! Antoine CULLY
|
Yeap, we know it, and discussed it briefly, but to do that we would have to change the API of the GP to allow incremental updates of the model, and like you said, when there is optimization of the hyperparameters, everything gets recomputed, so we think it's not really worth it. What they do is they perform incremental updates of the cholesky decomposition, and every N iterations(configurable by the user) they optimize the hyperparameters, and recompute the full decomposition. In this case, since there is no hyperaparameter optimization, they always perform the incremental updates, which, as you said, may lead to greater stability. Good weekend for you! |
Hello! For some reason, not all jobs ran in the cluster, and I ended up with only 17743 samples, but I think they are more than enough. They just confirm what I posted previously: Accuracy
Speed
In these days I will continue benchamarking. |
Thank you. This is very interesting. What is your stopping criterion? On Mon, Nov 23, 2015 at 10:53 AM Federico Allocati notifications@github.com
|
Forgot to write it, 190 iterations in both cases |
Hi all, Do you have more explanations about the accuracy difference ? Best, Antoine CULLY
|
I have been working on other things, and will resume this tomorrow. |
I just tried both libraries, in the same scenario, but both starting with the same set of points (taken from here: http://mathematica.stackexchange.com/questions/47638/distribution-of-10-points-within-a-unit-square), and they give really different results. Limbo is able to find a suitable solution (with approx the same accuracy as before), while BayesOpt doesn't get it, and got stuck in a local optimum. Also, from now there are many things to try and paths to follow, so I propose that we stablish a "Roadmap" for the comparisons:
I'm waiting for suggestions on what scenarios are worth comparing to you. Should this scenarios be always fair? (for example, by default they don't recompute the hyperparameters every step, but every 50 I think, so should we set this parameter to 1, or leave the default?). Please write any ideas that you have! |
I agree with this.
I think this is also a good idea.
I think we need to test the scenarios once with fair comparison and once with the defaults of each library. |
Okay people, I've got some news that explain the difference in accuracy and speed. |
Thank you. I am looking forward seeing the results! On 27 Nov 2015, at 19:33, Federico Allocati notifications@github.com wrote:
� Jean-Baptiste Mouret |
I agree with Konstantinos on all of this. � JBM
� Jean-Baptiste Mouret |
Just so you can check, results with 1000 runs of each: Accuracy
Speed
|
Thank you. How do you measure accuracy? Is it the difference with the real optimum? If so, it seems that both lib fail for Rastrigin� (which is not unexpected). A count of 1000 means 1000 iterations?? If so, this is a lot.. |
Exactly, accuracy is the difference with the real optimum. The count is the
|
I've got the results with 5000 runs. It didn't changed much from the previous data: Accuracy
I think it makes no sense to upload the plots for this, since each function has it's own scale, and is not really informative, but if someone wants to, I can put them. As you can see, we have more or less the same accuracy as them, except for the branin function(in which they are 2 orders of magnitude better), the hartman6(1 order better) and the goldenprice(2 orders better), and the simple sphere in which we are 1 order of magnitude better. Speed
Some good news for us, we are always faster! The plots are clear, we really beat them in this aspect, with them being in average over all the functions 1.61 times slower than us. I'm pretty sure that the difference will be even bigger if I set the recalculation of the whole cholesky matrix on each iteration. |
Maybe after some bug fixes (like random_init) and the new cmaes library, we should re-run these experiments. |
We should be able to easily run these experiments so that we can check that there is no regression. � JBM
� Jean-Baptiste Mouret |
Good idea! |
This is solved by #96 ... Closing... |
https://bitbucket.org/rmcantin/
Criteria:
The text was updated successfully, but these errors were encountered: