Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
Already on GitHub? Sign in to your account
[MRG+1] Implement greedy A-optimal acquisition function for pure exploration #432
Conversation
codecov-io
commented
Jul 12, 2017
•
Codecov Report
@@ Coverage Diff @@
## master #432 +/- ##
==========================================
+ Coverage 86.43% 86.46% +0.02%
==========================================
Files 22 22
Lines 1563 1581 +18
==========================================
+ Hits 1351 1367 +16
- Misses 212 214 +2
Continue to review full report at Codecov.
|
|
Looks interesting! Could you elaborate a bit more on particular use cases for the function, eg a bit more description of practical use cases? Also could you provide some references to the literature where such thing is used? Would be good so that people can take a look at it a bit more in detail. One idea I would have in my mind is to possibly use this instead of random initialization for the optimizers, so that initial points generated are distributed "more evenly" across search space. |
|
The general setting is called active learning in which you want to learn the target function with as few evaluations as possible. "A-optimality" was established in optimal design . The goal is to specify design points in advance which reduce the average variance of the parameter estimates. See [1] for a good treatment of the different optimality criteria when applied in Bayesian optimization. This reference could also be useful if we want to implement more criteria like the mutual information. For initialization we could calculate a fixed set of [1] Krause, Andreas, Ajit Singh, and Carlos Guestrin. "Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies." Journal of Machine Learning Research 9.Feb (2008): 235-284. |
|
Naive question: how is this acquisition function different from evaluating the objective using a Sobol (or your favourite quasi random) sequence? Is it because with a Sobol sequence you explore the space "evenly" and here you pick points that have large uncertainty? Is there a simple example where the two don't lead to "the same" thing? (a heteroscedastic objective?) |
betatim
referenced
this pull request
Jul 12, 2017
Open
Picking initial points with latin hypercubes #433
|
Hmm, I think you can achieve the same by setting |
|
@betatim I will play around with a few GPs to come up with an example where the behavior is different. In any case the Sobol sequence is not adaptive, ie it will not change if the user provides an initial set of points for which the objective value is already known. @MechCoder Yes, indeed I was doing exactly this as a workaround before deciding to implement the acquisition function. In my opinion it is cleaner this way, since the effect of the mean is completely removed. |
|
In that case, I would prefer having a special value for |
|
Ok, that sounds like a good compromise. I can make the change next week
since I'm going on vacation today.
…
|
kiudee
closed this
Jul 19, 2017
|
I made the change, by letting the user provide a special string Somehow Github did not like that I rebased the commits and force-pushed. Any ideas on how to fix the pull request without recreating it? |
kiudee
reopened this
Jul 19, 2017
|
Looks good to me. +1 for merge |
glouppe
changed the title from
Implement greedy A-optimal acquisition function for pure exploration to [MRG+1] Implement greedy A-optimal acquisition function for pure exploration
Jul 21, 2017
| Controls how much of the variance in the predicted values should be | ||
| taken into account. If set to be very high, then we are favouring | ||
| exploration over exploitation and vice versa. | ||
| + If set to 'Aopt', the acquisition function will only use the variance |
kiudee
Jul 24, 2017
Contributor
I agree, since we do not have any other acquisition functions approximating optimal designs, we could call it something like 'var', 'variance', 'var_only' or 'explore_only'. I am open to suggestions.
| Controls how much of the variance in the predicted values should be | ||
| taken into account. If set to be very high, then we are favouring | ||
| exploration over exploitation and vice versa. | ||
| + If set to 'variance', the acquisition function will only use the variance |
iaroslav-ai
Jul 26, 2017
Owner
Do you talk about the name of acquisition function? Some might have weird associations with 'std' as abbreviation
|
After looking at the scikit-optimize documentation I would propose calling
it uncertainty which is the term used in the introduction of Bayesian
optimization. Though technically true that we pick the points maximizing
standard deviation and equivalently variance, I would say it is more
consistent to use uncertainty. Thoughts?
…
|
|
So the confusion on my side is because I would be fine with allowing |
|
No strong opinion from my side. Either way is fine for me.
(sent from my phone)
…
|
|
I would be fine calling it `'inf'` and explaining it in the docstring.
Gilles Louppe <notifications@github.com> schrieb am Do., 27. Juli 2017 um
07:53 Uhr:
…
|
|
Good to go for me when Travis is happy. |
|
The Travis build canceled due to |
MechCoder
merged commit bb73e24
into
scikit-optimize:master
Jul 28, 2017
|
Thanks! |
kiudee commentedJul 12, 2017
This acquisition function aims at reducing the overall uncertainty of our objective function approximation.
This is useful if you want to accurately gauge the effect of every hyperparameter on the objective function, typically to set proper ranges for the subsequent optimization or to remove a parameter completely.
The
gaussian_a_optfunction uses the standard deviation provided by the base estimator and samples those points first where it is maximal.Suggestions for improvement are welcome.