Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

[MRG+1] Implement greedy A-optimal acquisition function for pure exploration #432

Merged
merged 2 commits into from Jul 28, 2017

Conversation

Projects
None yet
6 participants
Contributor

kiudee commented Jul 12, 2017

This acquisition function aims at reducing the overall uncertainty of our objective function approximation.
This is useful if you want to accurately gauge the effect of every hyperparameter on the objective function, typically to set proper ranges for the subsequent optimization or to remove a parameter completely.

The gaussian_a_opt function uses the standard deviation provided by the base estimator and samples those points first where it is maximal.

Suggestions for improvement are welcome.

codecov-io commented Jul 12, 2017 edited

Codecov Report

Merging #432 into master will increase coverage by 0.02%.
The diff coverage is 75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #432      +/-   ##
==========================================
+ Coverage   86.43%   86.46%   +0.02%     
==========================================
  Files          22       22              
  Lines        1563     1581      +18     
==========================================
+ Hits         1351     1367      +16     
- Misses        212      214       +2
Impacted Files Coverage Δ
skopt/acquisition.py 95.95% <75%> (-0.89%) ⬇️
skopt/callbacks.py 95.65% <0%> (-0.51%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update afb0e49...3824ef0. Read the comment docs.

Owner

iaroslav-ai commented Jul 12, 2017

Looks interesting!

Could you elaborate a bit more on particular use cases for the function, eg a bit more description of practical use cases? Also could you provide some references to the literature where such thing is used? Would be good so that people can take a look at it a bit more in detail.

One idea I would have in my mind is to possibly use this instead of random initialization for the optimizers, so that initial points generated are distributed "more evenly" across search space.

Contributor

kiudee commented Jul 12, 2017 edited

The general setting is called active learning in which you want to learn the target function with as few evaluations as possible.

"A-optimality" was established in optimal design . The goal is to specify design points in advance which reduce the average variance of the parameter estimates. See [1] for a good treatment of the different optimality criteria when applied in Bayesian optimization. This reference could also be useful if we want to implement more criteria like the mutual information.

For initialization we could calculate a fixed set of n_random_starts points to implement an optimal design.
I would advise against using the surrogate model for that purpose.
For quasi-random initialization I would recommend a sequence of points satisfying low-discrepancy (see [2] for a recent paper on quasi-monte carlo integration). This captures your intuition of "more evenly" exploring the search space.
The library Spearmint uses a Sobol sequence for initialization. I would recommend choosing a random start value of the sequence, otherwise it will always start with the exact same points.

[1] Krause, Andreas, Ajit Singh, and Carlos Guestrin. "Near-optimal sensor placements in Gaussian processes: Theory, efficient algorithms and empirical studies." Journal of Machine Learning Research 9.Feb (2008): 235-284.
[2] Dick, Josef, Frances Y. Kuo, and Ian H. Sloan. "High-dimensional integration: the quasi-Monte Carlo way." Acta Numerica 22 (2013): 133-288.

Owner

betatim commented Jul 12, 2017 edited

Naive question: how is this acquisition function different from evaluating the objective using a Sobol (or your favourite quasi random) sequence? Is it because with a Sobol sequence you explore the space "evenly" and here you pick points that have large uncertainty? Is there a simple example where the two don't lead to "the same" thing? (a heteroscedastic objective?)

Owner

MechCoder commented Jul 12, 2017 edited

Hmm, I think you can achieve the same by setting Kappa to a very-high value in LCB. Is it not?

Contributor

kiudee commented Jul 13, 2017

@betatim I will play around with a few GPs to come up with an example where the behavior is different. In any case the Sobol sequence is not adaptive, ie it will not change if the user provides an initial set of points for which the objective value is already known.

@MechCoder Yes, indeed I was doing exactly this as a workaround before deciding to implement the acquisition function. In my opinion it is cleaner this way, since the effect of the mean is completely removed.

Owner

MechCoder commented Jul 15, 2017

In that case, I would prefer having a special value for Kappa that will set exploitation to zero (and will have no controversy in getting merged) instead of having yet another acquisition function.

Contributor

kiudee commented Jul 15, 2017

kiudee closed this Jul 19, 2017

Karlson Pfannschmidt Implement A-optimal selection in LCB acquisition
283e830
Contributor

kiudee commented Jul 19, 2017 edited

I made the change, by letting the user provide a special string 'Aopt' as the parameter kappa in LCB.

Somehow Github did not like that I rebased the commits and force-pushed. Any ideas on how to fix the pull request without recreating it?
edit: It appears simply reopening it fixed the history, but we need to rerun the tests.

kiudee reopened this Jul 19, 2017

Owner

glouppe commented Jul 21, 2017

Looks good to me. +1 for merge

glouppe changed the title from Implement greedy A-optimal acquisition function for pure exploration to [MRG+1] Implement greedy A-optimal acquisition function for pure exploration Jul 21, 2017

skopt/acquisition.py
Controls how much of the variance in the predicted values should be
taken into account. If set to be very high, then we are favouring
exploration over exploitation and vice versa.
+ If set to 'Aopt', the acquisition function will only use the variance
@MechCoder

MechCoder Jul 22, 2017

Owner

Sorry for being a prick but is Aopt the best name?
`

@kiudee

kiudee Jul 24, 2017

Contributor

I agree, since we do not have any other acquisition functions approximating optimal designs, we could call it something like 'var', 'variance', 'var_only' or 'explore_only'. I am open to suggestions.

@glouppe

glouppe Jul 25, 2017

Owner

"variance" is fine with me.

skopt/acquisition.py
Controls how much of the variance in the predicted values should be
taken into account. If set to be very high, then we are favouring
exploration over exploitation and vice versa.
+ If set to 'variance', the acquisition function will only use the variance
@MechCoder

MechCoder Jul 26, 2017

Owner

Sorry again, but this should be `std'?

@iaroslav-ai

iaroslav-ai Jul 26, 2017

Owner

Do you talk about the name of acquisition function? Some might have weird associations with 'std' as abbreviation 😅

Contributor

kiudee commented Jul 26, 2017

Owner

MechCoder commented Jul 27, 2017

So the confusion on my side is because kappa denotes the value by which the std is multiplied and not the acquisition function itself.

I would be fine with allowing kappa="inf" or/and kappa=np.inf with a note that says this sets off exploitation. WDYT?

Owner

glouppe commented Jul 27, 2017

Contributor

kiudee commented Jul 27, 2017

Karlson Pfannschmidt Rename Aopt to inf
Since in LCB the variable kappa is used to describe how much weight is
given to the standard deviation, 'inf' is a more natural name for
the limit of this weight.
3824ef0
Owner

glouppe commented Jul 27, 2017 edited

Good to go for me when Travis is happy.

Contributor

kiudee commented Jul 27, 2017

The Travis build canceled due to
The job exceeded the maximum time limit for jobs, and has been terminated.

@MechCoder MechCoder merged commit bb73e24 into scikit-optimize:master Jul 28, 2017

2 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
Owner

MechCoder commented Jul 28, 2017

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment