Skip to content
This repository has been archived by the owner on Feb 28, 2024. It is now read-only.

Local-search based technique to optimize the acquisition function of trees and friends #74

Open
MechCoder opened this issue Apr 29, 2016 · 10 comments

Comments

@MechCoder
Copy link
Member

We cannot optimize the acquisition function of using conventional gradient / 2nd order information based methods. SMAC does it in the following way described in page 13 of http://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf

Some terminology.

  1. If we have p parameters and a parameter configuration, a one-exchange neighbourhood is defined as a parameter configuration that is different in exactly one parameter.
  2. For a parameter (say X) that is continuous, this neighbor is sampled from a Gaussian centered at X with std 0.2 keeping all other parameters constant.
  3. For a parameter (say Y) that is categorical, this neighbour is any other categorical parameter keeping all other parameters constant.

Seems like they do a multi-start local search with 10 points. For each local search:

  1. Initialize a random point p
  2. Check the acquisition values at "4X + Y" neighbours.
  3. If none of the neighbours have a lesser acquisition than p, then terminate
    Else reassign p to the neighbour with minimum acquisition value.

Then return the minimum of all the 10 local searches.

@MechCoder
Copy link
Member Author

ping @glouppe @betatim

@MechCoder
Copy link
Member Author

They also say "we plan to investigate better mechanisms in the future". Might be a good place for us to investigate as well :P

@glouppe
Copy link
Member

glouppe commented Apr 29, 2016

Thanks for digging this up. This does look very hackish indeed and it would definitely be nice to find a better solution for that.

@MechCoder
Copy link
Member Author

Do we first implement this as a baseline?

@betatim
Copy link
Member

betatim commented Jun 15, 2016

Having re-read the SMAC paper I think we should stick with random sampling for now. It seems quite convoluted. They combine the result of 10 local searches with 10000 random samples and it is not terribly clear to me that it does better than simple random sampling.

Do you understand the last paragraph in section 4.3?

@MechCoder
Copy link
Member Author

MechCoder commented Jun 17, 2016

Hmm.. They claim however in the paragraph before the last that the "ten
results of local search typically had larger EI than all randomly sampled configurations".

So it might be worth a try to just do the local searches independent of the random sampling.

I have it in my branch here (https://github.com/MechCoder/scikit-optimize/tree/paramils) and will send a PR once we test that the _optimize methods indeed handles categorical parameters.

@betatim
Copy link
Member

betatim commented Jun 18, 2016

What I couldn't work out was: If the ten local searches "always" outperform the 10000 random samples, why do they continue to do them?

The other thing I find confusing is the statement of interleaving random samples for training purposes and unbiasedness.

Looking forward to a PR, then we can benchmark things and compare how much we gain from using a smarter/more complicated method.

@MechCoder
Copy link
Member Author

OK, I now have a working version but it fails 2 out of the 6 minimizers. I shall verify tomorrow and let you know if the method performs worse or if it is a bug in the code. (I hope it is the latter)

@glouppe
Copy link
Member

glouppe commented Jul 16, 2016

Great! Could you make a WIP PR? I am curious to see how you do that :)

@MechCoder
Copy link
Member Author

done :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants