Gaussian Process for Integer and Categorical Dimensions? #580

Hvass-Labs · 2017-12-17T15:19:41Z

One of the many things that I really like about skopt is that I can mix real-valued, integer and categorical dimensions in the search-space. This is REALLY powerful!

However, I'm not an expert on Gaussian Processes (GP) and Bayesian Optimization (BO), so can you explain to me (and the rest of the users) how you actually do this internally in skopt? All the lectures I have watched on GP and BO always assume a single real-valued variable. How do you model integers and categorical variables using a GP?

betatim · 2017-12-21T00:13:13Z

If the search space is purely categorical then we use a special kernel. Beyond that take a look at the transform for categorical dimensions.

Personally I use tree based optimisers when the space has lots of integers/categories instead of using a GP based optimizer.

Hvass-Labs · 2017-12-28T09:47:26Z

I would like to understand this so I can explain it in my tutorial.

For Integer dimensions, it appears from the comment (yay!) in Integer.inverse_transform() that the search-space is perhaps always real-valued and this is simply cast to / from integers. Is that correct?

Is it documented somewhere that all dimensions are converted / transformed internally to real-valued? Or is this transformation only done for Gaussian Processes?

For Categorical dimensions, I managed to track down something called LabelBinarizer but I don't really understand how it works with a Gaussian Process. For example, if we have a Categorical dimension named activation that can take one of three values: linear, sigmoid, relu. Does this get one-hot encoded so linear corresponds to e.g. [1.0, 0.0, 0.0] and sigmoid corresponds to [0.0, 1.0, 0.0] and relu corresponds to [0.0, 0.0, 1.0]? Then a position in the search-space could be e.g. [0.1, 0.6, 0.3] and to get the category we then do an argmax and find the sigmoid category? Is that how it works?

But the categorical dimension can have a very non-smooth impact on the objective function. Is the Gaussian Process really able to handle such non-smooth dimensions? Or is that perhaps why you prefer to use tree-based search?

betatim · 2017-12-29T07:54:47Z

How each optimisation method handles the encoding/transforming of dimensions is considered an implementation detail. As a result you need to read the code to find out how it is done. It has the advantage that we can change how this works without having to deprecate behaviour first.

Your description of what happens for a GP is correct:

In [9]: from skopt.utils import normalize_dimensions

In [10]: space = normalize_dimensions([(4, 13), ('a', 'b', 'c', 'd'), (34., 42.2
    ...: 3)])

In [11]: space.rvs(random_state=1)
Out[11]: [[8, 'c', 34.000941304746746]]

In [13]: space.transform(space.rvs(random_state=1))
Out[13]: 
array([[  4.44444444e-01,   0.00000000e+00,   0.00000000e+00,
          1.00000000e+00,   0.00000000e+00,   1.14374817e-04]])

In [14]: space.inverse_transform(np.array([[0.444444444444444445, 0.2, 0.2, 0.2,
    ...:  0.4, 1.14374817e-4], ]))
Out[14]: [[8, 'd', 34.000941304743911]]

But the categorical dimension can have a very non-smooth impact on the objective function. Is the Gaussian Process really able to handle such non-smooth dimensions? Or is that perhaps why you prefer to use tree-based search?

Correct. To learn more you can start reading papers by Frank Hutter like for example https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf (section 4). I think they were the people who first used trees instead of GPs. I think the best way to understand the trade-offs and caveats of using GPs vs trees is to watch some of the many videos from the various summer schools on black-box optimisation and GPs (google for Nando de Freitas and Sheffield summer school).

Hvass-Labs · 2017-12-30T08:05:45Z

Thank you. It might be useful to add some of this information to doc-strings.

mjmikulski · 2020-04-23T12:12:13Z

@Hvass-Labs if you are still interested in the subject, have a look on this very ilustrative paper
Dealing with Categorical and Integer-valued Variables in Bayesian Optimization with Gaussian Processes

They modify the kernel to use the information about the integerness of a variable - look at the sections 3.1 and 3.2 and at the figure 1 at page 7.

Hvass-Labs changed the title ~~Gaussian Process for Integer and Categorial Dimensions?~~ Gaussian Process for Integer and Categorical Dimensions? Dec 17, 2017

Hvass-Labs closed this as completed Dec 30, 2017

nipunbatra mentioned this issue Jun 13, 2019

TODO Master issue distillpub/post--bayesian-optimization#2

Closed

19 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gaussian Process for Integer and Categorical Dimensions? #580

Gaussian Process for Integer and Categorical Dimensions? #580

Hvass-Labs commented Dec 17, 2017

betatim commented Dec 21, 2017

Hvass-Labs commented Dec 28, 2017

betatim commented Dec 29, 2017

Hvass-Labs commented Dec 30, 2017

mjmikulski commented Apr 23, 2020

Gaussian Process for Integer and Categorical Dimensions? #580

Gaussian Process for Integer and Categorical Dimensions? #580

Comments

Hvass-Labs commented Dec 17, 2017

betatim commented Dec 21, 2017

Hvass-Labs commented Dec 28, 2017

betatim commented Dec 29, 2017

Hvass-Labs commented Dec 30, 2017

mjmikulski commented Apr 23, 2020