discrete parameter domain of GP, high-dimensional GP, multi-output BO, output-constrainted BO #268

jmren168 · 2019-09-17T00:47:25Z

🚀 Feature Request

Hi, I'm new to this package, and tried to find if botorch matches my situations.

What I would like to do is

create surrogate model on sample pool, e.g., a high-dimensional (dim. >40) , and discrete parameter domain of GP,
create acquisition function, e.g., multi-output BO,
optimize acquisition function, e.g., output-constrained BO,
get the next recommendation sample, say X and do REAL experiment to get REAL_Y,
augment (X, REAL_Y) to sample pool, and
repeat steps 1-5 till some criterion is satisfied

What I found is that discrete parameter domain of GP is not supported by botorch, and is recommended to use Ax instead of botorch (#177 ), but I'm not sure if Ax satisfies my situations.

Any comments are highly appreciated.

Balandat · 2019-09-17T02:22:42Z

Well, this reads like a standard BayesOpt loop, so on a high level, yes, that is what botorch does. A couple of questions:

How many of your parameters are discrete? Are they categorical or are they ordered? How many values can they take on?
What do you mean by multi-output BO here? A multi-output model on which you want to evaluate some scalar objective? Or is your goal to do something like Pareto-front optimization without scalarization/weighing the outputs?

jmren168 · 2019-09-17T03:22:17Z

Hi Balandat,

Thanks for the quick reply.

30 parameters are continuous and at least 10 parameters are discrete and ordered. 10 to 20 possible values for these discrete parameters.
I have at least three outputs (continuous variables), and would like to optimize them at the same time. So it should be a Pareto-front optimization problem.

Via BO with GP, I would like to find Pareto-set and do REAL experiment on Pareto-optimal to get REAL_Y. Hope this makes my problems more clearly, and many thanks again.

Balandat · 2019-09-17T04:54:12Z

The fact that the discrete parameters are ordered is helpful, it should be possible to use a continuous relaxation for these for the modeling and optimization. The fact that you have >40 parameters is somewhat daunting though unless you either
a. have a ton of data and a scalable model
b. have reason to believe that the outcome is well-modeled on a lower-dimensional subspace/manifold of the parameters.
Re a., how many data points do you expect to be able to gather?
We're working on some algorithms for Pareto-optimization, but right now we don't have anything packaged up yet. The simplest approach would be to do some random scalarization similar to e.g. https://arxiv.org/abs/1805.12168 - this would be pretty straightforward to implement. Note though that your problem is relatively high-dimensional, and so I'd first see whether you can get a good handle on the modeling in 1. before going down that road.

jmren168 · 2019-09-17T15:12:08Z

Hi Balandat,

Thanks for the guides.

Although we face a high-dimensional optimization problem, but one good news is that most of parameters are fixed, and only 1 to 3 parameters are changed in REAL sequential experiments. The number of data points is at most 30. Although it may not be enough to construct BO models, but from domain experts, we believe that some SUB-PARAMETERS dominate the outcomes. For this case, do you recommend any subspace optimization techniques? Either comments or papers are highly welcome. Thanks again.
We have tried to apply "Dragonfly" package to our case, but it needs a function caller instead of REAL experiments. It may be good to implement random scalarization approach by ourselves.

Balandat · 2019-09-17T15:30:21Z

What's the motivation behind modeling around ~40 parameters if you're optimizing over only 1-3? Is this a contextual optimization problem where you're trying to find optimal settings conditional on other parameters? Standard GP models are likely to fail if you try to model 30-40 dimensions with only 30 data points.

jmren168 · 2019-09-18T01:14:58Z

Here's the situation, say, x1,...,x40, where x1,...,x10 are discrete and the others are continuous. y1,...,y3 are the 3 outcomes of REAL sequential experiments.

Since the case is a SEQUENTIAL experiment, the setting of the 2nd experiment depends on the result of the 1st experiment. E.g., x3 of the 2nd data point is modified where the others are the same as the 1st data point. Then, for the 3rd data point, maybe only x2 and x40 are modified compared to the 2nd data point. That's what we mean most parameters are FIXED here. Hope this makes our case more clearly.

saitcakmak · 2022-06-28T18:57:13Z

Closing this since many of the features are now supported.

jmren168 added the enhancement New feature or request label Sep 17, 2019

saitcakmak closed this as completed Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

discrete parameter domain of GP, high-dimensional GP, multi-output BO, output-constrainted BO #268

discrete parameter domain of GP, high-dimensional GP, multi-output BO, output-constrainted BO #268

jmren168 commented Sep 17, 2019 •

edited

Balandat commented Sep 17, 2019

jmren168 commented Sep 17, 2019 •

edited

Balandat commented Sep 17, 2019

jmren168 commented Sep 17, 2019

Balandat commented Sep 17, 2019

jmren168 commented Sep 18, 2019

saitcakmak commented Jun 28, 2022

discrete parameter domain of GP, high-dimensional GP, multi-output BO, output-constrainted BO #268

discrete parameter domain of GP, high-dimensional GP, multi-output BO, output-constrainted BO #268

Comments

jmren168 commented Sep 17, 2019 • edited

🚀 Feature Request

Balandat commented Sep 17, 2019

jmren168 commented Sep 17, 2019 • edited

Balandat commented Sep 17, 2019

jmren168 commented Sep 17, 2019

Balandat commented Sep 17, 2019

jmren168 commented Sep 18, 2019

saitcakmak commented Jun 28, 2022

jmren168 commented Sep 17, 2019 •

edited

jmren168 commented Sep 17, 2019 •

edited