Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing over discrete parameter domains #177

Closed
avimit opened this issue Jun 19, 2019 · 19 comments
Closed

Optimizing over discrete parameter domains #177

avimit opened this issue Jun 19, 2019 · 19 comments
Labels
enhancement New feature or request

Comments

@avimit
Copy link

avimit commented Jun 19, 2019

Can BoTorch be used over discrete parameter domains?
(if so, than this is a feature inquiry, and not a feature "request")

We have a use case of domains which are partly continuous, partly discrete, like:
[{"name": "param1", "type": "continuous", "domain": [-5, 10]}, {"name": "param2", "type": "continuous", "domain": [1, 15]}, {"name": "param3", "type": "discrete", "domain": [1, 1.5, 2, 2.5, 3, 3.5, 4]}]

The functions under "botorch/optim/optimize.py" accept an argument called "bounds", which you define as : "bounds: A 2 x d tensor of lower and upper bounds for each column of X".

These are obviously bounds for a continuous search space. Can BoTorch be used for searching over discrete spaces?

Thank you so much for the package!
Avi

@avimit avimit added the enhancement New feature or request label Jun 19, 2019
@avimit avimit changed the title [Feature Request] Optimizing over discrete parameter domains Jun 19, 2019
@Balandat
Copy link
Contributor

Hi @avimit, BoTorch itself at this point doesn't directly deal with discrete parameters, but If your goal is simply to run BayesOpt (rather than developing new BayesOpt primitives), you can easily do this using the higher-level Ax interface. Ax will perform appropriate transformations (e.g. rounding for ordinal parameters, or one-hot encoding for categorical ones) when talking to BoTorch under the hood.

We are also working on improving support for discrete parameters directly in BoTorch (e.g. by using a concrete relaxation to still enable gradient-based optimization).

Finally, if you're willing to write your own Kernels or even models that comply with our basic API, you can do anything you like directly with discrete parameters and still use BoTorch's acquisition functions. Note, however, that optimizing over mixed domains will then also require using a different optimization algorithm, i.e., you can't use the functions in optim/optimize.py directly.

Hope this helps!

@avimit
Copy link
Author

avimit commented Jun 19, 2019

Hi @Balandat,

Thank you for the answer!

I will have a look at Ax, but I feel that we are in need of the more "low level" character of BoTorch.

Explanation: we have have our own small platform for distributing & managing jobs (mostly DL training) on the cloud, and on top of it we sometimes run HPO. Our HPO framework treats the training of models as a "black box" function, launching train jobs as resources are freed, and collecting evaluations when ready. Our use case for BoTorch is as a supplier of next points for evaluation; I managed to successfully integrate it (and enjoyed the process!) as long as I don't use discrete parameters.

Thanks,
Avi

@Balandat
Copy link
Contributor

Glad you were able to use BoTorch directly for this.

That said, Ax is designed for exactly your use case, and provides a number of different APIs (including lower-level ask/tell style ones) that should make that integration quite straightforward. You can also stick your own BoTorch components into these APIs if you want to retain the flexibility of customizing your own models/acquisition functions. See the tutorial on this.

As I mentioned, we are working on improving support for discrete parameters directly in Botorch. For now, what you could do is perform the parameter transformations that Ax does in this module yourself directly in torch.

@avimit
Copy link
Author

avimit commented Jun 19, 2019

I will sure give it a try!
Thanks a lot,
Avi

@eytan
Copy link
Contributor

eytan commented Jun 19, 2019

@avimit , based on your description, I think that Ax is the appropriate tool to plug into. For this ordered, categorical parameter (param3), you'll want to use a ChoiceParameter (as described here) with the keyword argument is_ordered=True. Ax will take care of all the necessary encoding, normalization of the search space and outcomes, etc.

Your setup sounds kind of like what one might do with Slurm or Ray Tune. We have forthcoming interfaces between Ax and these platforms, and they were quite straightforward to implement.

@eytan eytan closed this as completed Jun 19, 2019
@eytan
Copy link
Contributor

eytan commented Jun 20, 2019

You might want to look at the source code for the Ray Tune interface as an example.

@avimit
Copy link
Author

avimit commented Jun 20, 2019

@eytan, ChoiceParameter does look like what I need, and the is_ordered switch covers all options; the example in Ray Tune in indeed instructive. Thanks.

Taking a parameter transformation from Ax transforms is also a viable option, @Balandat; thank you too.

Now it's time for me to do the work ;}

@avimit
Copy link
Author

avimit commented Jul 4, 2019

Can you please direct me to a function in Ax which returns next candidates for evaluation, similar to joint_optimize() in BoTorch?

(I have been doing something like this:

model = botorch.models.SingleTaskGP(X, Y)
mll = gpytorch.mlls.ExactMarginalLogLikelihood(model.likelihood, model)
botorch.fit.fit_gpytorch_model(mll)

acquisition_function = botorch.acquisition.qNoisyExpectedImprovement(model, X_baseline)
X_candidates_tensor = botorch.optim.joint_optimize(acquisition_function, bounds=bounds, 
                                                                   q=batch_size, num_restarts=1, raw_samples=len(X))

)

Thanks!

@Balandat
Copy link
Contributor

Balandat commented Jul 4, 2019

If you use Ax's service api see API examples here, get_next_trial does what you want.

You can also use the developer api (next tab on the above page) - you can access the generated candidates ("arms") by accessing th arms property of the GeneratorRun returned by the gen function.

@avimit
Copy link
Author

avimit commented Jul 4, 2019

Thanks! I was there, but never noticed the tabs...

Aren't you supposed to be resting on 4th of July? ;)

@avimit
Copy link
Author

avimit commented Jul 6, 2019

Hi,

I must admit that I find implementing Ax for my task much harder compared to BoTorch.

The five schematic BoTorch lines I quote above do what I want: supply X new candidates for evaluation, given the known+pending evaluations.

Haven't yet found a way to do it with ax....
I've been trying to use the BotorchModel (for continuous params) via developer api:

Do I have to state an evaluation function when defining an "experiment"? Our "evaluation" is a training & testing of an ML model done on a cloud sever. I just want to feed the results to the model.

Also, I couldn't find how to load the known+pending evaluations to the model (and are the objective_weights, that the gen() fonction of BotorchModel requires, factors for low/high-fidelity evals?)

Thanks,
Avi

@Balandat
Copy link
Contributor

Balandat commented Jul 6, 2019

Our "evaluation" is a training & testing of an ML model done on a cloud sever

If you want to manage that manually you could just use the service API - calling parameters, trial_index = ax.get_next_trial() will give you a new set of parameters to evaluate. You can then communicate with your cloud server to run the ML evaluation and get some results back in however way you want. Then calling ax.complete_trial(trial_index=trial_index, raw_data=results) will include that data into the experiment state. Then calling ax.get_next_trial() gives you the next set of parameters.

The benefit of using Ax for this is that it keeps track of all the metadata, so you can for instance store a complete history of your experiment in a sql database (if you hook Ax up to it with the appropriate tables).

The developer API gives you even more flexibility. For instance you can write a Runner that implements the logic for talking to the ML training job on the cloud server. For more information on this see https://ax.dev/docs/runner.html

I hope this helps. If you have specific questions about Ax, we should probably move the discussion to an issue in the Ax repo. There you'll also have folks like @lena-kashtelyan, @amurthy1, @sdsingh who understand the Ax APIs much better than I do.

@taikichi-pc
Copy link

def get_next_point(init_x, init_y, best_init_y, bounds, candidates_num = 1):
        mxsingle_gp = MixedSingleTaskGP(init_x, init_y, cat_dims=[2, 3])
        mll = ExactMarginalLogLikelihood(likelihood = mxsingle_gp.likelihood, model = mxsingle_gp)
        fit_gpytorch_model(mll)
        model = qExpectedImprovement(model = mxsingle_gp, best_f = best_init_y)
        optimize_acqf_mixed(
                         acq_function = model
                         bounds = bounds,
                         fixed_features_list = [{3: 1}, {3: 2}, {2: 1}, {2: 2}],
                         q = candidates_num,
                         num_restarts = 200,
                         raw_samples = 512,
                         options = {"batch_limit": 5, "maxiter": 200}
        return candidates

I write fixed_features_list = [{3: 1}, {3: 2}, {2: 1}, {2: 2}] in order to fix discrete features(indices are 2&3, 0&1 are continuous), but only indice[3] of candidate is fixed. Then indice[2] is float like continuous.

Like this, When I'd like discrete inputs(>=2 this case x2, x3) candidate, how should I use it?

@Balandat
Copy link
Contributor

So doing fixed_features_list = [{3: 1}, {3: 2}, {2: 1}, {2: 2}] will run over the four dicts in the list, perform optimization with the features as specified in that dict fixed, and then pick the best value across all four dicts.

If you want to optimize the continuous features and fix the discrete ones, you should be doing sth like
fixed_features_list = [{2: 1, 3: 1}], which will rune one optimization with features 2 and 3 both fixed to 1. If you want multiple combinations of fixed features you can add additional dicts to the list.

@jackliu333
Copy link

Is this feature available now in BoTorch? I found the optimize_acqf() function only accepts continuous bounds in the bounds parameter.

@Balandat
Copy link
Contributor

Yes it is, check out optimize_acqf_mixed.

We also have an incoming PR (#1533) that implements the method from [1], but it'll take a bit to get that fully cleaned up and merged.

[1] S. Daulton et al. Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization. NeurIPS 2022

@jackliu333
Copy link

Yes it is, check out optimize_acqf_mixed.

We also have an incoming PR (#1533) that implements the method from [1], but it'll take a bit to get that fully cleaned up and merged.

[1] S. Daulton et al. Bayesian Optimization over Discrete and Mixed Spaces via Probabilistic Reparameterization. NeurIPS 2022

So if my understanding is correct, in optimize_acqf_mixed the bounds parameter handles continuous features and fixed_features_list handles discrete ones. So if we supply both arguments, such as the learning rate for bounds and the number of nodes in a layer for fixed_features_list, the function will condition on each unique discrete value of fixed_features_list and search for the optimal continuous value? In other words, it's the same as if we were to have a outer for loop over the discrete values and return the best one?

@Balandat
Copy link
Contributor

So if my understanding is correct, in optimize_acqf_mixed the bounds parameter handles continuous features and fixed_features_list handles discrete ones. So if we supply both arguments, such as the learning rate for bounds and the number of nodes in a layer for fixed_features_list, the function will condition on each unique discrete value of fixed_features_list and search for the optimal continuous value? In other words, it's the same as if we were to have a outer for loop over the discrete values and return the best one?

That is correct, it's a convenience function for essentially doing that. While this is the gold standard for optimizing, this obviously doesn't scale with the number of discrete features. Which is why we worked on [1], so this is incoming.

If you have discrete parameters with many ordered values (e.g. integers over a large range) it's usually fine to do a continuous relaxation for those (optimize over continuous range and then round).

@jackliu333
Copy link

Sounds pretty cool! Looking forward to it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants