Skip to content

Refactor Input Preproccesing and Mixed Optimization#626

Merged
jduerholt merged 42 commits intomainfrom
refactor/mixed_optimization
Oct 24, 2025
Merged

Refactor Input Preproccesing and Mixed Optimization#626
jduerholt merged 42 commits intomainfrom
refactor/mixed_optimization

Conversation

@jduerholt
Copy link
Contributor

Motivation

This PR contains a major refactoring of BoFire. Here is a short summary:

  • botorch introduced the optimize_acqf_mixed_alternating functionality, which thanks to @TobyBoyne can now also deal with categoricals. I added a few other features there so that it is now ready for use within BoFire. optimize_acqf_mixed_alternating performs a block gradient descent optimization of the acqf and is an alternative to the GA on large mixed domains.
  • optimize_acqf_mixed_alternating expects the categoricals always in an ordinal encoding. So far, BoFire was transforming the categoricals upfront to a vector encoding (like one-hot) before entering the actual surrogate. This will now change, categoricals will be always ordinal encoded, and different vector encodings can be realized by using a botorch input transform namely the NumericToCategoricalEncoding (https://github.com/pytorch/botorch/blob/b1097c6c475f29f694532c3282393ce8a67a9d6c/botorch/models/transforms/input.py#L1628C7-L1628C35). For this reason, the meaning of the input_preproccesing_specs will change, it is now not a transformation that is applied before the data enters the actual botorch model, instead it is a transformation which is applied within the model.
  • This change will make setting up the acqf optimization much cleaner as we do not have to account on this level for the different encodings.
  • I am currently rewriting the AcquisitionOptimizer. @LukasHebing: why is the domain not an attribute of the acquisition optimizer? Would make it cleaner, or? I do not remember, why we decided to do it in the way we did it :D

Have you read the Contributing Guidelines on pull requests?

Yes

Test Plan

Unit tests.

@jduerholt jduerholt marked this pull request as ready for review September 17, 2025 12:51
@jduerholt
Copy link
Contributor Author

Hey guys, this PR escalated a bit but it is ready for review now. The failing tests are due to NumericalToCategoricalEncoding input transform being not yet part of the latest botorch release. For this reason it only runs through in testing_against_latest_botorch. But I pinged the botorch guys regarding filing a new release for botorch.

So, happy reviewing, in case of questions reach out to me!

@jduerholt
Copy link
Contributor Author

@bertiqwerty @TobyBoyne @LukasHebing

Anybody volunteering to review, or should I provide more info etc?

Best,

Johannes

@TobyBoyne
Copy link
Collaborator

I will have a look over the weekend :)

@LukasHebing
Copy link
Contributor

@bertiqwerty @TobyBoyne @LukasHebing

Anybody volunteering to review, or should I provide more info etc?

Best,

Johannes

I think, I understood the motivation and can see where this is going. I can take a deeper look in next week, because these are a lot of changes...

@LukasHebing
Copy link
Contributor

@LukasHebing: the problem is that this input transform is not yet in the latest botorch release (it is in main and will be part of the next release), so one would need to install botorch from source for this. I can just copy the source code in here, should I? And as soon as it is in botorch release, we remove it again from bofire, maybe this is the smartest way of dealing with it ...

Ah, now I understand the problem. Is it possible to use the github branch as a source, instead of the pypi release? I know that is possible with poetry or uv.

Otherwise, we need to wait for the release anyway before we merge this PR, or?

I'll try to add botorch manually and check that.

@jduerholt
Copy link
Contributor Author

@LukasHebing, I copied the class NumericToCategoricalEncoding from main botorch to bofire, until they have it in the next release, then we can remove it again (I also do not copy the tests over). But this gives us option to move forward, as I want to implement other things based on this PR ;)

@jduerholt
Copy link
Contributor Author

Main tests are now passing, so you could clone it locally, I will have a look on the rest.

@LukasHebing
Copy link
Contributor

@LukasHebing, I copied the class NumericToCategoricalEncoding from main botorch to bofire, until they have it in the next release, then we can remove it again (I also do not copy the tests over). But this gives us option to move forward, as I want to implement other things based on this PR ;)

That is a great solution. Thanks!

@jduerholt
Copy link
Contributor Author

The test regarding latest botorch is failing due to a botorch bug in current main (meta-pytorch/botorch#3033). I try to fix this, but this just an edge case and should not affect the review process. Best, Johannes

Copy link
Contributor

@LukasHebing LukasHebing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay: I could now follow all the categoric variable handling which is done under the hood.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't have botorch also a great library for priors, which we use anyway later?
Is it maybe possible to just store a reference to the botorch priors / constraints with parameters, instead of building this elaborated structure? That would be more flexible, otherwise you need to add the infrastructure for all the new priors here as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we are using the botorch priors, what I added was the option to use more of them also within our data models.

if (
v.get(key, CategoricalEncodingEnum.ONE_HOT)
!= CategoricalEncodingEnum.ONE_HOT
v.get(key, CategoricalEncodingEnum.ORDINAL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I also struggle to understand the scope of input_preprocessing_specs:

  • When ORDINAL is the only possible option. Why do we have other options.
  • Where are the user-specified encodings (descriptor, etc.)?

"""Default input preprocessing specs for the GA optimizer: If none given, will use OneHot encoding for all categorical inputs"""
input_preprocessing_specs = {}
for input_ in domain.inputs.get():
if isinstance(input_, CategoricalDescriptorInput):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess, the CategoricalDescriptorInput will be handled in the botorch transformation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes!

Copy link
Collaborator

@TobyBoyne TobyBoyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally had the chance to look through everything again, thank you for waiting :)

Everything looks good, only remaining unresolved comments are pretty minor. Approved!

@jduerholt
Copy link
Contributor Author

I found a critical bug and fixed it, but will see over the next days if this is also affected anywhere else ;)

Copy link
Contributor

@bertiqwerty bertiqwerty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I am a little late to the party. Thanks Johannes. Thanks to the other reviewers. Looks good from my side. I just left some minor comments.

@jduerholt
Copy link
Contributor Author

Ok, everything is now settled, tests are failing due to a new pydantic version which was released yesterday. Here is the fix: #638

But we have to wait until we merge this for a new botorch version, as there is one bug in the current release version which will lead to trouble. It is fixed now in botorch, but they have to file a release, I pinged Max regarding this. Let us see what they say.

@jduerholt jduerholt merged commit 5126e09 into main Oct 24, 2025
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants