Refactor Input Preproccesing and Mixed Optimization#626
Conversation
|
Hey guys, this PR escalated a bit but it is ready for review now. The failing tests are due to So, happy reviewing, in case of questions reach out to me! |
|
@bertiqwerty @TobyBoyne @LukasHebing Anybody volunteering to review, or should I provide more info etc? Best, Johannes |
|
I will have a look over the weekend :) |
I think, I understood the motivation and can see where this is going. I can take a deeper look in next week, because these are a lot of changes... |
Ah, now I understand the problem. Is it possible to use the github branch as a source, instead of the pypi release? I know that is possible with poetry or uv. Otherwise, we need to wait for the release anyway before we merge this PR, or? I'll try to add botorch manually and check that. |
|
@LukasHebing, I copied the class |
|
Main tests are now passing, so you could clone it locally, I will have a look on the rest. |
That is a great solution. Thanks! |
|
The test regarding latest botorch is failing due to a botorch bug in current main (meta-pytorch/botorch#3033). I try to fix this, but this just an edge case and should not affect the review process. Best, Johannes |
LukasHebing
left a comment
There was a problem hiding this comment.
Sorry for the delay: I could now follow all the categoric variable handling which is done under the hood.
There was a problem hiding this comment.
Doesn't have botorch also a great library for priors, which we use anyway later?
Is it maybe possible to just store a reference to the botorch priors / constraints with parameters, instead of building this elaborated structure? That would be more flexible, otherwise you need to add the infrastructure for all the new priors here as well.
There was a problem hiding this comment.
Hmm, we are using the botorch priors, what I added was the option to use more of them also within our data models.
| if ( | ||
| v.get(key, CategoricalEncodingEnum.ONE_HOT) | ||
| != CategoricalEncodingEnum.ONE_HOT | ||
| v.get(key, CategoricalEncodingEnum.ORDINAL) |
There was a problem hiding this comment.
Sorry, I also struggle to understand the scope of input_preprocessing_specs:
- When
ORDINALis the only possible option. Why do we have other options. - Where are the user-specified encodings (descriptor, etc.)?
| """Default input preprocessing specs for the GA optimizer: If none given, will use OneHot encoding for all categorical inputs""" | ||
| input_preprocessing_specs = {} | ||
| for input_ in domain.inputs.get(): | ||
| if isinstance(input_, CategoricalDescriptorInput): |
There was a problem hiding this comment.
I guess, the CategoricalDescriptorInput will be handled in the botorch transformation?
TobyBoyne
left a comment
There was a problem hiding this comment.
Finally had the chance to look through everything again, thank you for waiting :)
Everything looks good, only remaining unresolved comments are pretty minor. Approved!
|
I found a critical bug and fixed it, but will see over the next days if this is also affected anywhere else ;) |
bertiqwerty
left a comment
There was a problem hiding this comment.
Sorry, I am a little late to the party. Thanks Johannes. Thanks to the other reviewers. Looks good from my side. I just left some minor comments.
|
Ok, everything is now settled, tests are failing due to a new pydantic version which was released yesterday. Here is the fix: #638 But we have to wait until we merge this for a new botorch version, as there is one bug in the current release version which will lead to trouble. It is fixed now in botorch, but they have to file a release, I pinged Max regarding this. Let us see what they say. |
Motivation
This PR contains a major refactoring of BoFire. Here is a short summary:
optimize_acqf_mixed_alternatingfunctionality, which thanks to @TobyBoyne can now also deal with categoricals. I added a few other features there so that it is now ready for use within BoFire.optimize_acqf_mixed_alternatingperforms a block gradient descent optimization of the acqf and is an alternative to the GA on large mixed domains.optimize_acqf_mixed_alternatingexpects the categoricals always in an ordinal encoding. So far, BoFire was transforming the categoricals upfront to a vector encoding (like one-hot) before entering the actual surrogate. This will now change, categoricals will be always ordinal encoded, and different vector encodings can be realized by using a botorch input transform namely theNumericToCategoricalEncoding(https://github.com/pytorch/botorch/blob/b1097c6c475f29f694532c3282393ce8a67a9d6c/botorch/models/transforms/input.py#L1628C7-L1628C35). For this reason, the meaning of the input_preproccesing_specs will change, it is now not a transformation that is applied before the data enters the actual botorch model, instead it is a transformation which is applied within the model.AcquisitionOptimizer. @LukasHebing: why is thedomainnot an attribute of the acquisition optimizer? Would make it cleaner, or? I do not remember, why we decided to do it in the way we did it :DHave you read the Contributing Guidelines on pull requests?
Yes
Test Plan
Unit tests.