V5 API rework - unified API for optimizers and experiments #110

fkiraly · 2025-05-03T22:15:30Z

In-progress V5 API rework suggestions.

BaseExperiment and BaseOptimizer interface
a jupyter notebook with explanation of usage - experiment, optimizer, and sklearn tuner
documented extension contracts for both, in extension_templates
sklearn experiment SklearnCvExperiment, inheriting from BaseExperiment class for integration
three common optimization test functions, also inheriting from BaseExperiment
optimizers inheriting from BaseOptimizer
- example for gfo backend: HillClimbing
- GridSearch using sklearn ParameterSet, mostly equivalent to the grid search logic used in sklearn (minus parallelization - for now)
the existing HyperactiveSearchCV is refactored to use the SklearnCvExperiment internally, instead of the custom adapter from the previous stable version
a new OptCV for sklearn which allows tuning using any optimizer in the hyperactiveAPI - e.g., random search, grid search, bayesian optimization, tree of Parzen, etc
test framework skeleton using scikit-base for consistent API contracts is added, for optimizers and experiments - this is extensible with more tests
a registry and retrieval utility is added, currently in a private module _registry for use by the test system, but this could be made public to the user.

Note: compared to #101, I have reverted most changes to optimizers, and reverted all deleted tests.
These changes could be added in a separate PR, and I would consider them mostly orthogonal.

This is to restore downwards compatibility, and allow a more gentle refactor - or, merge with #101.

fkiraly · 2025-05-05T23:32:19Z

@SimonBlanke, I think this is complete and ready now

SimonBlanke · 2025-05-11T08:10:13Z

src/hyperactive/opt/gfo/hillclimbing.py

+            `MyClass(**params)` or `MyClass(**params[i])` creates a valid test instance.
+            `create_test_instance` uses the first (or only) dictionary in `params`
+        """
+        import numpy as np


This is just hard-coded as an example for now, right? Will this change in future PRs?

get_test_params are test examples - a pure user should never get in contact with those, except if they want a quick instance for testing themselves.

The test examples typically do not change, we might add more to improve test coverage though.

SimonBlanke · 2025-05-11T08:13:08Z

src/hyperactive/opt/gfo/hillclimbing.py

+
+    def __init__(
+        self,
+        search_space=None,


I like the first steps to an APi that makes different optimizer packages available!
But I would find it confusing as a user to pass the search-space in here (init) or in the add_search-method. There should be one correct way to do this.
What advantages would it bring to allow this?

What advantages would it bring to allow this?

I would be happy passing all in init, especially since set_params can be used to change some (and any) of the parameters later.

I have simply left add_search as an alt option since I thought you needed it in your downstream APIs? So, for downwards compatibility.

I would not mind removing it, since I do not know first-principles arguments why to have it in. A very weak argument is to separate different types of parameters - but since we also pass them to __init__, it is not a strong or compelling one.

SimonBlanke · 2025-05-11T08:15:52Z

The jupyter-notebook fails at cell 17: "OptCV tuning via GridSearch". I guess this is not implemented, yet. It is just there to show how this could work in the future?

SimonBlanke · 2025-05-11T08:23:30Z

extension_templates/optimizers.py

+            "paramc": 42,
+            "experiment": AnotherExperiment("another_experiment_params"),
+        }
+        return [paramset1, paramset2]


Why would a user create a custom optimizer class, why hard codes the experiment? Maybe I do not understand the purpose of a custom optimizer. Could you provide a simple example file (*.py-file) where a custom optimizer is created by a user and then used on an experiment?

Why would a user create a custom optimizer class, why hard codes the experiment?

Sorry if there was a misunderstanding, I think there is - let me know your thoughts on where to document this better.

The experiment is hard-coded only for the test - get_test_params is for testing purposes only.

In all test cases, we hard-code the experiment so we are also able to hard-code things like the search space, which will depend on the experiment.

SimonBlanke

Looks intriguing so far. There are some pain-points for me:

some hard coded stuff, that I do not understand. But maybe this is just examples for now
The ambiguity of passing parameter to init or to add_search.

Those points should be clarified for the next review.

Please provide runnable *.py example files (in the future). Those are much easier to review and to understand. A hill-climbing, a grid-search and a custom-optimizer example would be great. Optionally a not-runnable optuna example.

fkiraly · 2025-05-11T08:58:52Z

The jupyter-notebook fails at cell 17: "OptCV tuning via GridSearch". I guess this is not implemented, yet. It is just there to show how this could work in the future?

Hm, no, it should work - I re-executed the notebook and it runs for me, locally.

Could you perhaps report your environment and the nature of the failure?

fkiraly · 2025-05-11T09:05:52Z

some hard coded stuff, that I do not understand. But maybe this is just examples for now

You mean the hard-coded example parameters in get_test_params? These parameters are used in testing only - the instances created there are the ones passed to the tests in TestAllOptimizers etc. For the tests, we need valid instances of optimizers and experiments, so we somehwere need to hardcode valid test instances.

The ambiguity of passing parameter to init or to add_search.

Yes, I am also not 100% happy with this - as said above, I included it mostly for downwards compatibility. We can remove the add_search method, though it does not cost much to maintain it either.

What is your preference?

fkiraly · 2025-05-11T09:09:02Z

Please provide runnable *.py example files (in the future). Those are much easier to review and to understand.

As compared to jupyter? I disagree on this statement, but perhaps it is a matter of taste. You can also convert jupyter notebooks to py via nbconvert if you prefer py.

Most users in the data science space imo strongly prefer example code in jupyter notebooks and not in py files - therefore, irrespective of personal preference, I would suggest to go with jupyter, simply since that is the strong user base preference (afaik).

A hill-climbing, a grid-search and a custom-optimizer example would be great.

The first are both available in the notebook hyperactive_intro.ipynb? Could you be more concrete if you think sth is missing, as to what kind of example you would like?

SimonBlanke · 2025-05-11T11:43:13Z

The jupyter-notebook fails at cell 17: "OptCV tuning via GridSearch". I guess this is not implemented, yet. It is just there to show how this could work in the future?

Hm, no, it should work - I re-executed the notebook and it runs for me, locally.

Could you perhaps report your environment and the nature of the failure?

I fixed it by upgrading to the newest version of sklearn

SimonBlanke · 2025-05-11T12:03:28Z

The ambiguity of passing parameter to init or to add_search.

Yes, I am also not 100% happy with this - as said above, I included it mostly for downwards compatibility. We can remove the add_search method, though it does not cost much to maintain it either.

What is your preference?

So we pass those parameter to init, because of the sklearn data-class structure, right? But if the parameters belong to the method (like add_search) "semantically" it would still be okay, right? One example we discussed was the fit-method, that accepts the training data of an estimator.

Does that mean, that you think the search-space does belong to the init, because it fits better semantically?
I think you explained this before, but I cannot remember the idea here (or I need a different explanation)

However if we are really forced to pass all those paramters to "init", then we should (maybe) remove add_search, some time in the future. This method is mainly for parallel computing. Maybe we find a better way to support this in the future.

fkiraly · 2025-05-11T14:43:04Z

I fixed it by upgrading to the newest version of sklearn

Interesting - we should try to be compatible to a wider range of versions. What exactly was the failure, with which version?

fkiraly · 2025-05-11T14:45:08Z

So we pass those parameter to init, because of the sklearn data-class structure, right? But if the parameters belong to the method (like add_search) "semantically" it would still be okay, right?

I suppose it is a bit of a stretch from the API, but it would be ok in the sense that it does not introduce an incompatibility.

Does that mean, that you think the search-space does belong to the init, because it fits better semantically?

I am not sure about this - search space being one of the more fuzzy points of the design - but init as an "if in doubt" location, for now at least.

This method is mainly for parallel computing. Maybe we find a better way to support this in the future.

Ok - I did not yet understand the parallel computing case entirely. Is it simply running multiple "runs" in parallel?

SimonBlanke · 2025-05-12T06:13:56Z

I fixed it by upgrading to the newest version of sklearn

Interesting - we should try to be compatible to a wider range of versions. What exactly was the failure, with which version?

The version of sklearn was 1.5. The error looked like this:
"received ImportError: cannot import name '_deprecate_Xt_in_inverse_transform' from 'sklearn.utils.deprecation'"

Ok - I did not yet understand the parallel computing case entirely. Is it simply running multiple "runs" in parallel?

Correct, they run independend from each other, but they can share a memory-dictionary of the objective-functions are the same.

I am not sure about this - search space being one of the more fuzzy points of the design - but init as an "if in doubt" location, for now at least.

A search-space is a general parameter for optimization. I expect all optimization packages to have this parameter.

As I see it we have two alternatives to avoid the init-, add_search- parameter ambiguity:

remove add_search and therefore the ability to do parallel computing in the optimizer. If we want re-enable parallel computing we could maybe do it in a separate "scheduler"-class?
we keep add_search and pass the experiment and search_space to add_search (and n_iter?).

If we go the first way:
We should at least have a concept for parallel computing before we go forward.

fkiraly · 2025-05-13T12:57:00Z

The version of sklearn was 1.5. The error looked like this:
"received ImportError: cannot import name '_deprecate_Xt_in_inverse_transform' from 'sklearn.utils.deprecation'"

Would be useful to have the full traceback. This looks like a big problem, something is using private methods that it should not. The obvious question is whether the problem is in one of our packages or somewhere external.

Correct, they run independend from each other, but they can share a memory-dictionary of the objective-functions are the same.

I see - how do you avoid race conditions?

I am not sure about this - search space being one of the more fuzzy points of the design - but init as an "if in doubt" location, for now at least.

A search-space is a general parameter for optimization. I expect all optimization packages to have this parameter.

Yes, but most certainly the python representations will vary widely, and often there is a mix of search space and search configuration (e.g., distributions).

As I see it we have two alternatives to avoid the init-, add_search- parameter ambiguity:

Plus the third alternative which is the current - via add_search we can support parallelism with data sharing, whereas __init__ supports the dataclass-like specification syntax.

If we go the first way:
We should at least have a concept for parallel computing before we go forward.

If I understand correctly, this is not just parallel computing but parallelism with shared search space, right?

Can you outline the simplest case in which non-trivial data sharing happens? If the search problems are completely different, it makes no sense to do that. So what is the simplest non-trivial case?

fkiraly · 2025-05-15T20:54:58Z

We should at least have a concept for parallel computing before we go forward.

This seems like the only open point - could we move to close it?

I do not understand the use case here, can you provide at least one non-trivial example where it is not just running two instances of run in parallel, optimally with current working code?

SimonBlanke

fix error(s)

src/hyperactive/base/tests/test_endtoend.py

SimonBlanke

Let's merge this! :-)

fkiraly and others added 30 commits October 22, 2024 16:33

Update pyproject.toml

1cf9bd2

Merge remote-tracking branch 'upstream/master' into optimizer-base-class

aaff76e

base classes and sklearn

40fe192

Update sklearn_cv_experiment.py

606ba62

refactor integration

34581ce

add API-design draft for optimizer, experiment and search-space

3bb0044

add docstring and explanations

9dcc025

fix import path

b5d6e7c

move search-space class

ca0379a

Merge branch 'dev' into v5-API-design

0fd016e

put v4 API into separate dir

e6da3eb

add src dir for v5 API

8f4fdc6

create working prototype of v5 API

5580677

Merge branch 'dev' into v5-API-design

12feb4a

add example for new API

87c9076

get callbacks and catch parameters from experiment

040d63f

set callbacks and catch to empty dict if None

e5336f0

use experiment to pass objective-function

1b028c2

show experiment name in output (instead of objective-function)

eec01d0

get obj-func from experiment

d9f12b6

readd result attributes

ce8b82f

remove print

2e75c3f

reformat

66ca1f6

add composite optimizer class

957e7de

use comp-opt class to enable parallel via 'add'-method

549e200

enable adding more than 2 optimizers together

7b8813c

rename hyper_optimizer -> search and fix multiprocessing

e110061

refactor callbacks/catch-parameter

a565549

move old tests; add new test-base file

b83143f

implement sklearn-approved experiment class

19719e2

fkiraly added 2 commits May 6, 2025 01:33

Update pyproject.toml

c610e7c

Update opt_cv.py

a4a8d9a

fkiraly mentioned this pull request May 6, 2025

[ENH] enable report engines (mlflow) for benchmarking sktime/sktime#8228

Open

SimonBlanke reviewed May 11, 2025

View reviewed changes

SimonBlanke requested changes May 11, 2025

View reviewed changes

fkiraly added 5 commits May 16, 2025 23:37

remove add_search

85cfe57

doctest runner

6261b43

integration test

5d32343

Update test_endtoend.py

0b906a7

Merge remote-tracking branch 'upstream/master' into FK-v5

c491ef0

SimonBlanke requested changes May 17, 2025

View reviewed changes

src/hyperactive/base/tests/test_endtoend.py Outdated Show resolved Hide resolved

fkiraly added 2 commits May 18, 2025 00:52

Update test_endtoend.py

6d0d7f4

Update test_doctest.py

c8684c9

SimonBlanke approved these changes May 18, 2025

View reviewed changes

SimonBlanke merged commit 731808f into hyperactive-project:master May 18, 2025
16 checks passed

SimonBlanke mentioned this pull request Jun 9, 2025

[MNT] compatibility issues with scikit-learn 1.7.0 #128

Closed

V5 API rework - unified API for optimizers and experiments #110

V5 API rework - unified API for optimizers and experiments #110

Uh oh!

Conversation

fkiraly commented May 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented May 5, 2025

Uh oh!

SimonBlanke May 11, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly May 11, 2025

Choose a reason for hiding this comment

Uh oh!

SimonBlanke May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly May 11, 2025

Choose a reason for hiding this comment

Uh oh!

SimonBlanke commented May 11, 2025

Uh oh!

SimonBlanke May 11, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly May 11, 2025

Choose a reason for hiding this comment

Uh oh!

SimonBlanke left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly commented May 11, 2025

Uh oh!

fkiraly commented May 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented May 11, 2025

Uh oh!

SimonBlanke commented May 11, 2025

Uh oh!

SimonBlanke commented May 11, 2025

Uh oh!

fkiraly commented May 11, 2025

Uh oh!

fkiraly commented May 11, 2025

Uh oh!

SimonBlanke commented May 12, 2025

Uh oh!

fkiraly commented May 13, 2025

Uh oh!

fkiraly commented May 15, 2025

Uh oh!

SimonBlanke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

SimonBlanke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fkiraly commented May 3, 2025 •

edited

Loading

SimonBlanke May 11, 2025 •

edited

Loading

fkiraly commented May 11, 2025 •

edited

Loading