-
Notifications
You must be signed in to change notification settings - Fork 65
V5 API rework - unified API for optimizers and experiments #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@SimonBlanke, I think this is complete and ready now |
| `MyClass(**params)` or `MyClass(**params[i])` creates a valid test instance. | ||
| `create_test_instance` uses the first (or only) dictionary in `params` | ||
| """ | ||
| import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just hard-coded as an example for now, right? Will this change in future PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_test_params are test examples - a pure user should never get in contact with those, except if they want a quick instance for testing themselves.
The test examples typically do not change, we might add more to improve test coverage though.
|
|
||
| def __init__( | ||
| self, | ||
| search_space=None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the first steps to an APi that makes different optimizer packages available!
But I would find it confusing as a user to pass the search-space in here (init) or in the add_search-method. There should be one correct way to do this.
What advantages would it bring to allow this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What advantages would it bring to allow this?
I would be happy passing all in init, especially since set_params can be used to change some (and any) of the parameters later.
I have simply left add_search as an alt option since I thought you needed it in your downstream APIs? So, for downwards compatibility.
I would not mind removing it, since I do not know first-principles arguments why to have it in. A very weak argument is to separate different types of parameters - but since we also pass them to __init__, it is not a strong or compelling one.
|
The jupyter-notebook fails at cell 17: " |
| "paramc": 42, | ||
| "experiment": AnotherExperiment("another_experiment_params"), | ||
| } | ||
| return [paramset1, paramset2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would a user create a custom optimizer class, why hard codes the experiment? Maybe I do not understand the purpose of a custom optimizer. Could you provide a simple example file (*.py-file) where a custom optimizer is created by a user and then used on an experiment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would a user create a custom optimizer class, why hard codes the experiment?
Sorry if there was a misunderstanding, I think there is - let me know your thoughts on where to document this better.
The experiment is hard-coded only for the test - get_test_params is for testing purposes only.
In all test cases, we hard-code the experiment so we are also able to hard-code things like the search space, which will depend on the experiment.
SimonBlanke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks intriguing so far. There are some pain-points for me:
- some hard coded stuff, that I do not understand. But maybe this is just examples for now
- The ambiguity of passing parameter to init or to add_search.
Those points should be clarified for the next review.
Please provide runnable *.py example files (in the future). Those are much easier to review and to understand. A hill-climbing, a grid-search and a custom-optimizer example would be great. Optionally a not-runnable optuna example.
Hm, no, it should work - I re-executed the notebook and it runs for me, locally. Could you perhaps report your environment and the nature of the failure? |
You mean the hard-coded example parameters in
Yes, I am also not 100% happy with this - as said above, I included it mostly for downwards compatibility. We can remove the What is your preference? |
As compared to jupyter? I disagree on this statement, but perhaps it is a matter of taste. You can also convert jupyter notebooks to py via Most users in the data science space imo strongly prefer example code in jupyter notebooks and not in py files - therefore, irrespective of personal preference, I would suggest to go with jupyter, simply since that is the strong user base preference (afaik).
The first are both available in the notebook |
I fixed it by upgrading to the newest version of sklearn |
So we pass those parameter to init, because of the sklearn data-class structure, right? But if the parameters belong to the method (like Does that mean, that you think the search-space does belong to the init, because it fits better semantically? However if we are really forced to pass all those paramters to "init", then we should (maybe) remove |
Interesting - we should try to be compatible to a wider range of versions. What exactly was the failure, with which version? |
I suppose it is a bit of a stretch from the API, but it would be ok in the sense that it does not introduce an incompatibility.
I am not sure about this - search space being one of the more fuzzy points of the design - but init as an "if in doubt" location, for now at least.
Ok - I did not yet understand the parallel computing case entirely. Is it simply running multiple "runs" in parallel? |
The version of sklearn was 1.5. The error looked like this:
Correct, they run independend from each other, but they can share a memory-dictionary of the objective-functions are the same.
A search-space is a general parameter for optimization. I expect all optimization packages to have this parameter. As I see it we have two alternatives to avoid the init-, add_search- parameter ambiguity:
If we go the first way: |
Would be useful to have the full traceback. This looks like a big problem, something is using private methods that it should not. The obvious question is whether the problem is in one of our packages or somewhere external.
I see - how do you avoid race conditions?
Yes, but most certainly the python representations will vary widely, and often there is a mix of search space and search configuration (e.g., distributions).
Plus the third alternative which is the current - via
If I understand correctly, this is not just parallel computing but parallelism with shared search space, right? Can you outline the simplest case in which non-trivial data sharing happens? If the search problems are completely different, it makes no sense to do that. So what is the simplest non-trivial case? |
This seems like the only open point - could we move to close it? I do not understand the use case here, can you provide at least one non-trivial example where it is not just running two instances of |
SimonBlanke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fix error(s)
SimonBlanke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge this! :-)
In-progress V5 API rework suggestions.
BaseExperimentandBaseOptimizerinterfacesklearntunerextension_templatessklearnexperimentSklearnCvExperiment, inheriting fromBaseExperimentclass for integrationBaseExperimentBaseOptimizergfobackend:HillClimbingGridSearchusingsklearnParameterSet, mostly equivalent to the grid search logic used insklearn(minus parallelization - for now)HyperactiveSearchCVis refactored to use theSklearnCvExperimentinternally, instead of the custom adapter from the previous stable versionOptCVforsklearnwhich allows tuning using any optimizer in thehyperactiveAPI - e.g., random search, grid search, bayesian optimization, tree of Parzen, etcscikit-basefor consistent API contracts is added, for optimizers and experiments - this is extensible with more tests_registryfor use by the test system, but this could be made public to the user.Note: compared to #101, I have reverted most changes to
optimizers, and reverted all deleted tests.These changes could be added in a separate PR, and I would consider them mostly orthogonal.
This is to restore downwards compatibility, and allow a more gentle refactor - or, merge with #101.