-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC BayesSearchCV in scikit-learn #26170
Comments
As a one of the scikit-optimize creators: if someone who has experience maintaining open-source projects wants to take over maintenance of scikit-optimize please get in touch! Given the additional complexity of using a tool like The last time we (the scikit-optimize authors) paid attention to the literature/benchmarks related to this topic our conclusion was that renting more CPU+RAM was way less complicated. |
@betatim Thank you so much for your effort in |
+1 for community maintained implementation either in However as @betatim said, I think this is too much complexity to be maintained as part of the main |
That basically means putting more money behind it, and I don't think we can expect everybody to be able to do that. I've heard over and over that people want BayesSearchCV in scikit-learn, so from that perspective, I'd be in favor of having it in the core library. Maybe a draft PR wouldn't hurt to see how complex inclusion of it in scikit-learn would be? |
Together with a PR (or maybe even before) it would be interesting to see a comparison to random search or sequential halving to evaluate how big an improvement you get (in terms of time spent to find a "nearly optimal point"). That way, no matter what the outcome, we can either add it to scikit-learn or write a blog post + docs to point out why it is not needed. |
Hi, have maintainers decided to work on this or this is open to public? Me and @jiawei-zhang-a have been interested in this for a while, if possible we would like to help with drafting a PR. |
@Charlie-XIAO we would really appreciate it if you could start with what @betatim suggests and a draft PR. |
Depending on your prior about the outcome of a benchmark, it might be wise to start with that instead of a PR. Something I don't know how to benchmark/measure is the "you trade tuning your estimator for tuning the regression model used by Another line of though is to investigate what the popular optimisers/models are in tools like https://optuna.org/. Optuna seems like it is a popular choice. I don't think it contains any "secret sauce" that people couldn't get elsewhere, so understanding why people choose it and which part of it they use could give you an unfair advantage in terms of making a hyper-parameter optimiser that is simple to maintain, simple to use and used a lot. But this moves the goal from "create a BayesSearchCV" to "create a simple and powerful hyper-parameter optimiser". Whether this is a good thing or not depends on people's priors (I guess). A (very) old paper comparing methods https://proceedings.neurips.cc/paper_files/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf |
Thanks for the information @betatim I'll definitely take a look. |
The original Hyperband paper that uses successive halving has benchmarks against Bayesian methods. To experiment with, Optuna already has a Search CV to use their strategies. For reference, here is a recent paper comparing Optuna and HyperOpt. Given the third party options out there, I am not too excited about adding a Bayes Search CV. I rather push people to use successive halving and further improve it. For example, we can take advantage of warm starting: #15125 @amueller may have more insights on this topic |
As for benchmarking, optuna developers seem to use their own kurobako and there is also HBOBench.
This does seem reasonable. For instance Hyperband (based on successive halving) seems to perform better than Bayesian optimization with small~medium budget (because Bayesian is like random initially and only stands out with larger budgets). However, there is also BOHB combining Bayesian optimization and Hyperband that again outperforms Hyperband on most tasks (though the BOHB repo is no longer maintained). Here is an interesting post that might be worth reading. I have also seen DEHB based on differential evolution and Hyperband that claims to beat BOHB. The repo is here. I'm not really sure what scikit-learn wants to have. An algorithm that has been quite popular, a state-of-the-art algorithm, or something else?
I'm thinking that scikit-learn is still nowadays the first choice when dealing with machine learning problems especially for beginners (and beginners may not even know Regardless, I don't think I'm as familiar with this topic as maintainers (just providing some information and personal thoughts here). Maybe I should first wait for you to reach an agreement? |
Yea, from what I've seen, people have moved away from bayesian approaches to approaches that takes training time into consideration.
From memory, I think successive halving is close enough to Hyperband and Hyperband is better than Bayesian approaches unless you have a lot of time to search the space. At the end of the day, I'll always recommend |
Seems like this is becoming more of a documentation issue then. We should probably make sure we refer people to good examples from all SearchCV classes to make sure they do best practices. |
In the pointers/docs we add can we add "keywords" like "hyperopt" and the like to help people (including me - I always have to look up the paper to remind myself that it is basically the same) realise that the, somewhat obscurely named, 👍 improved docs and examples and references within the docs. |
Discussed in #26141
Originally posted by earlev4 April 11, 2023
Hi! First off, thanks so much to the excellent work done by all the scikit-learn contributors! The project is truly a gift and your work is greatly appreciated!
I still consider myself a novice when it comes to scikit-learn. In my usage, I typically will attempt to use GridSearchCV when searching for the best parameters. However, depending on the search space, GridSearchCV might not always be the best option and can be computationally expensive and time-consuming in some scenarios. RandomizedSearchCV can be an alternative in these situations, but does not always seem to provide the best parameters compared to scikit-optimize BayesSearchCV. In my humble opinion, scikit-optimize BayesSearchCV seems to be a nice compromise between GridSearchCV and RandomizedSearchCV, providing good parameters in a reasonable time.
Unfortunately, scikit-optimize BayesSearchCV seems to be no longer supported. The last commit was in 2021. As of NumPy 1.24, NumPy now results in an error, unless a workaround of
np.int = int
is used. This is just one example. There are numerous issues that have not been touched since 2021. It would be a shame to lose a project such as scikit-optimize BayesSearchCV and humbly ask the contributors of scikit-learn if a similar version could be implemented in scikit-learn.Thanks so much for your consideration! Looking forward to the discussion.
I thought
scikit-optimize
was a maintained project, but it seems it isn't, and I agree it'd be nice for the community to have a maintainedBayesSearchCV
available.I'm not sure if we should include it here, or
scikit-learn-extra
, but would be nice to have it.The text was updated successfully, but these errors were encountered: