Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Programatically finding all supported solvers and losses for an estimator #14063

Open
rth opened this issue Jun 11, 2019 · 8 comments
Open

Programatically finding all supported solvers and losses for an estimator #14063

rth opened this issue Jun 11, 2019 · 8 comments

Comments

@rth
Copy link
Member

rth commented Jun 11, 2019

It would be useful to have some mechanism of determining supported solvers for a given estimator. First, because currently check_estimator only runs on the default solver and so we are potentially not testing a number of configurations.

It would also make easier to check that all solvers yield comparable results such as #13914

Of course, this can also be generalized to other parameters that impact the solver but should not change the way estimator behaves.

One way could be just to store e.g. _supported_solvers as a private attribute, or possibly in type annotations (though this would require PEP 586 from Python 3.8 that can be backported by vendoring typing_extensions.py), related to #11170

Another way could be to have some method that yields possible estimator variants to be tested. @amueller if I remember correctly you mentioned something similar in the estimator tags PR.

@jnothman
Copy link
Member

jnothman commented Jun 11, 2019 via email

@rth
Copy link
Member Author

rth commented Jun 12, 2019

I agree that a more general solution might be needed. Solvers seemed like the most simple and useful point where to start. We are also somewhat limited by test runtime (i.e. if we multiply common test execution time by 5-10 due to exhaustively trying all estimator variants it might become an issue).

Also I think it's worth distinguishing parameters where we expect that in the ideal world the estimator would be equivalent (e.g. solvers, initialization conditions) and those where it would't in general (tol, max_iter).

@rth
Copy link
Member Author

rth commented Jun 13, 2019

we need a more general way to list instances to run estimator checks on

I see you proposed a related implementation in #11324

@amueller
Copy link
Member

Or we could list these in the tests for the estimator, i.e. do a parametrize with things we want to test and then call the yield from the estimator checks.

@jnothman
Copy link
Member

jnothman commented Jun 18, 2019 via email

@amueller
Copy link
Member

I'm not sure if that's a good goal tbh.

If/once we have a specification of the config space (#13031), i.e. all allowed options, we could try common tests for random configurations from the allowed space. But then we either need to sample multiple times, or we need to change the randomness each run.
In a sense what we want is a subset of the config space that sufficiently covers "all paths". We haven't really decided how to attach a space to an estimator, but if we name/tag them we could have a "legal" space, a "tuning" space and a "testing" space?

We could even have an offline / cron test that checks that the "test" space has enough coverage if we want to get really magical ;)

@amueller amueller changed the title Programatticaly finding all supported solvers for an estimator Programatically finding all supported solvers for an estimator Jun 18, 2019
@jnothman
Copy link
Member

jnothman commented Jun 19, 2019 via email

@rth rth changed the title Programatically finding all supported solvers for an estimator Programatically finding all supported solvers and losses for an estimator Jun 3, 2020
@rth
Copy link
Member Author

rth commented Jun 3, 2020

There is dynamic way of detecting this by running e.g. Estimator(solver=solver).fit(...) where solver is a custom string like object that remembers what it is compared against. Code in #17441. It's a bit of a hack but it mostly works,

>>> from sklearn.tests.test_common_non_default import detect_all_params
>>> from sklearn.linear_model import LogisticRegression
>>> detect_all_params(LogisticRegression)
{'solver': ['lbfgs', 'liblinear', 'newton-cg', 'sag', 'saga'],
 'multi_class': ['auto', 'multinomial', 'ovr'],
 'fit_intercept': [False, True],
 'dual': [False, True]}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants