Programatically finding all supported solvers and losses for an estimator #14063

rth · 2019-06-11T08:57:29Z

It would be useful to have some mechanism of determining supported solvers for a given estimator. First, because currently check_estimator only runs on the default solver and so we are potentially not testing a number of configurations.

It would also make easier to check that all solvers yield comparable results such as #13914

Of course, this can also be generalized to other parameters that impact the solver but should not change the way estimator behaves.

One way could be just to store e.g. _supported_solvers as a private attribute, or possibly in type annotations (though this would require PEP 586 from Python 3.8 that can be backported by vendoring typing_extensions.py), related to #11170

Another way could be to have some method that yields possible estimator variants to be tested. @amueller if I remember correctly you mentioned something similar in the estimator tags PR.

The text was updated successfully, but these errors were encountered:

jnothman · 2019-06-11T22:02:19Z

I've said elsewhere that I think we need a more general way to list instances to run estimator checks on. I don't see how this is limited to solver, or how these would be enumerated if solver were not the only axis of variation.

rth · 2019-06-12T08:27:30Z

I agree that a more general solution might be needed. Solvers seemed like the most simple and useful point where to start. We are also somewhat limited by test runtime (i.e. if we multiply common test execution time by 5-10 due to exhaustively trying all estimator variants it might become an issue).

Also I think it's worth distinguishing parameters where we expect that in the ideal world the estimator would be equivalent (e.g. solvers, initialization conditions) and those where it would't in general (tol, max_iter).

rth · 2019-06-13T12:57:57Z

we need a more general way to list instances to run estimator checks on

I see you proposed a related implementation in #11324

amueller · 2019-06-13T14:00:05Z

Or we could list these in the tests for the estimator, i.e. do a parametrize with things we want to test and then call the yield from the estimator checks.

jnothman · 2019-06-18T08:10:31Z

Adding these in the tests for the estimator is the most straightforward solution for now... Where it falls down is only in our ability to say "all estimators pass common tests" by running `pytest sklearn.tests.test_common`.

amueller · 2019-06-18T19:00:51Z

I'm not sure if that's a good goal tbh.

If/once we have a specification of the config space (#13031), i.e. all allowed options, we could try common tests for random configurations from the allowed space. But then we either need to sample multiple times, or we need to change the randomness each run.
In a sense what we want is a subset of the config space that sufficiently covers "all paths". We haven't really decided how to attach a space to an estimator, but if we name/tag them we could have a "legal" space, a "tuning" space and a "testing" space?

We could even have an offline / cron test that checks that the "test" space has enough coverage if we want to get really magical ;)

jnothman · 2019-06-19T22:09:32Z

Yes, it's not a great goal and we could probably check better with some pytest hook that some check_estimator fixture had been applied at least once to each public estimator class.

rth · 2020-06-03T23:58:00Z

There is dynamic way of detecting this by running e.g. Estimator(solver=solver).fit(...) where solver is a custom string like object that remembers what it is compared against. Code in #17441. It's a bit of a hack but it mostly works,

>>> from sklearn.tests.test_common_non_default import detect_all_params
>>> from sklearn.linear_model import LogisticRegression
>>> detect_all_params(LogisticRegression)
{'solver': ['lbfgs', 'liblinear', 'newton-cg', 'sag', 'saga'],
 'multi_class': ['auto', 'multinomial', 'ovr'],
 'fit_intercept': [False, True],
 'dual': [False, True]}

amueller changed the title ~~Programatticaly finding all supported solvers for an estimator~~ Programatically finding all supported solvers for an estimator Jun 18, 2019

rth mentioned this issue Jul 31, 2019

[WIP] API specify test parameters via classmethod #11324

Open

5 tasks

rth mentioned this issue Mar 18, 2020

Support typing #16705

Open

rth changed the title ~~Programatically finding all supported solvers for an estimator~~ Programatically finding all supported solvers and losses for an estimator Jun 3, 2020

rth mentioned this issue Jun 3, 2020

Run common checks on estimators with non default parameters #17441

Open

2 tasks

jeremiedbb mentioned this issue Mar 15, 2022

MAINT Common parameter validation #22722

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Programatically finding all supported solvers and losses for an estimator #14063

Programatically finding all supported solvers and losses for an estimator #14063

rth commented Jun 11, 2019 •

edited

jnothman commented Jun 11, 2019 via email

rth commented Jun 12, 2019

rth commented Jun 13, 2019

amueller commented Jun 13, 2019

jnothman commented Jun 18, 2019 via email

amueller commented Jun 18, 2019

jnothman commented Jun 19, 2019 via email

rth commented Jun 3, 2020

Programatically finding all supported solvers and losses for an estimator #14063

Programatically finding all supported solvers and losses for an estimator #14063

Comments

rth commented Jun 11, 2019 • edited

jnothman commented Jun 11, 2019 via email

rth commented Jun 12, 2019

rth commented Jun 13, 2019

amueller commented Jun 13, 2019

jnothman commented Jun 18, 2019 via email

amueller commented Jun 18, 2019

jnothman commented Jun 19, 2019 via email

rth commented Jun 3, 2020

rth commented Jun 11, 2019 •

edited