New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Programatically finding all supported solvers and losses for an estimator #14063
Comments
I've said elsewhere that I think we need a more general way to list
instances to run estimator checks on. I don't see how this is limited to
solver, or how these would be enumerated if solver were not the only axis
of variation.
|
I agree that a more general solution might be needed. Solvers seemed like the most simple and useful point where to start. We are also somewhat limited by test runtime (i.e. if we multiply common test execution time by 5-10 due to exhaustively trying all estimator variants it might become an issue). Also I think it's worth distinguishing parameters where we expect that in the ideal world the estimator would be equivalent (e.g. |
I see you proposed a related implementation in #11324 |
Or we could list these in the tests for the estimator, i.e. do a parametrize with things we want to test and then call the |
Adding these in the tests for the estimator is the most straightforward
solution for now... Where it falls down is only in our ability to say "all
estimators pass common tests" by running `pytest sklearn.tests.test_common`.
|
I'm not sure if that's a good goal tbh. If/once we have a specification of the config space (#13031), i.e. all allowed options, we could try common tests for random configurations from the allowed space. But then we either need to sample multiple times, or we need to change the randomness each run. We could even have an offline / cron test that checks that the "test" space has enough coverage if we want to get really magical ;) |
Yes, it's not a great goal and we could probably check better with some
pytest hook that some check_estimator fixture had been applied at least
once to each public estimator class.
|
There is dynamic way of detecting this by running e.g. >>> from sklearn.tests.test_common_non_default import detect_all_params
>>> from sklearn.linear_model import LogisticRegression
>>> detect_all_params(LogisticRegression)
{'solver': ['lbfgs', 'liblinear', 'newton-cg', 'sag', 'saga'],
'multi_class': ['auto', 'multinomial', 'ovr'],
'fit_intercept': [False, True],
'dual': [False, True]} |
It would be useful to have some mechanism of determining supported solvers for a given estimator. First, because currently
check_estimator
only runs on the default solver and so we are potentially not testing a number of configurations.It would also make easier to check that all solvers yield comparable results such as #13914
Of course, this can also be generalized to other parameters that impact the solver but should not change the way estimator behaves.
One way could be just to store e.g.
_supported_solvers
as a private attribute, or possibly in type annotations (though this would require PEP 586 from Python 3.8 that can be backported by vendoringtyping_extensions.py
), related to #11170Another way could be to have some method that yields possible estimator variants to be tested. @amueller if I remember correctly you mentioned something similar in the estimator tags PR.
The text was updated successfully, but these errors were encountered: