Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check_estimator is not sufficiently general #6715

Open
jakevdp opened this issue Apr 25, 2016 · 25 comments

Comments

@jakevdp
Copy link
Member

@jakevdp jakevdp commented Apr 25, 2016

I've been writing a lot of scikit-learn compatible code lately, and I love the idea of the general checks in check_estimator to make sure that my code is scikit-learn compatible. But in nearly all cases, I'm finding that these checks are not general enough for the objects I'm creating (for example: MSTClutering, which fails because it doesn't support specification of the number of clusters through an n_clusters argument)

Digging through the code, there are a lot of hard-coded special-cases of estimators within scikit-learn itself; this would imply that absent those special-cases scikit-learn's own estimators would not pass the general estimator checks, which is obviously a huge issue.

Making these checks significantly general would be hard; I imagine it would be a rather large project, and probably even involve designing an API so that estimators themselves can tune the checks that are run on them.

In the meantime, I think it would be better for us not to recommend people running this code on their own estimators, or to let them know that failing estimator checks do not necessarily imply non-compatibility.

@mblondel

This comment has been minimized.

Copy link
Member

@mblondel mblondel commented Apr 26, 2016

Agreed that the estimator checks are not completely ready yet.

and probably even involve designing an API so that estimators themselves can tune the checks that are run on them

The plan is to address this using estimator tags
#6599

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Apr 26, 2016

The point is, perhaps, to make clearer at scikit-learn-contrib that
estimator_checks is a work in progress. That's why I suggested we note at
scikit-learn-contrib that necessary failures should be reported as
scikit-learn issues.

On 26 April 2016 at 19:26, Mathieu Blondel notifications@github.com wrote:

Agreed that the estimator checks are not completely ready yet.

and probably even involve designing an API so that estimators themselves
can tune the checks that are run on them

The plan is to address this using estimator tags
#6599 #6599


You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub
#6715 (comment)

@jnothman jnothman changed the title estimator_checks are not sufficiently general check_estimator is not sufficiently general Aug 11, 2016
@rth

This comment has been minimized.

Copy link
Member

@rth rth commented Sep 25, 2016

To put some numbers on this issue, at present 41 estimators out of 147 do not pass the check_estimator validation in scikit learn. Out of 41 validations that fail (+4 skipped), 29 are masked in sklearn.utils.testing.DONT_TEST and 2 are private classes (and shouldn't probably be validated anyway). Full output here.

Tested with,

validate_estimators.py

from sklearn.utils.testing import all_estimators
from sklearn.utils.estimator_checks import check_estimator
import pytest


@pytest.mark.parametrize('name,estimator',
          all_estimators(include_meta_estimators=True,
              include_dont_test=True))
def test_estimator(name, estimator):
    check_estimator(estimator)

using py.test -v --tb=no validate_estimators.py on the code in the master branch (PY3, > sklearn-0.18rc2, Linux).

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Sep 30, 2016

@rth I'd argue you should should do include_meta_estimators=False. On the other hand, that could be another "tag". Though that could be done with inheritance, too.

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Sep 30, 2016

also, finally a minimal example of pytest parametrized testing that I can understand. That's quite nice - compare against the current mess in check_estimators and test_common.

@rth

This comment has been minimized.

Copy link
Member

@rth rth commented Feb 24, 2019

@qinhanmin2014 I don't think this is fully solved.

Estimator tags do help, but I'm not sure they fully resolve this. #11622 could also be relevant.

@qinhanmin2014

This comment has been minimized.

Copy link
Member

@qinhanmin2014 qinhanmin2014 commented Feb 24, 2019

I'm following amueller, he said Fixes #6599, #7737, #6554, #6715 in the estimator tag PR. Reopen if you disagree. (Honestly I don't have a deep understanding of these issues/PRs)

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Feb 24, 2019

@rth rth reopened this Feb 25, 2019
@rth

This comment has been minimized.

Copy link
Member

@rth rth commented Feb 25, 2019

Things that remain IMO are,

  • check if there are still some scikit-learn estimators that do not pass check_estimator (cf #6715 (comment)) and if is the case adjust their tags accordingly.
  • some feedback from scikit-learn contrib on using estimator tags would be useful cc @glemaitre
  • in my opinion, a binary outcome from check_estimators is not ideal to use from contrib / third party projects. It's much more realistic to say that N of those tests pass and 2 fail for known reasons (but someone is working on those) than an all or nothing approach. This is for instance what we are doing internally in test_non_meta_estimators instead of calling check_estimator. #11622 is not ideal, but it is a way to address that.
@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Feb 25, 2019

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Feb 26, 2019

yes, I think the last one can be done with explicit calls for now. The bigger issue is that none of the meta estimators fulfill the API contracts at all #9741.

There's a couple of tags that are missing, like positive and sparse data, and then there's generating 1d data, text data etc in the tests.

I'm not sure I understand @rth's last point. I guess we're not giving individual failures now but we should? That's "just" a matter of yielding the tests, right? Or does pytest allow us to run functions as tests?

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Feb 26, 2019

(doesn't look like you can run pytests as a function)

@amueller

This comment has been minimized.

Copy link
Member

@amueller amueller commented Feb 26, 2019

We can run pytest.main() and write a hook to overwrite the pytest collection mechanism. That way someone could do check_estimator(MyEstimator) and you'd get the output of the failed and passed and skipped tests as if running pytest.

@trevorstephens

This comment has been minimized.

Copy link
Contributor

@trevorstephens trevorstephens commented Mar 24, 2019

Chiming in! I'm working on an expansion to gplearn for classification and it seems that binary-only classification is an immediate fail for these checks? Is that intentional?

@trevorstephens

This comment has been minimized.

Copy link
Contributor

@trevorstephens trevorstephens commented Mar 24, 2019

In which case, I guess I should either modify the checks, or run checks individually? That's feeling awkward. Should the little winks and nods to sklearn estimators scattered throughout this module be generalised such that independent package maintainers can do something similar without a PR on sklearn?

@trevorstephens

This comment has been minimized.

Copy link
Contributor

@trevorstephens trevorstephens commented Mar 24, 2019

If I change this to allow for something more elaborate than binary classification https://github.com/trevorstephens/gplearn/pull/141/files#diff-e868c30a1191845a36e17737e4611da8R289 I can get a lot further through the test suite ... But eventually fail against a three label problem

@NicolasHug

This comment has been minimized.

Copy link
Contributor

@NicolasHug NicolasHug commented Mar 24, 2019

This is only a hacky workaround @trevorstephens but if that helps, in pygbm we ended up creating our own check_estimator() that skips test that cannot succeed (see here)

@trevorstephens

This comment has been minimized.

Copy link
Contributor

@trevorstephens trevorstephens commented Mar 24, 2019

Thanks @NicolasHug I'll do something similar for now then.

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Mar 25, 2019

it seems that binary-only classification is required to pass these checks

Could you clarify what you mean by this? I think now that we have "estimator tags" controlling which checks are applicable to which estimators, it would be appropriate to open a separate issue for this, clearly explaining / giving an example for what estimator features / limitations are not catered to.

@trevorstephens

This comment has been minimized.

Copy link
Contributor

@trevorstephens trevorstephens commented Mar 25, 2019

Sorry was typing too fast @jnothman and inverted what I was trying to say... Github comment was corrected to

it seems that binary-only classification is an immediate fail for these checks

Sorry for the confusion. My estimator is purely for binary classification and it fails about half the tests because they assume multi-label classification. I may be a bit behind these vogue new tag things though. Shall research.

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Mar 25, 2019

@trevorstephens

This comment has been minimized.

Copy link
Contributor

@trevorstephens trevorstephens commented Mar 25, 2019

Fair enough to have standards on scikit-learn's own native estimators, not really wanting to push on that here. It was just very unexpected that support of multi-class is required for passing scikit-learn compatibility checks. Acceptance into the main package as a first-class estimator, and compatibility with scikit-learn, are kinda different things.

@trevorstephens

This comment has been minimized.

Copy link
Contributor

@trevorstephens trevorstephens commented Mar 25, 2019

If there's some appetite to rethink some of these tests to relieve that requirement, and enough core team support, I might be able to help with that PR since it looks like I'm going to be re-writing a few hundred lines of test code for myself anyhow @jnothman :-D

@bladeoflight16

This comment has been minimized.

Copy link

@bladeoflight16 bladeoflight16 commented Nov 7, 2019

Binary classifiers are a fantastic example for consideration. These violate the "normal" expectation that multiclass problems are supported, but they can still be particularly useful for a specific problem. I'm also dealing with trying to create a classifier that has unusual class restrictions (much more unusual than binary), and because check_estimator has no mechanism to allow for me to specify classes that meet them, I can't validate compliance with any of the standards. The fact there's no list I can actually read makes it worse, since (being rather new to scikit-learn) I don't really have any hints as to what they are to even eyeball that I think I'm following them.

I think what's probably really needed here is to identify the requirements according to what functionality an estimator class needs to support. E.g., what's required for use with cross_validate? What's required to support a meta learner like MultiOutputRegressor? Or for ensemble learners like BaggingClassifier? Or piplines? Once the requirements for these different pieces of functionality are identified, the checks can be encoded more cleanly. Separate methods might make sense (like a bare minimum check method, which might need to allow certain behaviors to be filled in via arguments, and then a full blown one for testing models included in scikit-learn, and maybe meta learners get a separate one from regular learns).

To put it another way: it's not just that the method isn't general enough. It just makes way too many assumptions about the use cases a model needs to support to be of any use to a third party. If you want stricter standards inside scikit-learn, that's fine and maybe even great, but the rest of us writing our own bespoked models to deal with unusual situations need something to be able to give us some confidence that it will work with most scikit code requiring a model as input.

@jnothman

This comment has been minimized.

Copy link
Member

@jnothman jnothman commented Nov 7, 2019

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Andy's pets
PR phase
API and interoperability
Design/analysis stage
9 participants
You can’t perform that action at this time.