Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Binary only estimator checks for classification #13875

Merged
merged 9 commits into from Jun 12, 2019

Conversation

@trevorstephens
Copy link
Contributor

trevorstephens commented May 14, 2019

Reference Issues/PRs

See also #6715 (comment)

What does this implement/fix? Explain your changes.

I don't think this closes the reference issue entirely, but might help it out in some cases such as mine. I maintain gplearn, a niche package that implements genetic programming with a scikit-learn API. I try to stick to scikit-learn standards and be "compatible" as much as possible, but the estimator checks will not pass due to my classifier currently only supporting binary classification as a first-pass MVP. Due to these requirements I currently have thousands of lines of re-written test code in my project that I'd love to lose. I don't think that multiclass should be a requirement to be a scikit-learn compatible estimator as an external package, though open to being challenged on that front.

Changes to the test suite re-frame tests that have multiple classes to be binary if the "binary_only" flag exists within the more_tags attribute of the classifier.

Any other comments?

Need to write a test to ensure future tests respect this flag, will remove WIP from title when ready. Happy to chat before then :-)

@trevorstephens trevorstephens changed the title [WIP] Binary only classification checks with estimator_checks [MRG] Binary only classification checks with estimator_checks May 15, 2019
@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented May 15, 2019

@jnothman got any thoughts on this one?

@NicolasHug

This comment has been minimized.

Copy link
Contributor

NicolasHug commented May 16, 2019

This is worth supporting, but I'm a bit concerned about the implementation. Everytime a check is added, we need to remember the existence of this tag and generate the data accordingly.

Not sure what would be a better alternative though :/

@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented May 16, 2019

The test I added would fail if the data is not generated to support the tag @NicolasHug so any new estimator checks would fail if they don't cover the binary case

@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented May 16, 2019

I could add some comments explaining what it is for to the test case or the binary DTC estimator if you think that would help?

Copy link
Contributor

NicolasHug left a comment

Oh I missed the check, sorry. It's pretty clear what it does ;)

I commented a few nits, but since this doesn't seem to introduce much changes I'm OK with it.

sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved
sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented May 17, 2019

Thanks for the review @NicolasHug , I'll make suggested changes tomorrow

Copy link
Member

rth left a comment

Looks good. I'm wondering whether we want to add a check that ensures an exception is raised when this tag is used on an estimator that supports milti-class classification (to make sure this tag is not misused). Though I think we don't do this for other tags, so it is probably not necessary.

@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented May 19, 2019

@rth I think it would look kinda strange if someone tagged an estimator as binary only when it isn't ... But I can add a check if you want.

So, check if the tag is there, and then assert an exception is raised when fitting multi-class if it is present? Would a specific type of exception be expected?

@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented May 19, 2019

@NicolasHug I have addressed your comments. The Windows failure doesn't appear to be related to these changes.

@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented May 21, 2019

@rth @NicolasHug .. bump :-)

Looks like that test failure was solved by another PR.

Copy link
Contributor

NicolasHug left a comment

small suggestion but LGTM.

sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved
@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented Jun 2, 2019

Anything else?

@rth
rth approved these changes Jun 12, 2019
Copy link
Member

rth left a comment

Thanks @trevorstephens !

@rth rth changed the title [MRG] Binary only classification checks with estimator_checks ENH Binary only estimator checks for classification Jun 12, 2019
@rth rth merged commit d84a8d1 into scikit-learn:master Jun 12, 2019
16 checks passed
16 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 98.52% of diff hit (target 96.27%)
Details
codecov/project 96.8% (+0.53%) compared to 88846b3
Details
scikit-learn.scikit-learn Build #20190529.29 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_np_atlas) Linux py35_np_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda) Linux pylatest_conda succeeded
Details
scikit-learn.scikit-learn (Windows py35_32) Windows py35_32 succeeded
Details
scikit-learn.scikit-learn (Windows py37_64) Windows py37_64 succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda) macOS pylatest_conda succeeded
Details
@trevorstephens trevorstephens deleted the trevorstephens:binary-only-checks branch Jun 12, 2019
@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented Jun 12, 2019

Cheers @rth 👍

Copy link
Member

jnothman left a comment

Argh. I wrote these comments but apparently forgot to submit them. This also lacks a what's new entry

@@ -1550,6 +1550,10 @@ poor_score
multioutput_only
whether estimator supports only multi-output classification or regression.

binary_only
whether estimator supports binary classification but lacks multi-class
classification support.

This comment has been minimized.

Copy link
@jnothman

jnothman Jun 12, 2019

Member

Note that it may still support multilabel

This comment has been minimized.

Copy link
@trevorstephens

trevorstephens Jun 12, 2019

Author Contributor

I guess just following the tag logic this is true since an estimator has to "opt in" to multilabel. I don't think there are any multilabel tests though 😕

sklearn/utils/estimator_checks.py Show resolved Hide resolved
@rth

This comment has been minimized.

Copy link
Member

rth commented Jun 12, 2019

This also lacks a what's new entry

I was wondering about that but estimator tags is an experimental and technically private feature, that may see some evolution in the near future (e.g. #14069) do we really want to write what's new each time we change something there? Also it updates the docs, so I thought it might be sufficient.

@jnothman For your other comments, I agree. I can push a fix to master (including a what's new if you still think it's useful).

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jun 12, 2019

@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented Jun 12, 2019

Yes, I'll add this feature it to my package after the next release @jnothman ... I didn't think the change was major enough to warrant a what's new, but happy to add.

I'm fine to do another PR, or @rth can use super-powers to do it faster I'm sure :-) I'm not sure about your tags comment though. That pattern is all over this file, see my comments above.

@rth

This comment has been minimized.

Copy link
Member

rth commented Jun 12, 2019

OK so the patch with the what's new is,

diff --git a/doc/whats_new/v0.22.rst b/doc/whats_new/v0.22.rst
index e998294e6..1be125e3c 100644
--- a/doc/whats_new/v0.22.rst
+++ b/doc/whats_new/v0.22.rst
@@ -113,3 +113,7 @@ These changes mostly affect library developers.
   ``transform`` is called before ``fit``; previously an ``AttributeError`` or
   ``ValueError`` was acceptable.
   :pr:`13013` by by :user:`Agamemnon Krasoulis <agamemnonc>`.
+
+- |Enhancement| Binary only classifiers are now supported in estimator checks.
+  Such classifiers need to have the `binary_only=True` estimator tag.
+  :pr:`13875` by `Trevor Stephens`_.

@trevorstephens does that work for you? Or if you want to change anything, probably better to make a PR indeed :)

+1 for keeping the doc as it, since that's indeed the general logic in that file.

@trevorstephens

This comment has been minimized.

Copy link
Contributor Author

trevorstephens commented Jun 12, 2019

Sure @rth 👍

rth added a commit that referenced this pull request Jun 12, 2019
@trevorstephens trevorstephens mentioned this pull request Jun 12, 2019
@amueller

This comment has been minimized.

Copy link
Member

amueller commented Jun 12, 2019

awesome! Thanks!

koenvandevelde added a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
koenvandevelde added a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.