ENH Binary only estimator checks for classification #13875

trevorstephens · 2019-05-14T12:27:21Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

I don't think this closes the reference issue entirely, but might help it out in some cases such as mine. I maintain gplearn, a niche package that implements genetic programming with a scikit-learn API. I try to stick to scikit-learn standards and be "compatible" as much as possible, but the estimator checks will not pass due to my classifier currently only supporting binary classification as a first-pass MVP. Due to these requirements I currently have thousands of lines of re-written test code in my project that I'd love to lose. I don't think that multiclass should be a requirement to be a scikit-learn compatible estimator as an external package, though open to being challenged on that front.

Changes to the test suite re-frame tests that have multiple classes to be binary if the "binary_only" flag exists within the more_tags attribute of the classifier.

Any other comments?

Need to write a test to ensure future tests respect this flag, will remove WIP from title when ready. Happy to chat before then :-)

trevorstephens · 2019-05-15T09:46:54Z

@jnothman got any thoughts on this one?

NicolasHug · 2019-05-16T16:16:50Z

This is worth supporting, but I'm a bit concerned about the implementation. Everytime a check is added, we need to remember the existence of this tag and generate the data accordingly.

Not sure what would be a better alternative though :/

trevorstephens · 2019-05-16T22:06:08Z

The test I added would fail if the data is not generated to support the tag @NicolasHug so any new estimator checks would fail if they don't cover the binary case

trevorstephens · 2019-05-16T22:11:52Z

I could add some comments explaining what it is for to the test case or the binary DTC estimator if you think that would help?

NicolasHug

Oh I missed the check, sorry. It's pretty clear what it does ;)

I commented a few nits, but since this doesn't seem to introduce much changes I'm OK with it.

sklearn/utils/tests/test_estimator_checks.py

sklearn/utils/estimator_checks.py

trevorstephens · 2019-05-17T23:50:33Z

Thanks for the review @NicolasHug , I'll make suggested changes tomorrow

rth

Looks good. I'm wondering whether we want to add a check that ensures an exception is raised when this tag is used on an estimator that supports milti-class classification (to make sure this tag is not misused). Though I think we don't do this for other tags, so it is probably not necessary.

trevorstephens · 2019-05-19T00:55:37Z

@rth I think it would look kinda strange if someone tagged an estimator as binary only when it isn't ... But I can add a check if you want.

So, check if the tag is there, and then assert an exception is raised when fitting multi-class if it is present? Would a specific type of exception be expected?

trevorstephens · 2019-05-19T01:20:11Z

@NicolasHug I have addressed your comments. The Windows failure doesn't appear to be related to these changes.

trevorstephens · 2019-05-21T12:37:32Z

@rth @NicolasHug .. bump :-)

Looks like that test failure was solved by another PR.

sklearn/utils/tests/test_estimator_checks.py

NicolasHug

small suggestion but LGTM.

sklearn/utils/tests/test_estimator_checks.py

trevorstephens · 2019-06-02T03:00:53Z

Anything else?

rth

Thanks @trevorstephens !

trevorstephens · 2019-06-12T08:34:05Z

Cheers @rth 👍

jnothman

Argh. I wrote these comments but apparently forgot to submit them. This also lacks a what's new entry

jnothman · 2019-06-11T23:32:31Z

doc/developers/contributing.rst

@@ -1550,6 +1550,10 @@ poor_score
 multioutput_only
    whether estimator supports only multi-output classification or regression.

+binary_only
+    whether estimator supports binary classification but lacks multi-class
+    classification support.


Note that it may still support multilabel

I guess just following the tag logic this is true since an estimator has to "opt in" to multilabel. I don't think there are any multilabel tests though 😕

sklearn/utils/estimator_checks.py

rth · 2019-06-12T08:42:21Z

This also lacks a what's new entry

I was wondering about that but estimator tags is an experimental and technically private feature, that may see some evolution in the near future (e.g. #14069) do we really want to write what's new each time we change something there? Also it updates the docs, so I thought it might be sufficient.

@jnothman For your other comments, I agree. I can push a fix to master (including a what's new if you still think it's useful).

jnothman · 2019-06-12T08:45:00Z

I presume Trevor wants to use this feature, so I'd think it's worth reporting in what's new.

trevorstephens · 2019-06-12T08:59:33Z

Yes, I'll add this feature it to my package after the next release @jnothman ... I didn't think the change was major enough to warrant a what's new, but happy to add.

I'm fine to do another PR, or @rth can use super-powers to do it faster I'm sure :-) I'm not sure about your tags comment though. That pattern is all over this file, see my comments above.

rth · 2019-06-12T09:01:48Z

OK so the patch with the what's new is,

diff --git a/doc/whats_new/v0.22.rst b/doc/whats_new/v0.22.rst
index e998294e6..1be125e3c 100644
--- a/doc/whats_new/v0.22.rst
+++ b/doc/whats_new/v0.22.rst
@@ -113,3 +113,7 @@ These changes mostly affect library developers.
   ``transform`` is called before ``fit``; previously an ``AttributeError`` or
   ``ValueError`` was acceptable.
   :pr:`13013` by by :user:`Agamemnon Krasoulis <agamemnonc>`.
+
+- |Enhancement| Binary only classifiers are now supported in estimator checks.
+  Such classifiers need to have the `binary_only=True` estimator tag.
+  :pr:`13875` by `Trevor Stephens`_.

@trevorstephens does that work for you? Or if you want to change anything, probably better to make a PR indeed :)

+1 for keeping the doc as it, since that's indeed the general logic in that file.

trevorstephens · 2019-06-12T09:03:14Z

Sure @rth 👍

amueller · 2019-06-12T15:59:16Z

awesome! Thanks!

Also adds the binary_only estimator tag

trevorstephens added 2 commits May 14, 2019 22:08

update tests

4e3fa52

update doc

5895e27

This was referenced May 14, 2019

Remove custom estimator checks when sklearn removes multi-class requirement trevorstephens/gplearn#147

Closed

Rely on sklearns notfittederror trevorstephens/gplearn#162

Closed

add tests, correct one more test

097e887

trevorstephens changed the title ~~[WIP] Binary only classification checks with estimator_checks~~ [MRG] Binary only classification checks with estimator_checks May 15, 2019

batches, though a short bath is better than no bath

95f764b

NicolasHug reviewed May 17, 2019

View reviewed changes

rth reviewed May 18, 2019

View reviewed changes

trevorstephens added 2 commits May 19, 2019 10:40

changes for review

2eeee40

remove import

105749b

amueller reviewed May 28, 2019

View reviewed changes

sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved

trevorstephens added 2 commits May 29, 2019 07:51

add failing test

4813dce

allow single class in toy estimator

3e12beb

NicolasHug approved these changes May 29, 2019

View reviewed changes

sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved

inherit untagged estimator

dcc0888

rth approved these changes Jun 12, 2019

View reviewed changes

rth changed the title ~~[MRG] Binary only classification checks with estimator_checks~~ ENH Binary only estimator checks for classification Jun 12, 2019

rth merged commit d84a8d1 into scikit-learn:master Jun 12, 2019

trevorstephens deleted the binary-only-checks branch June 12, 2019 08:33

jnothman reviewed Jun 12, 2019

View reviewed changes

rth added a commit that referenced this pull request Jun 12, 2019

DOC Add what's new for binary only estimator checks (#13875)

4c58057

trevorstephens mentioned this pull request Jun 12, 2019

add binary tag trevorstephens/gplearn#178

Merged

rth mentioned this pull request Jun 20, 2019

check_estimator should allow for binary-only classification #6981

Closed

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

ENH Binary only estimator checks for classification (scikit-learn#13875)

2acc55f

Also adds the binary_only estimator tag

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

DOC Add what's new for binary only estimator checks (scikit-learn#13875)

13feade

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH Binary only estimator checks for classification #13875

ENH Binary only estimator checks for classification #13875

trevorstephens commented May 14, 2019

trevorstephens commented May 15, 2019

NicolasHug commented May 16, 2019

trevorstephens commented May 16, 2019

trevorstephens commented May 16, 2019

NicolasHug left a comment

trevorstephens commented May 17, 2019

rth left a comment

trevorstephens commented May 19, 2019

trevorstephens commented May 19, 2019

trevorstephens commented May 21, 2019

NicolasHug left a comment

trevorstephens commented Jun 2, 2019

rth left a comment

trevorstephens commented Jun 12, 2019

jnothman left a comment

jnothman Jun 11, 2019

trevorstephens Jun 12, 2019

rth commented Jun 12, 2019 •

edited

jnothman commented Jun 12, 2019 via email

trevorstephens commented Jun 12, 2019

rth commented Jun 12, 2019 •

edited

trevorstephens commented Jun 12, 2019

amueller commented Jun 12, 2019

ENH Binary only estimator checks for classification #13875

ENH Binary only estimator checks for classification #13875

Conversation

trevorstephens commented May 14, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

trevorstephens commented May 15, 2019

NicolasHug commented May 16, 2019

trevorstephens commented May 16, 2019

trevorstephens commented May 16, 2019

NicolasHug left a comment

Choose a reason for hiding this comment

trevorstephens commented May 17, 2019

rth left a comment

Choose a reason for hiding this comment

trevorstephens commented May 19, 2019

trevorstephens commented May 19, 2019

trevorstephens commented May 21, 2019

NicolasHug left a comment

Choose a reason for hiding this comment

trevorstephens commented Jun 2, 2019

rth left a comment

Choose a reason for hiding this comment

trevorstephens commented Jun 12, 2019

jnothman left a comment

Choose a reason for hiding this comment

jnothman Jun 11, 2019

Choose a reason for hiding this comment

trevorstephens Jun 12, 2019

Choose a reason for hiding this comment

rth commented Jun 12, 2019 • edited

jnothman commented Jun 12, 2019 via email

trevorstephens commented Jun 12, 2019

rth commented Jun 12, 2019 • edited

trevorstephens commented Jun 12, 2019

amueller commented Jun 12, 2019

rth commented Jun 12, 2019 •

edited

rth commented Jun 12, 2019 •

edited