Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH Binary only estimator checks for classification #13875

Merged
merged 9 commits into from Jun 12, 2019

Conversation

trevorstephens
Copy link
Contributor

@trevorstephens trevorstephens commented May 14, 2019

Reference Issues/PRs

See also #6715 (comment)

What does this implement/fix? Explain your changes.

I don't think this closes the reference issue entirely, but might help it out in some cases such as mine. I maintain gplearn, a niche package that implements genetic programming with a scikit-learn API. I try to stick to scikit-learn standards and be "compatible" as much as possible, but the estimator checks will not pass due to my classifier currently only supporting binary classification as a first-pass MVP. Due to these requirements I currently have thousands of lines of re-written test code in my project that I'd love to lose. I don't think that multiclass should be a requirement to be a scikit-learn compatible estimator as an external package, though open to being challenged on that front.

Changes to the test suite re-frame tests that have multiple classes to be binary if the "binary_only" flag exists within the more_tags attribute of the classifier.

Any other comments?

Need to write a test to ensure future tests respect this flag, will remove WIP from title when ready. Happy to chat before then :-)

@trevorstephens trevorstephens changed the title [WIP] Binary only classification checks with estimator_checks [MRG] Binary only classification checks with estimator_checks May 15, 2019
@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented May 15, 2019

@jnothman got any thoughts on this one?

@NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented May 16, 2019

This is worth supporting, but I'm a bit concerned about the implementation. Everytime a check is added, we need to remember the existence of this tag and generate the data accordingly.

Not sure what would be a better alternative though :/

@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented May 16, 2019

The test I added would fail if the data is not generated to support the tag @NicolasHug so any new estimator checks would fail if they don't cover the binary case

@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented May 16, 2019

I could add some comments explaining what it is for to the test case or the binary DTC estimator if you think that would help?

Copy link
Member

@NicolasHug NicolasHug left a comment

Oh I missed the check, sorry. It's pretty clear what it does ;)

I commented a few nits, but since this doesn't seem to introduce much changes I'm OK with it.

sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved
sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved
sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved
@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented May 17, 2019

Thanks for the review @NicolasHug , I'll make suggested changes tomorrow

Copy link
Member

@rth rth left a comment

Looks good. I'm wondering whether we want to add a check that ensures an exception is raised when this tag is used on an estimator that supports milti-class classification (to make sure this tag is not misused). Though I think we don't do this for other tags, so it is probably not necessary.

@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented May 19, 2019

@rth I think it would look kinda strange if someone tagged an estimator as binary only when it isn't ... But I can add a check if you want.

So, check if the tag is there, and then assert an exception is raised when fitting multi-class if it is present? Would a specific type of exception be expected?

@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented May 19, 2019

@NicolasHug I have addressed your comments. The Windows failure doesn't appear to be related to these changes.

@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented May 21, 2019

@rth @NicolasHug .. bump :-)

Looks like that test failure was solved by another PR.

Copy link
Member

@NicolasHug NicolasHug left a comment

small suggestion but LGTM.

sklearn/utils/tests/test_estimator_checks.py Outdated Show resolved Hide resolved
@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented Jun 2, 2019

Anything else?

rth
rth approved these changes Jun 12, 2019
Copy link
Member

@rth rth left a comment

Thanks @trevorstephens !

@rth rth changed the title [MRG] Binary only classification checks with estimator_checks ENH Binary only estimator checks for classification Jun 12, 2019
@rth rth merged commit d84a8d1 into scikit-learn:master Jun 12, 2019
16 checks passed
@trevorstephens trevorstephens deleted the binary-only-checks branch Jun 12, 2019
@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented Jun 12, 2019

Cheers @rth 👍

Copy link
Member

@jnothman jnothman left a comment

Argh. I wrote these comments but apparently forgot to submit them. This also lacks a what's new entry

@@ -1550,6 +1550,10 @@ poor_score
multioutput_only
whether estimator supports only multi-output classification or regression.

binary_only
whether estimator supports binary classification but lacks multi-class
classification support.
Copy link
Member

@jnothman jnothman Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it may still support multilabel

Copy link
Contributor Author

@trevorstephens trevorstephens Jun 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess just following the tag logic this is true since an estimator has to "opt in" to multilabel. I don't think there are any multilabel tests though 😕

sklearn/utils/estimator_checks.py Show resolved Hide resolved
@rth
Copy link
Member

@rth rth commented Jun 12, 2019

This also lacks a what's new entry

I was wondering about that but estimator tags is an experimental and technically private feature, that may see some evolution in the near future (e.g. #14069) do we really want to write what's new each time we change something there? Also it updates the docs, so I thought it might be sufficient.

@jnothman For your other comments, I agree. I can push a fix to master (including a what's new if you still think it's useful).

@jnothman
Copy link
Member

@jnothman jnothman commented Jun 12, 2019

@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented Jun 12, 2019

Yes, I'll add this feature it to my package after the next release @jnothman ... I didn't think the change was major enough to warrant a what's new, but happy to add.

I'm fine to do another PR, or @rth can use super-powers to do it faster I'm sure :-) I'm not sure about your tags comment though. That pattern is all over this file, see my comments above.

@rth
Copy link
Member

@rth rth commented Jun 12, 2019

OK so the patch with the what's new is,

diff --git a/doc/whats_new/v0.22.rst b/doc/whats_new/v0.22.rst
index e998294e6..1be125e3c 100644
--- a/doc/whats_new/v0.22.rst
+++ b/doc/whats_new/v0.22.rst
@@ -113,3 +113,7 @@ These changes mostly affect library developers.
   ``transform`` is called before ``fit``; previously an ``AttributeError`` or
   ``ValueError`` was acceptable.
   :pr:`13013` by by :user:`Agamemnon Krasoulis <agamemnonc>`.
+
+- |Enhancement| Binary only classifiers are now supported in estimator checks.
+  Such classifiers need to have the `binary_only=True` estimator tag.
+  :pr:`13875` by `Trevor Stephens`_.

@trevorstephens does that work for you? Or if you want to change anything, probably better to make a PR indeed :)

+1 for keeping the doc as it, since that's indeed the general logic in that file.

@trevorstephens
Copy link
Contributor Author

@trevorstephens trevorstephens commented Jun 12, 2019

Sure @rth 👍

@amueller
Copy link
Member

@amueller amueller commented Jun 12, 2019

awesome! Thanks!

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this issue Jul 12, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this issue Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants