Skip to content

TST check that binary only classifiers fail on multiclass data#29874

Merged
adrinjalali merged 19 commits intoscikit-learn:mainfrom
adrinjalali:tests/tag-multiclass-false
Oct 18, 2024
Merged

TST check that binary only classifiers fail on multiclass data#29874
adrinjalali merged 19 commits intoscikit-learn:mainfrom
adrinjalali:tests/tag-multiclass-false

Conversation

@adrinjalali
Copy link
Member

Fixes #18005

Checks that if the estimator has tags.classifier_tags.multi_class=False, then it actually fails.

I would be happier if we had a better error message telling people how to fix their issue though. Not sure what to put there.

cc @chkoar @glemaitre

@adrinjalali adrinjalali added the Developer API Third party developer API related label Sep 17, 2024
@adrinjalali
Copy link
Member Author

Do we need a changelog?

@github-actions
Copy link

github-actions bot commented Sep 17, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 0f4289b. Link to the linter CI: here

@glemaitre
Copy link
Member

Do we need a changelog?

I would argue that we should. End-user are not impacted directly but it would be nice to send a signal with an entry to third-party developers.



def type_of_target(y, input_name=""):
def type_of_target(y, input_name="", raise_unknown=False):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tempted to have this always raise. Adding it since we expect estimators to raise an error message, but we don't really have a tool that raises that for the devs.

LinearRegression,
SGDClassifier,
PCA,
ExtraTreesClassifier,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removing this cause it's very slow. Removing this reduces the time of this test on my machine from over 7s to 1.7s

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to set the parameters to have few trees and shallow depth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it doesn't really add anything for the purpose of the test anyway. So I'm happy to have it gone.

" a continuous target is passed and the message should include the word"
" 'continuous'"
)
msg = "Unknown label type: |continuous"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the alternative since the code in TaggedBinaryClassifier in this PR should be considered valid code.

@adrinjalali
Copy link
Member Author

This is now ready for review.

@glemaitre glemaitre self-requested a review September 28, 2024 11:38
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

LinearRegression,
SGDClassifier,
PCA,
ExtraTreesClassifier,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to set the parameters to have few trees and shallow depth.

@chkoar
Copy link
Contributor

chkoar commented Oct 11, 2024

I would be happier if we had a better error message telling people how to fix their issue though. Not sure what to put there.

Are you referring to this one?

Copy link
Member Author

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to this one?

@chkoar generally I'm working on improving all those messages.

LinearRegression,
SGDClassifier,
PCA,
ExtraTreesClassifier,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having it doesn't really add anything for the purpose of the test anyway. So I'm happy to have it gone.

adrinjalali and others added 2 commits October 15, 2024 13:09
adrinjalali and others added 2 commits October 16, 2024 17:29
@adrinjalali
Copy link
Member Author

cc @Charlie-XIAO @adam2392 maybe?

Copy link
Member

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except a few nits.

Copy link
Contributor

@Charlie-XIAO Charlie-XIAO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM! Just a few more nitpicks:

Comment on lines +3646 to +3651
err_msg = (
"When a classifier is passed a continuous target, it should raise a ValueError"
" with a message containing 'Unknown label type: ' or a message indicating that"
" a continuous target is passed and the message should include the word"
" 'continuous'"
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this err_msg unused? I think it's more like a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for noticing, yeah now it's shown to the user if the test fails.

"""
if raise_unknown:
input = input_name if input_name else "data"
raise ValueError(f"Unknown label type for {input}: {repr(y)}")
Copy link
Contributor

@Charlie-XIAO Charlie-XIAO Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
raise ValueError(f"Unknown label type for {input}: {repr(y)}")
raise ValueError(f"Unknown label type for {input}: {y!r}")

Nitpick, not sure which style people think is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, can do, but the first one is much easier to understand to me than the second one, and the second one is also hard to look up for, if you don't know what it does.

@adrinjalali adrinjalali enabled auto-merge (squash) October 18, 2024 08:20
@adrinjalali adrinjalali merged commit 18dc863 into scikit-learn:main Oct 18, 2024
@chkoar
Copy link
Contributor

chkoar commented Oct 18, 2024

Thanks @adrinjalali

@adrinjalali adrinjalali deleted the tests/tag-multiclass-false branch October 18, 2024 09:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Developer API Third party developer API related module:utils

Projects

None yet

Development

Successfully merging this pull request may close these issues.

check_estimator and binary_only tag

5 participants