TST check that binary only classifiers fail on multiclass data#29874
TST check that binary only classifiers fail on multiclass data#29874adrinjalali merged 19 commits intoscikit-learn:mainfrom
Conversation
|
Do we need a changelog? |
I would argue that we should. End-user are not impacted directly but it would be nice to send a signal with an entry to third-party developers. |
|
|
||
|
|
||
| def type_of_target(y, input_name=""): | ||
| def type_of_target(y, input_name="", raise_unknown=False): |
There was a problem hiding this comment.
I'm tempted to have this always raise. Adding it since we expect estimators to raise an error message, but we don't really have a tool that raises that for the devs.
| LinearRegression, | ||
| SGDClassifier, | ||
| PCA, | ||
| ExtraTreesClassifier, |
There was a problem hiding this comment.
removing this cause it's very slow. Removing this reduces the time of this test on my machine from over 7s to 1.7s
There was a problem hiding this comment.
An alternative is to set the parameters to have few trees and shallow depth.
There was a problem hiding this comment.
Having it doesn't really add anything for the purpose of the test anyway. So I'm happy to have it gone.
| " a continuous target is passed and the message should include the word" | ||
| " 'continuous'" | ||
| ) | ||
| msg = "Unknown label type: |continuous" |
There was a problem hiding this comment.
adding the alternative since the code in TaggedBinaryClassifier in this PR should be considered valid code.
|
This is now ready for review. |
| LinearRegression, | ||
| SGDClassifier, | ||
| PCA, | ||
| ExtraTreesClassifier, |
There was a problem hiding this comment.
An alternative is to set the parameters to have few trees and shallow depth.
Are you referring to this one? |
| LinearRegression, | ||
| SGDClassifier, | ||
| PCA, | ||
| ExtraTreesClassifier, |
There was a problem hiding this comment.
Having it doesn't really add anything for the purpose of the test anyway. So I'm happy to have it gone.
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
Co-authored-by: Guillaume Lemaitre <guillaume@probabl.ai>
|
cc @Charlie-XIAO @adam2392 maybe? |
…cikit-learn into tests/tag-multiclass-false
| err_msg = ( | ||
| "When a classifier is passed a continuous target, it should raise a ValueError" | ||
| " with a message containing 'Unknown label type: ' or a message indicating that" | ||
| " a continuous target is passed and the message should include the word" | ||
| " 'continuous'" | ||
| ) |
There was a problem hiding this comment.
Is this err_msg unused? I think it's more like a comment.
There was a problem hiding this comment.
thanks for noticing, yeah now it's shown to the user if the test fails.
sklearn/utils/multiclass.py
Outdated
| """ | ||
| if raise_unknown: | ||
| input = input_name if input_name else "data" | ||
| raise ValueError(f"Unknown label type for {input}: {repr(y)}") |
There was a problem hiding this comment.
| raise ValueError(f"Unknown label type for {input}: {repr(y)}") | |
| raise ValueError(f"Unknown label type for {input}: {y!r}") |
Nitpick, not sure which style people think is better.
There was a problem hiding this comment.
sure, can do, but the first one is much easier to understand to me than the second one, and the second one is also hard to look up for, if you don't know what it does.
|
Thanks @adrinjalali |
Fixes #18005
Checks that if the estimator has
tags.classifier_tags.multi_class=False, then it actually fails.I would be happier if we had a better error message telling people how to fix their issue though. Not sure what to put there.
cc @chkoar @glemaitre