-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG + 1] Reject regression type targets for classifiers #5084
Conversation
'multilabel-indicator', 'multilabel-sequences']: | ||
if not y_type is 'continuous': | ||
raise ValueError("Unknown label type: %r" % y) | ||
elif len(np.unique(y)) > 2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When y is centered (in a test for RandomizedLogisticRegression), [0,1] becomes [-0.5, 0.5], therefore is continuous.
This is a bit arbitrary... Maybe it's better to change that test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a bug in randomized logistic regression.
Looks good apart from my minor nitpicks. |
OK, I added a test for assert_non_regression_targets, and changed the RandomizedLogisticRegression test so that there is no need to treat the 2 float value targets. |
@@ -417,6 +418,8 @@ def _set_oob_score(self, X, y): | |||
self.oob_score_ = oob_score / self.n_outputs_ | |||
|
|||
def _validate_y_class_weight(self, y): | |||
assert_non_regression_targets(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name assert_*
makes it sound like a unit test utility function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about is_non_regression_targets, but it is supposed to do nothing at all when it is non regression targets, and raise an exception otherwise. So returning a bool seemed a little superfluous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could do check_non_regression_target
? On second though, as it raises an exception, assert seems ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check
functions seem to raise exceptions too, specifically check_array
. I like check_non_regression_target
better than assert_non_regression_target
, personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I changed that into check_non_regression_targets.
One concern is that https://github.com/vermouthmjl/scikit-learn/blob/type_of_target/sklearn/utils/multiclass.py#L328 |
The first one is only done when |
We could add an |
vermouthmjl : are you completing #4976 as well? |
What is the status here? Can you please rebase? |
It looks like it is good to go. @ogrisel any opinions? |
6a4d076
to
56d8c52
Compare
@@ -158,6 +158,88 @@ def is_multilabel(y): | |||
_is_integral_float(labels)) | |||
|
|||
|
|||
|
|||
# def is_sequence_of_sequences(y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this commented function added in?
Hm I just realize we should have a common test for this. Could you please add one in |
@amueller I think I already did that by adding |
@vermouthmjl sorry, you're good. I was a bit sleep deprived. |
|
||
X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64, | ||
X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cosmetics: trailing spaces
Is it clearer to move the |
I don't think a Mixin class should implement fit to do input checks without performing any actual data fitting and then expect the concrete classes to override it to do the actual data fitting. I find it simpler and more explicit to call input check functions explicitly in the fit method of the concrete class. |
Makes sense. LGTM as well (apart from inline comments). Maybe you can rebase and merge yourself (if this is important for the release)? |
…th two distinct values
0f513bd
to
0413fd5
Compare
I rebased. @amueller |
0413fd5
to
d7bf5c7
Compare
squashed and merged as 85223b9. I think it will be good to have this in the RC. |
This PR solves #5060