Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG + 1] Reject regression type targets for classifiers #5084

Closed
wants to merge 10 commits into from

Conversation

vermouthmjl
Copy link
Contributor

This PR solves #5060

'multilabel-indicator', 'multilabel-sequences']:
if not y_type is 'continuous':
raise ValueError("Unknown label type: %r" % y)
elif len(np.unique(y)) > 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When y is centered (in a test for RandomizedLogisticRegression), [0,1] becomes [-0.5, 0.5], therefore is continuous.
This is a bit arbitrary... Maybe it's better to change that test.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a bug in randomized logistic regression.

@amueller
Copy link
Member

amueller commented Aug 4, 2015

Looks good apart from my minor nitpicks.
I'm wondering if we should include this check in check_X_y (if y is not supposed to be numeric) but I'm not sure if this is a good idea.

@vermouthmjl
Copy link
Contributor Author

OK, I added a test for assert_non_regression_targets, and changed the RandomizedLogisticRegression test so that there is no need to treat the 2 float value targets.

@@ -417,6 +418,8 @@ def _set_oob_score(self, X, y):
self.oob_score_ = oob_score / self.n_outputs_

def _validate_y_class_weight(self, y):
assert_non_regression_targets(y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name assert_* makes it sound like a unit test utility function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about is_non_regression_targets, but it is supposed to do nothing at all when it is non regression targets, and raise an exception otherwise. So returning a bool seemed a little superfluous.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could do check_non_regression_target? On second though, as it raises an exception, assert seems ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check functions seem to raise exceptions too, specifically check_array. I like check_non_regression_target better than assert_non_regression_target, personally.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I changed that into check_non_regression_targets.

@mblondel
Copy link
Member

mblondel commented Aug 5, 2015

@amueller
Copy link
Member

amueller commented Aug 5, 2015

The first one is only done when y.dtype is float. So I'd blame the user ;)
The unique is done again afterwards in the classifier, so it is not really expensive compared to what happens next. We could avoid duplicate work by replicating the float test in the new helper and not calling type_of_target.

@amueller
Copy link
Member

amueller commented Aug 5, 2015

We could add an is_regression_target and use that in the classifiers and in type_of_target maybe?

@jayflo
Copy link

jayflo commented Aug 5, 2015

vermouthmjl : are you completing #4976 as well?

@amueller
Copy link
Member

amueller commented Sep 9, 2015

What is the status here? Can you please rebase?

@amueller
Copy link
Member

amueller commented Sep 9, 2015

It looks like it is good to go. @ogrisel any opinions?

@amueller amueller added this to the 0.17 milestone Sep 9, 2015
@amueller amueller changed the title Reject regression type targets for classifiers [MRG + 1] Reject regression type targets for classifiers Sep 11, 2015
@@ -158,6 +158,88 @@ def is_multilabel(y):
_is_integral_float(labels))



# def is_sequence_of_sequences(y):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented function added in?

@amueller
Copy link
Member

Hm I just realize we should have a common test for this. Could you please add one in estimator_checks?

@vermouthmjl
Copy link
Contributor Author

@amueller I think I already did that by adding check_classifiers_regression_target in estimator_checks. Maybe I misunderstood what you mean by common test?

@amueller
Copy link
Member

@vermouthmjl sorry, you're good. I was a bit sleep deprived.
@ogrisel merge?


X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,
X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cosmetics: trailing spaces

@MechCoder
Copy link
Member

Is it clearer to move the check_non_regression_targets for every fit to ClassifierMixin.fit or does it make the code look more complicated (because one has to call the superclass.fit) every time?

@ogrisel
Copy link
Member

ogrisel commented Oct 8, 2015

Is it clearer to move the check_non_regression_targets for every fit to ClassifierMixin.fit or does it make the code look more complicated (because one has to call the superclass.fit) every time?

I don't think a Mixin class should implement fit to do input checks without performing any actual data fitting and then expect the concrete classes to override it to do the actual data fitting.

I find it simpler and more explicit to call input check functions explicitly in the fit method of the concrete class.

@MechCoder
Copy link
Member

Makes sense. LGTM as well (apart from inline comments).

Maybe you can rebase and merge yourself (if this is important for the release)?

@vermouthmjl
Copy link
Contributor Author

I rebased. @amueller

@amueller
Copy link
Member

squashed and merged as 85223b9. I think it will be good to have this in the RC.

@amueller amueller closed this Oct 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants