[MRG + 1] Reject regression type targets for classifiers #5084

vermouthmjl · 2015-08-04T16:44:00Z

This PR solves #5060

amueller · 2015-08-04T17:20:41Z

sklearn/utils/multiclass.py

+            'multilabel-indicator', 'multilabel-sequences']:
+        if not y_type is 'continuous':
+            raise ValueError("Unknown label type: %r" % y)
+        elif len(np.unique(y)) > 2:


When y is centered (in a test for RandomizedLogisticRegression), [0,1] becomes [-0.5, 0.5], therefore is continuous.
This is a bit arbitrary... Maybe it's better to change that test.

That seems like a bug in randomized logistic regression.

amueller · 2015-08-04T17:23:43Z

Looks good apart from my minor nitpicks.
I'm wondering if we should include this check in check_X_y (if y is not supposed to be numeric) but I'm not sure if this is a good idea.

vermouthmjl · 2015-08-05T07:22:06Z

OK, I added a test for assert_non_regression_targets, and changed the RandomizedLogisticRegression test so that there is no need to treat the 2 float value targets.

mblondel · 2015-08-05T15:35:09Z

sklearn/ensemble/forest.py

@@ -417,6 +418,8 @@ def _set_oob_score(self, X, y):
        self.oob_score_ = oob_score / self.n_outputs_

    def _validate_y_class_weight(self, y):
+        assert_non_regression_targets(y)


The name assert_* makes it sound like a unit test utility function.

I thought about is_non_regression_targets, but it is supposed to do nothing at all when it is non regression targets, and raise an exception otherwise. So returning a bool seemed a little superfluous.

you could do check_non_regression_target? On second though, as it raises an exception, assert seems ok.

check functions seem to raise exceptions too, specifically check_array. I like check_non_regression_target better than assert_non_regression_target, personally.

OK. I changed that into check_non_regression_targets.

mblondel · 2015-08-05T15:36:00Z

One concern is that type_of_target does potentially expensive things if n_samples is large:

https://github.com/vermouthmjl/scikit-learn/blob/type_of_target/sklearn/utils/multiclass.py#L328
https://github.com/vermouthmjl/scikit-learn/blob/type_of_target/sklearn/utils/multiclass.py#L330

amueller · 2015-08-05T15:45:05Z

The first one is only done when y.dtype is float. So I'd blame the user ;)
The unique is done again afterwards in the classifier, so it is not really expensive compared to what happens next. We could avoid duplicate work by replicating the float test in the new helper and not calling type_of_target.

amueller · 2015-08-05T15:45:34Z

We could add an is_regression_target and use that in the classifiers and in type_of_target maybe?

jayflo · 2015-08-05T16:50:18Z

vermouthmjl : are you completing #4976 as well?

amueller · 2015-09-09T20:15:33Z

What is the status here? Can you please rebase?

amueller · 2015-09-09T20:19:11Z

It looks like it is good to go. @ogrisel any opinions?

jmschrei · 2015-09-19T01:18:54Z

sklearn/utils/multiclass.py

@@ -158,6 +158,88 @@ def is_multilabel(y):
                                    _is_integral_float(labels))


+
+# def is_sequence_of_sequences(y):


Why is this commented function added in?

amueller · 2015-09-20T03:31:03Z

Hm I just realize we should have a common test for this. Could you please add one in estimator_checks?

vermouthmjl · 2015-09-20T11:00:15Z

@amueller I think I already did that by adding check_classifiers_regression_target in estimator_checks. Maybe I misunderstood what you mean by common test?

amueller · 2015-09-22T00:27:57Z

@vermouthmjl sorry, you're good. I was a bit sleep deprived.
@ogrisel merge?

ogrisel · 2015-10-07T16:26:55Z

sklearn/linear_model/logistic.py

-
-        X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,
+            
+        X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64, 


cosmetics: trailing spaces

MechCoder · 2015-10-07T21:21:52Z

Is it clearer to move the check_non_regression_targets for every fit to ClassifierMixin.fit or does it make the code look more complicated (because one has to call the superclass.fit) every time?

ogrisel · 2015-10-08T08:12:13Z

Is it clearer to move the check_non_regression_targets for every fit to ClassifierMixin.fit or does it make the code look more complicated (because one has to call the superclass.fit) every time?

I don't think a Mixin class should implement fit to do input checks without performing any actual data fitting and then expect the concrete classes to override it to do the actual data fitting.

I find it simpler and more explicit to call input check functions explicitly in the fit method of the concrete class.

MechCoder · 2015-10-08T17:55:16Z

Makes sense. LGTM as well (apart from inline comments).

Maybe you can rebase and merge yourself (if this is important for the release)?

…th two distinct values

…n_targets()

vermouthmjl · 2015-10-12T10:35:19Z

I rebased. @amueller

…ation_targets

amueller · 2015-10-14T18:47:02Z

squashed and merged as 85223b9. I think it will be good to have this in the RC.

vermouthmjl mentioned this pull request Aug 4, 2015

Use type_of_target in all classifiers to reject regression targets #5060

Closed

amueller reviewed Aug 4, 2015
View reviewed changes

mblondel reviewed Aug 5, 2015
View reviewed changes

jayflo mentioned this pull request Aug 5, 2015

metrics.mutual_info_score hangs when given real vectors #4976

Closed

amueller added this to the 0.17 milestone Sep 9, 2015

amueller changed the title ~~Reject regression type targets for classifiers~~ [MRG + 1] Reject regression type targets for classifiers Sep 11, 2015

vermouthmjl force-pushed the type_of_target branch from 6a4d076 to 56d8c52 Compare September 16, 2015 12:12

jmschrei reviewed Sep 19, 2015
View reviewed changes

ogrisel reviewed Oct 7, 2015
View reviewed changes

vermouthmjl added 9 commits October 12, 2015 12:17

a commun test to check if classifiers fail when fed regression targets

6fe57ba

Add non regression target check in classifiers' fit method

478a55b

test assert_non_regression_targets, no longer considers continuous wi…

e8f716f

…th two distinct values

Labels shouldn't be centered.

4d3e70d

remove is_sequence_of_sequence as in 72f6225

6765289

check non regression targets in QuadraticDiscriminantAnalysis

951a381

new line at the end of file, delete commented function

75782d7

change assert_non_regression_targets into check_non_regression_targets

6508369

change the name check_non_regression_targets() to check_classificatio…

84e6d44

…n_targets()

vermouthmjl force-pushed the type_of_target branch from 0f513bd to 0413fd5 Compare October 12, 2015 10:33

using assert_raises_regex in common test and test for check_classific…

d7bf5c7

…ation_targets

vermouthmjl force-pushed the type_of_target branch from 0413fd5 to d7bf5c7 Compare October 12, 2015 12:46

amueller closed this Oct 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG + 1] Reject regression type targets for classifiers #5084

[MRG + 1] Reject regression type targets for classifiers #5084

vermouthmjl commented Aug 4, 2015

amueller Aug 4, 2015

vermouthmjl Aug 4, 2015

amueller Aug 4, 2015

amueller commented Aug 4, 2015

vermouthmjl commented Aug 5, 2015

mblondel Aug 5, 2015

amueller Aug 5, 2015

vermouthmjl Aug 5, 2015

amueller Aug 5, 2015

jmschrei Sep 19, 2015

vermouthmjl Sep 19, 2015

mblondel commented Aug 5, 2015

amueller commented Aug 5, 2015

amueller commented Aug 5, 2015

jayflo commented Aug 5, 2015

amueller commented Sep 9, 2015

amueller commented Sep 9, 2015

jmschrei Sep 19, 2015

amueller commented Sep 20, 2015

vermouthmjl commented Sep 20, 2015

amueller commented Sep 22, 2015

ogrisel Oct 7, 2015

MechCoder commented Oct 7, 2015

ogrisel commented Oct 8, 2015

MechCoder commented Oct 8, 2015

vermouthmjl commented Oct 12, 2015

amueller commented Oct 14, 2015

		@@ -158,6 +158,88 @@ def is_multilabel(y):
		_is_integral_float(labels))



		# def is_sequence_of_sequences(y):


		X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,

		X, y = check_X_y(X, y, accept_sparse='csr', dtype=np.float64,

[MRG + 1] Reject regression type targets for classifiers #5084

[MRG + 1] Reject regression type targets for classifiers #5084

Conversation

vermouthmjl commented Aug 4, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Aug 4, 2015

vermouthmjl commented Aug 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mblondel commented Aug 5, 2015

amueller commented Aug 5, 2015

amueller commented Aug 5, 2015

jayflo commented Aug 5, 2015

amueller commented Sep 9, 2015

amueller commented Sep 9, 2015

Choose a reason for hiding this comment

amueller commented Sep 20, 2015

vermouthmjl commented Sep 20, 2015

amueller commented Sep 22, 2015

Choose a reason for hiding this comment

MechCoder commented Oct 7, 2015

ogrisel commented Oct 8, 2015

MechCoder commented Oct 8, 2015

vermouthmjl commented Oct 12, 2015

amueller commented Oct 14, 2015