Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Fix cross_val_predict behavior for binary classification in decision_function (Fixes #9589) #9593

Merged

Conversation

@reiinakano
Copy link
Contributor

@reiinakano reiinakano commented Aug 21, 2017

Reference Issue

Fixes #9589

What does this implement/fix? Explain your changes.

I figured that the purpose of this code (original)

if method in ['decision_function', 'predict_proba', 'predict_log_proba']:
        n_classes = len(set(y))
        predictions_ = np.zeros((X_test.shape[0], n_classes))
        if method == 'decision_function' and len(estimator.classes_) == 2:
                predictions_[:, estimator.classes_[-1]] = predictions
        else:
            predictions_[:, estimator.classes_] = predictions
        predictions = predictions_

is to handle cases where the cross-validation done in cross_val_predict doesn't properly stratify the classes i.e. one fold in the CV may not have all classes present.

However, it misses an edge case for decision_function when len(estimator.classes_) == 2 (train split has only 2 classes) and n_classes == 2 (total data has only 2 classes). It wrongly assumes that the output will be of shape (n_samples, n_classes), when in fact, this is only true for decision_function if n_classes > 2. For the correct behavior, we must first check if the total number of classes is greater than 2, and if it is not, we must stick to an output shape of (n_samples,).

Any other comments?

While this PR fixes the case raised in #9589, it still doesn't take into account the fact that for sklearn.svm.SVC, decision_function does not always follow the shape (n_samples, n_classes) or (n_samples,). If decision_function_shape is not set to ovr, the shape is (n_samples, n_classes * (n_classes-1) / 2), and if different train splits have different numbers of classes (not stratified), this inconsistency can't be fixed by the approach currently taken by cross_val_predict.

I would also like to point out my personal opinion that cross_val_predict should probably just throw an error if the classes are not stratified properly (by this I mean some classes are entirely absent in some splits) instead of handling it automatically as it tries to do now. Metrics like log loss and ROC might be affected if for some splits, entire columns are zeroed out. Idk, it just feels inconsistent. Of course, you guys have the final say on this.

What I mean by inconsistent:

from sklearn.model_selection import cross_val_predict, KFold
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

X, y = load_iris(True)
cross_val_predict(LogisticRegression(), X, y, cv=KFold().split(X), method='predict_proba')

returns (array cut down in size)

array([[  0.00000000e+00,   9.99987841e-01,   1.21585683e-05],
       [  0.00000000e+00,   9.99963190e-01,   3.68104260e-05],
       [  0.00000000e+00,   9.99970224e-01,   2.97757119e-05],
       [  0.00000000e+00,   9.99978481e-01,   2.15185386e-05],
       [  0.00000000e+00,   9.99993136e-01,   6.86428459e-06],
       [  1.20030529e-01,   0.00000000e+00,   8.79969471e-01],
       [  1.13183668e-01,   0.00000000e+00,   8.86816332e-01],
       [  6.75506225e-02,   0.00000000e+00,   9.32449377e-01],
       [  1.07273496e-01,   0.00000000e+00,   8.92726504e-01],
       [  1.26854747e-01,   0.00000000e+00,   8.73145253e-01],
       [  1.16881976e-01,   0.00000000e+00,   8.83118024e-01],
       [  2.45077442e-04,   9.99754923e-01,   0.00000000e+00],
       [  2.70475201e-04,   9.99729525e-01,   0.00000000e+00],
       [  6.59523734e-04,   9.99340476e-01,   0.00000000e+00],
       [  4.07574999e-04,   9.99592425e-01,   0.00000000e+00],
       [  2.42451670e-03,   9.97575483e-01,   0.00000000e+00],
       [  2.03959503e-03,   9.97960405e-01,   0.00000000e+00]])

Notice how there are "zeroed out" columns.

And even worse:

from sklearn.model_selection import cross_val_predict, KFold
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression

X, y = load_iris(True)
cross_val_predict(LogisticRegression(), X, y, cv=KFold().split(X), method='decision_function')

returns (array cut down in size)

array([[  0.        ,   0.        , -11.31746426],
       [  0.        ,   0.        , -10.20969263],
       [  0.        ,   0.        , -10.42178776],
       [  0.        ,   0.        ,  -9.60397656],
       [  0.        ,   0.        , -11.30005126],
       [  0.        ,   0.        , -11.1906238 ],
       [  0.        ,   0.        , -10.05510072],
       [  0.        ,   0.        , -10.74657422],
       [  0.        ,   0.        ,  -9.20295991],
       [  0.        ,   8.3136912 ,   0.        ],
       [  0.        ,   6.77281268,   0.        ],
       [  0.        ,   7.79874366,   0.        ],
       [  0.        ,   7.29615134,   0.        ],
       [  0.        ,   7.9199709 ,   0.        ],
       [  0.        ,   9.163118  ,   0.        ],
       [  0.        ,   5.8858718 ,   0.        ],
       [  0.        ,   6.37425957,   0.        ],
       [  0.        ,   6.66261924,   0.        ],
       [  0.        ,   6.19296233,   0.        ]])

which is hardly of use to anyone.

@jnothman
Copy link
Member

@jnothman jnothman commented Aug 21, 2017

Thanks!

Running ROC over the output of cross_val_predict is a really bad idea in any case...

@jnothman
Copy link
Member

@jnothman jnothman commented Aug 21, 2017

Yes, perhaps 0 isn't the best pick, particularly for predict_proba. NaN?

@jnothman
Copy link
Member

@jnothman jnothman commented Aug 21, 2017

I think better not to error in the context of using this in ClassifierChain or a stacking meta-estimator.

@reiinakano
Copy link
Contributor Author

@reiinakano reiinakano commented Aug 21, 2017

I think better not to error in the context of using this in ClassifierChain or a stacking meta-estimator.

Hmm, you're right. One note though, I can't speak for ClassifierChain but for stacking, these zeroed out columns would mess with how the secondary learner learns and I wouldn't trust the results. Perhaps display a warning?

Yes, perhaps 0 isn't the best pick, particularly for predict_proba. NaN?

NaN makes it clearer that these columns weren't considered at all for this particular split, but wouldn't that break things? For instance, I frequently use np.argmax(cross_val_predict(..., method='predict_proba)) to recover the true labels for calculating accuracy, etc. Of course, I always make sure my folds are stratified so I'm probably not going to face it, but still..

Also, wouldn't that break ClassifierChains and stacking meta-estimators anyway?

@jnothman
Copy link
Member

@jnothman jnothman commented Aug 21, 2017

Copy link
Member

@jnothman jnothman left a comment

Could you please add a non-regression test.

reiinakano added 3 commits Aug 22, 2017
@reiinakano
Copy link
Contributor Author

@reiinakano reiinakano commented Aug 22, 2017

I've added the necessary unit tests.

Although at this point, I'd like to point out there's another regression for the case of SVC with decision_function_shape=ovo

Code to test:

from sklearn.datasets import load_digits
from sklearn.svm import SVC
from sklearn.model_selection import cross_val_predict

X, y = load_digits(return_X_y=True)
cross_val_predict(SVC(kernel='linear', decision_function_shape='ovo'), X, y, method='decision_function').shape

Expected Results
In previous versions, this is (1797, 45)

Actual Results

ValueError: shape mismatch: value array of shape (602,45) could not be broadcast to indexing result of shape (10,602)

This error is due to the shape of an SVC decision function with decision_function_shape set to 'ovr' which is (n_samples, n_classes * (n_classes-1) / 2)

Should I include the fix in this PR?

@jnothman
Copy link
Member

@jnothman jnothman commented Aug 22, 2017

Copy link
Member

@jnothman jnothman left a comment

LGTM

@jnothman jnothman changed the title [WIP] Fix cross_val_predict behavior for binary classification in decision_function (Fixes #9589) [MRG+1] Fix cross_val_predict behavior for binary classification in decision_function (Fixes #9589) Aug 22, 2017
@jnothman jnothman added this to the 0.19.1 milestone Aug 22, 2017
@reiinakano reiinakano changed the title [MRG+1] Fix cross_val_predict behavior for binary classification in decision_function (Fixes #9589) [WIP] Fix cross_val_predict behavior for binary classification in decision_function (Fixes #9589) Aug 22, 2017
@reiinakano
Copy link
Contributor Author

@reiinakano reiinakano commented Aug 22, 2017

Hi @jnothman, I changed this back to WIP since there was a conflict with a recent merge. I also figured out a better fix that also fixes the SVC "ovo" regression.

@jnothman
Copy link
Member

@jnothman jnothman commented Aug 22, 2017

Okay. Let me know when it's MRG, i.e. ready for review.

reiinakano added 3 commits Aug 22, 2017
…unction' into fix-cross-val-predict-decision-function
…a', 'predict_log_proba'
@reiinakano reiinakano changed the title [WIP] Fix cross_val_predict behavior for binary classification in decision_function (Fixes #9589) [MRG] Fix cross_val_predict behavior for binary classification in decision_function (Fixes #9589) Aug 22, 2017
@reiinakano
Copy link
Contributor Author

@reiinakano reiinakano commented Aug 22, 2017

@jnothman Ready for review

Copy link
Member

@jnothman jnothman left a comment

Did you mean to add a test for the OvO case?

else:
predictions_[:, estimator.classes_] = predictions
predictions = predictions_
if not n_classes == len(estimator.classes_):

This comment has been minimized.

@jnothman

jnothman Aug 22, 2017
Member

not ... == -> ... !=

reiinakano added 2 commits Aug 22, 2017
@reiinakano
Copy link
Contributor Author

@reiinakano reiinakano commented Aug 22, 2017

Yup, sorry. Added unit test and addressed comments.

reiinakano added 3 commits Oct 11, 2017
@reiinakano
Copy link
Contributor Author

@reiinakano reiinakano commented Oct 11, 2017

I think I've mostly sorted this out. For the non-special case of decision_function (iv. n_classes_train > 2 and n_classes > 2 but n_classes_train != n_classes.), I handle it the same as predict_log_proba. For the other special cases, there is an error message.

Apologies for the confusing code previously, I should've done it this way in the first place, but I didn't expect decision_function to have so many edge cases.

Copy link
Member

@lesteve lesteve left a comment

I think we still need a warning in any case when n_classes_training != n_classes.

I would still like feedback from some core devs about #9593 (comment), especially about the default value for missing classes for predict_log_proba and decision_function (personally I would favour -inf but maybe there are consumers of cross_val_predict which would raise an error with infinite values)

'use a cross-validation technique resulting '
'in properly stratified folds\.',
cross_val_predict, est, X, y,
cv=KFold(), method='decision_function')

This comment has been minimized.

@lesteve

lesteve Oct 11, 2017
Member

Use explicit number of splits here n_splits=3.

predictions_[:, estimator.classes_] = predictions
predictions = predictions_
if n_classes != len(estimator.classes_):
if predictions.ndim == 2 and \

This comment has been minimized.

@lesteve

lesteve Oct 11, 2017
Member

Nitpick: I would use parentheses rather than backslash.

if n_classes != len(estimator.classes_):
if predictions.ndim == 2 and \
predictions.shape[1] != len(estimator.classes_):
# This handles the case when the shape of predictions

This comment has been minimized.

@lesteve

lesteve Oct 11, 2017
Member

I don't think the comment matches the error message. IIUC the error can happen for one of these reasons:

  • you use SVC with decision_function_shape='ovo' whose predict output does not follow the same convention as all the other classifiers. This corresponds to the comment. In this case the recommendation in the error message is not helpful at all.
  • you don't have all the classes in your training fold. This corresponds to the error message.

This comment has been minimized.

@reiinakano

reiinakano Oct 12, 2017
Author Contributor

Actually, for the first case (ovo), the recommendation in the error message is helpful. We can handle the ovo case but only if the folds are stratified.

predictions_[:, estimator.classes_] = predictions

else: # Special handling logic for decision_function
err_mess = 'Output shape {} of {} does not match ' \

This comment has been minimized.

@lesteve

lesteve Oct 11, 2017
Member

I'd rather have parentheses than backslashes

error_message = ('asdfsa '
    'asdfds '
    'asdfsa ')

predictions_ = np.full((_num_samples(X_test), n_classes),
np.finfo(predictions.dtype).min)
predictions_[:, estimator.classes_] = predictions

This comment has been minimized.

@lesteve

lesteve Oct 11, 2017
Member

What changes for each method is the default value, you could simplify the code and use better variable names, e.g. something like this in the non-error case:

float_min = np.finfo(predictions.dtype).min
default_values = {'decision_function': float_min, 'predict_log_proba': float_min, 
    'predict_proba': 0}
predictions_for_all_classes = np.full((_num_samples(predictions), n_classes),
    default_values[method])
predictions_for_all_classes[:, estimator.classes_] = predictions
' in different folds. To fix this, use ' \
'a cross-validation technique resulting' \
' in properly stratified folds'
if predictions.ndim == 2 and \

This comment has been minimized.

@lesteve

lesteve Oct 11, 2017
Member

Parentheses rather than backslashes

@@ -1267,6 +1347,18 @@ def test_cross_val_predict_class_subset():
est, method)
assert_array_almost_equal(expected_predictions, predictions)

# Special test for decision_function. This makes sure not to trigger
# any of the numerous edge cases in decision_function
X = np.arange(200).reshape(100, 2)

This comment has been minimized.

@lesteve

lesteve Oct 11, 2017
Member

I think you should keep decision_function in the loop above. Just find a dataset that works for decision_function. e.g. this one or one with many classes to avoid the edge cases that error.

reiinakano added 4 commits Oct 12, 2017
@caioaao caioaao mentioned this pull request Oct 13, 2017
7 of 7 tasks complete
Copy link
Member

@lesteve lesteve left a comment

I think we still need a warning in any case when n_classes_training != n_classes.

You have not tackled this comment AFAICT. It would be good to add a test for this with assert_warns as well.

len(estimator.classes_)))
if len(estimator.classes_) <= 2:
# In this special case, `predictions` contains a 1D array.
raise ValueError(err_mess.format(predictions.shape, method,

This comment has been minimized.

@lesteve

lesteve Oct 13, 2017
Member

I don't think this error message makes sense here since predictions has the right shape compared to len(estimator.classes_). I am not sure what the error message should say, to be honest, maybe something like: only two classes in the training fold, this is not supported for method='decision_function' + the error message above about using properly stratified folds.

@@ -1259,7 +1339,8 @@ def test_cross_val_predict_class_subset():
assert_array_almost_equal(expected_predictions, predictions)

# Testing unordered labels
y = [1, 1, -4, 6]
y = np.array([x//20 for x in range(-100, 100, 2)])
y = shuffle(y, random_state=0)

This comment has been minimized.

@lesteve

lesteve Oct 13, 2017
Member

or just y = np.repeat([1, 1, -4, 6], 25). Always a good idea to avoid unnecessary complications ...

reiinakano added 4 commits Oct 13, 2017
@@ -776,6 +777,87 @@ def split(self, X, y=None, groups=None):

assert_raises(ValueError, cross_val_predict, est, X, y, cv=BadCV())

X, y = load_iris(return_X_y=True)
assert_warns(RuntimeWarning, cross_val_predict, LogisticRegression(),

This comment has been minimized.

@lesteve

lesteve Oct 16, 2017
Member

Can you use assert_warns_message to test the warning message?

This comment has been minimized.

@jnothman

jnothman Oct 17, 2017
Member

Apart from this do you have any concerns with the PR?

This comment has been minimized.

@lesteve

lesteve Oct 17, 2017
Member

I don't think so, I am fine with the general strategy as I mentioned above. There are some minor things I will fix shortly.

If you haven't already, maybe you can have a quick look at the warnings/error message and make sure they are fine.

This comment has been minimized.

@lesteve

lesteve Oct 17, 2017
Member

Another thing: are you fine with float_min as the default value for missing classes for predict_log_proba and decision_function. Another option is -inf but maybe some consumers downstream will raise an error with infinite values.

This comment has been minimized.

@jnothman

jnothman Oct 18, 2017
Member

predict_log_proba should probably use -inf... decision_function is a messy case.

This comment has been minimized.

@jnothman

jnothman Oct 18, 2017
Member

I don't think float-min is perfect, but I'm documenting it at least.

Other minor fixes
@codecov
Copy link

@codecov codecov bot commented Oct 17, 2017

Codecov Report

Merging #9593 into master will decrease coverage by 0.18%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9593      +/-   ##
==========================================
- Coverage   96.16%   95.97%   -0.19%     
==========================================
  Files         336      336              
  Lines       62442    62566     +124     
==========================================
+ Hits        60046    60049       +3     
- Misses       2396     2517     +121
Impacted Files Coverage Δ
sklearn/model_selection/_validation.py 96.87% <100%> (+0.08%) ⬆️
sklearn/model_selection/tests/test_validation.py 98.43% <100%> (-0.34%) ⬇️
sklearn/utils/fixes.py 65.76% <0%> (-15.32%) ⬇️
sklearn/utils/tests/test_deprecation.py 85.29% <0%> (-14.71%) ⬇️
sklearn/utils/deprecation.py 85.41% <0%> (-10.42%) ⬇️
sklearn/utils/tests/test_utils.py 91.19% <0%> (-8.81%) ⬇️
sklearn/_build_utils/__init__.py 55.1% <0%> (-6.13%) ⬇️
sklearn/utils/tests/test_estimator_checks.py 90.72% <0%> (-5.97%) ⬇️
sklearn/datasets/mldata.py 91.25% <0%> (-5%) ⬇️
sklearn/utils/extmath.py 93.77% <0%> (-2.4%) ⬇️
... and 31 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 534f68b...bc405c8. Read the comment docs.

# it with. This case is found when sklearn.svm.SVC is
# set to `decision_function_shape='ovo'`.
raise ValueError('Output shape {} of {} does not match '
'number of classes ({}) in fold. Cannot '

This comment has been minimized.

@lesteve

lesteve Oct 17, 2017
Member

I think this error message is wrong:

  • I would remove "Cannot reconcile different number of classes in different folds, because the problem is specific to one fold
  • the recommendation of using properly stratified folds does not apply. If we are sure that this problem only exists for SVC with decision_shape_function='ovo', we should say use decision_shape_function='ovr' maybe?

This comment has been minimized.

@amueller

amueller Oct 17, 2017
Member

Maybe. It could happen in other situations? Could also be a custom estimator?

This comment has been minimized.

@jnothman

jnothman Oct 17, 2017
Member

Or just say that wacky decision functions are not supported :)

@jnothman
Copy link
Member

@jnothman jnothman commented Oct 17, 2017

I would like to see this merged and 0.19.1 released by the end of the week. Is that reasonable, @amueller?

default score needs to be assigned to all instances for that class if
``method`` produces columns per class, as in {'decision_function',
'predict_proba', 'predict_log_proba'}. For ``predict_proba`` this value is
0. In order to ensure finite output, we approximate negative infinity by

This comment has been minimized.

@jnothman

jnothman Oct 19, 2017
Member

I think this behaviour and note is sufficient for a first release.

@amueller amueller merged commit ebc8730 into scikit-learn:master Oct 19, 2017
6 checks passed
6 checks passed
ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.16%)
Details
codecov/project 96.17% (+0.01%) compared to 534f68b
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details
@reiinakano reiinakano deleted the reiinakano:fix-cross-val-predict-decision-function branch Oct 19, 2017
jnothman added a commit to jnothman/scikit-learn that referenced this pull request Oct 20, 2017
…ecision_function (Fixes scikit-learn#9589) (scikit-learn#9593)

* fix cross_val_predict for binary classification in decision_function

* Add unit tests

* Add unit tests

* Add unit tests

* better fix

* fix conflict

* fix broken

* only calculate n_classes if one of 'decision_function', 'predict_proba', 'predict_log_proba'

* add test for SVC ovo in cross_val_predict

* flake8 fix

* fix case of ovo and imbalanced folds for binary classification

* change assert_raises to assert_raise_message for ovo case

* fix flake8 linetoo long

* add comments and clearer tests

* improve comments and error message for OvO

* fix .format error with L

* use assert_raises_regex for better error message

* raise error in decision_function special cases. change predict_log_proba missing classes to minimum numpy value

* fix broken tests due to special cases of decision_function

* add modified test for decision_function behavior that does not trigger edge cases

* fix typos

* fix typos

* escape regex .

* escape regex .

* address comments. one unaddressed comment

* simplify code

* flake

* wrong classes range

* address comments. adjust error message

* add warning

* change warning to runtimewarning

* add test for the warning

* Use assert_warns_message rather than assert_warns

Other minor fixes

* Note on class-absent replacement values

* Improve error message
maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
…ecision_function (Fixes scikit-learn#9589) (scikit-learn#9593)

* fix cross_val_predict for binary classification in decision_function

* Add unit tests

* Add unit tests

* Add unit tests

* better fix

* fix conflict

* fix broken

* only calculate n_classes if one of 'decision_function', 'predict_proba', 'predict_log_proba'

* add test for SVC ovo in cross_val_predict

* flake8 fix

* fix case of ovo and imbalanced folds for binary classification

* change assert_raises to assert_raise_message for ovo case

* fix flake8 linetoo long

* add comments and clearer tests

* improve comments and error message for OvO

* fix .format error with L

* use assert_raises_regex for better error message

* raise error in decision_function special cases. change predict_log_proba missing classes to minimum numpy value

* fix broken tests due to special cases of decision_function

* add modified test for decision_function behavior that does not trigger edge cases

* fix typos

* fix typos

* escape regex .

* escape regex .

* address comments. one unaddressed comment

* simplify code

* flake

* wrong classes range

* address comments. adjust error message

* add warning

* change warning to runtimewarning

* add test for the warning

* Use assert_warns_message rather than assert_warns

Other minor fixes

* Note on class-absent replacement values

* Improve error message
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
…ecision_function (Fixes scikit-learn#9589) (scikit-learn#9593)

* fix cross_val_predict for binary classification in decision_function

* Add unit tests

* Add unit tests

* Add unit tests

* better fix

* fix conflict

* fix broken

* only calculate n_classes if one of 'decision_function', 'predict_proba', 'predict_log_proba'

* add test for SVC ovo in cross_val_predict

* flake8 fix

* fix case of ovo and imbalanced folds for binary classification

* change assert_raises to assert_raise_message for ovo case

* fix flake8 linetoo long

* add comments and clearer tests

* improve comments and error message for OvO

* fix .format error with L

* use assert_raises_regex for better error message

* raise error in decision_function special cases. change predict_log_proba missing classes to minimum numpy value

* fix broken tests due to special cases of decision_function

* add modified test for decision_function behavior that does not trigger edge cases

* fix typos

* fix typos

* escape regex .

* escape regex .

* address comments. one unaddressed comment

* simplify code

* flake

* wrong classes range

* address comments. adjust error message

* add warning

* change warning to runtimewarning

* add test for the warning

* Use assert_warns_message rather than assert_warns

Other minor fixes

* Note on class-absent replacement values

* Improve error message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

4 participants