Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spurious cv FutureWarning with cross_val_score and LeaveOneGroupOut #12370

Closed
GaelVaroquaux opened this issue Oct 12, 2018 · 8 comments
Closed
Labels
Bug help wanted Moderate Anything that requires some knowledge of conventions and best practices

Comments

@GaelVaroquaux
Copy link
Member

The following code raises a FutureWarning:

from sklearn.model_selection import LeaveOneGroupOut, cross_val_score
cv = LeaveOneGroupOut()
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
from sklearn import datasets
X, y = datasets.load_iris(return_X_y=True)
groups = np.arange(len(y)) % 10
cross_val_score(model, X, y, groups=groups)

Ouput:

/home/varoquau/dev/scikit-learn/sklearn/model_selection/_split.py:1943: FutureWarning: You should specify a value for 'cv' instead of relying on the default value. The default value will change from 3 to 5 in version 0.22.
  warnings.warn(CV_WARNING, FutureWarning)

This seems spurious to me.

@GaelVaroquaux GaelVaroquaux added Bug Moderate Anything that requires some knowledge of conventions and best practices help wanted labels Oct 12, 2018
@amueller
Copy link
Member

how is that spurious? You're not using cv, right?

@GaelVaroquaux
Copy link
Member Author

Yes. And adding the cv fixes the problem. Sorry for the noise.

@KamalakerDadi : this is not the problem in nilearn.

@KamalakerDadi
Copy link
Contributor

I meant even if you specify cv. For instance, specifying cv=LeaveOneGroupOut()

@KamalakerDadi
Copy link
Contributor

Nope. I don't think so. Sorry.

@GaelVaroquaux
Copy link
Member Author

GaelVaroquaux commented Oct 12, 2018 via email

@KamalakerDadi
Copy link
Contributor

Can you can a small example that reproduces?

Here it is. May be it shouldn't be used this way ?

from sklearn.datasets import make_blobs
from sklearn.model_selection import GridSearchCV, cross_val_score, ShuffleSplit
from sklearn.svm import SVC

X, y = make_blobs(n_samples=40, centers=2, random_state=6)
cv = ShuffleSplit()
svm_cv = GridSearchCV(SVC(C=1., kernel='linear'), param_grid={'C': [.1, .5]})

cross_val_score(svm_cv, X, y, cv=cv)

@adrinjalali
Copy link
Member

You're doing a nested cross validation here @KamalakerDadi, for each fold in cross_val_score, GridSearchCV runs a cross validated parameter selection as well, and you're not setting the cv parameter of GridSearchCV here. That's why you're getting the warning.

@KamalakerDadi
Copy link
Contributor

Thanks @adrinjalali it makes sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug help wanted Moderate Anything that requires some knowledge of conventions and best practices
Projects
None yet
Development

No branches or pull requests

4 participants