-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
Description
Hello,
I need to define a custom cross-validation object. I searched on how to do it and understood that it's just a matter of defining an iterable object with the proper list of indices... I implemented this as with the code below. Maybe it's not the proper way to do it, but in any case, I then encountered some very weird behavior: I run cross_val_score once with this custom cv and it works fine; I launch the same cross_val_score command a second time and this time it fails...
To try to undestand this behavior, I ran this on different configs (all on a debian system):
-
this bug happens with python3.5.2 (the anaconda version) using sklearn 0.18, both with model_selection.cross_val_score and cross_validation.cross_val_score
-
but it does not bug with python2.7 (the system version that ships with debian) using sklearn 0.16 (hence with cross_validation.cross_val_score)
Does anyone has any idea what causes this? And if there is a workaround to define a custom cv?
Cheers,
Sylvain
Code...
import numpy as np
from sklearn.svm import SVC
from sklearn.cross_validation import cross_val_score
n_samples = 100
n_features = 10
y = np.hstack([np.ones(n_samples/2),np.zeros(n_samples/2)])
X = np.random.random([n_samples,n_features])
svc = SVC(kernel='linear')
train_inds = np.arange(0,n_samples-10)
test_inds = np.arange(n_samples-10,n_samples)
custom_cv = zip([train_inds],[test_inds])
score1 = cross_val_score(svc,X,y,cv=custom_cv)
print(score1)
score2 = cross_val_score(svc,X,y,cv=custom_cv)
print(score2)
The error I get
In [41]: %run custom_cv.py
/hpc/crise/anaconda3/lib/python3.5/site-packages/numpy/core/numeric.py:190: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
a = empty(shape, dtype, order)
/envau/userspace/takerkart/python/sklearn_bug/custom_cv.py:8: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
y = np.hstack([np.ones(n_samples/2),np.zeros(n_samples/2)])
[ 0.3]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
/envau/userspace/takerkart/python/sklearn_bug/custom_cv.py in <module>()
19 print(score1)
20
---> 21 score2 = cross_val_score(svc,X,y,cv=custom_cv)
22 print(score2)
23
/hpc/crise/anaconda3/lib/python3.5/site-packages/sklearn/model_selection/_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
139 fit_params)
140 for train, test in cv.split(X, y, groups))
--> 141 return np.array(scores)[:, 0]
142
143
IndexError: too many indices for array
Versions
Versions with which I get this problem...
Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.11.2
SciPy 0.18.1
Scikit-Learn 0.18