Skip to content

problem with custom-defined cv and cross_val_score #8639

@SylvainTakerkart

Description

@SylvainTakerkart

Hello,

I need to define a custom cross-validation object. I searched on how to do it and understood that it's just a matter of defining an iterable object with the proper list of indices... I implemented this as with the code below. Maybe it's not the proper way to do it, but in any case, I then encountered some very weird behavior: I run cross_val_score once with this custom cv and it works fine; I launch the same cross_val_score command a second time and this time it fails...

To try to undestand this behavior, I ran this on different configs (all on a debian system):

  • this bug happens with python3.5.2 (the anaconda version) using sklearn 0.18, both with model_selection.cross_val_score and cross_validation.cross_val_score

  • but it does not bug with python2.7 (the system version that ships with debian) using sklearn 0.16 (hence with cross_validation.cross_val_score)

Does anyone has any idea what causes this? And if there is a workaround to define a custom cv?

Cheers,

Sylvain

Code...

import numpy as np
from sklearn.svm import SVC
from sklearn.cross_validation import cross_val_score

n_samples = 100
n_features = 10

y = np.hstack([np.ones(n_samples/2),np.zeros(n_samples/2)])
X = np.random.random([n_samples,n_features])

svc = SVC(kernel='linear')

train_inds = np.arange(0,n_samples-10)
test_inds = np.arange(n_samples-10,n_samples)

custom_cv = zip([train_inds],[test_inds])

score1 = cross_val_score(svc,X,y,cv=custom_cv)
print(score1)

score2 = cross_val_score(svc,X,y,cv=custom_cv)
print(score2)

The error I get

In [41]: %run custom_cv.py
/hpc/crise/anaconda3/lib/python3.5/site-packages/numpy/core/numeric.py:190: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  a = empty(shape, dtype, order)
/envau/userspace/takerkart/python/sklearn_bug/custom_cv.py:8: VisibleDeprecationWarning: using a non-integer number instead of an integer will result in an error in the future
  y = np.hstack([np.ones(n_samples/2),np.zeros(n_samples/2)])
[ 0.3]
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/envau/userspace/takerkart/python/sklearn_bug/custom_cv.py in <module>()
     19 print(score1)
     20 
---> 21 score2 = cross_val_score(svc,X,y,cv=custom_cv)
     22 print(score2)
     23 

/hpc/crise/anaconda3/lib/python3.5/site-packages/sklearn/model_selection/_validation.py in cross_val_score(estimator, X, y, groups, scoring, cv, n_jobs, verbose, fit_params, pre_dispatch)
    139                                               fit_params)
    140                       for train, test in cv.split(X, y, groups))
--> 141     return np.array(scores)[:, 0]
    142 
    143 

IndexError: too many indices for array

Versions

Versions with which I get this problem...

Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 2 2016, 17:53:06)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

NumPy 1.11.2

SciPy 0.18.1

Scikit-Learn 0.18

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions