cross_val_score does not accept a list as "y" parameters #2508

Closed
tweksteen opened this Issue Oct 10, 2013 · 9 comments

Comments

Projects
None yet
5 participants

While moving from {train_test_split/fit/score} to cross_val_score, a bug was found:

File "/usr/lib64/python2.7/site-packages/sklearn/cross_validation.py", line 1058, in _cross_val_score
  y_train = y[train]
TypeError: only integer arrays with one element can be converted to an index

It seems that the y parameter is expected to be an array and not a list. This behaviour is not documented and differs from train_test_split.

Two solutions:

  • Document this behaviour.
  • Call check_arrays within cross_val_score
Owner

ogrisel commented Oct 10, 2013

+1 for using check_arrays in cross_val_score. Can you please submit a PR with a non-regression test and the fix?

Actually, there is already a check_arrays in this function but the parameter "allow_lists" has been set to True in: e1972fa

The changelog is not explicit enough. Is it worth contacting the person who did the commit to know why?

Owner

arjoly commented Oct 11, 2013

Actually, there is already a check_arrays in this function but the parameter "allow_lists" has been set to True in: e1972fa

@amueller Any opinion on this?

Contributor

eloj commented Dec 9, 2013

Just wanted to add that I just ran into this issue, and as a newbie it took a while to figure out the problem. That I could pass the same list to 'train_test_split' without problem just made it that much more confusing.

I was able to convert my list using numpy.asarray() as a workaround.

Owner

amueller commented Dec 27, 2013

Hum that is weird. Maybe a recent regression when the default behavior of the indices parameter was changed? Or maybe just a bug. It should pass lists through without converting them to arrays (for text for example) but obviously it shouldn't throw an error.

Owner

amueller commented Dec 27, 2013

Oh, wait, I got it. y is a list here, not X. I don't really see the difference in the code between cross_val_score and train_test_split, I'll investigate.

Owner

amueller commented Dec 27, 2013

Ok so there is no allow_lists in train_test_split, which could be considered a bug.
I think we have two options

  1. Pass y that are lists as lists through all cross-validation functions.
  2. Convert y always into arrays.

The problem of 1) is that it creates a bit more code. It pushes the conversion / validation of y to a later point in the processing which has pro and cons.
The problem of 2) is that we need to make check_arrays somewhat more complicated. Also it is weird with train_test_split as that function doesn't really distinguish between X an y, so treating them differently is weird.

I guess I'll implement 1) quickly.

Owner

amueller commented Dec 27, 2013

See #2694.

Owner

amueller commented Dec 27, 2013

Also: sorry for the aeon-like reply time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment