As indicated by the title, the roc_auc_score metric is no longer accepting a Pandas Series as an argument for y_true as of .15, although it will accept it at y_score: you can replicate this bug as follows:
import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
import pandas as pd
This will return "ValueError: Expected array-like (array or non-string sequence), got 0 1"
However, roc_auc_score(np.array(DF["C"]),DF["predict_proba"]) will work fine.
Since when do we support Series as input for anything?
Gael-That returns "ValueError: Expected array-like (array or non-string sequence), got 0 False" for me.
Amueller-It works pretty regularly in .14, numpy '1.8.0', and pandas '0.13.1'; not sure about other versions.
On pandas 0.14.1 I get:
>>> roc_auc_score(a, a)
Traceback (most recent call last):
File "<ipython-input-9-462110cc3127>", line 1, in <module>
File "/Users/ogrisel/code/scikit-learn/sklearn/metrics/metrics.py", line 593, in roc_auc_score
File "/Users/ogrisel/code/scikit-learn/sklearn/metrics/metrics.py", line 468, in _average_binary_score
y_type = type_of_target(y_true)
File "/Users/ogrisel/code/scikit-learn/sklearn/utils/multiclass.py", line 277, in type_of_target
'got %r' % y)
ValueError: Expected array-like (array or non-string sequence), got 0 False
Apparently, Series is no longer a subclass or numpy array. So you have to call a.values explicitly.
(<class 'pandas.core.series.Series'>, <class 'pandas.core.base.IndexOpsMixin'>, <class 'pandas.core.generic.NDFrame'>, <class 'pandas.core.base.PandasObject'>, <class 'pandas.core.base.StringMixin'>, <class 'object'>)
The fix is to make type_of_target check for the __array__ attribute to know whether or not to call np.asarray on it.
Much appreciated; thanks for all the fantastic work you all have done on this library.
This should be fixed in PR #3394 (specifically by commit bbf7ae7). Part of this PR is backported in 0.15.X, and thus should be available for release 0.15.