Skip to content

roc_auc_score no longer accepting Pandas Series for y_true #3390

Closed
TELSER1 opened this Issue Jul 15, 2014 · 9 comments

4 participants

@TELSER1
TELSER1 commented Jul 15, 2014

As indicated by the title, the roc_auc_score metric is no longer accepting a Pandas Series as an argument for y_true as of .15, although it will accept it at y_score: you can replicate this bug as follows:

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import roc_auc_score
import pandas as pd
DF=pd.DataFrame({"A":[1,2,3,4,5,6,7,8,9,10],"B":[1,1,1,1,0,0,0,0,0,0],"C":[1,1,1,0,0,0,0,0,0,0]})
RF=RandomForestClassifier()
RF.fit(DF[['A','B']],DF["C"])
DF['predict_proba']=RF.predict_proba(DF[['A','B']])[:,1]
roc_auc_score(DF["C"],DF["predict_proba"])

This will return "ValueError: Expected array-like (array or non-string sequence), got 0 1"

However, roc_auc_score(np.array(DF["C"]),DF["predict_proba"]) will work fine.

@GaelVaroquaux
scikit-learn member
@amueller
scikit-learn member

Since when do we support Series as input for anything?

@TELSER1
TELSER1 commented Jul 15, 2014

Gael-That returns "ValueError: Expected array-like (array or non-string sequence), got 0 False" for me.
Amueller-It works pretty regularly in .14, numpy '1.8.0', and pandas '0.13.1'; not sure about other versions.

@ogrisel
scikit-learn member
ogrisel commented Jul 15, 2014

On pandas 0.14.1 I get:

>>> roc_auc_score(a, a)
Traceback (most recent call last):
  File "<ipython-input-9-462110cc3127>", line 1, in <module>
    roc_auc_score(a, a)
  File "/Users/ogrisel/code/scikit-learn/sklearn/metrics/metrics.py", line 593, in roc_auc_score
    sample_weight=sample_weight)
  File "/Users/ogrisel/code/scikit-learn/sklearn/metrics/metrics.py", line 468, in _average_binary_score
    y_type = type_of_target(y_true)
  File "/Users/ogrisel/code/scikit-learn/sklearn/utils/multiclass.py", line 277, in type_of_target
    'got %r' % y)
ValueError: Expected array-like (array or non-string sequence), got 0    False
1    False
2    False
3    False
4    False
5    False
6     True
7     True
8     True
9     True
dtype: bool
@ogrisel
scikit-learn member
ogrisel commented Jul 15, 2014

Apparently, Series is no longer a subclass or numpy array. So you have to call a.values explicitly.

>>> a.__class__.__mro__
(<class 'pandas.core.series.Series'>, <class 'pandas.core.base.IndexOpsMixin'>, <class 'pandas.core.generic.NDFrame'>, <class 'pandas.core.base.PandasObject'>, <class 'pandas.core.base.StringMixin'>, <class 'object'>)
@ogrisel ogrisel modified the milestone: 0.15.1 Jul 15, 2014
@GaelVaroquaux
scikit-learn member
@ogrisel
scikit-learn member
ogrisel commented Jul 15, 2014

The fix is to make type_of_target check for the __array__ attribute to know whether or not to call np.asarray on it.

@TELSER1
TELSER1 commented Jul 15, 2014

Much appreciated; thanks for all the fantastic work you all have done on this library.

@GaelVaroquaux
scikit-learn member

This should be fixed in PR #3394 (specifically by commit bbf7ae7). Part of this PR is backported in 0.15.X, and thus should be available for release 0.15.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.