Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Expected array-like (array or non-string sequence), got 0 0.0 #1886

Closed
sbushmanov opened this issue Mar 17, 2020 · 3 comments
Closed
Labels
bug Something isn't working

Comments

@sbushmanov
Copy link

sbushmanov commented Mar 17, 2020

Describe the bug
Trying to estimate accuracy of pandas.core.series.Seriesvs cudf.core.series.Series
Getting error message:
ValueError: Expected array-like (array or non-string sequence), got 0 0.0

Steps/Code to reproduce bug

from cuml.ensemble import RandomForestClassifier as curfc
from sklearn.metrics import accuracy_score
cuml_model = curfc(n_estimators=40,
                   max_depth=16,
                   max_features=1.0,
                   seed=10)

cuml_model.fit(X_cudf_train, y_cudf_train)
fil_preds_orig = cuml_model.predict(X_cudf_test)
fil_acc_orig = accuracy_score(y_test,fil_preds_orig)
ValueError                                Traceback (most recent call last)
<ipython-input-45-3e552f306f92> in <module>
      1 # %%time
      2 fil_preds_orig = cuml_model.predict(X_cudf_test)
----> 3 fil_acc_orig = accuracy_score(y_test,fil_preds_orig)
      4 fil_acc_orig

~/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_classification.py in accuracy_score(y_true, y_pred, normalize, sample_weight)
    183 
    184     # Compute accuracy for each possible representation
--> 185     y_type, y_true, y_pred = _check_targets(y_true, y_pred)
    186     check_consistent_length(y_true, y_pred, sample_weight)
    187     if y_type.startswith('multilabel'):

~/anaconda3/lib/python3.7/site-packages/sklearn/metrics/_classification.py in _check_targets(y_true, y_pred)
     80     check_consistent_length(y_true, y_pred)
     81     type_true = type_of_target(y_true)
---> 82     type_pred = type_of_target(y_pred)
     83 
     84     y_type = {type_true, type_pred}

~/anaconda3/lib/python3.7/site-packages/sklearn/utils/multiclass.py in type_of_target(y)
    239     if not valid:
    240         raise ValueError('Expected array-like (array or non-string sequence), '
--> 241                          'got %r' % y)
    242 
    243     sparse_pandas = (y.__class__.__name__ in ['SparseSeries', 'SparseArray'])

ValueError: Expected array-like (array or non-string sequence), got 0        0.0
1        1.0
2        1.0
3        1.0
4        0.0
        ... 
26210    1.0
26211    0.0
26212    1.0
26213    1.0
26214    0.0
Length: 26215, dtype: float32

Expected behavior
Estimation of accuracy is expected

Environment details (please complete the following information):

  • Environment location: Bare-metal
  • Linux Distro/Architecture: [Ubuntu 16.04 amd64]
  • GPU Model/Driver: NVIDIA-SMI 440.59 Driver Version: 440.59
  • CUDA Version: 10.2
  • Method of cuDF & cuML install: conda
    Name Version Build Channel
    cuml 0.13.0a200317 cuda10.2_py37_1494 rapidsai-nightly
    libcuml 0.13.0a200317 cuda10.2_1494 rapidsai-nightly
    libcumlprims 0.13.0a200313 cuda10.2_11 rapidsai-nightly

The error happens after updating for the latest nightly build. It used to run fine.

@sbushmanov sbushmanov added ? - Needs Triage Need team to review and classify bug Something isn't working labels Mar 17, 2020
@github-actions github-actions bot added this to Needs prioritizing in Bug Squashing Mar 17, 2020
@dantegd
Copy link
Member

dantegd commented Mar 17, 2020

@sbushmanov this seems an issue with Scikit-learn expecting an array as opposed to a cuDF Series, to be able to use that particular accuracy method you can convert the cuDF Series to a NumPy array like:

fil_acc_orig = accuracy_score(y_test, fil_preds_orig.to_array())

alternatively you can also use the accuracy from cuML itself which does accept cuDF Series (or NumPy, CuPy, Numba arrays among other formats): https://rapidsai.github.io/projects/cuml/en/0.12.0/api.html?highlight=accuracy#cuml.metrics.accuracy.accuracy_score

@sbushmanov
Copy link
Author

@dantegd Thanks for answering!

The code snippet used to work in previous releases (and it is still present in the official notebooks).

Did they change the policy on how they supply the arrays to sklearn?

@divyegala
Copy link
Member

Closing this via rapidsai/cudf#4864

Bug Squashing automation moved this from Needs prioritizing to Closed Apr 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Development

No branches or pull requests

3 participants