Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a warning when one-class vector is passed to LabelBinarizer::fit() #6009

Closed
wants to merge 2 commits into from
Closed

Add a warning when one-class vector is passed to LabelBinarizer::fit() #6009

wants to merge 2 commits into from

Conversation

ysk24ok
Copy link

@ysk24ok ysk24ok commented Dec 11, 2015

This change is Reviewable

@@ -295,6 +296,9 @@ def fit(self, y):

self.sparse_input_ = sp.issparse(y)
self.classes_ = unique_labels(y)
if len(self.classes_) == 1:
warnings.warn("Only one label in y and this label will be "
"regarded as negitive one.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'negative'

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry, I have fixed the typo.

@amueller
Copy link
Member

hm... do we want this? What was the motivation @ysk24ok? I guess it can lead to counter-intuitive results, but it's a normal use-case, isn't it?

@ysk24ok
Copy link
Author

ysk24ok commented Sep 17, 2016

I had encounterd the same situation as #4546.
When only one class is in y_true, LabelBinarizer recognizes the label as negative one
and the return value of log_loss gets weired.
I thought it is necessary to make it clear that LabelBinarizer recognizes the label as a negative label when only one label in y_true, and that's why I created this PR.

But I didn't know #7239 is merged and this bug is now already fixed.

scikit-learn 0.17:

>>> y_true = np.array(['class1', 'class1', 'class1'])
>>> y_predict = np.array([[0.1],[0.7],[0.5]])
>>> log_loss(y_true, y_predict)
0.66749350018123588

current master:

>>> y_true = np.array(['class1', 'class1', 'class1'])
>>> y_pred = np.array([[0.1],[0.7],[0.5]])
>>> log_loss(y_true, y_pred)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/yusuke-nishioka/Documents/Github/scikit-learn/sklearn/metrics/classification.py", line 1620, in log_loss
    'labels argument.'.format(lb.classes_[0]))
ValueError: y_true contains only one label (class1). Please provide the true labels explicitly through the labels argument.

LabelBinarizer doesn't permit one-class 'y_true'.
So this PR is outdated. Closing is good.

@amueller
Copy link
Member

amueller commented Oct 7, 2016

ok thanks for the feedback @ysk24ok

@amueller amueller closed this Oct 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants