New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] Fix log loss bug #7239

Merged
merged 3 commits into from Aug 25, 2016

Conversation

Projects
None yet
7 participants
@nelson-liu
Contributor

nelson-liu commented Aug 24, 2016

Reference Issue

original PRs at #6714 and #7166 .
Fixes:
metrics.log_loss fails when any classes are missing in y_true #4033
Fix a bug, the result is wrong when use sklearn.metrics.log_loss with one class, #4546
Log_loss is calculated incorrectly when only 1 class present #6703

What does this implement/fix? Explain your changes.

This PR is a cherrypicked, rebased, and squashed version of #7166. I addressed the comments in there, namely by renaming the single-letter variables, adding another ValueError saying that labels should have more than one unique label if len(lb.classes_) == 1 and labels is not None, and removing a commented out code block.

Any other comments?

@MechCoder @amueller anything else that needs to be done?

@@ -1597,36 +1602,53 @@ def log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None):
The logarithm used is the natural logarithm (base-e).
"""
lb = LabelBinarizer()
T = lb.fit_transform(y_true)

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

this block was moved to line 1620 in the diff. the variable T was renamed to transformed_labels and the variable Y was renamed to y_pred

@nelson-liu nelson-liu changed the title from Fix log loss bug to [MRG] Fix log loss bug Aug 24, 2016

@MechCoder MechCoder added this to the 0.18 milestone Aug 24, 2016

else:
lb.fit(y_true)
if labels is None and len(lb.classes_) == 1:

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

You could reorganize this into:

if len(lb.classes_) == 1:
    if labels is None:
        raise
    else:
        raise

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

Definitely, good catch

# This happens in cases when elements in y_pred have type "str".
if not isinstance(Y, np.ndarray):
if not isinstance(y_pred, np.ndarray):

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

Not your fault, but it is not clear if this check is doing what it says, np.clip should raise errors if y_pred has elements of type str

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

oh i didn't even notice that. hmm should i do a check that the dtype of y_pred is float before the clip and remove that bit?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

actually no that doesn't work since y_pred can take values of bool right?

# Renormalize
Y /= Y.sum(axis=1)[:, np.newaxis]
loss = -(T * np.log(Y)).sum(axis=1)
y_pred /= y_pred.sum(axis=1)[:, np.newaxis]

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

Umm, is the [:, np.newaxis] broadcasting necessary? Should work without that no?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

no i think it is necessary...

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

tests also blow up if i remove it, which seems to support the fact that it's necessary.

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

Ooh yeah, ofc sorry..

Y = Y[:, np.newaxis]
if Y.shape[1] == 1:
Y = np.append(1 - Y, Y, axis=1)
if y_pred.ndim == 1:

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

So it might be a good time to add to the documentation that we can handle y_pred = (n_samples,) and that the score inferred is that of the positive class.

if T.shape[1] != Y.shape[1]:
check_consistent_length(transformed_labels, y_pred)
transformed_labels = check_array(transformed_labels)
y_pred = check_array(y_pred)

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

The check_array is clearly redundant.

This comment has been minimized.

@amueller

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

Because there is another check_array on L1624

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

seems like the check in L1604 of the original file, 1624 of the diff should do?

T = check_array(T)
Y = check_array(Y)
if T.shape[1] != Y.shape[1]:
check_consistent_length(transformed_labels, y_pred)

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

wdyt about moving this consistent_length check to the top, i.e check_consistent_length(y_pred, y_true)

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

+1

if not isinstance(Y, np.ndarray):
raise ValueError("y_pred should be an array of floats.")
# Verify the dtype of y_pred is float.
if y_pred.dtype != float:

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

Alternatively you could provide the dtype argument in check_array as well.

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

Hmm. Seems like tests fail because np.float64 != float

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

yeah, i thought that would be a gotcha as well. but I ran the following and it made me think it was fine:

In [1]: import numpy as np

In [2]: l = np.array([.1,.2,.3,.4])

In [3]: l.dtype
Out[3]: dtype('float64')

In [4]: l.dtype == float
Out[4]: True

In [5]: l.dtype != float
Out[5]: False

looking at the failing tests, the y_preds being passed in are of type bool, so doesn't seem related to that.

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

I see, in that case it should be ok to cast it to a float dtype. The np.clip call does that anyhow.

y = np.array([True, False, False])
np.clip(y, 1e-7, 1e7)
array([  1.00000000e+00,   1.00000000e-07,   1.00000000e-07])

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

hmm, are you proposing casting the contents of the numpy array y_pred to float? Is the correct way to do this y_pred.astype(float) and then catch any conversion error from numpy and throw a ValueError: y_pred should be an array of floats?

y_score = np.array([[0.1, 0.9], [0.1, 0.9]])
# because y_true label are the same, there should be an error if the
# labels option has not been used

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

I think we can do away with this comment. The error_str is explicit enough.

assert_raise_message(ValueError, error_str, log_loss, y_true, y_pred)
# when the labels argument is used, it works

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

glad it does :p

@@ -1384,6 +1383,24 @@ def test_log_loss():
loss = log_loss(y_true, y_pred)
assert_almost_equal(loss, 1.0383217, decimal=6)
# test labels option
y_true = [2, 2]

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

Should we add this under a separate test?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

hmm i think it's fine as it is?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 24, 2016

Contributor

if you want it as a separate test, then i'd be fine with moving it. no strong opinions either way here.

@MechCoder

This comment has been minimized.

Member

MechCoder commented Aug 24, 2016

Could you also add a test for when there is more than one label in y_true but still len(np.unique(y_true)) != y_pred.shape[1] as a non-regression test for this (#4033) and as a sanity check?

@MechCoder

This comment has been minimized.

Member

MechCoder commented Aug 24, 2016

Also, it is possible to move all the check_array and check_consistent_length checks to the top of the function. It is not clear why all those checks are necessary. (For instance, the fit and transform in the LabelBinarizer should internally call check_array as well.)

@MechCoder

This comment has been minimized.

Member

MechCoder commented Aug 24, 2016

Thanks for wrapping this up. Seems like changing the names of variables has given us the opportunity to do an unintentional cleanup ;)

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Aug 24, 2016

Also, it is possible to move all the check_array and check_consistent_length checks to the top of the function. It is not clear why all those checks are necessary. (For instance, the fit and transform in the LabelBinarizer should internally call check_array as well.)

The check for transformed_labels that I didn't move to the top seems necessary, considering the error is thrown at all (and thus isn't picked up by LabelBinarizer). I've addressed your comments above (kept track of what I had finished with the 👍 emoji). if you can't find where I put a change, I'd be happy to point you to the associated place as this diff is pretty big.

raise ValueError("Unable to automatically cast y_pred to "
"float. y_pred should be an array of floats.")
# sanity check
if y_pred.dtype != float:

This comment has been minimized.

@MechCoder

MechCoder Aug 24, 2016

Member

So the correct way to do this is to pass a dtype argument to check_array that raises an error if it is unable to be cast to the provided dtype. But if I pass dtype=np.float32, it fails this test (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/tests/test_classification.py#L1392) because check_array raises an error with a MockDataFrame and dtype=np.float32.

For now, I would suggest to either just remove these float checks (note that nothing useful was checked previously and np.clip should take care of string dtypes) or figure out what is going on with the MockDataframe and fix that (which is beyond the scope of this PR)
Wdyt?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

seems reasonable, i agree that figuring out what's going on with MockDataframe is beyond the scope of the PR. do you want to raise an issue for that?

raise ValueError("y_true and y_pred have different number of classes "
"%d, %d" % (T.shape[1], Y.shape[1]))
"%d, %d. Please provide the true labels explicitly "

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

This can also happen when the number of labels provided is different than the number of classes in y_pred. If you would like to add this error, then it would be better to refactor into

if len(lb.classes_) != y_pred.shape[1]:
    if labels is None:
        raise "provide labels"
    else:
        raise "number of classes in labels different from that in y_pred"

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

doesn't this not work when y_pred is of shape (n_samples, )? tuple index out of range on shape

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

also aren't there legitimate cases where len(lb.classes_) != y_pred.shape[1] and you want it to work? e.g. #4033

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

how is that possible? have you not appended 1 - y_pred for that case above?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

Sorry, what do you mean "appended 1-y_pred for that case above"?

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

I thought I had commented here :-( .

When you provide labels, it is assumed that these labels are from the entire training data, and your y_score matrix is got after fitting your model on the entire training data. So len(labels) == y_score.shape[1] which is also equal to len(lb.classes_) since you fit your binarizer with labels.

If you do not provide labels, it is assumed y_true has all possible labels. Hence still len(np.unique(y_true)) == y_pred.shape[1] which is again equal to len(lb.classes_) since now you fit the binarizer with y_true.

what do you mean "appended 1-y_pred for that case above"?

I meant that for the (n_samples,) (binary) case you have explicitly converted to (n_samples, 2) above right?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

weird, i thought you did too. at least i got an email for two comments. anyway, i think i understand now thanks 😄

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

yeah wow i was mistakenly looking at the similar conditional above, sorry about that :/

if not isinstance(Y, np.ndarray):
raise ValueError("y_pred should be an array of floats.")
# if not isinstance(y_pred, np.ndarray):
# raise ValueError("y_pred should be an array of floats.")

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

you can remove this.

@@ -1563,8 +1564,10 @@ def log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None):
Ground truth (correct) labels for n_samples samples.
y_pred : array-like of float, shape = (n_samples, n_classes)
or (n_samples, )

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

(n_samples,)

Predicted probabilities, as returned by a classifier's
predict_proba method.
predict_proba method. If ``y_pred.shape = (n_samples, )``
the score inferred is that of the positive class.

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

I know I suggested to add this but wdyt about rewording this like

"the probabilities provided are assumed to be that of the positive class"

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

yeah, that's clearer.

@@ -1384,6 +1383,31 @@ def test_log_loss():
loss = log_loss(y_true, y_pred)
assert_almost_equal(loss, 1.0383217, decimal=6)
# case when len(np.unique(y_true)) != y_pred.shape[1]

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

either the comment is misleading or the test is, but my money is on the test ;)

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

indeed. sorry about that.

@@ -263,6 +263,10 @@ Enhancements
generating attribute ``estimators_samples_`` only when it is needed.
By `David Staub`_.
- Added `labels` flag to :class:`metrics.log_loss` to correct metrics when
only one class is present in test data set.

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

WDYT about rewording this to (or something similar)

"Added labels flag to :class:metrics.log_loss to explicitly provide the labels when the number of classes in y_true and y_pred differ"

y_pred = [[0.2, 0.7], [0.6, 0.5], [0.2, 0.3]]
error_str = ('Found arrays with inconsistent numbers of '
'samples: [2 3]')
assert_raise_message(ValueError, error_str, log_loss, y_true, y_pred)

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

Don't we test this in metrics.test.test_common? Too bad :/

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

should i remove the test?

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

I did not have a detailed look. If we already do check, then yes.

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

i couldn't find it, but i'd be fine with removing if someone can point me to it

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

Have you checked the coverage?

@MechCoder

This comment has been minimized.

Member

MechCoder commented Aug 25, 2016

@nelson-liu That should be my last pass. LGTM pending comments.

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Aug 25, 2016

@MechCoder addressed your comments, let me know if i got them right (in particular the nonregression test).

@@ -1384,6 +1383,31 @@ def test_log_loss():
loss = log_loss(y_true, y_pred)
assert_almost_equal(loss, 1.0383217, decimal=6)
# case when len(np.unique(y_true)) != y_pred.shape[1]
y_true = [1,2,2]

This comment has been minimized.

@MechCoder

MechCoder Aug 25, 2016

Member

space this out. and space out labels 2 lines below

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

still pep8

@MechCoder

This comment has been minimized.

Member

MechCoder commented Aug 25, 2016

Thanks! You have my +1.

@@ -1544,7 +1544,8 @@ def hamming_loss(y_true, y_pred, classes=None, sample_weight=None):
raise ValueError("{0} is not supported".format(y_type))
def log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None):
def log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None,
labels=None):
"""Log loss, aka logistic loss or cross-entropy loss.
This is the loss function used in (multinomial) logistic regression

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

I think the docstring should document the assumptions. I'm not sure we do this in other places but it would be good practice. If labels is None and y_pred is 1-d, what happens? This is sort of no-obvious. Also, the labels in y_pred are assumed to be ordered alphabetically, as done by LabelBinarizer, right?

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

Maybe add a sentence to the main part of the docstring "Log-loss is only defined for two or more labels."?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

good idea, will do

@@ -1577,6 +1581,10 @@ def log_loss(y_true, y_pred, eps=1e-15, normalize=True, sample_weight=None):
sample_weight : array-like of shape = [n_samples], optional
Sample weights.
labels : array-like, optional (default=None)
If not provided, labels will be inferred from y_true

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

full stop at the end.

if len(lb.classes_) == 1:
if labels is None:
raise ValueError('y_true has only one label. Please provide '

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

has -> contains

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

Also maybe add the single label to the error message?

'the true labels explicitly through the labels '
'argument.')
else:
raise ValueError('labels should have more than one unique label.')

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

have -> contains

Maybe "The labels array needs to contain at least two labels for log_loss, got {}."?

if y_pred.ndim == 1:
y_pred = y_pred[:, np.newaxis]
if y_pred.shape[1] == 1:
y_pred = np.append(1 - y_pred, y_pred, axis=1)

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

why was the check_array on y_pred removed?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

it was moved to the very top of the function

if labels is None:
raise ValueError("y_true and y_pred have different number of classes "
"%d, %d. Please provide the true labels explicitly "
"through the labels argument" %

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

full stop at the end

(transformed_labels.shape[1], y_pred.shape[1]))
else:
raise ValueError('The number of classes in labels is different '
'from that in y_pred')

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

full stop. Also add the shapes? Or maybe actually I'd add lb.classes_ (also in the above error message).

'samples: [2 3]')
assert_raise_message(ValueError, error_str, log_loss, y_true, y_pred)
# when the labels argument is used, it works

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

This comment is confusing when coming after the test above. It refers to a test two above.

'the true labels explicitly through the labels argument.')
assert_raise_message(ValueError, error_str, log_loss, y_true, y_pred)
y_pred = [[0.2, 0.7], [0.6, 0.5], [0.2, 0.3]]

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

y_score2?

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

hmm i don't see why this should be renamed to y_score2? could you explain a bit more. sorry if im being slow / missing something obvious

This comment has been minimized.

@amueller

amueller Aug 25, 2016

Member

Just cospmetic. But in the test it's called y_score, right? I guess you switch between y_score and y_pred for the same thing multiple times.

This comment has been minimized.

@nelson-liu

nelson-liu Aug 25, 2016

Contributor

ah, that makes sense. I didn't write most of these tests, so i just followed the example of the nearest neighbors 😄 . ill change it.

@amueller

This comment has been minimized.

Member

amueller commented Aug 25, 2016

looks good apart from minor comments.

@amueller

This comment has been minimized.

Member

amueller commented Aug 25, 2016

rebase?

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Aug 25, 2016

@amueller addressed your comments, can you look over them (in particular the docstrings clarifying assumptions made) to make sure I got what you wanted before I rebase and lose the history? I'll rebase after you verify it's ok.

hongguangguo and others added some commits Apr 25, 2016

fixed log_loss bug
enhance log_loss labels option feature

log_loss

changed test log_loss case

u

add ValueError in log_loss
fixed error message when y_pred and y_test labels don't match
fixes as per existing pull request #6714

fixed log_loss bug

enhance log_loss labels option feature

log_loss

changed test log_loss case

u

add ValueError in log_loss

fixes as per existing pull request #6714

fixed error message when y_pred and y_test labels don't match

fixed error message when y_pred and y_test labels don't match

corrected doc/whats_new.rst for syntax and with correct formatting of credits

additional formatting fixes for doc/whats_new.rst

fixed versionadded comment

removed superfluous line

removed superflous line
@amueller

This comment has been minimized.

Member

amueller commented Aug 25, 2016

lgtm with additional sentence to docstring.

Wrap up changes to fix log_loss bug and clean up log_loss
fix a typo in whatsnew

refactor conditional and move dtype check before np.clip

general cleanup of log_loss

remove dtype checks

edit non-regression test and wordings

fix non-regression test

misc doc fixes / clarifications + final touches

fix naming of y_score2 variable

specify log loss is only valid for 2 labels or more
@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Aug 25, 2016

squashed my commits and rebased. do we want to merge all of the commits on this PR, seeing as there are 3 authors, instead of squashing?

@amueller

This comment has been minimized.

Member

amueller commented Aug 25, 2016

one commit per author is fine.

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Aug 25, 2016

Perfect, just how I squashed it. Assuming CI passes, this is ready for merge on my side, let me know if anything else is needed

@amueller

This comment has been minimized.

Member

amueller commented Aug 25, 2016

no, I think it's good to go.

@nelson-liu nelson-liu changed the title from [MRG+1] Fix log loss bug to [MRG+2] Fix log loss bug Aug 25, 2016

@nelson-liu

This comment has been minimized.

Contributor

nelson-liu commented Aug 25, 2016

@amueller merge?

@MechCoder MechCoder merged commit 104e09a into scikit-learn:master Aug 25, 2016

3 checks passed

ci/circleci Your tests passed on CircleCI!
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@MechCoder

This comment has been minimized.

Member

MechCoder commented Aug 25, 2016

Thanks!

@marconoe

This comment has been minimized.

marconoe commented Dec 8, 2016

hey guys im new to github and coding and wondering - how do i use the fix that you guys seem to have created above? I am getting the same issue using log_loss:

ValueError: y_true and y_pred have different number of classes 2, 3

thanks for working on this!
marco

@amueller

This comment has been minimized.

Member

amueller commented Dec 8, 2016

@marconoe what version of scikit-learn are you using? This should be fixed in 0.18 and 0.181.

@marconoe

This comment has been minimized.

marconoe commented Dec 9, 2016

hi @amueller,
Thanks that makes sense - I'm using 0.17.1. I just updated to 0.18.1

Now i get a new error but at least it's different so i can work on that.

cheers
marco

@jnothman

This comment has been minimized.

Member

jnothman commented Dec 9, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment