New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Completely support binary y_true in roc_auc_score #9828

Merged
merged 7 commits into from Oct 11, 2017

Conversation

Projects
None yet
3 participants
@qinhanmin2014
Member

qinhanmin2014 commented Sep 25, 2017

Reference Issue

Fixes #2723, proposed by @jnothman
Also see the discussions in #9805, #9567, #6874, #6873, #2616

What does this implement/fix? Explain your changes.

Currently, roc_auc_score only support either {0, 1} binary y_true or {-1, 1} binary y_true.
The PR completely support binary y_true. The basic thought is that for binary y_true, y_score is supposed to be the score of the class with greater label.
Use common test as the regression test.

Any other comments?

cc @jnothman

@jnothman

Otherwise LGTM

Show outdated Hide outdated sklearn/metrics/tests/test_common.py
@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Sep 25, 2017

Member

@jnothman Thanks for the review.

What are we excluding by this? what happens when we remove the condition?

Test will fail because some metrics still only support {0, 1} y_true or {-1, 1} y_true.

After this PR, there are still some THRESHOLDED_METRICS which are excluded by the if statement:

# average_precision_score
average_precision_score
macro_average_precision_score
micro_average_precision_score
samples_average_precision_score
weighted_average_precision_score

# Multilabel ranking metrics
coverage_error
label_ranking_average_precision_score
label_ranking_loss
Member

qinhanmin2014 commented Sep 25, 2017

@jnothman Thanks for the review.

What are we excluding by this? what happens when we remove the condition?

Test will fail because some metrics still only support {0, 1} y_true or {-1, 1} y_true.

After this PR, there are still some THRESHOLDED_METRICS which are excluded by the if statement:

# average_precision_score
average_precision_score
macro_average_precision_score
micro_average_precision_score
samples_average_precision_score
weighted_average_precision_score

# Multilabel ranking metrics
coverage_error
label_ranking_average_precision_score
label_ranking_loss
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 25, 2017

Member
Member

jnothman commented Sep 25, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Sep 25, 2017

Member

@jnothman Thanks.
Seems that a possible way is to use METRIC_UNDEFINED_BINARY instead of the awkward list. It also seems reasonable because we are testing binary y_true here. But this should be based on #9786 (move roc_auc_score out of METRIC_UNDEFINED_BINARY).

Member

qinhanmin2014 commented Sep 25, 2017

@jnothman Thanks.
Seems that a possible way is to use METRIC_UNDEFINED_BINARY instead of the awkward list. It also seems reasonable because we are testing binary y_true here. But this should be based on #9786 (move roc_auc_score out of METRIC_UNDEFINED_BINARY).

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 25, 2017

Member
Member

jnothman commented Sep 25, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Sep 25, 2017

Member

@jnothman Thanks for your instant reply.

would it be a good idea to merge the PRs so we can see the state of affairs more completely?

From my perspective, #9786 actually solves another problem (improve the stability of roc_auc_score) and is almost finished. There's hardly any direct relationship between the two PRs. So it might be better not to combine #9786 and this PR unless you insist.

should we have a regression test to show that cross_val_score works regardless of which label is positive?

Sorry but I don't quite understand the necessity of such test. If roc_auc_score works appropriately, then seems that the scorer based on roc_auc_score as well as cross_val_score should work appropriately? I can't find similar test currently. If you can point out something similar to me, I will be able to further understand the problem.

And I'm not sure I get why average precision would be undefined-binary, but I'm still not in a position to look at the code.

According to the doc and a glance at the source code, seems that we should also move average_precision_score out of METRIC_UNDEFINED_BINARY. I'll take care of it after #9786.

I have wrapped up our discussions about the common test along with some opinions from myself in #9829.

Member

qinhanmin2014 commented Sep 25, 2017

@jnothman Thanks for your instant reply.

would it be a good idea to merge the PRs so we can see the state of affairs more completely?

From my perspective, #9786 actually solves another problem (improve the stability of roc_auc_score) and is almost finished. There's hardly any direct relationship between the two PRs. So it might be better not to combine #9786 and this PR unless you insist.

should we have a regression test to show that cross_val_score works regardless of which label is positive?

Sorry but I don't quite understand the necessity of such test. If roc_auc_score works appropriately, then seems that the scorer based on roc_auc_score as well as cross_val_score should work appropriately? I can't find similar test currently. If you can point out something similar to me, I will be able to further understand the problem.

And I'm not sure I get why average precision would be undefined-binary, but I'm still not in a position to look at the code.

According to the doc and a glance at the source code, seems that we should also move average_precision_score out of METRIC_UNDEFINED_BINARY. I'll take care of it after #9786.

I have wrapped up our discussions about the common test along with some opinions from myself in #9829.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 25, 2017

Member

Sounds good

Member

jnothman commented Sep 25, 2017

Sounds good

@jnothman jnothman changed the title from [MRG] Completely support binary y_true in roc_auc_score to [MRG+1] Completely support binary y_true in roc_auc_score Sep 25, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Sep 28, 2017

Member

I'd rather the condition be a blacklist (suggesting "not yet implemented" or "not applicable") than a whitelist, which would seem to defy the purpose of common tests

@jnothman Now, we can get rid of the awkward list. Is it OK for you? Thanks.

Member

qinhanmin2014 commented Sep 28, 2017

I'd rather the condition be a blacklist (suggesting "not yet implemented" or "not applicable") than a whitelist, which would seem to defy the purpose of common tests

@jnothman Now, we can get rid of the awkward list. Is it OK for you? Thanks.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 28, 2017

Member

That looks much better!

Member

jnothman commented Sep 28, 2017

That looks much better!

qinhanmin2014 added some commits Sep 28, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Sep 28, 2017

Member

@jnothman Could you please give me some suggestions on how to make lgtm run? Thanks :)

Member

qinhanmin2014 commented Sep 28, 2017

@jnothman Could you please give me some suggestions on how to make lgtm run? Thanks :)

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Sep 29, 2017

Member

@lesteve @amueller Could you kindly please give a second review? Thanks a lot :)

Member

qinhanmin2014 commented Sep 29, 2017

@lesteve @amueller Could you kindly please give a second review? Thanks a lot :)

@TomDLT

This comment has been minimized.

Show comment
Hide comment
@TomDLT

TomDLT Oct 11, 2017

Member

This is indeed much better than #9567, #6874, #2616

The basic use case seems to work:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

np.random.seed(0)
n_samples, n_features = 100, 10
est = LogisticRegression()
X = np.random.randn(n_samples, n_features)
y = np.random.randint(2, size=n_samples)
classes = np.array(['good', 'not-good'])

for y_true in (classes[y], classes[1 - y]):
    est.fit(X, y_true)
    y_score = est.decision_function(X)
    print(roc_auc_score(y_true, y_score))
# 0.678090575275
# 0.678090575275

LGTM

Member

TomDLT commented Oct 11, 2017

This is indeed much better than #9567, #6874, #2616

The basic use case seems to work:

import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score

np.random.seed(0)
n_samples, n_features = 100, 10
est = LogisticRegression()
X = np.random.randn(n_samples, n_features)
y = np.random.randint(2, size=n_samples)
classes = np.array(['good', 'not-good'])

for y_true in (classes[y], classes[1 - y]):
    est.fit(X, y_true)
    y_score = est.decision_function(X)
    print(roc_auc_score(y_true, y_score))
# 0.678090575275
# 0.678090575275

LGTM

@TomDLT TomDLT merged commit 94db658 into scikit-learn:master Oct 11, 2017

6 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.17%)
Details
codecov/project 96.17% (+<.01%) compared to daeb3ad
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details

@qinhanmin2014 qinhanmin2014 deleted the qinhanmin2014:my-feature-3 branch Oct 11, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment