New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] ENH&BUG Add pos_label parameter and fix a bug in average_precision_score #9980

Merged
merged 11 commits into from Jul 16, 2018

Conversation

5 participants
@qinhanmin2014
Member

qinhanmin2014 commented Oct 23, 2017

Reference Issues/PRs

part of #9829

What does this implement/fix? Explain your changes.

(1)add pos_label parameter to average_precision_score (Although we finally decide not to introduce pos_label in roc_auc_score, I think we might need pos_label here. Because there are no relationship between the results if we reverse the true labels, also, precision/recall all support pos_label)
(2)fix a bug where average_precision_score will sometimes return nan when sample_weight contains 0

y_true = np.array([0, 0, 0, 1, 1, 1])
y_score = np.array([0.1, 0.4, 0.85, 0.35, 0.8, 0.9])
average_precision_score(y_true, y_score, sample_weight=[1, 1, 0, 1, 1, 0])
# output:nan

I do it here because of (3)
(3)move average_precision scores out of METRIC_UNDEFINED_BINARY (this should contain the regression test for (1) and (2))

Some comments:
(1)For the underlying method(precision_recall_curve), the default value of pos_label is None, but I choose to set the default value of pos_label to 1 because this is what P/R/F is doing. What's more, the meaning of pos_label=None is not clear even in scikit-learn itself (see #10010)
(2)I slightly modified the common test. Currently, the part I modified is only designed for brier_score_loss(I'm doing the same thing in #9562) . I think it is right because as a common test, it seems not good to force metrics to accept str y_true without pos_label.

Any other comments?

cc @jnothman Could you please take some time to review or at least judge whether this is the right way to go? Thanks a lot :)

qinhanmin2014 added some commits Oct 23, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Oct 26, 2017

Member

ping @jnothman Could you please take some time to review it or at least judge whether this is the right way to go? Thanks a lot :)

Member

qinhanmin2014 commented Oct 26, 2017

ping @jnothman Could you please take some time to review it or at least judge whether this is the right way to go? Thanks a lot :)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 26, 2017

Member
Member

jnothman commented Oct 26, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Dec 6, 2017

Member

ping @jnothman for a review. Thanks :)

Member

qinhanmin2014 commented Dec 6, 2017

ping @jnothman for a review. Thanks :)

@jnothman

Otherwise LGTM

Show outdated Hide outdated sklearn/metrics/ranking.py Outdated
Show outdated Hide outdated sklearn/metrics/ranking.py Outdated
Show outdated Hide outdated sklearn/metrics/ranking.py Outdated
Show outdated Hide outdated sklearn/metrics/ranking.py Outdated
@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Dec 10, 2017

Member

@jnothman Thanks a lot for the review :) Comments addressed. I also simplify the code.

are we sure at this stage that the y_type is not multiclass?

At this point not. But we do the check once we finish the preparation and enter _average_binary_score, so I think it seems unnecessary to add a duplicate check here.

Member

qinhanmin2014 commented Dec 10, 2017

@jnothman Thanks a lot for the review :) Comments addressed. I also simplify the code.

are we sure at this stage that the y_type is not multiclass?

At this point not. But we do the check once we finish the preparation and enter _average_binary_score, so I think it seems unnecessary to add a duplicate check here.

@jnothman

LGTM. Please add what's new

@@ -150,7 +151,7 @@ def average_precision_score(y_true, y_score, average="macro",
Parameters
----------
y_true : array, shape = [n_samples] or [n_samples, n_classes]
True binary labels (either {0, 1} or {-1, 1}).
True binary labels or binary label indicators.

This comment has been minimized.

@jnothman

jnothman Jan 16, 2018

Member

label indicators -> multilabel indicators ??

@jnothman

jnothman Jan 16, 2018

Member

label indicators -> multilabel indicators ??

@jnothman jnothman changed the title from [MRG] ENH&BUG Add pos_label parameter and fix a bug in average_precision_score to [MRG+1] ENH&BUG Add pos_label parameter and fix a bug in average_precision_score Jan 16, 2018

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Jan 16, 2018

Member

@jnothman Thanks for the review :) I update what's new (not sure whether we need two entries)

label indicators -> multilabel indicators ??

I'm not creating term but just reverting the previous change (#9557) since we now extend the function. We are using the term binary label indicators in several places and we have stated that it's for multilabel case in the document. Do you think I need to change it? Thanks.

Member

qinhanmin2014 commented Jan 16, 2018

@jnothman Thanks for the review :) I update what's new (not sure whether we need two entries)

label indicators -> multilabel indicators ??

I'm not creating term but just reverting the previous change (#9557) since we now extend the function. We are using the term binary label indicators in several places and we have stated that it's for multilabel case in the document. Do you think I need to change it? Thanks.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jan 16, 2018

Member
Member

jnothman commented Jan 16, 2018

@GaelVaroquaux

This looks good aside from the cosmetic comment.

Show outdated Hide outdated sklearn/metrics/ranking.py Outdated
@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 16, 2018

Member

I am addressing the cosmetic comment myself and merging.

Member

GaelVaroquaux commented Jul 16, 2018

I am addressing the cosmetic comment myself and merging.

@GaelVaroquaux GaelVaroquaux changed the title from [MRG+1] ENH&BUG Add pos_label parameter and fix a bug in average_precision_score to [MRG+2] ENH&BUG Add pos_label parameter and fix a bug in average_precision_score Jul 16, 2018

@GaelVaroquaux

LGTM.

Merging when CI is green.

@ogrisel ogrisel added this to PRs tagged in scikit-learn 0.20 Jul 16, 2018

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Jul 16, 2018

Member

Thanks a lot @GaelVaroquaux :)
FYI, I remove an unused variable.

Member

qinhanmin2014 commented Jul 16, 2018

Thanks a lot @GaelVaroquaux :)
FYI, I remove an unused variable.

@amueller amueller merged commit dd69361 into scikit-learn:master Jul 16, 2018

4 of 5 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: python2 Your tests passed on CircleCI!
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

scikit-learn 0.20 automation moved this from PRs tagged to Done Jul 16, 2018

@qinhanmin2014 qinhanmin2014 deleted the qinhanmin2014:average_precision_pos_label branch Jul 17, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment