New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Add scorer based on brier_score_loss #9521

Merged
merged 11 commits into from Aug 21, 2017

Conversation

Projects
None yet
4 participants
@qinhanmin2014
Member

qinhanmin2014 commented Aug 11, 2017

Reference Issue

Fixes #9512

What does this implement/fix? Explain your changes.

Add scorer based on brier_score_loss.

Any other comments?

qinhanmin2014 added some commits Aug 11, 2017

@qinhanmin2014 qinhanmin2014 changed the title from [MRG] Add scorer based on brier_score_loss to [WIP] Add scorer based on brier_score_loss Aug 11, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Aug 11, 2017

Member

@jnothman Sorry to disturb but I think currently we can't implement scorer based on brier_score_loss because like other similar metrics(e.g., log_loss with a scorer log_loss_scorer), it depends on predicted probabilities but it simply can't support the result returned by predict_proba(shape= (n_samples, n_classes)).
From log_loss(already has a scorer):
2017-08-11_230437
From brier_score_loss:
2017-08-11_230453
Do I make a mistake? Thanks.

Member

qinhanmin2014 commented Aug 11, 2017

@jnothman Sorry to disturb but I think currently we can't implement scorer based on brier_score_loss because like other similar metrics(e.g., log_loss with a scorer log_loss_scorer), it depends on predicted probabilities but it simply can't support the result returned by predict_proba(shape= (n_samples, n_classes)).
From log_loss(already has a scorer):
2017-08-11_230437
From brier_score_loss:
2017-08-11_230453
Do I make a mistake? Thanks.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Aug 11, 2017

Member

you just need to pass just the second column, maybe using

make_scorer(lambda y_true, y_pred: brier_score_loss(y_true, y_pred[:, 1]),
            needs_proba=True, greater_is_better=False)

(untested)

Member

amueller commented Aug 11, 2017

you just need to pass just the second column, maybe using

make_scorer(lambda y_true, y_pred: brier_score_loss(y_true, y_pred[:, 1]),
            needs_proba=True, greater_is_better=False)

(untested)

@qinhanmin2014 qinhanmin2014 changed the title from [WIP] Add scorer based on brier_score_loss to [MRG] Add scorer based on brier_score_loss Aug 12, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Aug 12, 2017

Member

@amueller Thanks a lot. It works! I also add a comment to address the reason(maybe limitation) of current implementation.

Member

qinhanmin2014 commented Aug 12, 2017

@amueller Thanks a lot. It works! I also add a comment to address the reason(maybe limitation) of current implementation.

qinhanmin2014 added some commits Aug 12, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 13, 2017

Member
Member

jnothman commented Aug 13, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Aug 14, 2017

Member

@jnothman Thanks.
(1)According to the docstring, brier_score_loss is only used in binary cases. If this requirement is satisfied, the implementation of the scorers seems to be correct.
(2)However currently, we don't have such check in brier_score_loss, e.g., if we change y_true from [0,1,1,0] to [0,2,2,1], we get the same result(because we regard the biggest value as 1, the rest as 0).
I'm inspecting roc_auc_score and will reply soon :)

Member

qinhanmin2014 commented Aug 14, 2017

@jnothman Thanks.
(1)According to the docstring, brier_score_loss is only used in binary cases. If this requirement is satisfied, the implementation of the scorers seems to be correct.
(2)However currently, we don't have such check in brier_score_loss, e.g., if we change y_true from [0,1,1,0] to [0,2,2,1], we get the same result(because we regard the biggest value as 1, the rest as 0).
I'm inspecting roc_auc_score and will reply soon :)

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Aug 14, 2017

Member

@jnothman Here's my observation.
For roc_auc_score, when the shape of y_score is [n_samples, n_classes], the shape of y_true also need to be [n_samples, n_classes] (multilabel-indicator). Then each corresponding line is selected. The scores are then calculated and averaged.
Core function:_average_binary_score(in base.py)

for c in range(n_classes):
        y_true_c = y_true.take([c], axis=not_average_axis).ravel()
        y_score_c = y_score.take([c], axis=not_average_axis).ravel()
        score[c] = binary_metric(y_true_c, y_score_c,
                                 sample_weight=score_weight)

Note that in the loop, the shape of y_true_c and y_score_c has become [n_samples]. So that's why roc_auc_score is based on roc_curve and roc_curve only supports shape = [n_samples].

Member

qinhanmin2014 commented Aug 14, 2017

@jnothman Here's my observation.
For roc_auc_score, when the shape of y_score is [n_samples, n_classes], the shape of y_true also need to be [n_samples, n_classes] (multilabel-indicator). Then each corresponding line is selected. The scores are then calculated and averaged.
Core function:_average_binary_score(in base.py)

for c in range(n_classes):
        y_true_c = y_true.take([c], axis=not_average_axis).ravel()
        y_score_c = y_score.take([c], axis=not_average_axis).ravel()
        score[c] = binary_metric(y_true_c, y_score_c,
                                 sample_weight=score_weight)

Note that in the loop, the shape of y_true_c and y_score_c has become [n_samples]. So that's why roc_auc_score is based on roc_curve and roc_curve only supports shape = [n_samples].

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Aug 14, 2017

Member

But if someone does cross_val_score(DecisionTreeClassifier(), X, y, scoring="roc_auc") that works with binary even though predict_proba is (n_samples, 2) and y is not multi-label.

Member

amueller commented Aug 14, 2017

But if someone does cross_val_score(DecisionTreeClassifier(), X, y, scoring="roc_auc") that works with binary even though predict_proba is (n_samples, 2) and y is not multi-label.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 14, 2017

Member
Member

jnothman commented Aug 14, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Aug 15, 2017

Member

@jnothman @amueller (kindly regret me for a long comment and possible typo:))
From my perspective, both @amueller and myself are right. The core issue here is that scorer has a wrapper(_PredictScorer/_ProbaScorer/_ThresholdScorer), which can help it support more complicated situations.
roc_auc_score support binary y_true(shape=[n_samples]) or multilabel-indicator y_true(shape=[n_samples, n_classes]). For binary classification, we can use binary y_true or multilabel-indicator y_true, but for multiclass classification, we can only use multilabel-indicator y_true. The shape of y_pred has to be consistent with y_true.

y_true_1 = np.array([[0, 1], [1, 0], [1, 0], [0, 1]]*10) # multilabel-indicator
y_true_2 = np.array([1, 0, 0, 1]*10) # binary
y_prob_1 = np.array([[0.1, 0.9], [0.8, 0.2], [0.3, 0.7], [0.4, 0.6]]*10)
y_prob_2 = np.array([0.9, 0.2, 0.7, 0.6]*10)
roc_auc_score(y_true_1, y_prob_1) # no error, can extent to multiclass
roc_auc_score(y_true_1, y_prob_2) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_1) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_2) 
# no error, result same as the first one, only for binary classification 

However, since roc_auc_scorer is wrapped by _ThresholdScorer, it can support more complicated situations.
For example, through the following code in _ThresholdScorer:

if y_type == "binary":
    y_pred = y_pred[:, 1]

roc_auc_scorer can actually support binary y_true(shape=[n_samples]) and the output of predict_proba (shape=[n_samples, n_classes]).

X = y_prob_1 # only for convinience
cross_val_score(DecisionTreeClassifier(), X, y_true_1, scoring="roc_auc") 
# no problem like roc_auc_score
cross_val_score(DecisionTreeClassifier(), X, y_true_2, scoring="roc_auc") 
# no problem with the wrapper

When implementing scorer based on brier_loss_score, we need to use _ThresholdScorer. But unlike _ProbaScorer, it is naive since the only scorer based on it(log_loss_scorer) has done everything by itself.

log_loss(y_true_1, y_prob_1) # no error
log_loss(y_true_1, y_prob_2) # no error
log_loss(y_true_2, y_prob_1) # no error
log_loss(y_true_2, y_prob_2) # no error
# they are the same
Member

qinhanmin2014 commented Aug 15, 2017

@jnothman @amueller (kindly regret me for a long comment and possible typo:))
From my perspective, both @amueller and myself are right. The core issue here is that scorer has a wrapper(_PredictScorer/_ProbaScorer/_ThresholdScorer), which can help it support more complicated situations.
roc_auc_score support binary y_true(shape=[n_samples]) or multilabel-indicator y_true(shape=[n_samples, n_classes]). For binary classification, we can use binary y_true or multilabel-indicator y_true, but for multiclass classification, we can only use multilabel-indicator y_true. The shape of y_pred has to be consistent with y_true.

y_true_1 = np.array([[0, 1], [1, 0], [1, 0], [0, 1]]*10) # multilabel-indicator
y_true_2 = np.array([1, 0, 0, 1]*10) # binary
y_prob_1 = np.array([[0.1, 0.9], [0.8, 0.2], [0.3, 0.7], [0.4, 0.6]]*10)
y_prob_2 = np.array([0.9, 0.2, 0.7, 0.6]*10)
roc_auc_score(y_true_1, y_prob_1) # no error, can extent to multiclass
roc_auc_score(y_true_1, y_prob_2) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_1) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_2) 
# no error, result same as the first one, only for binary classification 

However, since roc_auc_scorer is wrapped by _ThresholdScorer, it can support more complicated situations.
For example, through the following code in _ThresholdScorer:

if y_type == "binary":
    y_pred = y_pred[:, 1]

roc_auc_scorer can actually support binary y_true(shape=[n_samples]) and the output of predict_proba (shape=[n_samples, n_classes]).

X = y_prob_1 # only for convinience
cross_val_score(DecisionTreeClassifier(), X, y_true_1, scoring="roc_auc") 
# no problem like roc_auc_score
cross_val_score(DecisionTreeClassifier(), X, y_true_2, scoring="roc_auc") 
# no problem with the wrapper

When implementing scorer based on brier_loss_score, we need to use _ThresholdScorer. But unlike _ProbaScorer, it is naive since the only scorer based on it(log_loss_scorer) has done everything by itself.

log_loss(y_true_1, y_prob_1) # no error
log_loss(y_true_1, y_prob_2) # no error
log_loss(y_true_2, y_prob_1) # no error
log_loss(y_true_2, y_prob_2) # no error
# they are the same
@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Aug 15, 2017

Member

@jnothman @amueller
I have come up with another way (might better) for the scorer. Could you please have a look? Thanks.

Member

qinhanmin2014 commented Aug 15, 2017

@jnothman @amueller
I have come up with another way (might better) for the scorer. Could you please have a look? Thanks.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Aug 15, 2017

Member

The question was for binary classification "why does it work at all" and the answer is: _ThresholdScorer has an if y_type == 'binary' and we're using predict_proba we slice the probabilities appropriately.
However _ProbaScorer doesn't do that. log_loss, the only other probabilistic score that we have right now, accepts either (n_samples,) and assumes it's the positive class (the larger one) or (n_samples, 2).

I think we can be flexible and do the same in brier_score_loss as we do in log_loss. Making the interface more flexible there also benefits people that directly use the function.

Btw, it looks like brier_score_loss might want to error if only one class is present and pos_label=None as we do in log_loss.

Member

amueller commented Aug 15, 2017

The question was for binary classification "why does it work at all" and the answer is: _ThresholdScorer has an if y_type == 'binary' and we're using predict_proba we slice the probabilities appropriately.
However _ProbaScorer doesn't do that. log_loss, the only other probabilistic score that we have right now, accepts either (n_samples,) and assumes it's the positive class (the larger one) or (n_samples, 2).

I think we can be flexible and do the same in brier_score_loss as we do in log_loss. Making the interface more flexible there also benefits people that directly use the function.

Btw, it looks like brier_score_loss might want to error if only one class is present and pos_label=None as we do in log_loss.

@jnothman

Yes, I think this approach is fine, copying the ThresholdScorer behaviour to ProbaScorer. LGTM

@jnothman jnothman changed the title from [MRG] Add scorer based on brier_score_loss to [MRG+1] Add scorer based on brier_score_loss Aug 15, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Aug 16, 2017

Member

@jnothman @amueller
I opened a new pull request #9562 to address your concerns about brier_score_loss along with a bug fix. Could you please have a look? Thanks:)

Member

qinhanmin2014 commented Aug 16, 2017

@jnothman @amueller
I opened a new pull request #9562 to address your concerns about brier_score_loss along with a bug fix. Could you please have a look? Thanks:)

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Aug 16, 2017

Member

Alright, I'm fine with merging this as well. Note, though, that it changes what is passed for log_loss, but that's inconsequential (unless someone provided a malformed probability that doesn't sum to 1, in which case this patch will change results).

Anyway, I'm ok to merge.

Member

amueller commented Aug 16, 2017

Alright, I'm fine with merging this as well. Note, though, that it changes what is passed for log_loss, but that's inconsequential (unless someone provided a malformed probability that doesn't sum to 1, in which case this patch will change results).

Anyway, I'm ok to merge.

qinhanmin2014 added some commits Aug 17, 2017

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Aug 17, 2017

Member

@jnothman @amueller Thanks. I updated what's new along with some minor fix about myself (things in 0.19.X but not in master and a strange duplicate entry.)
From my perspective, @jnothman could you please check @amueller 's previous comment. Current implementation will slightly change the behaviour of the scorer based on log_loss when users manage to pass probability that doesn't sum to 1 in binary case.
e.g.
y_pred=np.array([[0.1, 0.1], [0.9, 0.9], [0.2, 0.2], [0.8, 0.8]])
is previously perceived to be equal to y_pred=np.array([0.5, 0.5, 0.5, 0.5]) (normalized)
but now equal to y_pred=np.array([0.1, 0.9, 0.2, 0.8]) (we only take the second column and it's considered to be the probability of the positive class)
Note that the doc state that we only support predicted probabilities for log_loss. We can go back to the previous solution to remain everything the same but that might seems awkward for the scorer.

Member

qinhanmin2014 commented Aug 17, 2017

@jnothman @amueller Thanks. I updated what's new along with some minor fix about myself (things in 0.19.X but not in master and a strange duplicate entry.)
From my perspective, @jnothman could you please check @amueller 's previous comment. Current implementation will slightly change the behaviour of the scorer based on log_loss when users manage to pass probability that doesn't sum to 1 in binary case.
e.g.
y_pred=np.array([[0.1, 0.1], [0.9, 0.9], [0.2, 0.2], [0.8, 0.8]])
is previously perceived to be equal to y_pred=np.array([0.5, 0.5, 0.5, 0.5]) (normalized)
but now equal to y_pred=np.array([0.1, 0.9, 0.2, 0.8]) (we only take the second column and it's considered to be the probability of the positive class)
Note that the doc state that we only support predicted probabilities for log_loss. We can go back to the previous solution to remain everything the same but that might seems awkward for the scorer.

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Aug 21, 2017

Member

slight ping @jnothman Thanks :)

Member

qinhanmin2014 commented Aug 21, 2017

slight ping @jnothman Thanks :)

y_pred = clf.predict_proba(X)
if y_type == "binary":
y_pred = y_pred[:, 1]

This comment has been minimized.

@jnothman

jnothman Aug 21, 2017

Member

I think this change needs a non-regression test.

Otherwise LGTM.

@jnothman

jnothman Aug 21, 2017

Member

I think this change needs a non-regression test.

Otherwise LGTM.

This comment has been minimized.

@qinhanmin2014

qinhanmin2014 Aug 21, 2017

Member

@jnothman Sorry but I don't quite understand your meaning. Currently, it seems that we don't test the wrapper of the scorer. Instead, we directly test the scorer to ensure a reasonable behaviour. And I have added the new scorer to the test.

@qinhanmin2014

qinhanmin2014 Aug 21, 2017

Member

@jnothman Sorry but I don't quite understand your meaning. Currently, it seems that we don't test the wrapper of the scorer. Instead, we directly test the scorer to ensure a reasonable behaviour. And I have added the new scorer to the test.

This comment has been minimized.

@jnothman

jnothman Aug 21, 2017

Member

Oh. I think I got confused about what we were looking at. LGTM.

@jnothman

jnothman Aug 21, 2017

Member

Oh. I think I got confused about what we were looking at. LGTM.

@jnothman jnothman merged commit ee2025f into scikit-learn:master Aug 21, 2017

6 checks passed

ci/circleci Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.16%)
Details
codecov/project 96.16% (+<.01%) compared to dca8406
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
lgtm analysis: Python No alert changes
Details

@qinhanmin2014 qinhanmin2014 deleted the qinhanmin2014:my-feature-1 branch Aug 22, 2017

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment