New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+2] ENH multiclass balanced accuracy #10587

Merged
merged 17 commits into from Jul 27, 2018

Conversation

Projects
None yet
4 participants
@jnothman
Member

jnothman commented Feb 5, 2018

Includes computationally simpler implementation and logically simpler description.

See also #10040. Ping @maskani-moh, @amueller.

jnothman added some commits Feb 5, 2018

ENH multiclass balanced accuracy
Includes computationally simpler implementation and logically simpler description.
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Feb 5, 2018

Member

Ahh... passing tests.

Member

jnothman commented Feb 5, 2018

Ahh... passing tests.

jnothman added some commits Feb 5, 2018

DOC
DOC
@@ -1357,6 +1357,8 @@ functions or non-estimator constructors.
equal weight by giving each sample a weight inversely related
to its class's prevalence in the training data:
``n_samples / (n_classes * np.bincount(y))``.
**Note** however that this rebalancing does not take the weight of
samples in each class into account.

This comment has been minimized.

@jnothman

jnothman Feb 5, 2018

Member

Perhaps we should have a "weight-balanced" option for class_weight. It would be interesting to see if that improved imbalanced boosting.

@jnothman

jnothman Feb 5, 2018

Member

Perhaps we should have a "weight-balanced" option for class_weight. It would be interesting to see if that improved imbalanced boosting.

This comment has been minimized.

@jnothman

jnothman Feb 6, 2018

Member

Apparently my phone wrote "weight-loss card" (!) there. Amended.

@jnothman

jnothman Feb 6, 2018

Member

Apparently my phone wrote "weight-loss card" (!) there. Amended.

.. math::
\texttt{balanced-accuracy}(y, \hat{y}) = \frac{1}{2} \left(\frac{\sum_i 1(\hat{y}_i = 1 \land y_i = 1)}{\sum_i 1(y_i = 1)} + \frac{\sum_i 1(\hat{y}_i = 0 \land y_i = 0)}{\sum_i 1(y_i = 0)}\right)
\hat{w}_i = \frac{w_i}{\sum_j{1(y_j = y_i) w_j}}

This comment has been minimized.

@jnothman

jnothman Feb 5, 2018

Member

Should I give the equation assuming w_i=1?

@jnothman

jnothman Feb 5, 2018

Member

Should I give the equation assuming w_i=1?

This comment has been minimized.

@maskani-moh

maskani-moh Feb 6, 2018

Contributor

I think it's fine if we let the general formula.

@maskani-moh

maskani-moh Feb 6, 2018

Contributor

I think it's fine if we let the general formula.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Feb 6, 2018

Member

While I'm interested in your critique of the docs and implementation, @maskani-moh, I'd mostly like you to verify that this interpretation of balanced accuracy, as accuracy with sample weights assigned to give equal total weight to each class, makes the choice of a multiclass generalisation clear.

Member

jnothman commented Feb 6, 2018

While I'm interested in your critique of the docs and implementation, @maskani-moh, I'd mostly like you to verify that this interpretation of balanced accuracy, as accuracy with sample weights assigned to give equal total weight to each class, makes the choice of a multiclass generalisation clear.

@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Feb 8, 2018

Contributor

The implementation with the confusion matrix seems really straight forward. It looks like an average of the TPR per classes. The generalization from binary to multi-class look good to me. I don't see a case where it would not be correct.

Contributor

glemaitre commented Feb 8, 2018

The implementation with the confusion matrix seems really straight forward. It looks like an average of the TPR per classes. The generalization from binary to multi-class look good to me. I don't see a case where it would not be correct.

Show outdated Hide outdated doc/modules/model_evaluation.rst Outdated
Show outdated Hide outdated sklearn/metrics/classification.py Outdated
Show outdated Hide outdated doc/modules/model_evaluation.rst Outdated

@glemaitre glemaitre changed the title from [MRG] ENH multiclass balanced accuracy to [MRG+1] ENH multiclass balanced accuracy Feb 13, 2018

@glemaitre

This comment has been minimized.

Show comment
Hide comment
@glemaitre

glemaitre Feb 13, 2018

Contributor

LGTM. @maskani-moh Could you have a look and tell us WYT?

Contributor

glemaitre commented Feb 13, 2018

LGTM. @maskani-moh Could you have a look and tell us WYT?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 26, 2018

Member

This should be quick to review if someone (other than @glemaitre who has given his +1) is keen to throw it into 0.20.

Member

jnothman commented Jul 26, 2018

This should be quick to review if someone (other than @glemaitre who has given his +1) is keen to throw it into 0.20.

@qinhanmin2014

LGTM at a glance. I need (and promise) to double check the code and refs tomorrow.
Some small comments, feel free to ignore if you think current version is fine.
My LGTM on the PR is based on the fact that the function is there. Honestly, I don't like the idea of including such a function, which can simply be implemented using recall.
Tagging 0.20.

Show outdated Hide outdated doc/modules/model_evaluation.rst Outdated
Show outdated Hide outdated doc/modules/model_evaluation.rst Outdated
Show outdated Hide outdated doc/modules/model_evaluation.rst Outdated
Show outdated Hide outdated sklearn/metrics/classification.py Outdated

@qinhanmin2014 qinhanmin2014 added this to the 0.20 milestone Jul 26, 2018

assert balanced == pytest.approx(macro_recall)
adjusted = balanced_accuracy_score(y_true, y_pred, adjusted=True)
chance = balanced_accuracy_score(y_true, np.full_like(y_true, y_true[0]))
assert adjusted == (balanced - chance) / (1 - chance)

This comment has been minimized.

@qinhanmin2014

qinhanmin2014 Jul 26, 2018

Member

Any reason we can't use == when adjusted=False?

@qinhanmin2014

qinhanmin2014 Jul 26, 2018

Member

Any reason we can't use == when adjusted=False?

@qinhanmin2014

LGTM apart from the comments above.

Show outdated Hide outdated doc/whats_new/v0.20.rst Outdated

@qinhanmin2014 qinhanmin2014 changed the title from [MRG+1] ENH multiclass balanced accuracy to [MRG+2] ENH multiclass balanced accuracy Jul 27, 2018

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 27, 2018

Member
Member

jnothman commented Jul 27, 2018

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 27, 2018

Member
Member

jnothman commented Jul 27, 2018

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Jul 27, 2018

Member

@jnothman Do you mind if I push some cosmetic changes and merge this one?

Member

qinhanmin2014 commented Jul 27, 2018

@jnothman Do you mind if I push some cosmetic changes and merge this one?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 27, 2018

Member
Member

jnothman commented Jul 27, 2018

@qinhanmin2014

LGTM, thanks @jnothman

@qinhanmin2014 qinhanmin2014 merged commit e888c0d into scikit-learn:master Jul 27, 2018

4 of 5 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: python2 Your tests passed on CircleCI!
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 29, 2018

Member

Removing those backslashes broke CircleCI on master.

Member

jnothman commented Jul 29, 2018

Removing those backslashes broke CircleCI on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment