New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add averaging option to AMI and NMI #11124

Merged
merged 15 commits into from Jul 17, 2018

Conversation

6 participants
@aryamccarthy
Contributor

aryamccarthy commented May 23, 2018

Reference Issues/PRs

See #10308; this is a first step toward eventually deprecating one behavior and making their behavior consistent.

What does this implement/fix?

Background: The measures AMI, NMI, and V-measure are intimately related. Each is a normalized version of mutual information, and AMI incorporates adjustment for chance.

  • AMI, NMI, and V-Measure use different strategies for normalizing: the arithmetic mean, geometric mean, and max (i.e. infinity-norm) of the two clusterings' entropies. (V-measure is NMI with arithmetic mean.)
  • This makes the measures difficult to directly compare.
  • Added switch for NMI and AMI to allow choice of normalization
  • Long-term plan: unify behavior. Warning about future deprecation.
Add averaging option to AMI and NMI
Leave current behavior unchanged
@amueller

Needs tests, otherwise great!

Show outdated Hide outdated sklearn/metrics/cluster/supervised.py Outdated
Show outdated Hide outdated sklearn/metrics/cluster/supervised.py Outdated
@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller May 24, 2018

Member

test failures are flake8. you should run flake8 in your editor.

Member

amueller commented May 24, 2018

test failures are flake8. you should run flake8 in your editor.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller May 24, 2018

Member

oh this is related I just saw #8645

Member

amueller commented May 24, 2018

oh this is related I just saw #8645

aryamccarthy added some commits May 24, 2018

aryamccarthy added some commits May 24, 2018

Update docs from AMI, NMI changes (#1)
* Correct the NMI and AMI descriptions in docs

* Update docstrings due to averaging changes

- V-measure
- Homogeneity
- Completeness
- NMI
- AMI
@aryamccarthy

This comment has been minimized.

Show comment
Hide comment
@aryamccarthy

aryamccarthy May 25, 2018

Contributor

Looks ready to squash and merge—the fix is implemented and the tests passed. @amueller ?

Contributor

aryamccarthy commented May 25, 2018

Looks ready to squash and merge—the fix is implemented and the tests passed. @amueller ?

@jnothman

Not so fast ;)

@@ -1185,7 +1179,7 @@ following equation, from Vinh, Epps, and Bailey, (2009). In this equation,
Using the expected value, the adjusted mutual information can then be
calculated using a similar form to that of the adjusted Rand index:
.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\max(H(U), H(V)) - E[\text{MI}]}
.. math:: \text{AMI} = \frac{\text{MI} - E[\text{MI}]}{\text{mean}(H(U), H(V)) - E[\text{MI}]}

This comment has been minimized.

@jnothman

jnothman May 26, 2018

Member

The fact that mean is configurable and varies in the literature should be discussed here, perhaps with some notes on when one is more appropriate than another

@jnothman

jnothman May 26, 2018

Member

The fact that mean is configurable and varies in the literature should be discussed here, perhaps with some notes on when one is more appropriate than another

Show outdated Hide outdated sklearn/metrics/cluster/tests/test_supervised.py Outdated
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman May 26, 2018

Member

Please add an entry to the change log at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

Member

jnothman commented May 26, 2018

Please add an entry to the change log at doc/whats_new/v0.20.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

aryamccarthy added some commits May 27, 2018

Update documentation and remove nose tests (#2)
* Update v0.20.rst

* Update test_supervised.py

* Update clustering.rst
@aryamccarthy

This comment has been minimized.

Show comment
Hide comment
@aryamccarthy

aryamccarthy May 27, 2018

Contributor

Mission accomplished :)

Contributor

aryamccarthy commented May 27, 2018

Mission accomplished :)

@jnothman

I think this is a bit confusing. The normalising constant is always the max of some elementwise mean of U and V.

sqrt and sum don't make sense as names of means: call them geometric and arithmetic

@aryamccarthy

This comment has been minimized.

Show comment
Hide comment
@aryamccarthy

aryamccarthy May 27, 2018

Contributor
Contributor

aryamccarthy commented May 27, 2018

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman May 28, 2018

Member
Member

jnothman commented May 28, 2018

@aryamccarthy

This comment has been minimized.

Show comment
Hide comment
@aryamccarthy

aryamccarthy Jun 5, 2018

Contributor

Checking again—is this ready to pull?

Contributor

aryamccarthy commented Jun 5, 2018

Checking again—is this ready to pull?

@jnothman

Not yet looked at tests

Show outdated Hide outdated doc/whats_new/v0.20.rst Outdated
Show outdated Hide outdated sklearn/metrics/cluster/supervised.py Outdated
Show outdated Hide outdated sklearn/metrics/cluster/supervised.py Outdated
Show outdated Hide outdated doc/modules/clustering.rst Outdated
Show outdated Hide outdated sklearn/metrics/cluster/supervised.py Outdated
Show outdated Hide outdated sklearn/metrics/cluster/supervised.py Outdated
Show outdated Hide outdated sklearn/metrics/cluster/supervised.py Outdated
Show outdated Hide outdated sklearn/metrics/cluster/supervised.py Outdated
@aryamccarthy

This comment has been minimized.

Show comment
Hide comment
@aryamccarthy

aryamccarthy Jun 9, 2018

Contributor

@jnothman or @amueller, is this ready?

Contributor

aryamccarthy commented Jun 9, 2018

@jnothman or @amueller, is this ready?

@jnothman

Yes, I think this looks good now.

@aryamccarthy

This comment has been minimized.

Show comment
Hide comment
@aryamccarthy

aryamccarthy Jun 9, 2018

Contributor

Ah, then pull away! ;)

Contributor

aryamccarthy commented Jun 9, 2018

Ah, then pull away! ;)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 9, 2018

Member

We require two approvals before merge (sorry!)

Hopefully @amueller can give this another glance.

Member

jnothman commented Jun 9, 2018

We require two approvals before merge (sorry!)

Hopefully @amueller can give this another glance.

@aryamccarthy

This comment has been minimized.

Show comment
Hide comment
@aryamccarthy
Contributor

aryamccarthy commented Jul 11, 2018

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 12, 2018

Member
Member

jnothman commented Jul 12, 2018

@qinhanmin2014

This comment has been minimized.

Show comment
Hide comment
@qinhanmin2014

qinhanmin2014 Jul 12, 2018

Member

Apologies for the delay and thanks @aryamccarthy for your great work.
I'll mark this as 0.20 to help you attract reviewers. For me, I can only promise to give a review after the release.

Member

qinhanmin2014 commented Jul 12, 2018

Apologies for the delay and thanks @aryamccarthy for your great work.
I'll mark this as 0.20 to help you attract reviewers. For me, I can only promise to give a review after the release.

@qinhanmin2014 qinhanmin2014 added this to the 0.20 milestone Jul 12, 2018

@amueller

There's probably lots of deprecation warnings in the tests now. Can you please either catch them or explicitly pass the new parameter? (do we have a standard procedure for this btw? Is it documented?)

Show outdated Hide outdated doc/modules/clustering.rst Outdated
Show outdated Hide outdated doc/modules/clustering.rst Outdated
Show outdated Hide outdated doc/modules/clustering.rst Outdated
Show outdated Hide outdated doc/modules/clustering.rst Outdated
Show outdated Hide outdated doc/whats_new/v0.20.rst Outdated
Show outdated Hide outdated doc/whats_new/v0.20.rst Outdated
normalized_mutual_info_score,
adjusted_mutual_info_score,
]
means = {"min", "geometric", "arithmetic", "max"}

This comment has been minimized.

@amueller

amueller Jul 12, 2018

Member

feel kinda weird about calling these means.

@amueller

amueller Jul 12, 2018

Member

feel kinda weird about calling these means.

This comment has been minimized.

@aryamccarthy

aryamccarthy Jul 12, 2018

Contributor

I can switch to generalized_means if you prefer, but it has to be clear that this is a specific class of aggregations. Product, for instance, wouldn't work. Let me know and I'll ship all changes in one PR update.

@aryamccarthy

aryamccarthy Jul 12, 2018

Contributor

I can switch to generalized_means if you prefer, but it has to be clear that this is a specific class of aggregations. Product, for instance, wouldn't work. Let me know and I'll ship all changes in one PR update.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 15, 2018

Member

did you check that you're catching all deprecation warnings?

Member

amueller commented Jul 15, 2018

did you check that you're catching all deprecation warnings?

@aryamccarthy

This comment has been minimized.

Show comment
Hide comment
@aryamccarthy

aryamccarthy Jul 15, 2018

Contributor

With this?

with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=PendingDeprecationWarning)

Also someone pushed in a way that introduces merge conflicts in the whats-new file, so tests aren't passing.

Contributor

aryamccarthy commented Jul 15, 2018

With this?

with warnings.catch_warnings():
    warnings.filterwarnings("ignore",category=PendingDeprecationWarning)

Also someone pushed in a way that introduces merge conflicts in the whats-new file, so tests aren't passing.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 15, 2018

Member

can you merge master to fix the conflict?

And I think we're using

      with ignore_warnings(category=DeprecationWarning):

right now.

Member

amueller commented Jul 15, 2018

can you merge master to fix the conflict?

And I think we're using

      with ignore_warnings(category=DeprecationWarning):

right now.

@amueller amueller added the Blocker label Jul 16, 2018

@jorisvandenbossche jorisvandenbossche added this to PRs tagged in scikit-learn 0.20 Jul 16, 2018

@amueller amueller moved this from PRs tagged to Blockers in scikit-learn 0.20 Jul 16, 2018

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 16, 2018

Member

fixed the conflict

Member

amueller commented Jul 16, 2018

fixed the conflict

@aryamccarthy

This comment has been minimized.

Show comment
Hide comment
@aryamccarthy

aryamccarthy Jul 16, 2018

Contributor

For my future knowledge, what's the command to do that?

Contributor

aryamccarthy commented Jul 16, 2018

For my future knowledge, what's the command to do that?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 16, 2018

Member

I merged master into it and fixed the merge conflict and pushed into your branch.
So assuming you're on your branch and have the main repo as upstream remote

git pull upstream master
# fix merge conflict, commit
git push origin master

fyi you shouldn't send PRs from your master branch, you should ideally create a feature branch.

Member

amueller commented Jul 16, 2018

I merged master into it and fixed the merge conflict and pushed into your branch.
So assuming you're on your branch and have the main repo as upstream remote

git pull upstream master
# fix merge conflict, commit
git push origin master

fyi you shouldn't send PRs from your master branch, you should ideally create a feature branch.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Jul 16, 2018

Member

can you please also add a test that there are deprecation warnings? Also see the updated docs at http://scikit-learn.org/dev/developers/contributing.html#change-the-default-value-of-a-parameter

Member

amueller commented Jul 16, 2018

can you please also add a test that there are deprecation warnings? Also see the updated docs at http://scikit-learn.org/dev/developers/contributing.html#change-the-default-value-of-a-parameter

@aryamccarthy

This comment has been minimized.

Show comment
Hide comment
@aryamccarthy

aryamccarthy Jul 16, 2018

Contributor

I've written (but not committed) that. My concern is catching all of the FutureWarnings. I'd have to infect the entire test_supervised.py file with with ignore_warnings(category=FutureWarning):.

Contributor

aryamccarthy commented Jul 16, 2018

I've written (but not committed) that. My concern is catching all of the FutureWarnings. I'd have to infect the entire test_supervised.py file with with ignore_warnings(category=FutureWarning):.

@massich

This comment has been minimized.

Show comment
Hide comment
@massich

massich Jul 17, 2018

Contributor

LGTM

+1 to merge

Contributor

massich commented Jul 17, 2018

LGTM

+1 to merge

@GaelVaroquaux

This comment has been minimized.

Show comment
Hide comment
@GaelVaroquaux

GaelVaroquaux Jul 17, 2018

Member

LGTM. Merging.

Member

GaelVaroquaux commented Jul 17, 2018

LGTM. Merging.

@GaelVaroquaux GaelVaroquaux merged commit 52b6a66 into scikit-learn:master Jul 17, 2018

7 checks passed

ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: python2 Your tests passed on CircleCI!
Details
ci/circleci: python3 Your tests passed on CircleCI!
Details
codecov/patch 98.48% of diff hit (target 95.36%)
Details
codecov/project 95.37% (+<.01%) compared to 726fa36
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

scikit-learn 0.20 automation moved this from Blockers to Done Jul 17, 2018

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jul 18, 2018

Member
Member

jnothman commented Jul 18, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment