Skip to content

FIX Fix adjusted_mutual_info_score numerical issue#31065

Merged
lesteve merged 12 commits into
scikit-learn:mainfrom
glevv:adj-mi-bugfix
Apr 3, 2025
Merged

FIX Fix adjusted_mutual_info_score numerical issue#31065
lesteve merged 12 commits into
scikit-learn:mainfrom
glevv:adj-mi-bugfix

Conversation

@glevv

@glevv glevv commented Mar 24, 2025

Copy link
Copy Markdown
Contributor

Reference Issues/PRs

fixes #30950

What does this implement/fix? Explain your changes.

  • Make clipping on the values inside adjusted_mutual_info_score symmetric, i.e. both denominator and nominator are now clipped;
  • Add shortcut to calculations, to return 0 when there is no split in the data;
  • Add non-regression test.

Any other comments?

I'm not sure this is the best way forward

@github-actions

github-actions Bot commented Mar 24, 2025

Copy link
Copy Markdown

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 0b40829. Link to the linter CI: here

@glevv glevv marked this pull request as ready for review March 26, 2025 17:07

@lesteve lesteve left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks a lot for the investigation and the fix!

Comment thread sklearn/metrics/cluster/_supervised.py Outdated
Comment thread sklearn/metrics/cluster/_supervised.py
@glevv

glevv commented Mar 27, 2025

Copy link
Copy Markdown
Contributor Author

Is it possible to change this construction

if denominator < 0:
    denominator = min(denominator, -np.finfo("float64").eps)
else:
    denominator = max(denominator, np.finfo("float64").eps)

to
denominator = max(normalizer - emi, np.finfo("float64").eps)

Since normalizer should always be >= than emi?

Comment thread sklearn/metrics/cluster/_supervised.py Outdated

@lesteve lesteve left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix!

@lesteve

lesteve commented Mar 31, 2025

Copy link
Copy Markdown
Member

@virchan if you have some spare bandwitdth and you want to have a second look at this PR, this would be super useful 😉. This seems to fix a real bug in a not too controversial manner.

@virchan virchan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR, @glevv.

The proposed fix makes sense to me. I just have a few minor nitpicks:

Comment thread sklearn/metrics/cluster/_supervised.py Outdated
Comment thread sklearn/metrics/cluster/tests/test_supervised.py Outdated
@glevv

glevv commented Apr 1, 2025

Copy link
Copy Markdown
Contributor Author

I had to change some tests to incorporate the changes, so I'm re-requesting review

@glevv glevv requested review from lesteve and virchan April 1, 2025 06:56

@virchan virchan left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @glevv!

@lesteve lesteve changed the title FIX Adjusted Mutual Info Score returns wrong score in some cases FIX Fix adjusted_mutual_info_score numerical issue Apr 3, 2025
@lesteve

lesteve commented Apr 3, 2025

Copy link
Copy Markdown
Member

Thanks @glevv for the fix and @virchan for the review 🙏!

I tweaked the changelog a bit and enabled auto-merge so this PR will be merged when CI is green!

@lesteve lesteve enabled auto-merge (squash) April 3, 2025 14:24
@lesteve lesteve merged commit 75cb7c3 into scikit-learn:main Apr 3, 2025
@glevv glevv deleted the adj-mi-bugfix branch April 25, 2025 19:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Potential Problem in the Computation of Adjusted Mutual Info Score

3 participants