Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX better handle limit cases in normalized_mutual_info_score #22635

Merged
merged 8 commits into from Mar 2, 2022

Conversation

jeremiedbb
Copy link
Member

Fixes #13836

by better handling of limit cases, i.e when 1 or both labelling are constant, i.e. there's a single cluster.

@jeremiedbb jeremiedbb removed the cython label Feb 28, 2022
@jeremiedbb jeremiedbb changed the title FIX better handle limit cases in normalize_mutual_info_score FIX better handle limit cases in normalized_mutual_info_score Feb 28, 2022
@jeremiedbb jeremiedbb removed the cython label Feb 28, 2022
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

With the explicit handling of constant clusters. I think we can remove:

# Avoid 0.0 / 0.0 when either entropy is zero.
normalizer = max(normalizer, np.finfo("float64").eps)

sklearn/metrics/cluster/_supervised.py Show resolved Hide resolved
doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved
@jeremiedbb
Copy link
Member Author

jeremiedbb commented Mar 1, 2022

With the explicit handling of constant clusters. I think we can remove:
normalizer = max(normalizer, np.finfo("float64").eps)

I agree. Now the only way to have an entropy < eps would be to have a single one in an array of approx 10^17 zeros which is a big unrealistic array :)
I removed it

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jeremiedbb jeremiedbb added this to the 1.1 milestone Mar 2, 2022
@glemaitre glemaitre merged commit 020ee76 into scikit-learn:main Mar 2, 2022
@glemaitre
Copy link
Member

Thanks @jeremiedbb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Odd (incorrect) behavior with normalized_mutual_info_score
3 participants