FIX better handle limit cases in normalized_mutual_info_score #22635

jeremiedbb · 2022-02-28T15:48:26Z

by better handling of limit cases, i.e when 1 or both labelling are constant, i.e. there's a single cluster.

thomasjpfan

Thanks for the PR!

With the explicit handling of constant clusters. I think we can remove:

scikit-learn/sklearn/metrics/cluster/_supervised.py

Lines 1038 to 1039 in 9ced5ec

    
           # Avoid 0.0 / 0.0 when either entropy is zero. 
        
           normalizer = max(normalizer, np.finfo("float64").eps)

sklearn/metrics/cluster/_supervised.py

doc/whats_new/v1.1.rst

jeremiedbb · 2022-03-01T20:04:16Z

With the explicit handling of constant clusters. I think we can remove:
normalizer = max(normalizer, np.finfo("float64").eps)

I agree. Now the only way to have an entropy < eps would be to have a single one in an array of approx 10^17 zeros which is a big unrealistic array :)
I removed it

thomasjpfan

LGTM

glemaitre · 2022-03-02T13:25:24Z

Thanks @jeremiedbb

…-learn#22635)

handle limit cases in normalize_mutual_info

1dd299c

github-actions bot added module:metrics cython labels Feb 28, 2022

jeremiedbb added 3 commits February 28, 2022 17:01

change log entry

d2dcc2e

Merge branch 'master' into fix-normalize-mi-geometric

400f9bd

cln

db6332e

jeremiedbb removed the cython label Feb 28, 2022

github-actions bot added the cython label Feb 28, 2022

jeremiedbb changed the title ~~FIX better handle limit cases in normalize_mutual_info_score~~ FIX better handle limit cases in normalized_mutual_info_score Feb 28, 2022

jeremiedbb removed the cython label Feb 28, 2022

thomasjpfan reviewed Mar 1, 2022

View reviewed changes

sklearn/metrics/cluster/_supervised.py Show resolved Hide resolved

doc/whats_new/v1.1.rst Outdated Show resolved Hide resolved

jeremiedbb added 4 commits March 1, 2022 20:24

cln what's new

39a2037

remove unnecessary 0 division

4cf2375

wtf

c04a6a2

Merge branch 'master' into fix-normalize-mi-geometric

d5855c3

thomasjpfan approved these changes Mar 1, 2022

View reviewed changes

jeremiedbb added the Waiting for Reviewer label Mar 2, 2022

jeremiedbb added this to the 1.1 milestone Mar 2, 2022

glemaitre approved these changes Mar 2, 2022

View reviewed changes

glemaitre merged commit 020ee76 into scikit-learn:main Mar 2, 2022

This was referenced Mar 13, 2022

Incorrect implementation of homogeneity, completeness and v-measure #22185

Closed

replace np.log with math.log #22187

Closed

mathijs02 pushed a commit to mathijs02/scikit-learn that referenced this pull request Dec 27, 2022

FIX better handle limit cases in normalized_mutual_info_score (scikit…

9b5c0d8

…-learn#22635)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX better handle limit cases in normalized_mutual_info_score #22635

FIX better handle limit cases in normalized_mutual_info_score #22635

jeremiedbb commented Feb 28, 2022

thomasjpfan left a comment

jeremiedbb commented Mar 1, 2022 •

edited

thomasjpfan left a comment

glemaitre commented Mar 2, 2022

	# Avoid 0.0 / 0.0 when either entropy is zero.
	normalizer = max(normalizer, np.finfo("float64").eps)

FIX better handle limit cases in normalized_mutual_info_score #22635

FIX better handle limit cases in normalized_mutual_info_score #22635

Conversation

jeremiedbb commented Feb 28, 2022

thomasjpfan left a comment

Choose a reason for hiding this comment

jeremiedbb commented Mar 1, 2022 • edited

thomasjpfan left a comment

Choose a reason for hiding this comment

glemaitre commented Mar 2, 2022

jeremiedbb commented Mar 1, 2022 •

edited