Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] API normalized_mutual_info_score may violate Symmetry. #114

Closed
Z712023 opened this issue Jan 18, 2024 · 1 comment
Closed

[Bug] API normalized_mutual_info_score may violate Symmetry. #114

Z712023 opened this issue Jan 18, 2024 · 1 comment
Labels
bug Something isn't working enhancement New feature or request

Comments

@Z712023
Copy link
Collaborator

Z712023 commented Jan 18, 2024

Description

The output of api normalized_mutual_info_score may violate Symmetry, although the official documentation claims that
"This metric is furthermore symmetric: switching label_true with label_pred will return the same score value. "https://scikit-learn.org/stable/modules/generated/sklearn.metrics.normalized_mutual_info_score.html

Reproduce

  1. Run the code as follows:

from sklearn.preprocessing import LabelEncoder
from sklearn.metrics.cluster import normalized_mutual_info_score

a = "guest"
b = "user"
c = "admin"

src = [a,c,b,b,c,a,c,a,c,c]
tar = [a,b,a,a,b,c,b,b,c,a]
test_nums = 100
for i in range(test_nums):
le = LabelEncoder()
src_list = list(set(src))
tar_list = list(set(tar))
fit_list = tar_list + src_list
le.fit(fit_list)
src_col = le.transform(src)
tar_col = le.transform(tar)
test1 = normalized_mutual_info_score(src_col, tar_col,average_method='geometric')
test2 = normalized_mutual_info_score(tar_col, src_col,average_method='geometric')
print(f"iter:{i}: test1:{test1} test2:{test2}")
print(src_col,tar_col)
print(tar_col,src_col)
assert test2==test1

  1. I tried different choices for average_method, but the error was still there.

Expected behavior

Keep the Symmetry property of normalized mutual information. We may need to rewrite code by ourselves.

Context

  • Operating System and version: Mac OS 12.5.1
  • sdgx==0.1.4.dev0
  • scikit-learn==1.3.2
@Z712023 Z712023 added the bug Something isn't working label Jan 18, 2024
@MooooCat MooooCat changed the title API normalized_mutual_info_score may violate Symmetry. [Bug] API normalized_mutual_info_score may violate Symmetry. Jan 18, 2024
@MooooCat MooooCat added enhancement New feature or request help wanted Extra attention is needed labels Jan 18, 2024
@Z712023
Copy link
Collaborator Author

Z712023 commented Jan 19, 2024

I traced back this problem and found that the difference is the out of sum at https://github.com/scikit-learn/scikit-learn/blob/3f89022fa04d293152f1d32fbc2a5bdaaf2df364/sklearn/metrics/cluster/_supervised.py#L900
I think the reason is the design of float in Python, so the output of normalized_mutual_info_score can be accurate within a certain precision.

@Z712023 Z712023 removed the help wanted Extra attention is needed label Jan 19, 2024
@Z712023 Z712023 closed this as completed Jan 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants