Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numerical stability bug in adjusted_mutual_info_score with numpy master (1.20 dev) #19165

Closed
ogrisel opened this issue Jan 13, 2021 · 5 comments · Fixed by #19179
Closed

numerical stability bug in adjusted_mutual_info_score with numpy master (1.20 dev) #19165

ogrisel opened this issue Jan 13, 2021 · 5 comments · Fixed by #19179
Labels

Comments

@ogrisel
Copy link
Member

ogrisel commented Jan 13, 2021

Reproducer that works with numpy 1.19.5:

>>> import numpy as np
>>> from sklearn.metrics.cluster import adjusted_mutual_info_score
>>> n_samples = 1000
>>> labels_a = np.ones(n_samples, dtype=int)
>>> labels_b = np.arange(n_samples, dtype=int)
>>> adjusted_mutual_info_score(labels_a, labels_b, average_method="min")
0.0

and that fails on the [scipy-dev] build in the test test_exactly_zero_info_score with the numpy nightly build:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=25088&view=logs&j=dfe99b15-50db-5d7b-b1e9-4105c42527cf&t=ef785ae2-496b-5b02-9f0e-07a6c3ab3081

It returns 2.2207031250001146 instead of 0.0...

@glemaitre
Copy link
Member

It returns 2.2207031250001146 instead of 0.0...

Ouch....

@glemaitre glemaitre changed the title numerical stability bug in adjusted_mutual_info_score with numpy master (1.20 dev) numerical stability bug in adjusted_mutual_info_score with numpy master (1.21 dev) Jan 14, 2021
@glemaitre
Copy link
Member

Uhm it is weird:

In [2]: >>> import numpy as np
   ...: >>> from sklearn.metrics.cluster import adjusted_mutual_info_score
   ...: >>> n_samples = 1000
   ...: >>> labels_a = np.ones(n_samples, dtype=int)
   ...: >>> labels_b = np.arange(n_samples, dtype=int)
   ...: >>> adjusted_mutual_info_score(labels_a, labels_b, average_method="min")
Out[2]: 0.0

In [3]: np.__version__
Out[3]: '1.21.0.dev0+417.gfde2e536a'

In [4]: import sklearn

In [5]: sklearn.__version__
Out[5]: '1.0.dev0'

I cannot reproduce

@glemaitre glemaitre changed the title numerical stability bug in adjusted_mutual_info_score with numpy master (1.21 dev) numerical stability bug in adjusted_mutual_info_score with numpy master (1.20 dev) Jan 14, 2021
@glemaitre
Copy link
Member

Isn't it Scipy then.

@alfaro96
Copy link
Member

alfaro96 commented Jan 14, 2021

I cannot reproduce neither with:

System:
    python: 3.9.1 (default, Dec 11 2020, 14:22:09)  [GCC 8.3.0]
executable: /usr/local/bin/python
   machine: Linux-4.19.121-linuxkit-x86_64-with-glibc2.28

Python dependencies:
          pip: 20.3.3
   setuptools: 51.0.0
      sklearn: 1.0.dev0
        numpy: 1.21.0.dev0+417.gfde2e536a
        scipy: 1.7.0.dev0+5e9eb01
       Cython: 0.29.21
       pandas: 1.2.0
   matplotlib: 3.3.3
       joblib: 1.0.0
threadpoolctl: 2.1.0

Built with OpenMP: True

@glemaitre
Copy link
Member

Could not reproduce with the following versions (same as the CI apart of pandas)

Package                       Version                    Location
----------------------------- -------------------------- -----------------------------------------------
alabaster                     0.7.12
apipkg                        1.5
attrs                         20.3.0
Babel                         2.9.0
certifi                       2020.12.5
chardet                       4.0.0
codecov                       2.1.11
coverage                      5.3.1
Cython                        3.0a6
docutils                      0.16
execnet                       1.7.1
idna                          2.10
imagesize                     1.2.0
iniconfig                     1.1.1
Jinja2                        2.11.2
joblib                        1.1.0.dev0
MarkupSafe                    1.1.1
numpy                         1.21.0.dev0+417.gfde2e536a
numpydoc                      1.1.0
packaging                     20.8
pandas                        1.3.0.dev0+433.g25110a92b
Pillow                        8.2.0.dev0
pip                           20.3.3
pluggy                        0.13.1
py                            1.10.0
Pygments                      2.7.4
pyparsing                     2.4.7
pytest                        6.2.1
pytest-cov                    2.10.1
pytest-forked                 1.3.0
pytest-xdist                  2.2.0
python-dateutil               2.8.1
pytz                          2020.5
requests                      2.25.1
scikit-learn                  1.0.dev0                   /home/glemaitre/Documents/packages/scikit-learn
scipy                         1.7.0.dev0+5e9eb01
setuptools                    51.1.2.post20210112
six                           1.15.0
snowballstemmer               2.0.0
Sphinx                        3.4.3
sphinxcontrib-applehelp       1.0.2
sphinxcontrib-devhelp         1.0.2
sphinxcontrib-htmlhelp        1.0.3
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.3
sphinxcontrib-serializinghtml 1.1.4
threadpoolctl                 2.1.0
toml                          0.10.2
urllib3                       1.26.2
wheel                         0.36.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants