Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different inertia values even with random_state set #23886

Open
pzsette opened this issue Jul 12, 2022 · 2 comments
Open

Different inertia values even with random_state set #23886

pzsette opened this issue Jul 12, 2022 · 2 comments

Comments

@pzsette
Copy link

pzsette commented Jul 12, 2022

Describe the bug

Im trying to make my algorithm, which involves KMeans, reproducible, but even If I set random_state value I always get different inertia values.
I'm using this page dataset without the first line (dataset size).

Steps/Code to Reproduce

from sklearn.cluster import KMeans
import numpy as np

points = np.genfromtxt('page.txt')
for l in range(10):
    kmeans = KMeans(n_clusters=30, n_init=1, random_state=42)
    kmeans.fit(points)

    centroids = kmeans.cluster_centers_
    print('{0:.20f}'.format(kmeans.inertia_))

Expected Results

I expect to get printed ten time the same value

Actual Results

1008436173.14048004150390625000
1008436173.14048004150390625000
1008436173.14047992229461669922
1008436173.14048004150390625000
1008436173.14047992229461669922
1008436173.14048004150390625000
1008436173.14047992229461669922
1008436173.14048004150390625000
1008436173.14048004150390625000
1008436173.14048004150390625000

Versions

System:
    python: 3.9.12 (v3.9.12:b28265d7e6, Mar 23 2022, 18:22:40)  [Clang 13.0.0 (clang-1300.0.29.30)]
executable: /Users/user/PycharmProjects/MDE/venv/bin/python
   machine: macOS-12.4-arm64-arm-64bit
Python dependencies:
      sklearn: 1.1.0
          pip: 21.3.1
   setuptools: 60.2.0
        numpy: 1.22.4
        scipy: 1.8.0
       Cython: None
       pandas: 1.4.2
   matplotlib: 3.5.2
       joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True
@pzsette pzsette added Bug Needs Triage Issue requires triage labels Jul 12, 2022
@glemaitre
Copy link
Member

ping @jeremiedbb

@jeremiedbb
Copy link
Member

The relative difference between these inertiae is less than machine precision (~2e-16). It may be due to rounding errors being different due to the multi-threaded aspect of inertia computation. However, the clustering should be the same, i.e. centers and labels should be the same. I checked with your example on my machine and it's the case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants