Different inertia values even with random_state set #23886

pzsette · 2022-07-12T15:58:50Z

Describe the bug

Im trying to make my algorithm, which involves KMeans, reproducible, but even If I set random_state value I always get different inertia values.
I'm using this page dataset without the first line (dataset size).

Steps/Code to Reproduce

from sklearn.cluster import KMeans
import numpy as np

points = np.genfromtxt('page.txt')
for l in range(10):
    kmeans = KMeans(n_clusters=30, n_init=1, random_state=42)
    kmeans.fit(points)

    centroids = kmeans.cluster_centers_
    print('{0:.20f}'.format(kmeans.inertia_))

Expected Results

I expect to get printed ten time the same value

Actual Results

1008436173.14048004150390625000
1008436173.14048004150390625000
1008436173.14047992229461669922
1008436173.14048004150390625000
1008436173.14047992229461669922
1008436173.14048004150390625000
1008436173.14047992229461669922
1008436173.14048004150390625000
1008436173.14048004150390625000
1008436173.14048004150390625000

Versions

System:
    python: 3.9.12 (v3.9.12:b28265d7e6, Mar 23 2022, 18:22:40)  [Clang 13.0.0 (clang-1300.0.29.30)]
executable: /Users/user/PycharmProjects/MDE/venv/bin/python
   machine: macOS-12.4-arm64-arm-64bit
Python dependencies:
      sklearn: 1.1.0
          pip: 21.3.1
   setuptools: 60.2.0
        numpy: 1.22.4
        scipy: 1.8.0
       Cython: None
       pandas: 1.4.2
   matplotlib: 3.5.2
       joblib: 1.1.0
threadpoolctl: 3.1.0
Built with OpenMP: True

The text was updated successfully, but these errors were encountered:

glemaitre · 2022-07-14T15:27:08Z

ping @jeremiedbb

jeremiedbb · 2022-07-19T13:02:54Z

The relative difference between these inertiae is less than machine precision (~2e-16). It may be due to rounding errors being different due to the multi-threaded aspect of inertia computation. However, the clustering should be the same, i.e. centers and labels should be the same. I checked with your example on my machine and it's the case.

pzsette added Bug Needs Triage Issue requires triage labels Jul 12, 2022

jeremiedbb added module:cluster and removed Bug Needs Triage Issue requires triage labels Jul 19, 2022

slowkow mentioned this issue Jan 30, 2023

Results not reproducible slowkow/harmonypy#24

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different inertia values even with random_state set #23886

Different inertia values even with random_state set #23886

pzsette commented Jul 12, 2022 •

edited

glemaitre commented Jul 14, 2022

jeremiedbb commented Jul 19, 2022

Different inertia values even with random_state set #23886

Different inertia values even with random_state set #23886

Comments

pzsette commented Jul 12, 2022 • edited

Describe the bug

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

glemaitre commented Jul 14, 2022

jeremiedbb commented Jul 19, 2022

pzsette commented Jul 12, 2022 •

edited