Description
Description
Dear all,
I experience a difference in behaviour of sklearn.cluster.KMeans when using init='random' or init='k-means++' in combination with n_jobs=-1 (or unequal 1). Not all CPUs are used when init='random', n_jobs=-1 and n_clusers>1. I monitored this with htop. For init='k-means++' this is not the case. Interestingly, this is happening only on Linux (tested Red Hat and Ubuntu, specified in the Versions section is Ubuntu). Another intersting note is, that the behaviour is not observable on my Windows machine, here monitored with Task manager.
Steps/Code to Reproduce
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from tqdm import tqdm # to check the behaviour in dependence of cluster amount
A = make_blobs(60000, 48, 8)
# i=1 running on all cores, monitored with htop. i > 1 only one core
for i in tqdm(range(1, 10)):
model = KMeans(n_clusters=i, n_jobs=-1, n_init=200, max_iter=500, init='random').fit(A[0])
# For all i's this is using all cores
for i in tqdm(range(1, 10)):
model = KMeans(n_clusters=i, n_jobs=-1, n_init=200, max_iter=500, init='k-means++').fit(A[0])
Expected Results
No difference regarding usage of cores between 'random' and 'k-means++'.
Actual Results
Only working for all cores with 'random' when n_clusters=1, otherwise only using one core. 'k-means++' is using all cores for any value of n_clusters.
Versions
Windows:
Could not locate executable g77
Could not locate executable f77
Could not locate executable ifort
Could not locate executable ifl
Could not locate executable f90
Could not locate executable DF
Could not locate executable efl
System:
python: 3.6.5 |Anaconda, Inc.| (default, Mar 29 2018, 13:32:41) [MSC v.1900 64 bit (AMD64)]
executable: C:\ProgramData\Miniconda3\pythonw.exe
machine: Windows-7-6.1.7601-SP1
BLAS:
macros:
lib_dirs:
cblas_libs: cblas
Python deps:
pip: 18.1
setuptools: 39.0.1
sklearn: 0.20.2
numpy: 1.15.4
scipy: 1.1.0
Cython: 0.28.2
pandas: 0.23.4
C:\ProgramData\Miniconda3\lib\site-packages\numpy\distutils\system_info.py:625: UserWarning:
Atlas (http://math-atlas.sourceforge.net/) libraries not found.
Directories to search for the libraries can be specified in the
numpy/distutils/site.cfg file (section [atlas]) or by setting
the ATLAS environment variable.
self.calc_info()
C:\ProgramData\Miniconda3\lib\site-packages\numpy\distutils\system_info.py:625: UserWarning:
Blas (http://www.netlib.org/blas/) libraries not found.
Directories to search for the libraries can be specified in the
numpy/distutils/site.cfg file (section [blas]) or by setting
the BLAS environment variable.
self.calc_info()
C:\ProgramData\Miniconda3\lib\site-packages\numpy\distutils\system_info.py:625: UserWarning:
Blas (http://www.netlib.org/blas/) sources not found.
Directories to search for the sources can be specified in the
numpy/distutils/site.cfg file (section [blas_src]) or by setting
the BLAS_SRC environment variable.
self.calc_info()
Linux:
System:
python: 3.6.4 |Anaconda, Inc.| (default, Jan 16 2018, 18:10:19) [GCC 7.2.0]
executable: /cluster/programs/miniconda/envs/miniconda-36/bin/python
machine: Linux-4.4.0-87-generic-x86_64-with-debian-stretch-sid
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /cluster/programs/miniconda/envs/miniconda-36/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 18.0
setuptools: 38.4.0
sklearn: 0.20.2
numpy: 1.14.2
scipy: 1.1.0
Cython: 0.27.3
pandas: 0.23.4