[MRG] EHN: Change default value of n_init to 1 in KMeans #11530

murielgrobler · 2018-07-15T17:35:09Z

closes #9729 Change KMeans n_init default value to 1.

What does this implement/fix? Explain your changes.

Creates a warning message when the user does not specify a value for n_init. The warning message tells the user that the default value will change from 10 to 1 in 0.22

Any other comments?

Thanks for helping us contribute at SciPy 2018!

from 10 to 1 in 0.22

…cify initial values. All tests passed.

amueller · 2018-07-15T20:23:48Z

sklearn/cluster/k_means_.py


    max_iter : int, optional, default 300
        Maximum number of iterations of the k-means algorithm to run.

    verbose : boolean, optional
        Verbosity mode.
-
+n_init=10


this doesn't seem to belong here.

amueller · 2018-07-15T20:24:55Z

sklearn/cluster/k_means_.py

@@ -235,14 +235,15 @@ def k_means(X, n_clusters, sample_weight=None, init='k-means++',
    n_init : int, optional, default: 10
        Number of time the k-means algorithm will be run with different
        centroid seeds. The final results will be the best output of
-        n_init consecutive runs in terms of inertia.
+        n_init consecutive runs in terms of inertia. Note the defaul value will
+        be changed to 1 in 0.22.


Not sure if we want to use the .. versionchanged : syntax here.

I am in favor of using the .. versionchanged : syntax

glemaitre

Couple of changes

glemaitre · 2018-07-15T21:47:49Z

sklearn/cluster/k_means_.py

@@ -235,14 +235,15 @@ def k_means(X, n_clusters, sample_weight=None, init='k-means++',
    n_init : int, optional, default: 10
        Number of time the k-means algorithm will be run with different
        centroid seeds. The final results will be the best output of
-        n_init consecutive runs in terms of inertia.
+        n_init consecutive runs in terms of inertia. Note the defaul value will
+        be changed to 1 in 0.22.


I am in favor of using the .. versionchanged : syntax

glemaitre · 2018-07-15T21:49:59Z

sklearn/cluster/k_means_.py

@@ -920,6 +926,10 @@ def __init__(self, n_clusters=8, init='k-means++', n_init=10,
        self.copy_x = copy_x
        self.n_jobs = n_jobs
        self.algorithm = algorithm
+        if n_init == 'warn':


The validation and raising should be done in fit.

You can check this PR:
https://github.com/scikit-learn/scikit-learn/pull/11469/files#diff-e6faf37b13574bc591afbf0536128735R858

which is not merged yet but shows how to do that in more details.

ugh how did I miss this :-/

glemaitre · 2018-07-15T21:50:31Z

sklearn/cluster/tests/test_k_means.py

+
+
+def test_deprecation_warnings():
+    # Test that warnings are raised. Will be removed in 0.22


Start by using `FIXME: remove this test in 0.22

glemaitre · 2018-07-15T21:51:00Z

sklearn/cluster/tests/test_k_means.py

@@ -987,3 +988,16 @@ def test_iter_attribute():
    estimator = KMeans(algorithm="elkan", max_iter=1)
    estimator.fit(np.random.rand(10, 10))
    assert estimator.n_iter_ == 1
+
+
+def test_deprecation_warnings():


it would be better to name it test_change_n_init_future_warning

glemaitre · 2018-07-15T21:51:12Z

sklearn/cluster/tests/test_k_means.py

+def test_deprecation_warnings():
+    # Test that warnings are raised. Will be removed in 0.22
+
+    # When n_init is specified (no warning)


No need for this comment

glemaitre · 2018-07-15T21:51:24Z

sklearn/cluster/tests/test_k_means.py

+    km = KMeans(n_init=1)
+    assert_no_warnings(km.fit, X)
+
+    # When n_init is not specified (warns)


no need for this comment

glemaitre

2 nitpicks

glemaitre · 2018-07-16T03:31:25Z

sklearn/cluster/k_means_.py

@@ -921,6 +928,7 @@ def __init__(self, n_clusters=8, init='k-means++', n_init=10,
        self.n_jobs = n_jobs
        self.algorithm = algorithm

+


This change is not useful and should not be PEP8 I think

glemaitre · 2018-07-16T03:33:40Z

sklearn/cluster/tests/test_k_means.py

@@ -987,3 +988,14 @@ def test_iter_attribute():
    estimator = KMeans(algorithm="elkan", max_iter=1)
    estimator.fit(np.random.rand(10, 10))
    assert estimator.n_iter_ == 1
+
+
+def test_change_n_init_future_warning():


I did not think about it at first.
Can you check kmeans function as well.
Using only the class will not check for the raising in the function itself since self.n_init will be set in the meanwhile.

glemaitre · 2018-07-16T03:34:23Z

Please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

agramfort · 2018-07-17T15:05:27Z

you also need to update all the tests to avoid the warnings in the doc.

murielgrobler · 2018-07-18T14:17:37Z

Thanks so much for the feedback! I'm going to finish it up soon!

GaelVaroquaux · 2018-07-24T13:16:20Z

As mentioned on the original issue, 1 seems too small. 5 would be good.

cmarmo · 2022-05-06T21:41:06Z

Closing as superseded by #23038.

murielgrobler added 2 commits July 14, 2018 16:18

Added future warning for k_means that default n_init will change

4b64ab3

from 10 to 1 in 0.22

Added the test for generating an error message when user does not spe…

a4a03d5

…cify initial values. All tests passed.

amueller reviewed Jul 15, 2018

View reviewed changes

glemaitre changed the title ~~Fix/#9729~~ Change default value of n_init to 1 in KMeans Jul 15, 2018

glemaitre requested changes Jul 15, 2018

View reviewed changes

Processed PR feedback

a070370

glemaitre requested changes Jul 16, 2018

View reviewed changes

glemaitre changed the title ~~Change default value of n_init to 1 in KMeans~~ EHN: Change default value of n_init to 1 in KMeans Jul 16, 2018

glemaitre changed the title ~~EHN: Change default value of n_init to 1 in KMeans~~ [MRG] EHN: Change default value of n_init to 1 in KMeans Jul 16, 2018

amueller added the Needs Decision Requires decision label Aug 5, 2019

github-actions bot added the module:cluster label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:50

Micky774 mentioned this pull request Apr 3, 2022

EHN Change default value of n_init in cluster.KMeans and cluster.k_means #23038

Merged

thomasjpfan added Stalled Superseded PR has been replace by a newer PR labels Apr 3, 2022

cmarmo removed Stalled Needs Decision Requires decision labels May 6, 2022

cmarmo closed this May 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] EHN: Change default value of n_init to 1 in KMeans #11530

[MRG] EHN: Change default value of n_init to 1 in KMeans #11530

murielgrobler commented Jul 15, 2018 •

edited by glemaitre

Loading

amueller Jul 15, 2018

amueller Jul 15, 2018

glemaitre Jul 15, 2018

glemaitre left a comment

glemaitre Jul 15, 2018

glemaitre Jul 15, 2018

amueller Jul 15, 2018

glemaitre Jul 15, 2018

glemaitre Jul 15, 2018

glemaitre Jul 15, 2018

glemaitre Jul 15, 2018

glemaitre left a comment

glemaitre Jul 16, 2018

glemaitre Jul 16, 2018

glemaitre commented Jul 16, 2018

agramfort commented Jul 17, 2018

murielgrobler commented Jul 18, 2018

GaelVaroquaux commented Jul 24, 2018

cmarmo commented May 6, 2022



		def test_deprecation_warnings():
		# Test that warnings are raised. Will be removed in 0.22

		@@ -921,6 +928,7 @@ def __init__(self, n_clusters=8, init='k-means++', n_init=10,
		self.n_jobs = n_jobs
		self.algorithm = algorithm

[MRG] EHN: Change default value of n_init to 1 in KMeans #11530

[MRG] EHN: Change default value of n_init to 1 in KMeans #11530

Conversation

murielgrobler commented Jul 15, 2018 • edited by glemaitre Loading

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

glemaitre commented Jul 16, 2018

agramfort commented Jul 17, 2018

murielgrobler commented Jul 18, 2018

GaelVaroquaux commented Jul 24, 2018

cmarmo commented May 6, 2022

murielgrobler commented Jul 15, 2018 •

edited by glemaitre

Loading