Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] EHN: Change default value of n_init to 1 in KMeans #11530

Closed
wants to merge 3 commits into from

Conversation

murielgrobler
Copy link
Contributor

@murielgrobler murielgrobler commented Jul 15, 2018

closes #9729 Change KMeans n_init default value to 1.

What does this implement/fix? Explain your changes.

Creates a warning message when the user does not specify a value for n_init. The warning message tells the user that the default value will change from 10 to 1 in 0.22

Any other comments?

Thanks for helping us contribute at SciPy 2018!


max_iter : int, optional, default 300
Maximum number of iterations of the k-means algorithm to run.

verbose : boolean, optional
Verbosity mode.

n_init=10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this doesn't seem to belong here.

@@ -235,14 +235,15 @@ def k_means(X, n_clusters, sample_weight=None, init='k-means++',
n_init : int, optional, default: 10
Number of time the k-means algorithm will be run with different
centroid seeds. The final results will be the best output of
n_init consecutive runs in terms of inertia.
n_init consecutive runs in terms of inertia. Note the defaul value will
be changed to 1 in 0.22.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to use the .. versionchanged : syntax here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of using the .. versionchanged : syntax

@glemaitre glemaitre changed the title Fix/#9729 Change default value of n_init to 1 in KMeans Jul 15, 2018
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of changes

@@ -235,14 +235,15 @@ def k_means(X, n_clusters, sample_weight=None, init='k-means++',
n_init : int, optional, default: 10
Number of time the k-means algorithm will be run with different
centroid seeds. The final results will be the best output of
n_init consecutive runs in terms of inertia.
n_init consecutive runs in terms of inertia. Note the defaul value will
be changed to 1 in 0.22.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am in favor of using the .. versionchanged : syntax

@@ -920,6 +926,10 @@ def __init__(self, n_clusters=8, init='k-means++', n_init=10,
self.copy_x = copy_x
self.n_jobs = n_jobs
self.algorithm = algorithm
if n_init == 'warn':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The validation and raising should be done in fit.

You can check this PR:
https://github.com/scikit-learn/scikit-learn/pull/11469/files#diff-e6faf37b13574bc591afbf0536128735R858

which is not merged yet but shows how to do that in more details.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ugh how did I miss this :-/



def test_deprecation_warnings():
# Test that warnings are raised. Will be removed in 0.22
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Start by using `FIXME: remove this test in 0.22

@@ -987,3 +988,16 @@ def test_iter_attribute():
estimator = KMeans(algorithm="elkan", max_iter=1)
estimator.fit(np.random.rand(10, 10))
assert estimator.n_iter_ == 1


def test_deprecation_warnings():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to name it test_change_n_init_future_warning

def test_deprecation_warnings():
# Test that warnings are raised. Will be removed in 0.22

# When n_init is specified (no warning)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this comment

km = KMeans(n_init=1)
assert_no_warnings(km.fit, X)

# When n_init is not specified (warns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for this comment

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 nitpicks

@@ -921,6 +928,7 @@ def __init__(self, n_clusters=8, init='k-means++', n_init=10,
self.n_jobs = n_jobs
self.algorithm = algorithm


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is not useful and should not be PEP8 I think

@@ -987,3 +988,14 @@ def test_iter_attribute():
estimator = KMeans(algorithm="elkan", max_iter=1)
estimator.fit(np.random.rand(10, 10))
assert estimator.n_iter_ == 1


def test_change_n_init_future_warning():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not think about it at first.
Can you check kmeans function as well.
Using only the class will not check for the raising in the function itself since self.n_init will be set in the meanwhile.

@glemaitre
Copy link
Member

Please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

@glemaitre glemaitre changed the title Change default value of n_init to 1 in KMeans EHN: Change default value of n_init to 1 in KMeans Jul 16, 2018
@glemaitre glemaitre changed the title EHN: Change default value of n_init to 1 in KMeans [MRG] EHN: Change default value of n_init to 1 in KMeans Jul 16, 2018
@agramfort
Copy link
Member

you also need to update all the tests to avoid the warnings in the doc.

@murielgrobler
Copy link
Contributor Author

Thanks so much for the feedback! I'm going to finish it up soon!

@GaelVaroquaux
Copy link
Member

As mentioned on the original issue, 1 seems too small. 5 would be good.

@amueller amueller added the Needs Decision Requires decision label Aug 5, 2019
Base automatically changed from master to main January 22, 2021 10:50
@thomasjpfan thomasjpfan added Stalled Superseded PR has been replace by a newer PR labels Apr 3, 2022
@cmarmo cmarmo removed Stalled Needs Decision Requires decision labels May 6, 2022
@cmarmo
Copy link
Member

cmarmo commented May 6, 2022

Closing as superseded by #23038.

@cmarmo cmarmo closed this May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:cluster Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change KMeans n_init default value to 1.
7 participants