-
-
Notifications
You must be signed in to change notification settings - Fork 25.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] EHN: Change default value of n_init to 1 in KMeans #11530
Conversation
from 10 to 1 in 0.22
…cify initial values. All tests passed.
sklearn/cluster/k_means_.py
Outdated
|
||
max_iter : int, optional, default 300 | ||
Maximum number of iterations of the k-means algorithm to run. | ||
|
||
verbose : boolean, optional | ||
Verbosity mode. | ||
|
||
n_init=10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this doesn't seem to belong here.
sklearn/cluster/k_means_.py
Outdated
@@ -235,14 +235,15 @@ def k_means(X, n_clusters, sample_weight=None, init='k-means++', | |||
n_init : int, optional, default: 10 | |||
Number of time the k-means algorithm will be run with different | |||
centroid seeds. The final results will be the best output of | |||
n_init consecutive runs in terms of inertia. | |||
n_init consecutive runs in terms of inertia. Note the defaul value will | |||
be changed to 1 in 0.22. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we want to use the .. versionchanged :
syntax here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of using the .. versionchanged :
syntax
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of changes
sklearn/cluster/k_means_.py
Outdated
@@ -235,14 +235,15 @@ def k_means(X, n_clusters, sample_weight=None, init='k-means++', | |||
n_init : int, optional, default: 10 | |||
Number of time the k-means algorithm will be run with different | |||
centroid seeds. The final results will be the best output of | |||
n_init consecutive runs in terms of inertia. | |||
n_init consecutive runs in terms of inertia. Note the defaul value will | |||
be changed to 1 in 0.22. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am in favor of using the .. versionchanged :
syntax
sklearn/cluster/k_means_.py
Outdated
@@ -920,6 +926,10 @@ def __init__(self, n_clusters=8, init='k-means++', n_init=10, | |||
self.copy_x = copy_x | |||
self.n_jobs = n_jobs | |||
self.algorithm = algorithm | |||
if n_init == 'warn': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The validation and raising should be done in fit
.
You can check this PR:
https://github.com/scikit-learn/scikit-learn/pull/11469/files#diff-e6faf37b13574bc591afbf0536128735R858
which is not merged yet but shows how to do that in more details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ugh how did I miss this :-/
|
||
|
||
def test_deprecation_warnings(): | ||
# Test that warnings are raised. Will be removed in 0.22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Start by using `FIXME: remove this test in 0.22
@@ -987,3 +988,16 @@ def test_iter_attribute(): | |||
estimator = KMeans(algorithm="elkan", max_iter=1) | |||
estimator.fit(np.random.rand(10, 10)) | |||
assert estimator.n_iter_ == 1 | |||
|
|||
|
|||
def test_deprecation_warnings(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to name it test_change_n_init_future_warning
def test_deprecation_warnings(): | ||
# Test that warnings are raised. Will be removed in 0.22 | ||
|
||
# When n_init is specified (no warning) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this comment
km = KMeans(n_init=1) | ||
assert_no_warnings(km.fit, X) | ||
|
||
# When n_init is not specified (warns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 nitpicks
@@ -921,6 +928,7 @@ def __init__(self, n_clusters=8, init='k-means++', n_init=10, | |||
self.n_jobs = n_jobs | |||
self.algorithm = algorithm | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not useful and should not be PEP8 I think
@@ -987,3 +988,14 @@ def test_iter_attribute(): | |||
estimator = KMeans(algorithm="elkan", max_iter=1) | |||
estimator.fit(np.random.rand(10, 10)) | |||
assert estimator.n_iter_ == 1 | |||
|
|||
|
|||
def test_change_n_init_future_warning(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not think about it at first.
Can you check kmeans
function as well.
Using only the class will not check for the raising in the function itself since self.n_init
will be set in the meanwhile.
Please add an entry to the change log at doc/whats_new/v0.20.rst under bug fixes. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user: |
you also need to update all the tests to avoid the warnings in the doc. |
Thanks so much for the feedback! I'm going to finish it up soon! |
As mentioned on the original issue, 1 seems too small. 5 would be good. |
Closing as superseded by #23038. |
closes #9729 Change KMeans n_init default value to 1.
What does this implement/fix? Explain your changes.
Creates a warning message when the user does not specify a value for n_init. The warning message tells the user that the default value will change from 10 to 1 in 0.22
Any other comments?
Thanks for helping us contribute at SciPy 2018!