New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] add seeds when n_jobs=1 and use seed as random_state #9288
Conversation
This seems reasonable except insofar as KMeans with a fixed random state might have been returning the same model for a long time. I'm not sure it's worth breaking users' clusterings, |
sklearn/cluster/k_means_.py
Outdated
@@ -338,12 +338,13 @@ def k_means(X, n_clusters, init='k-means++', precompute_distances='auto', | |||
if n_jobs == 1: | |||
# For a single thread, less memory is needed if we just store one set | |||
# of the best results (as opposed to one set per run per thread). | |||
for it in range(n_init): | |||
seeds = random_state.randint(np.iinfo(np.int32).max, size=n_init) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move the seeds
assignment outside of the if
clause since it used both for n_job == 1
and n_jobs != 1
.
@jnothman Even though creating seeds for n_jobs =1, seeds will be the same with the fixed random_state which might return the same model. But the model might be not the same as the model generated by the current method. |
some duplication with #9785 |
This looks good to me. It can take the test from the other PR and I'd say it's almost good to go. @jnothman you still worried about backward compatibility here? |
I think it's a good fix. |
@bryanyang0528 would you have time to address the comments, and rebase on the latest master here? |
@adrinjalali no problem. Thanks! |
Any chance you can add a test? |
@jnothman no problem. |
@bryanyang0528 tests failing :) |
@adrinjalali I'm not sure why tests failed only on p.s. I notice that recent PRs in sklearn are failed in these tests steps either. |
Merge with master should fix the issue. |
@bryanyang0528 please avoid force pushing. The errors are not related to you, you can ignore the ones which fail to create the environment. |
or merge master as @thomasjpfan suggests. |
@adrinjalali @thomasjpfan Thank you for suggestions. |
@adrinjalali Thank you for help, all tests passed. What should I do for next step? |
This LGTM, ping @jnothman since I know he had some reservations about this solution. To me this is a fix, and therefore I wouldn't mind the change.
I'm happy with the fix. |
I'm happy with the fix. Please also note the change at the top of that file under Changed Models |
slight phrasing
thanks! |
Thanks! |
Reference Issue
Fixes #9287, Fixes #9784
What does this implement/fix? Explain your changes.
no matter how many n_jobs, the random_state in kmeans_single should be the same.
So I added
seeds = random_state.randint(np.iinfo(np.int32).max, size=n_init
if n_jobs=1and use
seed
instead original random_state in the for loop.Any other comments?
I didn't revise test cases for this change yet. I'll update them if you think this change is good.