-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG] add seeds when n_jobs=1 and use seed as random_state #9288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This seems reasonable except insofar as KMeans with a fixed random state might have been returning the same model for a long time. I'm not sure it's worth breaking users' clusterings, |
sklearn/cluster/k_means_.py
Outdated
@@ -338,12 +338,13 @@ def k_means(X, n_clusters, init='k-means++', precompute_distances='auto', | |||
if n_jobs == 1: | |||
# For a single thread, less memory is needed if we just store one set | |||
# of the best results (as opposed to one set per run per thread). | |||
for it in range(n_init): | |||
seeds = random_state.randint(np.iinfo(np.int32).max, size=n_init) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can move the seeds
assignment outside of the if
clause since it used both for n_job == 1
and n_jobs != 1
.
@jnothman Even though creating seeds for n_jobs =1, seeds will be the same with the fixed random_state which might return the same model. But the model might be not the same as the model generated by the current method. |
some duplication with #9785 |
This looks good to me. It can take the test from the other PR and I'd say it's almost good to go. @jnothman you still worried about backward compatibility here? |
I think it's a good fix. |
@bryanyang0528 would you have time to address the comments, and rebase on the latest master here? |
@adrinjalali no problem. Thanks! |
dda3fff
to
f3f97be
Compare
Any chance you can add a test? |
@jnothman no problem. |
@bryanyang0528 tests failing :) |
00a3461
to
3b95ebd
Compare
3b95ebd
to
a7a834b
Compare
@adrinjalali I'm not sure why tests failed only on p.s. I notice that recent PRs in sklearn are failed in these tests steps either. |
Merge with master should fix the issue. |
@bryanyang0528 please avoid force pushing. The errors are not related to you, you can ignore the ones which fail to create the environment. |
or merge master as @thomasjpfan suggests. |
@adrinjalali @thomasjpfan Thank you for suggestions. |
@adrinjalali Thank you for help, all tests passed. What should I do for next step? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM, ping @jnothman since I know he had some reservations about this solution. To me this is a fix, and therefore I wouldn't mind the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
happy to wait for @jnothman but looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as a bug-fix, thanks @bryanyang0528
I'm happy with the fix. |
I'm happy with the fix. Please also note the change at the top of that file under Changed Models |
slight phrasing
thanks! |
Thanks! |
Reference Issue
Fixes #9287, Fixes #9784
What does this implement/fix? Explain your changes.
no matter how many n_jobs, the random_state in kmeans_single should be the same.
So I added
seeds = random_state.randint(np.iinfo(np.int32).max, size=n_init
if n_jobs=1and use
seed
instead original random_state in the for loop.Any other comments?
I didn't revise test cases for this change yet. I'll update them if you think this change is good.