Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] add seeds when n_jobs=1 and use seed as random_state #9288

Merged
merged 5 commits into from Aug 16, 2019

Conversation

bryanyang0528
Copy link
Contributor

@bryanyang0528 bryanyang0528 commented Jul 6, 2017

Reference Issue

Fixes #9287, Fixes #9784

What does this implement/fix? Explain your changes.

no matter how many n_jobs, the random_state in kmeans_single should be the same.
So I added seeds = random_state.randint(np.iinfo(np.int32).max, size=n_init if n_jobs=1
and use seed instead original random_state in the for loop.

Any other comments?

I didn't revise test cases for this change yet. I'll update them if you think this change is good.

@jnothman
Copy link
Member

@jnothman jnothman commented Jul 6, 2017

This seems reasonable except insofar as KMeans with a fixed random state might have been returning the same model for a long time. I'm not sure it's worth breaking users' clusterings,

@@ -338,12 +338,13 @@ def k_means(X, n_clusters, init='k-means++', precompute_distances='auto',
if n_jobs == 1:
# For a single thread, less memory is needed if we just store one set
# of the best results (as opposed to one set per run per thread).
for it in range(n_init):
seeds = random_state.randint(np.iinfo(np.int32).max, size=n_init)
Copy link
Member

@lesteve lesteve Jul 6, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move the seeds assignment outside of the if clause since it used both for n_job == 1 and n_jobs != 1.

@bryanyang0528
Copy link
Contributor Author

@bryanyang0528 bryanyang0528 commented Jul 6, 2017

@jnothman Even though creating seeds for n_jobs =1, seeds will be the same with the fixed random_state which might return the same model. But the model might be not the same as the model generated by the current method.

@amueller
Copy link
Member

@amueller amueller commented Aug 5, 2019

some duplication with #9785

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Aug 6, 2019

This looks good to me. It can take the test from the other PR and I'd say it's almost good to go.

@jnothman you still worried about backward compatibility here?

@amueller
Copy link
Member

@amueller amueller commented Aug 6, 2019

I think it's a good fix.

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Aug 6, 2019

@bryanyang0528 would you have time to address the comments, and rebase on the latest master here?

@bryanyang0528
Copy link
Contributor Author

@bryanyang0528 bryanyang0528 commented Aug 9, 2019

@adrinjalali no problem. Thanks!

@jnothman
Copy link
Member

@jnothman jnothman commented Aug 11, 2019

Any chance you can add a test?

@bryanyang0528
Copy link
Contributor Author

@bryanyang0528 bryanyang0528 commented Aug 11, 2019

@jnothman no problem.

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Aug 12, 2019

@bryanyang0528 tests failing :)

@bryanyang0528 bryanyang0528 reopened this Aug 12, 2019
@bryanyang0528
Copy link
Contributor Author

@bryanyang0528 bryanyang0528 commented Aug 12, 2019

@adrinjalali I'm not sure why tests failed only on py35_conda_openblas and pylatest_conda_mkl_pandas. And No module named 'sklearn.__check_build._check_build' happened in circleci:doc. Are there any suggestions or hints for figuring out the issues?

p.s. I notice that recent PRs in sklearn are failed in these tests steps either.

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Aug 12, 2019

Merge with master should fix the issue.

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Aug 12, 2019

@bryanyang0528 please avoid force pushing. The errors are not related to you, you can ignore the ones which fail to create the environment.

@adrinjalali
Copy link
Member

@adrinjalali adrinjalali commented Aug 12, 2019

or merge master as @thomasjpfan suggests.

@bryanyang0528
Copy link
Contributor Author

@bryanyang0528 bryanyang0528 commented Aug 12, 2019

@adrinjalali @thomasjpfan Thank you for suggestions.

@bryanyang0528
Copy link
Contributor Author

@bryanyang0528 bryanyang0528 commented Aug 13, 2019

@adrinjalali Thank you for help, all tests passed. What should I do for next step?

Copy link
Member

@adrinjalali adrinjalali left a comment

This LGTM, ping @jnothman since I know he had some reservations about this solution. To me this is a fix, and therefore I wouldn't mind the change.

@amueller amueller changed the title [WIP] add seeds when n_jobs=1 and use seed as random_state [MRG] add seeds when n_jobs=1 and use seed as random_state Aug 13, 2019
Copy link
Member

@amueller amueller left a comment

happy to wait for @jnothman but looks good to me.

Copy link
Member

@NicolasHug NicolasHug left a comment

LGTM as a bug-fix, thanks @bryanyang0528

@jnothman
Copy link
Member

@jnothman jnothman commented Aug 14, 2019

I'm happy with the fix.

@jnothman
Copy link
Member

@jnothman jnothman commented Aug 14, 2019

I'm happy with the fix.
Please add an entry to the change log at doc/whats_new/v0.22.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:

Please also note the change at the top of that file under Changed Models

bryanyang0528 and others added 2 commits Aug 15, 2019
@amueller
Copy link
Member

@amueller amueller commented Aug 15, 2019

thanks!

@bryanyang0528
Copy link
Contributor Author

@bryanyang0528 bryanyang0528 commented Aug 16, 2019

Thanks!

@amueller amueller merged commit e8f2708 into scikit-learn:master Aug 16, 2019
17 checks passed
ariapoy referenced this issue in ntucllab/libact Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants