Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] add seeds when n_jobs=1 and use seed as random_state #9288

Merged
merged 5 commits into from Aug 16, 2019

Conversation

@bryanyang0528
Copy link
Contributor

bryanyang0528 commented Jul 6, 2017

Reference Issue

Fixes #9287, Fixes #9784

What does this implement/fix? Explain your changes.

no matter how many n_jobs, the random_state in kmeans_single should be the same.
So I added seeds = random_state.randint(np.iinfo(np.int32).max, size=n_init if n_jobs=1
and use seed instead original random_state in the for loop.

Any other comments?

I didn't revise test cases for this change yet. I'll update them if you think this change is good.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Jul 6, 2017

This seems reasonable except insofar as KMeans with a fixed random state might have been returning the same model for a long time. I'm not sure it's worth breaking users' clusterings,

@@ -338,12 +338,13 @@ def k_means(X, n_clusters, init='k-means++', precompute_distances='auto',
if n_jobs == 1:
# For a single thread, less memory is needed if we just store one set
# of the best results (as opposed to one set per run per thread).
for it in range(n_init):
seeds = random_state.randint(np.iinfo(np.int32).max, size=n_init)

This comment has been minimized.

Copy link
@lesteve

lesteve Jul 6, 2017

Member

You can move the seeds assignment outside of the if clause since it used both for n_job == 1 and n_jobs != 1.

@bryanyang0528

This comment has been minimized.

Copy link
Contributor Author

bryanyang0528 commented Jul 6, 2017

@jnothman Even though creating seeds for n_jobs =1, seeds will be the same with the fixed random_state which might return the same model. But the model might be not the same as the model generated by the current method.

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 5, 2019

some duplication with #9785

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Aug 6, 2019

This looks good to me. It can take the test from the other PR and I'd say it's almost good to go.

@jnothman you still worried about backward compatibility here?

@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 6, 2019

I think it's a good fix.

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Aug 6, 2019

@bryanyang0528 would you have time to address the comments, and rebase on the latest master here?

@bryanyang0528

This comment has been minimized.

Copy link
Contributor Author

bryanyang0528 commented Aug 9, 2019

@adrinjalali no problem. Thanks!

@bryanyang0528 bryanyang0528 force-pushed the bryanyang0528:consist_n_jobs branch from dda3fff to f3f97be Aug 9, 2019
@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 11, 2019

Any chance you can add a test?

@bryanyang0528

This comment has been minimized.

Copy link
Contributor Author

bryanyang0528 commented Aug 11, 2019

@jnothman no problem.

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Aug 12, 2019

@bryanyang0528 tests failing :)

@bryanyang0528 bryanyang0528 force-pushed the bryanyang0528:consist_n_jobs branch from 00a3461 to 3b95ebd Aug 12, 2019
@bryanyang0528 bryanyang0528 force-pushed the bryanyang0528:consist_n_jobs branch from 3b95ebd to a7a834b Aug 12, 2019
@bryanyang0528 bryanyang0528 reopened this Aug 12, 2019
@bryanyang0528

This comment has been minimized.

Copy link
Contributor Author

bryanyang0528 commented Aug 12, 2019

@adrinjalali I'm not sure why tests failed only on py35_conda_openblas and pylatest_conda_mkl_pandas. And No module named 'sklearn.__check_build._check_build' happened in circleci:doc. Are there any suggestions or hints for figuring out the issues?

p.s. I notice that recent PRs in sklearn are failed in these tests steps either.

@thomasjpfan

This comment has been minimized.

Copy link
Member

thomasjpfan commented Aug 12, 2019

Merge with master should fix the issue.

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Aug 12, 2019

@bryanyang0528 please avoid force pushing. The errors are not related to you, you can ignore the ones which fail to create the environment.

@adrinjalali

This comment has been minimized.

Copy link
Member

adrinjalali commented Aug 12, 2019

or merge master as @thomasjpfan suggests.

@bryanyang0528

This comment has been minimized.

Copy link
Contributor Author

bryanyang0528 commented Aug 12, 2019

@adrinjalali @thomasjpfan Thank you for suggestions.

@bryanyang0528

This comment has been minimized.

Copy link
Contributor Author

bryanyang0528 commented Aug 13, 2019

@adrinjalali Thank you for help, all tests passed. What should I do for next step?

Copy link
Member

adrinjalali left a comment

This LGTM, ping @jnothman since I know he had some reservations about this solution. To me this is a fix, and therefore I wouldn't mind the change.

@amueller amueller changed the title [WIP] add seeds when n_jobs=1 and use seed as random_state [MRG] add seeds when n_jobs=1 and use seed as random_state Aug 13, 2019
Copy link
Member

amueller left a comment

happy to wait for @jnothman but looks good to me.

Copy link
Contributor

NicolasHug left a comment

LGTM as a bug-fix, thanks @bryanyang0528

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 14, 2019

I'm happy with the fix.

@jnothman

This comment has been minimized.

Copy link
Member

jnothman commented Aug 14, 2019

I'm happy with the fix.
Please add an entry to the change log at doc/whats_new/v0.22.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:

Please also note the change at the top of that file under Changed Models

bryanyang0528 and others added 2 commits Aug 15, 2019
slight phrasing
@amueller

This comment has been minimized.

Copy link
Member

amueller commented Aug 15, 2019

thanks!

@bryanyang0528

This comment has been minimized.

Copy link
Contributor Author

bryanyang0528 commented Aug 16, 2019

Thanks!

@amueller amueller merged commit e8f2708 into scikit-learn:master Aug 16, 2019
17 checks passed
17 checks passed
LGTM analysis: C/C++ No code changes detected
Details
LGTM analysis: JavaScript No code changes detected
Details
LGTM analysis: Python No new or fixed alerts
Details
ci/circleci: deploy Your tests passed on CircleCI!
Details
ci/circleci: doc Your tests passed on CircleCI!
Details
ci/circleci: doc-min-dependencies Your tests passed on CircleCI!
Details
ci/circleci: lint Your tests passed on CircleCI!
Details
codecov/patch 100% of diff hit (target 96.87%)
Details
codecov/project 96.88% (+0.01%) compared to 3eacf94
Details
scikit-learn.scikit-learn Build #20190815.30 succeeded
Details
scikit-learn.scikit-learn (Linux py35_conda_openblas) Linux py35_conda_openblas succeeded
Details
scikit-learn.scikit-learn (Linux py35_ubuntu_atlas) Linux py35_ubuntu_atlas succeeded
Details
scikit-learn.scikit-learn (Linux pylatest_conda_mkl_pandas) Linux pylatest_conda_mkl_pandas succeeded
Details
scikit-learn.scikit-learn (Linux32 py35_ubuntu_atlas_32bit) Linux32 py35_ubuntu_atlas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py35_pip_openblas_32bit) Windows py35_pip_openblas_32bit succeeded
Details
scikit-learn.scikit-learn (Windows py37_conda_mkl) Windows py37_conda_mkl succeeded
Details
scikit-learn.scikit-learn (macOS pylatest_conda_mkl) macOS pylatest_conda_mkl succeeded
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants
You can’t perform that action at this time.