[MRG] add seeds when n_jobs=1 and use seed as random_state #9288

bryanyang0528 · 2017-07-06T11:03:14Z

Reference Issue

What does this implement/fix? Explain your changes.

no matter how many n_jobs, the random_state in kmeans_single should be the same.
So I added seeds = random_state.randint(np.iinfo(np.int32).max, size=n_init if n_jobs=1
and use seed instead original random_state in the for loop.

Any other comments?

I didn't revise test cases for this change yet. I'll update them if you think this change is good.

jnothman · 2017-07-06T12:05:26Z

This seems reasonable except insofar as KMeans with a fixed random state might have been returning the same model for a long time. I'm not sure it's worth breaking users' clusterings,

lesteve · 2017-07-06T12:08:42Z

sklearn/cluster/k_means_.py

@@ -338,12 +338,13 @@ def k_means(X, n_clusters, init='k-means++', precompute_distances='auto',
    if n_jobs == 1:
        # For a single thread, less memory is needed if we just store one set
        # of the best results (as opposed to one set per run per thread).
-        for it in range(n_init):
+        seeds = random_state.randint(np.iinfo(np.int32).max, size=n_init)


You can move the seeds assignment outside of the if clause since it used both for n_job == 1 and n_jobs != 1.

bryanyang0528 · 2017-07-06T15:57:00Z

@jnothman Even though creating seeds for n_jobs =1, seeds will be the same with the fixed random_state which might return the same model. But the model might be not the same as the model generated by the current method.

amueller · 2019-08-05T19:19:37Z

some duplication with #9785

adrinjalali · 2019-08-06T14:36:55Z

This looks good to me. It can take the test from the other PR and I'd say it's almost good to go.

@jnothman you still worried about backward compatibility here?

amueller · 2019-08-06T15:03:52Z

I think it's a good fix.

adrinjalali · 2019-08-06T15:22:08Z

@bryanyang0528 would you have time to address the comments, and rebase on the latest master here?

bryanyang0528 · 2019-08-09T02:31:22Z

@adrinjalali no problem. Thanks!

jnothman · 2019-08-11T10:23:31Z

Any chance you can add a test?

bryanyang0528 · 2019-08-11T11:50:06Z

@jnothman no problem.

adrinjalali · 2019-08-12T09:02:55Z

@bryanyang0528 tests failing :)

bryanyang0528 · 2019-08-12T12:40:39Z

@adrinjalali I'm not sure why tests failed only on py35_conda_openblas and pylatest_conda_mkl_pandas. And No module named 'sklearn.__check_build._check_build' happened in circleci:doc. Are there any suggestions or hints for figuring out the issues?

p.s. I notice that recent PRs in sklearn are failed in these tests steps either.

thomasjpfan · 2019-08-12T13:12:05Z

Merge with master should fix the issue.

adrinjalali · 2019-08-12T13:13:43Z

@bryanyang0528 please avoid force pushing. The errors are not related to you, you can ignore the ones which fail to create the environment.

adrinjalali · 2019-08-12T13:14:30Z

or merge master as @thomasjpfan suggests.

bryanyang0528 · 2019-08-12T14:07:10Z

@adrinjalali @thomasjpfan Thank you for suggestions.

bryanyang0528 · 2019-08-13T01:49:20Z

@adrinjalali Thank you for help, all tests passed. What should I do for next step?

adrinjalali

This LGTM, ping @jnothman since I know he had some reservations about this solution. To me this is a fix, and therefore I wouldn't mind the change.

amueller

happy to wait for @jnothman but looks good to me.

NicolasHug

LGTM as a bug-fix, thanks @bryanyang0528

jnothman · 2019-08-14T21:37:14Z

I'm happy with the fix.

jnothman · 2019-08-14T21:38:15Z

I'm happy with the fix.
Please add an entry to the change log at doc/whats_new/v0.22.rst. Like the other entries there, please reference this pull request with :pr: and credit yourself (and other contributors if applicable) with :user:

Please also note the change at the top of that file under Changed Models

slight phrasing

amueller · 2019-08-15T16:21:41Z

thanks!

bryanyang0528 · 2019-08-16T00:41:24Z

Thanks!

lesteve reviewed Jul 6, 2017

View reviewed changes

lesteve mentioned this pull request Sep 20, 2017

KMeans gives slightly different result for n_jobs=1 vs. n_jobs > 1 #9784

Closed

amueller mentioned this pull request Aug 5, 2019

[WIP] Fix KMeans n_init runs seed #9785

Closed

amueller added the Waiting for Reviewer label Aug 5, 2019

bryanyang0528 force-pushed the consist_n_jobs branch from dda3fff to f3f97be Compare August 9, 2019 03:46

bryanyang0528 force-pushed the consist_n_jobs branch from 00a3461 to 3b95ebd Compare August 12, 2019 11:17

bryanyang0528 closed this Aug 12, 2019

bryanyang0528 force-pushed the consist_n_jobs branch from 3b95ebd to a7a834b Compare August 12, 2019 11:28

bryanyang0528 added 2 commits August 12, 2019 19:30

use seed even if n_jobs=1

2848aba

add test case for diff n_jobs

1ca7e97

bryanyang0528 reopened this Aug 12, 2019

Merge remote-tracking branch 'upstream/master'

27b3738

adrinjalali approved these changes Aug 13, 2019

View reviewed changes

amueller changed the title ~~[WIP] add seeds when n_jobs=1 and use seed as random_state~~ [MRG] add seeds when n_jobs=1 and use seed as random_state Aug 13, 2019

amueller approved these changes Aug 13, 2019

View reviewed changes

NicolasHug approved these changes Aug 13, 2019

View reviewed changes

amueller removed the Waiting for Reviewer label Aug 15, 2019

bryanyang0528 and others added 2 commits August 16, 2019 00:16

updated doc

aebff53

Update v0.22.rst

b685401

slight phrasing

amueller merged commit e8f2708 into scikit-learn:master Aug 16, 2019

jeromedockes mentioned this pull request Oct 14, 2019

test_load_uniform_ball_cloud will fail with sklearn 0.22 nilearn/nilearn#2173

Closed

ariapoy referenced this pull request in ntucllab/libact Aug 2, 2021

Upgrade to sklearn==0.22.

0b61fa4

ariapoy mentioned this pull request Aug 4, 2021

[MRG] Upgrade to support newest scikit-learn version ntucllab/libact#188

Merged

Uh oh!

[MRG] add seeds when n_jobs=1 and use seed as random_state #9288

[MRG] add seeds when n_jobs=1 and use seed as random_state #9288

Uh oh!

Conversation

bryanyang0528 commented Jul 6, 2017 • edited by amueller Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jnothman commented Jul 6, 2017

Uh oh!

lesteve Jul 6, 2017

Choose a reason for hiding this comment

Uh oh!

bryanyang0528 commented Jul 6, 2017

Uh oh!

amueller commented Aug 5, 2019

Uh oh!

adrinjalali commented Aug 6, 2019

Uh oh!

amueller commented Aug 6, 2019

Uh oh!

adrinjalali commented Aug 6, 2019

Uh oh!

bryanyang0528 commented Aug 9, 2019

Uh oh!

jnothman commented Aug 11, 2019

Uh oh!

bryanyang0528 commented Aug 11, 2019

Uh oh!

adrinjalali commented Aug 12, 2019

Uh oh!

bryanyang0528 commented Aug 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan commented Aug 12, 2019

Uh oh!

adrinjalali commented Aug 12, 2019

Uh oh!

adrinjalali commented Aug 12, 2019

Uh oh!

bryanyang0528 commented Aug 12, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bryanyang0528 commented Aug 13, 2019

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman commented Aug 14, 2019

Uh oh!

jnothman commented Aug 14, 2019

Uh oh!

amueller commented Aug 15, 2019

Uh oh!

bryanyang0528 commented Aug 16, 2019

Uh oh!

Uh oh!

bryanyang0528 commented Jul 6, 2017 •

edited by amueller

Loading

bryanyang0528 commented Aug 12, 2019 •

edited

Loading

bryanyang0528 commented Aug 12, 2019 •

edited

Loading