Add a seed parameter for repeatability #14

ensonario · 2017-11-13T17:33:47Z

It would be great to add seed parameter for repeatability, the way how scikit-learn does it.

lmcinnes · 2017-11-13T17:48:17Z

I agree, that would make a lot of sense. Ideally it isn't too hard, but has some quirks given how I am currently handling random number generation. It is certainly on my list of things to do (which is unfortunately long).

ensonario · 2017-11-13T18:03:08Z

It might be a good idea to publish a roadmap, community will be able to contribute!

lmcinnes · 2017-11-13T18:24:14Z

Sounds like a good plan -- any suggestions for where and how best to do that?

ensonario · 2017-11-13T19:30:09Z

Well, an issue on github will do. People will add comments, and you will be able to update the issue after each release.

lmcinnes · 2017-11-15T01:21:52Z

Basic random seed support is now in place via the random_seed parameter which takes an int. Ideally things would work a little differently as per standard sklearn with a random_state that supports more input types (e.g. numpy random states) but that will take a little thought as to the best way to do that.

Edit: and it doesn't actually achieve the desired result :-( Not sure why though. It should provide slightly more consistency though.

lmcinnes · 2017-11-15T01:33:46Z

Okay, that helps more. I have a nagging feeling there will be more minor things like the eigenvector solver to track down if I want to truly eliminate variability.

lmcinnes · 2017-11-16T03:48:00Z

Well that was a lot more work than I intended. For the record (since others may face this, and so I will remember in the future) the issue is that numba very cleverly swaps out np.random calls for something lower level (to avoid roundtrips back to python I presume), and this does (may?) not play nice with setting a random seed for numpy. Once I worked out what the issue was and rewrote everything to deal with that the issue resolved itself nicely and we get something repeatable. I believe setting random_state now works and should provide consistent embeddings (with a consistent random state).

lmcinnes added the enhancement label Nov 13, 2017

lmcinnes added a commit that referenced this issue Nov 15, 2017

Basic support for user set random seed per issue #14

70bf1c2

lmcinnes added a commit that referenced this issue Nov 15, 2017

Fix eigvector solve initialisation to help with issue #14

3ab97e5

lmcinnes added a commit that referenced this issue Nov 16, 2017

Random seed now works; clean up (issue #14)

11937d2

lmcinnes added a commit that referenced this issue Nov 16, 2017

Update docs (issue #14)

f7f7520

lmcinnes closed this as completed Nov 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a seed parameter for repeatability #14

Add a seed parameter for repeatability #14

ensonario commented Nov 13, 2017

lmcinnes commented Nov 13, 2017

ensonario commented Nov 13, 2017 •

edited

lmcinnes commented Nov 13, 2017

ensonario commented Nov 13, 2017

lmcinnes commented Nov 15, 2017 •

edited

lmcinnes commented Nov 15, 2017

lmcinnes commented Nov 16, 2017

Add a seed parameter for repeatability #14

Add a seed parameter for repeatability #14

Comments

ensonario commented Nov 13, 2017

lmcinnes commented Nov 13, 2017

ensonario commented Nov 13, 2017 • edited

lmcinnes commented Nov 13, 2017

ensonario commented Nov 13, 2017

lmcinnes commented Nov 15, 2017 • edited

lmcinnes commented Nov 15, 2017

lmcinnes commented Nov 16, 2017

ensonario commented Nov 13, 2017 •

edited

lmcinnes commented Nov 15, 2017 •

edited