Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a seed parameter for repeatability #14

Closed
ensonario opened this issue Nov 13, 2017 · 7 comments
Closed

Add a seed parameter for repeatability #14

ensonario opened this issue Nov 13, 2017 · 7 comments

Comments

@ensonario
Copy link

It would be great to add seed parameter for repeatability, the way how scikit-learn does it.

@lmcinnes
Copy link
Owner

I agree, that would make a lot of sense. Ideally it isn't too hard, but has some quirks given how I am currently handling random number generation. It is certainly on my list of things to do (which is unfortunately long).

@ensonario
Copy link
Author

ensonario commented Nov 13, 2017

It might be a good idea to publish a roadmap, community will be able to contribute!

@lmcinnes
Copy link
Owner

Sounds like a good plan -- any suggestions for where and how best to do that?

@ensonario
Copy link
Author

Well, an issue on github will do. People will add comments, and you will be able to update the issue after each release.

@lmcinnes
Copy link
Owner

lmcinnes commented Nov 15, 2017

Basic random seed support is now in place via the random_seed parameter which takes an int. Ideally things would work a little differently as per standard sklearn with a random_state that supports more input types (e.g. numpy random states) but that will take a little thought as to the best way to do that.

Edit: and it doesn't actually achieve the desired result :-( Not sure why though. It should provide slightly more consistency though.

@lmcinnes
Copy link
Owner

Okay, that helps more. I have a nagging feeling there will be more minor things like the eigenvector solver to track down if I want to truly eliminate variability.

lmcinnes added a commit that referenced this issue Nov 16, 2017
@lmcinnes
Copy link
Owner

Well that was a lot more work than I intended. For the record (since others may face this, and so I will remember in the future) the issue is that numba very cleverly swaps out np.random calls for something lower level (to avoid roundtrips back to python I presume), and this does (may?) not play nice with setting a random seed for numpy. Once I worked out what the issue was and rewrote everything to deal with that the issue resolved itself nicely and we get something repeatable. I believe setting random_state now works and should provide consistent embeddings (with a consistent random state).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants