New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Reference glossary in random_state docstring entries in datasets module #10732
[MRG+1] Reference glossary in random_state docstring entries in datasets module #10732
Conversation
See the comments at #10614. We've decided we need to leave a little more information in, particularly informing users that an int can be used for determinism/reproducibility |
Thanks for pointing that out. I added a note about reproducibility to the |
@jnothman, just pinging you about this. Does that latest commit strike the balance between including info on reproducibility and offloading details to the glossary? Thanks for your help. |
I'd really like another core dev's input.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall. Thanks @richford for the PR :)
sklearn/datasets/base.py
Outdated
If RandomState instance, random_state is the random number generator; | ||
If None, the random number generator is the RandomState instance used | ||
by `np.random`. | ||
random_state : int, RandomState instance or None (default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default
-> default = 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong here but isn't the default None
, which uses the global random state from numpy.random
? Which is distinct from random_state=0
which would set the seed to zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it is 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Step 1. Remove foot from mouth.
Step 2. Implement the obvious and reasonable fixes that the devs told you to.
Thanks @qinhanmin2014 and @jnothman for pointing that out. Sorry for my confusion. The latest commits fix it.
sklearn/datasets/olivetti_faces.py
Outdated
If RandomState instance, random_state is the random number generator; | ||
If None, the random number generator is the RandomState instance used | ||
by `np.random`. | ||
random_state : int, RandomState instance or None (default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default
-> default = 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong here but isn't the default None
, which uses the global random state from numpy.random
? Which is distinct from random_state=0
which would set the seed to zero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in latest commit. Thanks for pointing that out. Apologies for the confusion.
sklearn/datasets/svmlight_format.py
Outdated
@@ -446,8 +446,8 @@ def dump_svmlight_file(X, y, f, zero_based=True, comment=None, query_id=None, | |||
|
|||
Xval = check_array(X, accept_sparse='csr') | |||
if Xval.shape[0] != yval.shape[0]: | |||
raise ValueError("X.shape[0] and y.shape[0] should be the same, got" | |||
" %r and %r instead." % (Xval.shape[0], yval.shape[0])) | |||
raise ValueError("X.shape[0] and y.shape[0] should be the same, got " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please avoid unrelevant changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flake 8 change reverted in latest commits.
sklearn/datasets/base.py
Outdated
If RandomState instance, random_state is the random number generator; | ||
If None, the random number generator is the RandomState instance used | ||
by `np.random`. | ||
random_state : int, RandomState instance or None (default = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default = 0
-> default=0
as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
sklearn/datasets/olivetti_faces.py
Outdated
If RandomState instance, random_state is the random number generator; | ||
If None, the random number generator is the RandomState instance used | ||
by `np.random`. | ||
random_state : int, RandomState instance or None (default = 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default = 0 -> default=0 as before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR @richford .
sklearn/datasets/base.py
Outdated
random_state : int, RandomState instance or None (default=0) | ||
Determines random number generation for dataset shuffling. Pass an int | ||
for reproducible output across multiple function calls. | ||
See :term:`random_state <Glossary>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be,
:term:`Glossary <random_state>`.
I think, as this link doesn't currently render correctly
(applies to all changes)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Thanks @rth.
Thanks for the thorough reviews @rth, @jnothman, and @qinhanmin2014. Is there anything more you'd like me to do on my end in order to merge this PR? |
Thank you. |
Reference Issues/PRs
Partially addresses #10548
What does this implement/fix? Explain your changes.
This PR updates the
random_state
docstring descriptions in allsklearn.datasets
docstrings. The descriptions now refer to the glossary and, in some cases, provide some detail on what exactly the random number generation is used for.