Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Reference glossary in random_state docstring entries in datasets module #10732

Merged
merged 7 commits into from Mar 24, 2018

Conversation

richford
Copy link
Contributor

Reference Issues/PRs

Partially addresses #10548

What does this implement/fix? Explain your changes.

This PR updates the random_state docstring descriptions in all sklearn.datasets docstrings. The descriptions now refer to the glossary and, in some cases, provide some detail on what exactly the random number generation is used for.

@richford richford changed the title Reference glossary in random_state docstring entries in datasets module [MRG] Reference glossary in random_state docstring entries in datasets module Feb 28, 2018
@jnothman
Copy link
Member

jnothman commented Mar 1, 2018

See the comments at #10614. We've decided we need to leave a little more information in, particularly informing users that an int can be used for determinism/reproducibility

@richford
Copy link
Contributor Author

richford commented Mar 1, 2018

Thanks for pointing that out. I added a note about reproducibility to the random_state entries of each docstring.

@richford
Copy link
Contributor Author

richford commented Mar 7, 2018

@jnothman, just pinging you about this. Does that latest commit strike the balance between including info on reproducibility and offloading details to the glossary? Thanks for your help.

@jnothman
Copy link
Member

jnothman commented Mar 7, 2018 via email

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall. Thanks @richford for the PR :)

If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
random_state : int, RandomState instance or None (default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default -> default = 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong here but isn't the default None, which uses the global random state from numpy.random? Which is distinct from random_state=0 which would set the seed to zero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it is 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Step 1. Remove foot from mouth.
Step 2. Implement the obvious and reasonable fixes that the devs told you to.

Thanks @qinhanmin2014 and @jnothman for pointing that out. Sorry for my confusion. The latest commits fix it.

If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
random_state : int, RandomState instance or None (default)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default -> default = 0

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong here but isn't the default None, which uses the global random state from numpy.random? Which is distinct from random_state=0 which would set the seed to zero.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in latest commit. Thanks for pointing that out. Apologies for the confusion.

@@ -446,8 +446,8 @@ def dump_svmlight_file(X, y, f, zero_based=True, comment=None, query_id=None,

Xval = check_array(X, accept_sparse='csr')
if Xval.shape[0] != yval.shape[0]:
raise ValueError("X.shape[0] and y.shape[0] should be the same, got"
" %r and %r instead." % (Xval.shape[0], yval.shape[0]))
raise ValueError("X.shape[0] and y.shape[0] should be the same, got "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please avoid unrelevant changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flake 8 change reverted in latest commits.

If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
random_state : int, RandomState instance or None (default = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default = 0 -> default=0 as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

If RandomState instance, random_state is the random number generator;
If None, the random number generator is the RandomState instance used
by `np.random`.
random_state : int, RandomState instance or None (default = 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default = 0 -> default=0 as before

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@qinhanmin2014 qinhanmin2014 changed the title [MRG] Reference glossary in random_state docstring entries in datasets module [MRG+1] Reference glossary in random_state docstring entries in datasets module Mar 15, 2018
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @richford .

random_state : int, RandomState instance or None (default=0)
Determines random number generation for dataset shuffling. Pass an int
for reproducible output across multiple function calls.
See :term:`random_state <Glossary>`.
Copy link
Member

@rth rth Mar 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be,

:term:`Glossary <random_state>`.

I think, as this link doesn't currently render correctly

(applies to all changes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Thanks @rth.

@richford
Copy link
Contributor Author

Thanks for the thorough reviews @rth, @jnothman, and @qinhanmin2014. Is there anything more you'd like me to do on my end in order to merge this PR?

@jnothman jnothman merged commit 3b037b0 into scikit-learn:master Mar 24, 2018
@jnothman
Copy link
Member

Thank you.

@richford richford deleted the random-state-datasets-docs branch March 25, 2018 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants