Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

train_test_split documentation: add example for stratify param #18735

Merged
merged 8 commits into from Nov 10, 2020

Conversation

kirisakow
Copy link
Contributor

@kirisakow kirisakow commented Nov 2, 2020

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

Provided an understandable example to the train_test_split function documentation, namely concerning the stratify parameter. Inspired by this reply on StackOverflow.

@kirisakow kirisakow changed the title train_test_split: add example for stratify param train_test_split documentation: add example for stratify param Nov 2, 2020
@NicolasHug
Copy link
Member

Thanks for the PR @kirisakow , we usually don't include these details in the docstring, but rather in the user guide.

In this case I think we can just add a "For more details on stratification, see the <User Guide -- Link here> " and link to https://scikit-learn.org/dev/modules/cross_validation.html#cross-validation-iterators-with-stratification-based-on-class-labels

We may also add a note that train_test_split behaves exactly like SSS in this subsection: https://scikit-learn.org/dev/modules/cross_validation.html#stratified-shuffle-split

@kirisakow
Copy link
Contributor Author

Thank you for explanation and counter-proposal @NicolasHug. I was actually afraid my suggestion would meet a snobbish reaction and would be dismissed

@cmarmo
Copy link
Member

cmarmo commented Nov 9, 2020

Hi @kirisakow, thanks for your pull request.
There are linting issues in your code. In the lint details you can see the message

sklearn/model_selection/_split.py:2121:80: E501 line too long (81 > 79 characters)

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kirisakow , minor suggestion on top of @cmarmo 's comment. Otherwise LGTM

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved
@kirisakow
Copy link
Contributor Author

@cmarmo, @NicolasHug, done!

Please, accept my apologies for my clumsiness.

kirisakow and others added 2 commits November 10, 2020 16:42
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
@NicolasHug NicolasHug merged commit 35b3195 into scikit-learn:master Nov 10, 2020
@NicolasHug
Copy link
Member

Thanks @kirisakow !

@kirisakow
Copy link
Contributor Author

Thank YOU, both of you, for your kind support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants