train_test_split documentation: add example for `stratify` param #18735

kirisakow · 2020-11-02T20:08:25Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

Provided an understandable example to the train_test_split function documentation, namely concerning the stratify parameter. Inspired by this reply on StackOverflow.

Source (of inspiration): https://stackoverflow.com/a/38889389/4883320

NicolasHug · 2020-11-03T08:54:05Z

Thanks for the PR @kirisakow , we usually don't include these details in the docstring, but rather in the user guide.

In this case I think we can just add a "For more details on stratification, see the <User Guide -- Link here> " and link to https://scikit-learn.org/dev/modules/cross_validation.html#cross-validation-iterators-with-stratification-based-on-class-labels

We may also add a note that train_test_split behaves exactly like SSS in this subsection: https://scikit-learn.org/dev/modules/cross_validation.html#stratified-shuffle-split

kirisakow · 2020-11-08T20:33:06Z

Thank you for explanation and counter-proposal @NicolasHug. I was actually afraid my suggestion would meet a snobbish reaction and would be dismissed

cmarmo · 2020-11-09T09:55:15Z

Hi @kirisakow, thanks for your pull request.
There are linting issues in your code. In the lint details you can see the message

sklearn/model_selection/_split.py:2121:80: E501 line too long (81 > 79 characters)

NicolasHug

Thanks @kirisakow , minor suggestion on top of @cmarmo 's comment. Otherwise LGTM

sklearn/model_selection/_split.py

Fix line length

kirisakow · 2020-11-10T04:31:19Z

@cmarmo, @NicolasHug, done!

Please, accept my apologies for my clumsiness.

sklearn/model_selection/_split.py

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

NicolasHug · 2020-11-10T16:11:15Z

Thanks @kirisakow !

kirisakow · 2020-11-10T16:17:18Z

Thank YOU, both of you, for your kind support!

train_test_split: add example for stratify param

f6171a8

Source (of inspiration): https://stackoverflow.com/a/38889389/4883320

github-actions bot added the module:model_selection label Nov 2, 2020

kirisakow changed the title ~~train_test_split: add example for stratify param~~ train_test_split documentation: add example for stratify param Nov 2, 2020

kirisakow added 3 commits November 8, 2020 20:41

Add stratification anchor

ded4246

Add ref to stratification anchor

3cc6a44

Typo fixed

2d824c5

NicolasHug reviewed Nov 9, 2020

View reviewed changes

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

kirisakow added 2 commits November 10, 2020 05:09

Put back accidentally deleted existing line

794edbc

Update _split.py

1e3d1bf

Fix line length

NicolasHug reviewed Nov 10, 2020

View reviewed changes

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

NicolasHug reviewed Nov 10, 2020

View reviewed changes

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

kirisakow and others added 2 commits November 10, 2020 16:42

Remove trailing blank spaces

09a4ecb

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

Remove trailing blank spaces

5a8cf46

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>

NicolasHug merged commit 35b3195 into scikit-learn:master Nov 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train_test_split documentation: add example for `stratify` param #18735

train_test_split documentation: add example for `stratify` param #18735

kirisakow commented Nov 2, 2020 •

edited

NicolasHug commented Nov 3, 2020

kirisakow commented Nov 8, 2020

cmarmo commented Nov 9, 2020

NicolasHug left a comment

kirisakow commented Nov 10, 2020

NicolasHug commented Nov 10, 2020

kirisakow commented Nov 10, 2020

train_test_split documentation: add example for stratify param #18735

train_test_split documentation: add example for stratify param #18735

Conversation

kirisakow commented Nov 2, 2020 • edited

Reference Issues/PRs

What does this implement/fix? Explain your changes.

NicolasHug commented Nov 3, 2020

kirisakow commented Nov 8, 2020

cmarmo commented Nov 9, 2020

NicolasHug left a comment

Choose a reason for hiding this comment

kirisakow commented Nov 10, 2020

NicolasHug commented Nov 10, 2020

kirisakow commented Nov 10, 2020

train_test_split documentation: add example for `stratify` param #18735

train_test_split documentation: add example for `stratify` param #18735

kirisakow commented Nov 2, 2020 •

edited