New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+3] edit train/test_size default behavior #7459

Merged
merged 26 commits into from Jun 14, 2017

Conversation

Projects
None yet
7 participants
@nelson-liu
Contributor

nelson-liu commented Sep 20, 2016

Reference Issue

Fixes #5948, #4618

What does this implement/fix? Explain your changes.

Changes the default behavior of train_size and test_size in splitters.
The defaults for both parameters are now None

  • if train_size and test_size are both None, train_size is set to 0.9 and test_size is set to 0.1
  • if only one of train_size or test_size are set, then the value of the unset parameter is n_samples - set_parameter (the complement)
  • If both are explicitly set, check that they are equal to or less than 1 or n_samples and respect that the user wants to subsample the dataset

Any other comments?

I'm aware that the cross_validation module is being deprecated; does this mean that I shouldn't add these changes to it and only to model_selection?

TODO

  • Edit documentation to clearly explain these new changes

This change is Reviewable

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Sep 20, 2016

Member

Please don't modify cross_validation.

On 20 September 2016 at 13:35, Nelson Liu notifications@github.com wrote:

Reference Issue

Fixes #5948 #5948
What does this implement/fix? Explain your changes.

Changes the default behavior of train_size and test_size in splitters.
The defaults for both parameters are now None

  • if train_size and test_size are both None, train_size is set to 0.1
    and test_size is set to0.9`
  • if only one of train_size or test_size are set, then the value of
    the unset parameter is n_samples - set_parameter (the complement)
  • If both are explicitly set, check that they are equal to or less
    than 0 and respect that the user wants to subsample the dataset

Any other comments?

I'm aware that the cross_validation module is being deprecated; does this
mean that I shouldn't add these changes to it and only to model_selection?
TODO

  • Edit documentation to clearly explain these new changes

You can view, comment on, or merge this pull request online at:

#7459
Commit Summary

  • edit train/test_size default behavior

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#7459, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AAEz6w0JaHZaFQeJaKrIaip96r33SnH0ks5qr1SAgaJpZM4KBMM_
.

Member

jnothman commented Sep 20, 2016

Please don't modify cross_validation.

On 20 September 2016 at 13:35, Nelson Liu notifications@github.com wrote:

Reference Issue

Fixes #5948 #5948
What does this implement/fix? Explain your changes.

Changes the default behavior of train_size and test_size in splitters.
The defaults for both parameters are now None

  • if train_size and test_size are both None, train_size is set to 0.1
    and test_size is set to0.9`
  • if only one of train_size or test_size are set, then the value of
    the unset parameter is n_samples - set_parameter (the complement)
  • If both are explicitly set, check that they are equal to or less
    than 0 and respect that the user wants to subsample the dataset

Any other comments?

I'm aware that the cross_validation module is being deprecated; does this
mean that I shouldn't add these changes to it and only to model_selection?
TODO

  • Edit documentation to clearly explain these new changes

You can view, comment on, or merge this pull request online at:

#7459
Commit Summary

  • edit train/test_size default behavior

File Changes

Patch Links:


You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#7459, or mute the
thread
https://github.com/notifications/unsubscribe-auth/AAEz6w0JaHZaFQeJaKrIaip96r33SnH0ks5qr1SAgaJpZM4KBMM_
.

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Sep 20, 2016

Contributor

Please don't modify cross_validation.

ok, thanks.

Contributor

nelson-liu commented Sep 20, 2016

Please don't modify cross_validation.

ok, thanks.

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Sep 20, 2016

Member

This needs a rebase on master to fix the conflicts.

Member

lesteve commented Sep 20, 2016

This needs a rebase on master to fix the conflicts.

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Sep 20, 2016

Contributor

This needs a rebase on master to fix the conflicts.

Done, thanks for letting me know. Didn't notice / realize someone else was working on the file as well :)

Contributor

nelson-liu commented Sep 20, 2016

This needs a rebase on master to fix the conflicts.

Done, thanks for letting me know. Didn't notice / realize someone else was working on the file as well :)

@nelson-liu nelson-liu changed the title from [WIP] edit train/test_size default behavior to [MRG] edit train/test_size default behavior Sep 27, 2016

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Sep 27, 2016

Contributor

this is ready to be looked at, missing a whatsnew though (I'd like to verify the current behavior is correct before writing it). Among the changes discussed earlier, GroupShuffleSplit has test_size set to a default of 0.2. This PR changes it to 0.1 to match all the other splitters.

Contributor

nelson-liu commented Sep 27, 2016

this is ready to be looked at, missing a whatsnew though (I'd like to verify the current behavior is correct before writing it). Among the changes discussed earlier, GroupShuffleSplit has test_size set to a default of 0.2. This PR changes it to 0.1 to match all the other splitters.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 30, 2016

Member

I expect the docs of train_test_split also need a change. And you can't change the default behavior of GroupShuffleSplit without a deprecation. We could have done that in 0.18, but now it's too late :-/

Member

amueller commented Sep 30, 2016

I expect the docs of train_test_split also need a change. And you can't change the default behavior of GroupShuffleSplit without a deprecation. We could have done that in 0.18, but now it's too late :-/

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Sep 30, 2016

Contributor

right, sorry for omitting that @amueller . I've actually never "deprecated" by changing the default parameter of a function before, would you mind pointing me to an example / explaining how to do it? It seems to be missing in the contributing docs as well, so I'd be happy to add it there.

Contributor

nelson-liu commented Sep 30, 2016

right, sorry for omitting that @amueller . I've actually never "deprecated" by changing the default parameter of a function before, would you mind pointing me to an example / explaining how to do it? It seems to be missing in the contributing docs as well, so I'd be happy to add it there.

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 30, 2016

Member

I'm not sure we should actually change the default from 0.2 to 0.1 in that case.

For reference, the deprecation documentation is here:
http://scikit-learn.org/dev/developers/contributing.html#deprecation

Member

amueller commented Sep 30, 2016

I'm not sure we should actually change the default from 0.2 to 0.1 in that case.

For reference, the deprecation documentation is here:
http://scikit-learn.org/dev/developers/contributing.html#deprecation

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 30, 2016

Member

I'm always kinda confused by the fact that the "Contributing" link on the help page doesn't go to contributing, but to developer....

Member

amueller commented Sep 30, 2016

I'm always kinda confused by the fact that the "Contributing" link on the help page doesn't go to contributing, but to developer....

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Sep 30, 2016

Contributor

hmm, would

            warnings.warn("The default value of train_size parameter was "
                          "changed from 0.2 to None, which changes the "
                          "default (train_size, test_size)"
                          "from (0.2, 0.8) to (0.1, 0.9) "
                          "in version 0.19", DeprecationWarning)

be clear? I feel like there's a better way to word it...

Contributor

nelson-liu commented Sep 30, 2016

hmm, would

            warnings.warn("The default value of train_size parameter was "
                          "changed from 0.2 to None, which changes the "
                          "default (train_size, test_size)"
                          "from (0.2, 0.8) to (0.1, 0.9) "
                          "in version 0.19", DeprecationWarning)

be clear? I feel like there's a better way to word it...

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Sep 30, 2016

Member

Usually we just say "The default value of (..) will change from .. to .. in version 0.20" and the None is really just a weird implementation detail

Member

amueller commented Sep 30, 2016

Usually we just say "The default value of (..) will change from .. to .. in version 0.20" and the None is really just a weird implementation detail

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Oct 4, 2016

Contributor

@amueller we also need a deprecation for train_test_split right? Considering the defaults will change from 0.25,0.75 to 0.1,0.9 in version 0.21

also, by the docs of train_test_split need a change, do you mean the docstrings or the examples / tutorial material?

Contributor

nelson-liu commented Oct 4, 2016

@amueller we also need a deprecation for train_test_split right? Considering the defaults will change from 0.25,0.75 to 0.1,0.9 in version 0.21

also, by the docs of train_test_split need a change, do you mean the docstrings or the examples / tutorial material?

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Oct 4, 2016

Contributor

also I edited the default parameters taken by the __init__ of GroupShuffleSplit to take test_size=None and train_size=None, allowing me to throw the deprecation warning and then accordingly set the previous defaults (test_size=0.2 and train_size-0.8). Should I just not have touched this?

Contributor

nelson-liu commented Oct 4, 2016

also I edited the default parameters taken by the __init__ of GroupShuffleSplit to take test_size=None and train_size=None, allowing me to throw the deprecation warning and then accordingly set the previous defaults (test_size=0.2 and train_size-0.8). Should I just not have touched this?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Oct 5, 2016

Member

We should not change the defaults of train_test_split. We should use the same behavior, though. And I meant the docstrings need to document the new way train_size and test_size interact.

Member

amueller commented Oct 5, 2016

We should not change the defaults of train_test_split. We should use the same behavior, though. And I meant the docstrings need to document the new way train_size and test_size interact.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 6, 2016

Member

There are a couple of errors in your issue description:

  • if train_size and test_size are both None, train_size is set to 0.1 and test_size is set to 0.9

you mean the opposite

  • If both are explicitly set, check that they are equal to or less than 0 and respect that the user wants to subsample the dataset

I think you mean less than n_samples or less than 1.

Now that model_selection is released we need to be aware of backwards compatibility issues. For now, if a user sets train_size without setting test_size (and train_size+test_size > 1 or n_samples) the best we can do is warn that the behaviour will change in a future release, IMO.

Member

jnothman commented Oct 6, 2016

There are a couple of errors in your issue description:

  • if train_size and test_size are both None, train_size is set to 0.1 and test_size is set to 0.9

you mean the opposite

  • If both are explicitly set, check that they are equal to or less than 0 and respect that the user wants to subsample the dataset

I think you mean less than n_samples or less than 1.

Now that model_selection is released we need to be aware of backwards compatibility issues. For now, if a user sets train_size without setting test_size (and train_size+test_size > 1 or n_samples) the best we can do is warn that the behaviour will change in a future release, IMO.

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Oct 7, 2016

Contributor

We should not change the defaults of train_test_split. We should use the same behavior, though. And I meant the docstrings need to document the new way train_size and test_size interact.

Makes sense, I've reverted the change and updated the description accordingly.

There are a couple of errors in your issue description:

Indeed, thanks for catching that.

Now that model_selection is released we need to be aware of backwards compatibility issues. For now, if a user sets train_size without setting test_size (and train_size+test_size > 1 or n_samples) the best we can do is warn that the behaviour will change in a future release, IMO.

To clarify --- we should not implement the changes to the behavior when setting train_size without test_size, but throw a warning instead saying that it will change in future release (sort of like a deprecation)? Also, how is there a case that train_size+test_size > 1 or n_samples? Sorry if these questions have obvious answers.

Contributor

nelson-liu commented Oct 7, 2016

We should not change the defaults of train_test_split. We should use the same behavior, though. And I meant the docstrings need to document the new way train_size and test_size interact.

Makes sense, I've reverted the change and updated the description accordingly.

There are a couple of errors in your issue description:

Indeed, thanks for catching that.

Now that model_selection is released we need to be aware of backwards compatibility issues. For now, if a user sets train_size without setting test_size (and train_size+test_size > 1 or n_samples) the best we can do is warn that the behaviour will change in a future release, IMO.

To clarify --- we should not implement the changes to the behavior when setting train_size without test_size, but throw a warning instead saying that it will change in future release (sort of like a deprecation)? Also, how is there a case that train_size+test_size > 1 or n_samples? Sorry if these questions have obvious answers.

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Oct 8, 2016

Member

I'm not now sure about the details of my comment. Point is that all behaviours need to remain identical, except perhaps for a warning or extra user-specified parameters/values handled. I need to take a proper look at what's changing here to give more precise feedback but there's a huge load of issues for me to skim right now...

Member

jnothman commented Oct 8, 2016

I'm not now sure about the details of my comment. Point is that all behaviours need to remain identical, except perhaps for a warning or extra user-specified parameters/values handled. I need to take a proper look at what's changing here to give more precise feedback but there's a huge load of issues for me to skim right now...

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Nov 24, 2016

Member

I hope to look at this soon, @nelson-liu. In the meantime, resolving merge conflicts with master wouldn't hurt.

Member

jnothman commented Nov 24, 2016

I hope to look at this soon, @nelson-liu. In the meantime, resolving merge conflicts with master wouldn't hurt.

@amueller amueller added this to the 0.19 milestone Nov 30, 2016

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Nov 30, 2016

Member

@jnothman feel free to hassle me about this, too ;)

Member

amueller commented Nov 30, 2016

@jnothman feel free to hassle me about this, too ;)

@lesteve

This comment has been minimized.

Show comment
Hide comment
@lesteve

lesteve Dec 1, 2016

Member

Rebased on master to fix conflicts.

Member

lesteve commented Dec 1, 2016

Rebased on master to fix conflicts.

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Dec 1, 2016

Contributor

thanks @lesteve! not sure if this is ready to be looked at actually, I don't think i've addressed @jnothman 's latest comments wrt keeping behaviors identical...

Contributor

nelson-liu commented Dec 1, 2016

thanks @lesteve! not sure if this is ready to be looked at actually, I don't think i've addressed @jnothman 's latest comments wrt keeping behaviors identical...

@nelson-liu nelson-liu changed the title from [MRG] edit train/test_size default behavior to [WIP] edit train/test_size default behavior Dec 1, 2016

@jnothman

Unfortunately, we need to retain the behaviour that train_test_split(train_size=.5) will sample 10% (or whatever the default is) for test and 50% for train. This behaviour can change in version 0.21. We achieve this by setting test_size to some otherwise inappropriate sentinel by default, such as 'default', which behaves exactly like the current default value (0.1). In 0.21, it can change to None and behave like you specify. In the case that test_size='default' and train_size is a number, we warn that behaviour will change to one where test_size will always complement train_size unless both are specified or both are unspecified.

Show outdated Hide outdated sklearn/model_selection/_split.py
Show outdated Hide outdated sklearn/model_selection/_split.py
Show outdated Hide outdated sklearn/model_selection/_split.py
@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Dec 25, 2016

Contributor

@jnothman - let me know if i understood what to do correctly. The idea is that the current behavior shouldn't change, and thus this PR should just add the functions that detect when a user invokes behavior that will change in the future (test_size default, unspecified value, train_size some specified value) and warn them that the behavior of the code will change in 0.21. Then when 0.20dev comes along, we'll actually implement the new behavior?

If this is the case, I reverted everything to ensure that it maintains the current behavior and just warns the user appropriately.

Contributor

nelson-liu commented Dec 25, 2016

@jnothman - let me know if i understood what to do correctly. The idea is that the current behavior shouldn't change, and thus this PR should just add the functions that detect when a user invokes behavior that will change in the future (test_size default, unspecified value, train_size some specified value) and warn them that the behavior of the code will change in 0.21. Then when 0.20dev comes along, we'll actually implement the new behavior?

If this is the case, I reverted everything to ensure that it maintains the current behavior and just warns the user appropriately.

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Dec 25, 2016

Contributor

another thing i was thinking about wrt this PR: I set the default value to 'default', but should test_size='default' or the actual value appear in __repr__?

for example, in the current PR, printing a GroupShuffleSplit object will show test_size=0.2 (as opposed to test_size='default'), and printing a StratifiedShuffleSplit object will show test_size='default' (as opposed to test_size = '0.1')

Contributor

nelson-liu commented Dec 25, 2016

another thing i was thinking about wrt this PR: I set the default value to 'default', but should test_size='default' or the actual value appear in __repr__?

for example, in the current PR, printing a GroupShuffleSplit object will show test_size=0.2 (as opposed to test_size='default'), and printing a StratifiedShuffleSplit object will show test_size='default' (as opposed to test_size = '0.1')

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Dec 25, 2016

Contributor

I also reverted any changes I made to any default values --- this PR doesn't touch those at all (e.g. GroupShuffleSplit "default" test size is still 0.2, and StratifiedShuffleSplit "default" test size is still 0.1)

Contributor

nelson-liu commented Dec 25, 2016

I also reverted any changes I made to any default values --- this PR doesn't touch those at all (e.g. GroupShuffleSplit "default" test size is still 0.2, and StratifiedShuffleSplit "default" test size is still 0.1)

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Dec 25, 2016

Contributor

hmm, the circleci error looks unrelated?

Contributor

nelson-liu commented Dec 25, 2016

hmm, the circleci error looks unrelated?

@jnothman jnothman changed the title from [MRG+1] edit train/test_size default behavior to [MRG+3] edit train/test_size default behavior Jun 13, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 13, 2017

Member

Sufficient what's new:

In version 0.21, the default behavior of splitters that use the test_size and train_size parameter will change, such that specifying train_size alone will cause test_size to be the remainder.

Member

jnothman commented Jun 13, 2017

Sufficient what's new:

In version 0.21, the default behavior of splitters that use the test_size and train_size parameter will change, such that specifying train_size alone will cause test_size to be the remainder.

@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Jun 14, 2017

Codecov Report

Merging #7459 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7459      +/-   ##
==========================================
+ Coverage   95.48%   95.48%   +<.01%     
==========================================
  Files         342      342              
  Lines       61013    61033      +20     
==========================================
+ Hits        58259    58279      +20     
  Misses       2754     2754
Impacted Files Coverage Δ
sklearn/model_selection/_split.py 98.65% <100%> (+0.04%) ⬆️
sklearn/model_selection/tests/test_split.py 96% <100%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 919b4a8...fd49cb9. Read the comment docs.

codecov bot commented Jun 14, 2017

Codecov Report

Merging #7459 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7459      +/-   ##
==========================================
+ Coverage   95.48%   95.48%   +<.01%     
==========================================
  Files         342      342              
  Lines       61013    61033      +20     
==========================================
+ Hits        58259    58279      +20     
  Misses       2754     2754
Impacted Files Coverage Δ
sklearn/model_selection/_split.py 98.65% <100%> (+0.04%) ⬆️
sklearn/model_selection/tests/test_split.py 96% <100%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 919b4a8...fd49cb9. Read the comment docs.

@jnothman jnothman merged commit 1f781e6 into scikit-learn:master Jun 14, 2017

0 of 3 checks passed

ci/circleci CircleCI is running your tests
Details
continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman
Member

jnothman commented Jun 14, 2017

Thanks @nelson-liu

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Jun 14, 2017

Contributor

sorry for letting this languish @jnothman , and thanks for taking it into your own hands to patch it up.

Contributor

nelson-liu commented Jun 14, 2017

sorry for letting this languish @jnothman , and thanks for taking it into your own hands to patch it up.

@nelson-liu nelson-liu deleted the nelson-liu:edit_train_test_split_api branch Jun 14, 2017

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jun 14, 2017

Member
Member

jnothman commented Jun 14, 2017

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

dmohns added a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

@eliot1785

This comment has been minimized.

Show comment
Hide comment
@eliot1785

eliot1785 Aug 16, 2017

So what actually changed here, in practice? If the user did nothing they would have 0.75/0.25 before, and it looks like that is still the case.

eliot1785 commented Aug 16, 2017

So what actually changed here, in practice? If the user did nothing they would have 0.75/0.25 before, and it looks like that is still the case.

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Aug 16, 2017

Contributor

So what actually changed here, in practice?

No breaking behavior has been implemented in version 0.19 (hence the FutureWarning). In version 0.21, if you specify train_size without specifying test_size, it will automatically set test_size = 1 - train_size

If the user did nothing they would have 0.75/0.25 before, and it looks like that is still the case.

Yes, that is correct.

Contributor

nelson-liu commented Aug 16, 2017

So what actually changed here, in practice?

No breaking behavior has been implemented in version 0.19 (hence the FutureWarning). In version 0.21, if you specify train_size without specifying test_size, it will automatically set test_size = 1 - train_size

If the user did nothing they would have 0.75/0.25 before, and it looks like that is still the case.

Yes, that is correct.

@eliot1785

This comment has been minimized.

Show comment
Hide comment
@eliot1785

eliot1785 Aug 16, 2017

Right, so what I'm confused by is it seems like that's already the behavior. From the doc: "If None, the value is set to the complement of the train size."

eliot1785 commented Aug 16, 2017

Right, so what I'm confused by is it seems like that's already the behavior. From the doc: "If None, the value is set to the complement of the train size."

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Aug 16, 2017

Contributor

ah, test_size is not None by default (it's now a sentinel string "default", and was the float 0.1 in version 0.18). If you explicitly pass None to it, then it will complement the train size. This is the case in both 0.18 and 0.19.

Does that help clarify?

Contributor

nelson-liu commented Aug 16, 2017

ah, test_size is not None by default (it's now a sentinel string "default", and was the float 0.1 in version 0.18). If you explicitly pass None to it, then it will complement the train size. This is the case in both 0.18 and 0.19.

Does that help clarify?

@eliot1785

This comment has been minimized.

Show comment
Hide comment
@eliot1785

eliot1785 Aug 16, 2017

That helps, although I would have thought the default was the float 0.25 since the docs also say, "By default, the value is set to 0.25." Was it 0.1 only in version 0.18?

eliot1785 commented Aug 16, 2017

That helps, although I would have thought the default was the float 0.25 since the docs also say, "By default, the value is set to 0.25." Was it 0.1 only in version 0.18?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Aug 16, 2017

Member
Member

jnothman commented Aug 16, 2017

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Aug 16, 2017

Contributor

Also, note that the defaults are different for different splitting functions/classes (e.g. train_test_split does have a default of 0.25 instead of 0.1).

edit: note that the train_test_split default test_size is 0.25 only if train_size and test_size are both none (which is their default value, unlike the CVSplitters).

Contributor

nelson-liu commented Aug 16, 2017

Also, note that the defaults are different for different splitting functions/classes (e.g. train_test_split does have a default of 0.25 instead of 0.1).

edit: note that the train_test_split default test_size is 0.25 only if train_size and test_size are both none (which is their default value, unlike the CVSplitters).

@eliot1785

This comment has been minimized.

Show comment
Hide comment
@eliot1785

eliot1785 Aug 16, 2017

But you're saying it was 0.1 for one version?

eliot1785 commented Aug 16, 2017

But you're saying it was 0.1 for one version?

@nelson-liu

This comment has been minimized.

Show comment
Hide comment
@nelson-liu

nelson-liu Aug 16, 2017

Contributor

The default test_size for BaseSplitter has been 0.1 for at least 0.17 and 0.18 (and possibly longer).

As far as i can remember, the default test_size for train_test_split has been 0.25 (it could have been changed from 0.1, though -- you can probably find out by looking at the release history)

In addition: I'm not sure what objects you're referring to by "it" --- the default behavior of the Splitter classes and train_test_split are different (and this PR touches both, so unsure which you're referring to). Hopefully I answered your question though.

Contributor

nelson-liu commented Aug 16, 2017

The default test_size for BaseSplitter has been 0.1 for at least 0.17 and 0.18 (and possibly longer).

As far as i can remember, the default test_size for train_test_split has been 0.25 (it could have been changed from 0.1, though -- you can probably find out by looking at the release history)

In addition: I'm not sure what objects you're referring to by "it" --- the default behavior of the Splitter classes and train_test_split are different (and this PR touches both, so unsure which you're referring to). Hopefully I answered your question though.

@eliot1785

This comment has been minimized.

Show comment
Hide comment
@eliot1785

eliot1785 Aug 17, 2017

Ok thanks. I just wanted to make sure the defaults for train_test_split hadn't changed recently because if so it would affect my models. It doesn't sound like it, so I'm satisfied. I guess this is a good example of the wisdom of always specifying parameters rather than relying on defaults.

eliot1785 commented Aug 17, 2017

Ok thanks. I just wanted to make sure the defaults for train_test_split hadn't changed recently because if so it would affect my models. It doesn't sound like it, so I'm satisfied. I guess this is a good example of the wisdom of always specifying parameters rather than relying on defaults.

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

AishwaryaRK added a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment