[MRG+3] edit train/test_size default behavior #7459
Conversation
83a7a05
to
7ea9a8e
Please don't modify cross_validation. On 20 September 2016 at 13:35, Nelson Liu notifications@github.com wrote:
|
ok, thanks. |
This needs a rebase on master to fix the conflicts. |
f77bdc6
to
faa3124
Done, thanks for letting me know. Didn't notice / realize someone else was working on the file as well :) |
this is ready to be looked at, missing a whatsnew though (I'd like to verify the current behavior is correct before writing it). Among the changes discussed earlier, |
I expect the docs of |
right, sorry for omitting that @amueller . I've actually never "deprecated" by changing the default parameter of a function before, would you mind pointing me to an example / explaining how to do it? It seems to be missing in the contributing docs as well, so I'd be happy to add it there. |
I'm not sure we should actually change the default from 0.2 to 0.1 in that case. For reference, the deprecation documentation is here: |
I'm always kinda confused by the fact that the "Contributing" link on the help page doesn't go to contributing, but to developer.... |
hmm, would
be clear? I feel like there's a better way to word it... |
Usually we just say "The default value of (..) will change from .. to .. in version 0.20" and the |
@amueller we also need a deprecation for also, by |
also I edited the default parameters taken by the |
We should not change the defaults of |
There are a couple of errors in your issue description:
you mean the opposite
I think you mean less than Now that |
Makes sense, I've reverted the change and updated the description accordingly.
Indeed, thanks for catching that.
To clarify --- we should not implement the changes to the behavior when setting train_size without test_size, but throw a warning instead saying that it will change in future release (sort of like a deprecation)? Also, how is there a case that |
I'm not now sure about the details of my comment. Point is that all behaviours need to remain identical, except perhaps for a warning or extra user-specified parameters/values handled. I need to take a proper look at what's changing here to give more precise feedback but there's a huge load of issues for me to skim right now... |
I hope to look at this soon, @nelson-liu. In the meantime, resolving merge conflicts with master wouldn't hurt. |
@jnothman feel free to hassle me about this, too ;) |
0c53fda
to
898ab03
Rebased on master to fix conflicts. |
Unfortunately, we need to retain the behaviour that |
test_size : float, int, or None, default None | ||
If float, should be between 0.0 and 1.0 and represent the proportion | ||
of the dataset to include in the test split. If int, represents the | ||
absolute number of test samples. If None, and `train_size` is None, |
jnothman
Dec 8, 2016
Member
double backticks
double backticks
If float, should be between 0.0 and 1.0 and represent the proportion | ||
of the dataset to include in the test split. If int, represents the | ||
absolute number of test samples. If None, and `train_size` is None, | ||
the value is set to 0.1. If None and `train_size` is not None, the |
jnothman
Dec 8, 2016
Member
double backticks here too
double backticks here too
random_state=None): | ||
if test_size is None and train_size is None: | ||
warnings.warn("The default value of the test_size parameter" | ||
"will change from 0.1 to 0.2 in version 0.21.", |
jnothman
Dec 8, 2016
Member
0.2 is already the default, is it not??
0.2 is already the default, is it not??
@jnothman - let me know if i understood what to do correctly. The idea is that the current behavior shouldn't change, and thus this PR should just add the functions that detect when a user invokes behavior that will change in the future ( If this is the case, I reverted everything to ensure that it maintains the current behavior and just warns the user appropriately. |
Sufficient what's new: In version 0.21, the default behavior of splitters that use the |
Codecov Report
@@ Coverage Diff @@
## master #7459 +/- ##
==========================================
+ Coverage 95.48% 95.48% +<.01%
==========================================
Files 342 342
Lines 61013 61033 +20
==========================================
+ Hits 58259 58279 +20
Misses 2754 2754
Continue to review full report at Codecov.
|
Thanks @nelson-liu |
sorry for letting this languish @jnothman , and thanks for taking it into your own hands to patch it up. |
No worries! I'm glad it is in and we can stop having people frustrated by
it.
…On 15 June 2017 at 08:41, Nelson Liu ***@***.***> wrote:
sorry for letting this languish @jnothman <https://github.com/jnothman> ,
and thanks for taking it into your own hands to patch it up.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#7459 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6_4J-x0YhQAiI_EelADBaO31QJZnks5sEGGrgaJpZM4KBMM_>
.
|
So what actually changed here, in practice? If the user did nothing they would have 0.75/0.25 before, and it looks like that is still the case. |
No breaking behavior has been implemented in version
Yes, that is correct. |
Right, so what I'm confused by is it seems like that's already the behavior. From the doc: "If None, the value is set to the complement of the train size." |
ah, Does that help clarify? |
That helps, although I would have thought the default was the float 0.25 since the docs also say, "By default, the value is set to 0.25." Was it 0.1 only in version 0.18? |
defaults and their documentation were inconsistent in 0.18 iirc.
|
Also, note that the defaults are different for different splitting functions/classes (e.g. edit: note that the |
But you're saying it was 0.1 for one version? |
The default As far as i can remember, the default In addition: I'm not sure what objects you're referring to by "it" --- the default behavior of the Splitter classes and |
Ok thanks. I just wanted to make sure the defaults for train_test_split hadn't changed recently because if so it would affect my models. It doesn't sound like it, so I'm satisfied. I guess this is a good example of the wisdom of always specifying parameters rather than relying on defaults. |
Reference Issue
Fixes #5948, #4618
What does this implement/fix? Explain your changes.
Changes the default behavior of
train_size
andtest_size
in splitters.The defaults for both parameters are now
None
train_size
andtest_size
are bothNone
,train_size
is set to0.9
andtest_size
is set to0.1
train_size
ortest_size
are set, then the value of the unset parameter isn_samples - set_parameter
(the complement)n_samples
and respect that the user wants to subsample the datasetAny other comments?
I'm aware that the
cross_validation
module is being deprecated; does this mean that I shouldn't add these changes to it and only tomodel_selection
?TODO
This change is