New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold #8120

Merged
merged 13 commits into from Mar 4, 2017

Conversation

Projects
None yet
4 participants
@neerajgangwar
Contributor

neerajgangwar commented Dec 27, 2016

Reference Issue

Fixes #7948

What does this implement/fix? Explain your changes.

Implements RepeatedKFold and RepeatedStratifiedKFold

Any other comments?

For previous discussion on this, please refer to #7960

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Dec 27, 2016

Member

Do you intend to close #7960

Member

jnothman commented Dec 27, 2016

Do you intend to close #7960

@neerajgangwar neerajgangwar changed the title from Feature/repeated splits to [MRG] Repeated K-Fold and Stratified K-Fold Dec 27, 2016

@neerajgangwar neerajgangwar changed the title from [MRG] Repeated K-Fold and Stratified K-Fold to [MRG] Repeated K-Fold and Repeated Stratified K-Fold Dec 27, 2016

@neerajgangwar

This comment has been minimized.

Show comment
Hide comment
@neerajgangwar

neerajgangwar Dec 27, 2016

Contributor

Yes.

Contributor

neerajgangwar commented Dec 27, 2016

Yes.

Show outdated Hide outdated doc/modules/cross_validation.rst
>>> from sklearn.model_selection import RepeatedKFold, KFold
>>> X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]])
>>> random_states = [12883823, 28827347]
>>> rkf = RepeatedKFold(KFold(n_splits=2), n_repeats=2, random_states=random_states)

This comment has been minimized.

@jnothman

jnothman Dec 27, 2016

Member

We want a solution that can accept a single random_state. It can generate random states for each KFold instance.

I also don't get why you should accept a KFold instance to the constructor of RepeatedKFold.

@jnothman

jnothman Dec 27, 2016

Member

We want a solution that can accept a single random_state. It can generate random states for each KFold instance.

I also don't get why you should accept a KFold instance to the constructor of RepeatedKFold.

This comment has been minimized.

@neerajgangwar

neerajgangwar Dec 27, 2016

Contributor

Yes, it would be better to accept n_splits to the constructor of RepeatedKFold and create an instance of KFold inside. Likewise for RepeatedStratifiedKFold.

I'll make the changes.

@neerajgangwar

neerajgangwar Dec 27, 2016

Contributor

Yes, it would be better to accept n_splits to the constructor of RepeatedKFold and create an instance of KFold inside. Likewise for RepeatedStratifiedKFold.

I'll make the changes.

Show outdated Hide outdated sklearn/model_selection/_split.py
if n_repeats <= 1:
raise ValueError("Number of repetitions must be greater than 1.")
rng = check_random_state(random_state)

This comment has been minimized.

@jnothman

jnothman Dec 27, 2016

Member

we are not certain it's the best design, but currently all the splitters do this in split, not in __init__

@jnothman

jnothman Dec 27, 2016

Member

we are not certain it's the best design, but currently all the splitters do this in split, not in __init__

This comment has been minimized.

@neerajgangwar

neerajgangwar Dec 28, 2016

Contributor

Is there any other way to achieve this other than initializing self.random_states = [] in __init__ and generate random states when split is called for the first time? The code 946-950 will move to split inside an if condition?

@neerajgangwar

neerajgangwar Dec 28, 2016

Contributor

Is there any other way to achieve this other than initializing self.random_states = [] in __init__ and generate random states when split is called for the first time? The code 946-950 will move to split inside an if condition?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Dec 28, 2016

Member
Member

jnothman commented Dec 28, 2016

@neerajgangwar

This comment has been minimized.

Show comment
Hide comment
@neerajgangwar

neerajgangwar Dec 29, 2016

Contributor

Do you mean initialize RandomState in each split call with check_random_state and generate random states? In this case, if initial random_state is int, it will work fine as check_random_state will return RandomState with the same initial seed on every call. But if it's None, it will return RandomState with different seed on every call and if it's RandomState, it'll return the same object. In both of these cases, split will produce different splits on different calls. To generate same splits on different split calls, initial state needs to be stored somewhere probably?

Or are you referring to some other way?

Contributor

neerajgangwar commented Dec 29, 2016

Do you mean initialize RandomState in each split call with check_random_state and generate random states? In this case, if initial random_state is int, it will work fine as check_random_state will return RandomState with the same initial seed on every call. But if it's None, it will return RandomState with different seed on every call and if it's RandomState, it'll return the same object. In both of these cases, split will produce different splits on different calls. To generate same splits on different split calls, initial state needs to be stored somewhere probably?

Or are you referring to some other way?

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Dec 29, 2016

Member
Member

jnothman commented Dec 29, 2016

Show outdated Hide outdated doc/modules/cross_validation.rst
[1 2] [0 3]
[0 3] [1 2]

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

I think you should just mention RepeatStratifiedKFold here and under StratifiedKFold. Also in the "see also"s of relevant classes.

@jnothman

jnothman Dec 29, 2016

Member

I think you should just mention RepeatStratifiedKFold here and under StratifiedKFold. Also in the "see also"s of relevant classes.

Show outdated Hide outdated doc/modules/cross_validation.rst
@@ -409,6 +432,30 @@ two slightly unbalanced classes::
[0 1 3 4 5 8 9] [2 6 7]
[0 1 2 4 5 6 7] [3 8 9]
Repeated Stratified K-Fold

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

I.e. I think this is overkill

@jnothman

jnothman Dec 29, 2016

Member

I.e. I think this is overkill

Show outdated Hide outdated sklearn/model_selection/_split.py
@@ -913,6 +915,238 @@ def get_n_splits(self, X, y, groups):
return int(comb(len(np.unique(groups)), self.n_groups, exact=True))
class _RepeatedSplits(with_metaclass(ABCMeta)):
"""Repeated splits for K-Fold and Stratified K-Fold

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

"for an arbitrary randomized CV splitter"

@jnothman

jnothman Dec 29, 2016

Member

"for an arbitrary randomized CV splitter"

Show outdated Hide outdated sklearn/model_selection/_split.py
class _RepeatedSplits(with_metaclass(ABCMeta)):
"""Repeated splits for K-Fold and Stratified K-Fold
Repeats splits for cross-validators n times.

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

with different randomization

@jnothman

jnothman Dec 29, 2016

Member

with different randomization

Show outdated Hide outdated sklearn/model_selection/_split.py
return self._repeated_splits.get_n_repeats()
class RepeatedStratifiedKFold(with_metaclass(ABCMeta)):

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

Why is this an ABC?

@jnothman

jnothman Dec 29, 2016

Member

Why is this an ABC?

This comment has been minimized.

@neerajgangwar

neerajgangwar Dec 29, 2016

Contributor

Removing.

@neerajgangwar

neerajgangwar Dec 29, 2016

Contributor

Removing.

Show outdated Hide outdated sklearn/model_selection/_split.py
def __init__(self, cv, n_repeats=5, random_state=None):
if not isinstance(cv, (KFold, StratifiedKFold)):
raise ValueError(
"cv must be an instance of KFold or StratifiedKFold.")

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

why?

@jnothman

This comment has been minimized.

@neerajgangwar

neerajgangwar Dec 29, 2016

Contributor

I think only KFold and StratifiedKFold use random_state. That's why I added this check. Should I remove it? And also, is there a way to check if cv is an instance of cross-validator with randomized split functionality?

@neerajgangwar

neerajgangwar Dec 29, 2016

Contributor

I think only KFold and StratifiedKFold use random_state. That's why I added this check. Should I remove it? And also, is there a way to check if cv is an instance of cross-validator with randomized split functionality?

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

Yes, remove it.

@jnothman

jnothman Dec 29, 2016

Member

Yes, remove it.

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

It's a private class, it doesn't require such validation.

@jnothman

jnothman Dec 29, 2016

Member

It's a private class, it doesn't require such validation.

Show outdated Hide outdated sklearn/model_selection/_split.py
for train_index, test_index in cv.split(X, y, groups):
yield train_index, test_index
def get_n_repeats(self):

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

I don't think we need this. n_repeats is already an attribute.

@jnothman

jnothman Dec 29, 2016

Member

I don't think we need this. n_repeats is already an attribute.

This comment has been minimized.

@neerajgangwar

neerajgangwar Dec 29, 2016

Contributor

Removing from all classes.

@neerajgangwar

neerajgangwar Dec 29, 2016

Contributor

Removing from all classes.

Show outdated Hide outdated sklearn/model_selection/_split.py
test : ndarray
The testing set indices for that split.
"""
cv = self.cv

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

I wonder if instead we should be constructing new CV objects in here (i.e. in split). Thus _RepeatedSplits.__init__ would take a constructor for cv rather than cv.

@jnothman

jnothman Dec 29, 2016

Member

I wonder if instead we should be constructing new CV objects in here (i.e. in split). Thus _RepeatedSplits.__init__ would take a constructor for cv rather than cv.

This comment has been minimized.

@neerajgangwar

neerajgangwar Dec 29, 2016

Contributor

It might be nice as it would remove the dependency of KFold from RepeatedKFold in terms of parameters. I am thinking something like def __init__(self, cv, n_repeats=5, random_state=None, **cvargs):. It will be called as _RepeatedSplits(KFold, n_repeats, random_state, n_splits=n_splits). This is what you meant, right?

I have one doubt though. Since shuffle should always be True and random_state will be generated inside split function, would it be okay to just mention that user should not pass these arguments and one random_state that is passed does not correspond to random_state parameter of KFold?

@neerajgangwar

neerajgangwar Dec 29, 2016

Contributor

It might be nice as it would remove the dependency of KFold from RepeatedKFold in terms of parameters. I am thinking something like def __init__(self, cv, n_repeats=5, random_state=None, **cvargs):. It will be called as _RepeatedSplits(KFold, n_repeats, random_state, n_splits=n_splits). This is what you meant, right?

I have one doubt though. Since shuffle should always be True and random_state will be generated inside split function, would it be okay to just mention that user should not pass these arguments and one random_state that is passed does not correspond to random_state parameter of KFold?

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

You can have **cvargs in the _RepeatedSplits class while still only allowing specified named args in RepeatedKFold

@jnothman

jnothman Dec 29, 2016

Member

You can have **cvargs in the _RepeatedSplits class while still only allowing specified named args in RepeatedKFold

Show outdated Hide outdated sklearn/model_selection/tests/test_split.py
train, test = next(splits)
assert_array_equal(train, [0, 1, 2])
assert_array_equal(test, [3, 4])

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

please also check that a second call to split produces the same sets.

@jnothman

jnothman Dec 29, 2016

Member

please also check that a second call to split produces the same sets.

This comment has been minimized.

@jnothman

jnothman Jan 1, 2017

Member

add a comment to explain why this is repeated.
perhaps use a loop or a helper function to avoid duplicated code.

@jnothman

jnothman Jan 1, 2017

Member

add a comment to explain why this is repeated.
perhaps use a loop or a helper function to avoid duplicated code.

This comment has been minimized.

@jnothman

jnothman Jan 1, 2017

Member

you also don't check here that the iterator is exhausted after 4 elements

@jnothman

jnothman Jan 1, 2017

Member

you also don't check here that the iterator is exhausted after 4 elements

Show outdated Hide outdated sklearn/model_selection/tests/test_split.py
def test_repeated_stratified_kfold_errors():
# n_repeats is not integer or <= 1
assert_raises(ValueError, RepeatedStratifiedKFold, n_repeats=1)

This comment has been minimized.

@jnothman

jnothman Dec 29, 2016

Member

do this in a loop together with the RepeatedKFold case.

@jnothman

jnothman Dec 29, 2016

Member

do this in a loop together with the RepeatedKFold case.

Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _…
…RepeatedSplits and other review changes
@jnothman

This is looking much better, thanks!

Show outdated Hide outdated sklearn/model_selection/_split.py
See also
--------
RepeatedStratifiedKFold: Repeats Stratified K-Fold n times.

This comment has been minimized.

@jnothman

jnothman Jan 1, 2017

Member

please remove blank line

@jnothman

jnothman Jan 1, 2017

Member

please remove blank line

Show outdated Hide outdated sklearn/model_selection/tests/test_split.py
train, test = next(splits)
assert_array_equal(train, [0, 1, 2])
assert_array_equal(test, [3, 4])

This comment has been minimized.

@jnothman

jnothman Jan 1, 2017

Member

add a comment to explain why this is repeated.
perhaps use a loop or a helper function to avoid duplicated code.

@jnothman

jnothman Jan 1, 2017

Member

add a comment to explain why this is repeated.
perhaps use a loop or a helper function to avoid duplicated code.

Show outdated Hide outdated sklearn/model_selection/tests/test_split.py
train, test = next(splits)
assert_array_equal(train, [0, 1, 2])
assert_array_equal(test, [3, 4])

This comment has been minimized.

@jnothman

jnothman Jan 1, 2017

Member

you also don't check here that the iterator is exhausted after 4 elements

@jnothman

jnothman Jan 1, 2017

Member

you also don't check here that the iterator is exhausted after 4 elements

@neerajgangwar

This comment has been minimized.

Show comment
Hide comment
@neerajgangwar

neerajgangwar Jan 1, 2017

Contributor

Thanks @jnothman for the review. And a very happy new year :)

Contributor

neerajgangwar commented Jan 1, 2017

Thanks @jnothman for the review. And a very happy new year :)

@jnothman

This comment has been minimized.

Show comment
Hide comment
@jnothman

jnothman Jan 1, 2017

Member

LGTM!

Member

jnothman commented Jan 1, 2017

LGTM!

@jnothman jnothman changed the title from [MRG] Repeated K-Fold and Repeated Stratified K-Fold to [MRG+1] Repeated K-Fold and Repeated Stratified K-Fold Jan 1, 2017

raise ValueError("Number of repetitions must be of Integral type.")
if n_repeats <= 1:
raise ValueError("Number of repetitions must be greater than 1.")

This comment has been minimized.

@tguillemot

tguillemot Jan 18, 2017

Contributor

Never check values in __init__. Move it to split.

@tguillemot

tguillemot Jan 18, 2017

Contributor

Never check values in __init__. Move it to split.

This comment has been minimized.

@neerajgangwar

neerajgangwar Jan 19, 2017

Contributor

Shouldn't error be thrown at the construction time if there is some discrepancy with the parameters passed? In _BaseKFold also, values are checked in __init__.

@neerajgangwar

neerajgangwar Jan 19, 2017

Contributor

Shouldn't error be thrown at the construction time if there is some discrepancy with the parameters passed? In _BaseKFold also, values are checked in __init__.

This comment has been minimized.

@tguillemot

tguillemot Jan 19, 2017

Contributor

In sklearn for the estimator, we never check the error in init because of set_params but these classes are not estimators. I imagine this rule is not applied here.

@jnothman As I'm not 100% sure can you confirm that ?

@tguillemot

tguillemot Jan 19, 2017

Contributor

In sklearn for the estimator, we never check the error in init because of set_params but these classes are not estimators. I imagine this rule is not applied here.

@jnothman As I'm not 100% sure can you confirm that ?

This comment has been minimized.

@jnothman

jnothman Jan 20, 2017

Member

For now, at least, CV splitters are a bit special in this regard. Checking in __init__ is consistent with other splitters.

@jnothman

jnothman Jan 20, 2017

Member

For now, at least, CV splitters are a bit special in this regard. Checking in __init__ is consistent with other splitters.

This comment has been minimized.

@tguillemot

tguillemot Jan 20, 2017

Contributor

ok thx @jnothman

@tguillemot
**cvargs : additional params
Constructor parameters for cv. Must not contain random_state
and shuffle.

This comment has been minimized.

@tguillemot

tguillemot Jan 18, 2017

Contributor

Not an obligation as _RepeatedSplits is private but can you raise an error in split to check that ?

@tguillemot

tguillemot Jan 18, 2017

Contributor

Not an obligation as _RepeatedSplits is private but can you raise an error in split to check that ?

Show outdated Hide outdated sklearn/model_selection/_split.py
rng = check_random_state(self.random_state)
for idx in range(n_repeats):
random_state = rng.randint(np.iinfo(np.int32).max)

This comment has been minimized.

@tguillemot

tguillemot Jan 18, 2017

Contributor

Maybe I'm missing something but why directly send rng to random_state ?

@tguillemot

tguillemot Jan 18, 2017

Contributor

Maybe I'm missing something but why directly send rng to random_state ?

This comment has been minimized.

@neerajgangwar

neerajgangwar Jan 19, 2017

Contributor

Integer random state generated by rng is sent as random_state. Do you have any other way in mind?

@neerajgangwar

neerajgangwar Jan 19, 2017

Contributor

Integer random state generated by rng is sent as random_state. Do you have any other way in mind?

This comment has been minimized.

@tguillemot

tguillemot Jan 19, 2017

Contributor

Sorry I was not clear.
Can you remove the line random_state = rng.randint(np.iinfo(np.int32).max) and change random_state by rng later?

@tguillemot

tguillemot Jan 19, 2017

Contributor

Sorry I was not clear.
Can you remove the line random_state = rng.randint(np.iinfo(np.int32).max) and change random_state by rng later?

This comment has been minimized.

@neerajgangwar

neerajgangwar Jan 19, 2017

Contributor

Do you mean after creating the object for cv? If yes, how would it make a difference?

@neerajgangwar

neerajgangwar Jan 19, 2017

Contributor

Do you mean after creating the object for cv? If yes, how would it make a difference?

This comment has been minimized.

@tguillemot

tguillemot Jan 19, 2017

Contributor

I mean : cv = self.cv(random_state=rng, shuffle=True, **self.cvargs)

@tguillemot

tguillemot Jan 19, 2017

Contributor

I mean : cv = self.cv(random_state=rng, shuffle=True, **self.cvargs)

This comment has been minimized.

@tguillemot

tguillemot Jan 19, 2017

Contributor

Your code is similar to the code following :

In [1]: from sklearn.utils import check_random_state

In [2]: class Foo:
   ...:     def __init__(self, random_state):
   ...:         self.rng = check_random_state(random_state)
   ...:         
   ...:     def fit(self):
   ...:         print(self.rng.randint(1000))
   ...:         

In [3]: rng = check_random_state(0)

In [4]: f1 = Foo(rng)

In [5]: f2 = Foo(rng)

In [6]: f1.fit()
684

In [7]: f1.fit()
559

In [8]: f2.fit()
629

In [9]: f2.fit()
192

rng is an object and it will be modified every time you call cv.split. So for me is not necessary to generate a specific random_state at each iteration. Maybe I am missing something ?

@tguillemot

tguillemot Jan 19, 2017

Contributor

Your code is similar to the code following :

In [1]: from sklearn.utils import check_random_state

In [2]: class Foo:
   ...:     def __init__(self, random_state):
   ...:         self.rng = check_random_state(random_state)
   ...:         
   ...:     def fit(self):
   ...:         print(self.rng.randint(1000))
   ...:         

In [3]: rng = check_random_state(0)

In [4]: f1 = Foo(rng)

In [5]: f2 = Foo(rng)

In [6]: f1.fit()
684

In [7]: f1.fit()
559

In [8]: f2.fit()
629

In [9]: f2.fit()
192

rng is an object and it will be modified every time you call cv.split. So for me is not necessary to generate a specific random_state at each iteration. Maybe I am missing something ?

This comment has been minimized.

@tguillemot

tguillemot Jan 19, 2017

Contributor

check_random_state does not create a copy of rng if rng is a random_state.

@tguillemot

tguillemot Jan 19, 2017

Contributor

check_random_state does not create a copy of rng if rng is a random_state.

This comment has been minimized.

@neerajgangwar

neerajgangwar Jan 19, 2017

Contributor

I can't find any case which we'll miss by this approach. But I think current implementation keeps the use of random_state clean. I am not really sure.

@jnothman thoughts?

@neerajgangwar

neerajgangwar Jan 19, 2017

Contributor

I can't find any case which we'll miss by this approach. But I think current implementation keeps the use of random_state clean. I am not really sure.

@jnothman thoughts?

This comment has been minimized.

@jnothman

jnothman Jan 20, 2017

Member

I think passing rng directly should be okay. If it's not okay, we need to be able to construct a test case that proves so!

@jnothman

jnothman Jan 20, 2017

Member

I think passing rng directly should be okay. If it's not okay, we need to be able to construct a test case that proves so!

This comment has been minimized.

@neerajgangwar

neerajgangwar Jan 20, 2017

Contributor

I am not able to find any such testcase. So making the changes. Thanks!

@neerajgangwar

neerajgangwar Jan 20, 2017

Contributor

I am not able to find any such testcase. So making the changes. Thanks!

neerajgangwar added some commits Jan 20, 2017

@tguillemot

LGTM.

Circle seems unrelated.

Show outdated Hide outdated sklearn/model_selection/_split.py
Parameters
----------
n_splits : int, default=3

This comment has been minimized.

@amueller

amueller Feb 19, 2017

Member

This is consistent with the other estimators but seem pretty useless in practice. I would make 10 times 10-fold the default. Why would you want to do 3x5 instead of 10 fold?

@amueller

amueller Feb 19, 2017

Member

This is consistent with the other estimators but seem pretty useless in practice. I would make 10 times 10-fold the default. Why would you want to do 3x5 instead of 10 fold?

This comment has been minimized.

@neerajgangwar

neerajgangwar Feb 24, 2017

Contributor

Changing it to 5x10 (n_splits x n_repeats) by default. Is that fine?

@neerajgangwar

neerajgangwar Feb 24, 2017

Contributor

Changing it to 5x10 (n_splits x n_repeats) by default. Is that fine?

This comment has been minimized.

@amueller

amueller Mar 4, 2017

Member

yeah

@amueller
@amueller

Looks good apart from minor changes, in particular default parameters. How do others feel about adding this to check_cv? We could have a mxn syntax to have m repetitions of n-fold with automatically detecting stratified vs not, so you could do cv="10x10". Not entirely sure about that though.

---------------
:class:`RepeatedKFold` repeats K-Fold n times. It can be used when one
requires to run :class:`KFold` n times, producing different splits in

This comment has been minimized.

@amueller

amueller Feb 19, 2017

Member

I would say "it can be used to run KFold multiple times to increase the fidelity of the estimate? Or can we say to decrease the variance? Is that accurate?

@amueller

amueller Feb 19, 2017

Member

I would say "it can be used to run KFold multiple times to increase the fidelity of the estimate? Or can we say to decrease the variance? Is that accurate?

Show outdated Hide outdated sklearn/model_selection/_split.py
Parameters
----------
cv : callable
Constructor of cross-validator.

This comment has been minimized.

@amueller

amueller Feb 19, 2017

Member

Isn't this the cross validation class itself? That seems more natural than passing the __init__ method.

@amueller

amueller Feb 19, 2017

Member

Isn't this the cross validation class itself? That seems more natural than passing the __init__ method.

This comment has been minimized.

@neerajgangwar

neerajgangwar Feb 24, 2017

Contributor

We are not passing the __init__ method. It's called as RepeatedSplits(KFold, n_repeats, random_state, n_splits=n_splits). I am changing description to "Cross-validator class.".

@neerajgangwar

neerajgangwar Feb 24, 2017

Contributor

We are not passing the __init__ method. It's called as RepeatedSplits(KFold, n_repeats, random_state, n_splits=n_splits). I am changing description to "Cross-validator class.".

Show outdated Hide outdated sklearn/model_selection/_split.py
cv : callable
Constructor of cross-validator.
n_repeats : int, default=5

This comment has been minimized.

@amueller

amueller Feb 19, 2017

Member

I would probably do 10x10 by default, or maybe 5 times 10 fold. This is something you use when you care about accuracy but not necessarily time.

@amueller

amueller Feb 19, 2017

Member

I would probably do 10x10 by default, or maybe 5 times 10 fold. This is something you use when you care about accuracy but not necessarily time.

This comment has been minimized.

@neerajgangwar

neerajgangwar Feb 24, 2017

Contributor

Changing default values of n_splits to 5 and n_repeats to 10.

@neerajgangwar

neerajgangwar Feb 24, 2017

Contributor

Changing default values of n_splits to 5 and n_repeats to 10.

if n_repeats <= 1:
raise ValueError("Number of repetitions must be greater than 1.")
if any(key in cvargs for key in ('random_state', 'shuffle')):

This comment has been minimized.

@amueller

amueller Feb 19, 2017

Member

if set(cvargs).intersection({'random_state', 'shuffle'})? Though not really shorter :-/

@amueller

amueller Feb 19, 2017

Member

if set(cvargs).intersection({'random_state', 'shuffle'})? Though not really shorter :-/

This comment has been minimized.

@neerajgangwar

neerajgangwar Feb 24, 2017

Contributor

Keeping the same as both are of equal length. :P

@neerajgangwar

neerajgangwar Feb 24, 2017

Contributor

Keeping the same as both are of equal length. :P

rng = check_random_state(self.random_state)
for idx in range(n_repeats):
cv = self.cv(random_state=rng, shuffle=True,

This comment has been minimized.

@amueller

amueller Feb 19, 2017

Member

Do we maybe want to raise nice errors if these arguments are not present?

@amueller

amueller Feb 19, 2017

Member

Do we maybe want to raise nice errors if these arguments are not present?

This comment has been minimized.

@neerajgangwar

neerajgangwar Feb 24, 2017

Contributor

I didn't get you. Which arguments?

@neerajgangwar

neerajgangwar Feb 24, 2017

Contributor

I didn't get you. Which arguments?

This comment has been minimized.

@amueller

amueller Mar 4, 2017

Member

Honestly I have no idea what I meant.... ?

@amueller

amueller Mar 4, 2017

Member

Honestly I have no idea what I meant.... ?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Mar 4, 2017

Member

LGTM. Can you add an entry to whatsnew.rst?

Member

amueller commented Mar 4, 2017

LGTM. Can you add an entry to whatsnew.rst?

@neerajgangwar

This comment has been minimized.

Show comment
Hide comment
@neerajgangwar

neerajgangwar Mar 4, 2017

Contributor

@amueller Conflict in doc/whats_new.rst. Help?

Contributor

neerajgangwar commented Mar 4, 2017

@amueller Conflict in doc/whats_new.rst. Help?

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Mar 4, 2017

Member

Fixed it (which you could have also done locally ;) omg this github feature is amazing !!

Member

amueller commented Mar 4, 2017

Fixed it (which you could have also done locally ;) omg this github feature is amazing !!

@amueller

This comment has been minimized.

Show comment
Hide comment
@amueller

amueller Mar 4, 2017

Member

It's customary to add your username to the entry, but you don't have to.

Member

amueller commented Mar 4, 2017

It's customary to add your username to the entry, but you don't have to.

@codecov

This comment has been minimized.

Show comment
Hide comment
@codecov

codecov bot Mar 4, 2017

Codecov Report

Merging #8120 into master will increase coverage by <.01%.
The diff coverage is 98.61%.

@@            Coverage Diff             @@
##           master    #8120      +/-   ##
==========================================
+ Coverage   95.48%   95.48%   +<.01%     
==========================================
  Files         342      342              
  Lines       60913    60985      +72     
==========================================
+ Hits        58160    58231      +71     
- Misses       2753     2754       +1
Impacted Files Coverage Δ
sklearn/model_selection/tests/test_split.py 95.96% <100%> (+0.25%)
sklearn/model_selection/init.py 100% <100%> (ø)
sklearn/model_selection/_split.py 98.6% <96%> (-0.17%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75c892c...338997e. Read the comment docs.

codecov bot commented Mar 4, 2017

Codecov Report

Merging #8120 into master will increase coverage by <.01%.
The diff coverage is 98.61%.

@@            Coverage Diff             @@
##           master    #8120      +/-   ##
==========================================
+ Coverage   95.48%   95.48%   +<.01%     
==========================================
  Files         342      342              
  Lines       60913    60985      +72     
==========================================
+ Hits        58160    58231      +71     
- Misses       2753     2754       +1
Impacted Files Coverage Δ
sklearn/model_selection/tests/test_split.py 95.96% <100%> (+0.25%)
sklearn/model_selection/init.py 100% <100%> (ø)
sklearn/model_selection/_split.py 98.6% <96%> (-0.17%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 75c892c...338997e. Read the comment docs.

@neerajgangwar

This comment has been minimized.

Show comment
Hide comment
@neerajgangwar

neerajgangwar Mar 4, 2017

Contributor

@amueller Added my name :)

Contributor

neerajgangwar commented Mar 4, 2017

@amueller Added my name :)

@amueller amueller merged commit af1796e into scikit-learn:master Mar 4, 2017

1 of 3 checks passed

continuous-integration/appveyor/pr Waiting for AppVeyor build to complete
Details
continuous-integration/travis-ci/pr The Travis CI build is in progress
Details
ci/circleci Your tests passed on CircleCI!
Details

@neerajgangwar neerajgangwar deleted the neerajgangwar:feature/repeated-splits branch Mar 5, 2017

@Przemo10 Przemo10 referenced this pull request Mar 17, 2017

Closed

update fork (#1) #8606

herilalaina added a commit to herilalaina/scikit-learn that referenced this pull request Mar 26, 2017

[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold (#8120)
* Add _RepeatedSplits and RepeatedKFold class

* Add RepeatedStratifiedKFold and doc for repeated cvs

* Change default value of n_repeats

* Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state

* Generate random states in split function rather than store it beforehand

* Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes

* Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase

* Using rng directly as random_state param to create cv instance and added a check for cvargs

* Fix pep8 warnings

* Changing default values for n_splits and n_repeats and add entry in changelog

* Adding name to the feature

* Missing space

massich added a commit to massich/scikit-learn that referenced this pull request Apr 26, 2017

[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold (#8120)
* Add _RepeatedSplits and RepeatedKFold class

* Add RepeatedStratifiedKFold and doc for repeated cvs

* Change default value of n_repeats

* Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state

* Generate random states in split function rather than store it beforehand

* Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes

* Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase

* Using rng directly as random_state param to create cv instance and added a check for cvargs

* Fix pep8 warnings

* Changing default values for n_splits and n_repeats and add entry in changelog

* Adding name to the feature

* Missing space

Sundrique added a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017

[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold (#8120)
* Add _RepeatedSplits and RepeatedKFold class

* Add RepeatedStratifiedKFold and doc for repeated cvs

* Change default value of n_repeats

* Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state

* Generate random states in split function rather than store it beforehand

* Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes

* Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase

* Using rng directly as random_state param to create cv instance and added a check for cvargs

* Fix pep8 warnings

* Changing default values for n_splits and n_repeats and add entry in changelog

* Adding name to the feature

* Missing space

NelleV added a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017

[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold (#8120)
* Add _RepeatedSplits and RepeatedKFold class

* Add RepeatedStratifiedKFold and doc for repeated cvs

* Change default value of n_repeats

* Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state

* Generate random states in split function rather than store it beforehand

* Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes

* Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase

* Using rng directly as random_state param to create cv instance and added a check for cvargs

* Fix pep8 warnings

* Changing default values for n_splits and n_repeats and add entry in changelog

* Adding name to the feature

* Missing space

paulha added a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold (#8120)
* Add _RepeatedSplits and RepeatedKFold class

* Add RepeatedStratifiedKFold and doc for repeated cvs

* Change default value of n_repeats

* Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state

* Generate random states in split function rather than store it beforehand

* Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes

* Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase

* Using rng directly as random_state param to create cv instance and added a check for cvargs

* Fix pep8 warnings

* Changing default values for n_splits and n_repeats and add entry in changelog

* Adding name to the feature

* Missing space

maskani-moh added a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold (#8120)
* Add _RepeatedSplits and RepeatedKFold class

* Add RepeatedStratifiedKFold and doc for repeated cvs

* Change default value of n_repeats

* Change input parameters of repeated cv constructor to n_splits, n_repeats, random_state

* Generate random states in split function rather than store it beforehand

* Doc changes, inheriting RepeatedKFold, RepeatedStratifiedKFold from _RepeatedSplits and other review changes

* Remove blank line, put testcases for deterministic split in loop and add StopIteration check in testcase

* Using rng directly as random_state param to create cv instance and added a check for cvargs

* Fix pep8 warnings

* Changing default values for n_splits and n_repeats and add entry in changelog

* Adding name to the feature

* Missing space
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment