Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
[MRG+1] Repeated K-Fold and Repeated Stratified K-Fold #8120
changed the title from
[MRG] Repeated K-Fold and Stratified K-Fold
Dec 27, 2016
you don't need to store the random states, just generate them from the initial random state in split.…
On 28 December 2016 at 17:06, Neeraj Gangwar ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sklearn/model_selection/_split.py <#8120>: > + random_state : None, int or RandomState, default=None + Random state to be used to generate random state for each + repetition. + """ + def __init__(self, cv, n_repeats=5, random_state=None): + if not isinstance(cv, (KFold, StratifiedKFold)): + raise ValueError( + "cv must be an instance of KFold or StratifiedKFold.") + + if not isinstance(n_repeats, (np.integer, numbers.Integral)): + raise ValueError("Number of repetitions must be of Integral type.") + + if n_repeats <= 1: + raise ValueError("Number of repetitions must be greater than 1.") + + rng = check_random_state(random_state) Is there any other way to achieve this other than initializing self.random_states =  in __init__ and generate random states when split is called for the first time? The code 946-950 will move to split inside an if condition? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8120>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz61GmYoTB4NCB-HyVXSsgCpyWImAsks5rMfxwgaJpZM4LWF1S> .
Do you mean initialize
Or are you referring to some other way?
Currently KFold with shuffle=True will generate different splits on different calls to split, will it not? This should behave the same.…
On 29 December 2016 at 14:54, Neeraj Gangwar ***@***.***> wrote: Do you mean initialize RandomState in each split call with check_random_state and generate random states? In this case, if initial random_state is int, it will work fine as check_random_state will return RandomState with the same initial seed on every call. But if it's None, it will return RandomState with different seed on every call and if it's RandomState, it'll return the same object. In both of these cases, split will produce different splits on different calls. To generate same splits on different split calls, initial state needs to be stored somewhere probably? Or are you referring to some other way? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#8120 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz67TFHjdy2Vo5fEV2HOdLZ_VqDUOpks5rMy7lgaJpZM4LWF1S> .
Looks good apart from minor changes, in particular default parameters. How do others feel about adding this to
check_cv? We could have a
mxn syntax to have m repetitions of n-fold with automatically detecting stratified vs not, so you could do
cv="10x10". Not entirely sure about that though.
@@ Coverage Diff @@ ## master #8120 +/- ## ========================================== + Coverage 95.48% 95.48% +<.01% ========================================== Files 342 342 Lines 60913 60985 +72 ========================================== + Hits 58160 58231 +71 - Misses 2753 2754 +1