data-independent CV iterators

In many situations, you don't have a test set so you would like to use CV for _both_ evaluation and hyper-parameter tuning. Therefore, you need to do nested cross-validation:

``` python
for train, test in cv1:
   # Find the best hyper-parameters for this split
    for train, val in cv2:
        [...]
   # Retrain using the best hyper-parameters
   [...]
# Return best scores for each split
```

This is very difficult to implement in a generic way with our current API because CV iterators are tied to a particular data. For example, when doing `cv = KFold(n_samples)`, `cv` will only work with a dataset of the specified size. 

Ideally, we would need something closer to the estimator API: use constructor parameters for data-independent options (n_folds, shuffle, random_state, train / test proportion, etc) and a `run` method that takes `y` as argument (the reason to take `y` is for stratified schemes). This would look something like this:

``` python
# deprecated usage
for train, test in KFold(n, n_folds):
    print train, test

# new usage
for train, test in KFold(n_folds).run(y):
    print train, test
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

data-independent CV iterators #2904

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

data-independent CV iterators #2904

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions