Cross-validation #49

goodfeli · 2012-03-01T15:22:44Z

Sprint assignees:

Caglar
Raoul

goodfeli · 2012-03-01T15:39:21Z

The first thing to do is to add support for monitoring both (a stochastic sample of) the training set and a validation set to monitor.py. This implies

figuring out an easy way to specify the extended parameters of monitoring (how many training set batches, how many samples per batch, how many validation set batches, how many samples per batch.
Stopping criteria need to support these new multi-channel monitors.

goodfeli · 2012-03-01T15:40:47Z

The second thing would be an "outer loop" class that runs a TrainingAlgorithm multiple times with different values of the hyperparameters. Random Search is probably easiest, followed by grid search (look at itertools module to make looping simple).

goodfeli · 2012-03-01T15:42:16Z

The third thing would be adding support to splitting a single dataset into training and validation folds. This may have to be fairly specific to individual dataset classes but could be made more general by a well-thought-out addition to the dataset API.

dwf · 2012-03-26T22:01:20Z

@caglar Can you update this ticket with the current status?

caglar · 2012-03-26T23:43:09Z

Well, my crossvalidation class is almost finished, HoldoutCrossValidation seems to work, KFoldCrossValidation has a few issues, I haven't updated the code in the remote with my latest local changes but you can have look at the recent state of cross-validation code here:

https://github.com/caglar/pylearn/blob/feature/crossval_support/pylearn2/crossval/crossval.py

I'm also trying to add a new reset_params() method to the model and its children classes, to avoid unnecessary object creation at each fold.

I've edited dense_design_matrix and added merge_datasets method, but instead I've decided to use numpy.concetanate function. I've to edit some other files that I don't remember right know. But I'll inform you about the status tomorrow.

I think Raul completed the hyper-parameter search using my crossvalidation class. But he is still working on multi-monitor staff. I'm not sure about the latest status of his monitor work. Raul talked with Ian and he told him that multi-monitor support is a separate job that is needed for early-stopping criteria.

I push my codes to here after refactoring, commenting it and fixing some points.

chandiar · 2012-03-29T14:11:10Z

Hi,

I finished the random and grid search implementation that makes use of Caglar cross-validation class. You can take a look to my code, here: http://tinyurl.com/89qs73p

I tested the search algos with the Holdout crossvalidation and I had no problems with it. When the kfold crossvalidation will be fixed, I will tested with the search algos.

As for the first part of this ticket: adding support to monitor.py for monitoring more than one datasets at the same time, Ian told me that this feature was not necessary for implementing crossvalidation but it would be good to implement it in case early-stopping will be implemented in the near future. Thus when the cross-validation ticket will be finished, I will work on adding support for monitoring more than one datasets at the same time.

However, in the first part of the ticket, they ask also to for stopping criteria to support multi-channel monitors and this I implemented it. I will push my code by Friday after adding some comments and making sure everything is working fine with the latest updates in pylearn2.

lamblin · 2013-06-18T22:45:48Z

Apparently, @chandiar 's changes did never get merged into Pylearn2.
Should we try and bring that back to life?

dwf mentioned this issue Mar 2, 2012

Make monitor channel names predictable #61

Closed

goodfeli mentioned this issue Mar 6, 2012

[sprint] Cross-validation #73

Merged

dwf closed this as completed Mar 26, 2012

dwf reopened this Mar 26, 2012

skearnes mentioned this issue Mar 28, 2014

Cross-validation and clean handling of multiple independent models #765

Merged

skearnes mentioned this issue Apr 21, 2014

Cross validation pandegroup/pylearn2#1

Merged

matt-graham mentioned this issue Feb 17, 2015

DensePNGDataset class has to have options to split dataset Neuroglycerin/neukrill-net-tools#57

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-validation #49

Cross-validation #49

goodfeli commented Mar 1, 2012

goodfeli commented Mar 1, 2012

goodfeli commented Mar 1, 2012

goodfeli commented Mar 1, 2012

dwf commented Mar 26, 2012

caglar commented Mar 26, 2012

chandiar commented Mar 29, 2012

lamblin commented Jun 18, 2013

Cross-validation #49

Cross-validation #49

Comments

goodfeli commented Mar 1, 2012

goodfeli commented Mar 1, 2012

goodfeli commented Mar 1, 2012

goodfeli commented Mar 1, 2012

dwf commented Mar 26, 2012

caglar commented Mar 26, 2012

chandiar commented Mar 29, 2012

lamblin commented Jun 18, 2013