Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cross-validation #49

Open
goodfeli opened this issue Mar 1, 2012 · 7 comments
Open

Cross-validation #49

goodfeli opened this issue Mar 1, 2012 · 7 comments

Comments

@goodfeli
Copy link
Contributor

goodfeli commented Mar 1, 2012

Sprint assignees:

  • Caglar
  • Raoul
@goodfeli
Copy link
Contributor Author

goodfeli commented Mar 1, 2012

The first thing to do is to add support for monitoring both (a stochastic sample of) the training set and a validation set to monitor.py. This implies

  • figuring out an easy way to specify the extended parameters of monitoring (how many training set batches, how many samples per batch, how many validation set batches, how many samples per batch.
  • Stopping criteria need to support these new multi-channel monitors.

@goodfeli
Copy link
Contributor Author

goodfeli commented Mar 1, 2012

The second thing would be an "outer loop" class that runs a TrainingAlgorithm multiple times with different values of the hyperparameters. Random Search is probably easiest, followed by grid search (look at itertools module to make looping simple).

@goodfeli
Copy link
Contributor Author

goodfeli commented Mar 1, 2012

The third thing would be adding support to splitting a single dataset into training and validation folds. This may have to be fairly specific to individual dataset classes but could be made more general by a well-thought-out addition to the dataset API.

@dwf
Copy link
Contributor

dwf commented Mar 26, 2012

@caglar Can you update this ticket with the current status?

@caglar
Copy link
Contributor

caglar commented Mar 26, 2012

Well, my crossvalidation class is almost finished, HoldoutCrossValidation seems to work, KFoldCrossValidation has a few issues, I haven't updated the code in the remote with my latest local changes but you can have look at the recent state of cross-validation code here:

https://github.com/caglar/pylearn/blob/feature/crossval_support/pylearn2/crossval/crossval.py

I'm also trying to add a new reset_params() method to the model and its children classes, to avoid unnecessary object creation at each fold.

I've edited dense_design_matrix and added merge_datasets method, but instead I've decided to use numpy.concetanate function. I've to edit some other files that I don't remember right know. But I'll inform you about the status tomorrow.

I think Raul completed the hyper-parameter search using my crossvalidation class. But he is still working on multi-monitor staff. I'm not sure about the latest status of his monitor work. Raul talked with Ian and he told him that multi-monitor support is a separate job that is needed for early-stopping criteria.

I push my codes to here after refactoring, commenting it and fixing some points.

@chandiar
Copy link
Contributor

Hi,

I finished the random and grid search implementation that makes use of Caglar cross-validation class. You can take a look to my code, here: http://tinyurl.com/89qs73p

I tested the search algos with the Holdout crossvalidation and I had no problems with it. When the kfold crossvalidation will be fixed, I will tested with the search algos.

As for the first part of this ticket: adding support to monitor.py for monitoring more than one datasets at the same time, Ian told me that this feature was not necessary for implementing crossvalidation but it would be good to implement it in case early-stopping will be implemented in the near future. Thus when the cross-validation ticket will be finished, I will work on adding support for monitoring more than one datasets at the same time.

However, in the first part of the ticket, they ask also to for stopping criteria to support multi-channel monitors and this I implemented it. I will push my code by Friday after adding some comments and making sure everything is working fine with the latest updates in pylearn2.

@lamblin
Copy link
Member

lamblin commented Jun 18, 2013

Apparently, @chandiar 's changes did never get merged into Pylearn2.
Should we try and bring that back to life?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants