Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resume training from checkpoints #33

Closed
gwern opened this issue Mar 13, 2016 · 5 comments
Closed

Resume training from checkpoints #33

gwern opened this issue Mar 13, 2016 · 5 comments

Comments

@gwern
Copy link

gwern commented Mar 13, 2016

One feature in char-rnn is the ability to take a .t7 checkpoint and resume training with it. This is very helpful for recovering from crashes, OOMs, system shutdowns (say, from a cat stepping on one's laptop power button several days into training), accidental C-cs, etc. It can also be useful for training on one dataset and then training on another.

cmd:option('-init_from', '', 'initialize network parameters from checkpoint at this path')

(Yes, I know I can use char-rnn, but I like torch-rnn's efficiency and this seems to be the main feature I'm missing from char-rnn at the moment.)

@jcjohnson
Copy link
Owner

This shouldn't be too hard to add - I should be able to implement this next week some time.

@gwern
Copy link
Author

gwern commented Mar 13, 2016

I didn't think it would be, and that you understandably just hadn't gotten around to it yet since no one had complained. (I thought about trying to copy over the char-rnn init code and adapting it myself, but I'd probably screw it up.)

@jcjohnson
Copy link
Owner

It should actually be a lot simpler than char-rnn, since there aren't any clones or weight sharing to worry about.

@jcjohnson
Copy link
Owner

Implemented in 5641d23 with a 9 line diff =)

@gwern
Copy link
Author

gwern commented Mar 18, 2016

I've given it a spin, and it looks like it's working. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants