-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resume training from checkpoints #33
Comments
This shouldn't be too hard to add - I should be able to implement this next week some time. |
I didn't think it would be, and that you understandably just hadn't gotten around to it yet since no one had complained. (I thought about trying to copy over the char-rnn init code and adapting it myself, but I'd probably screw it up.) |
It should actually be a lot simpler than char-rnn, since there aren't any clones or weight sharing to worry about. |
Implemented in 5641d23 with a 9 line diff =) |
I've given it a spin, and it looks like it's working. Thanks. |
One feature in char-rnn is the ability to take a .t7 checkpoint and resume training with it. This is very helpful for recovering from crashes, OOMs, system shutdowns (say, from a cat stepping on one's laptop power button several days into training), accidental C-cs, etc. It can also be useful for training on one dataset and then training on another.
cmd:option('-init_from', '', 'initialize network parameters from checkpoint at this path')
(Yes, I know I can use char-rnn, but I like torch-rnn's efficiency and this seems to be the main feature I'm missing from char-rnn at the moment.)
The text was updated successfully, but these errors were encountered: