Resume training from checkpoints #33

gwern · 2016-03-13T00:35:00Z

One feature in char-rnn is the ability to take a .t7 checkpoint and resume training with it. This is very helpful for recovering from crashes, OOMs, system shutdowns (say, from a cat stepping on one's laptop power button several days into training), accidental C-cs, etc. It can also be useful for training on one dataset and then training on another.

cmd:option('-init_from', '', 'initialize network parameters from checkpoint at this path')

(Yes, I know I can use char-rnn, but I like torch-rnn's efficiency and this seems to be the main feature I'm missing from char-rnn at the moment.)

jcjohnson · 2016-03-13T01:24:50Z

This shouldn't be too hard to add - I should be able to implement this next week some time.

gwern · 2016-03-13T01:28:41Z

I didn't think it would be, and that you understandably just hadn't gotten around to it yet since no one had complained. (I thought about trying to copy over the char-rnn init code and adapting it myself, but I'd probably screw it up.)

jcjohnson · 2016-03-13T01:48:30Z

It should actually be a lot simpler than char-rnn, since there aren't any clones or weight sharing to worry about.

jcjohnson · 2016-03-16T04:37:10Z

Implemented in 5641d23 with a 9 line diff =)

gwern · 2016-03-18T19:49:26Z

I've given it a spin, and it looks like it's working. Thanks.

jcjohnson closed this as completed Mar 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Resume training from checkpoints #33

Resume training from checkpoints #33

gwern commented Mar 13, 2016

jcjohnson commented Mar 13, 2016

gwern commented Mar 13, 2016

jcjohnson commented Mar 13, 2016

jcjohnson commented Mar 16, 2016

gwern commented Mar 18, 2016

Resume training from checkpoints #33

Resume training from checkpoints #33

Comments

gwern commented Mar 13, 2016

jcjohnson commented Mar 13, 2016

gwern commented Mar 13, 2016

jcjohnson commented Mar 13, 2016

jcjohnson commented Mar 16, 2016

gwern commented Mar 18, 2016