Save only the best checkpoint #1475

bernardohenz · 2018-07-24T20:43:31Z

Instead of using EarlyStop to avoid overfitting, I would like to save the model at the time it had the best (lowest) validation loss. In other words, I would like to keep track only for the best checkpoint (based on val/dev set).

I tried to use a wrapper for tf.train.Saver (this one: https://github.com/vonclites/checkmate), but couldn't make it works with DeepSpeech. Is there a easy way to do tha (maybe using the MonitoredTrainingSession as you are using)?

reuben · 2018-07-24T21:29:42Z

I looked into this briefly but couldn't find a clean way to implement it with MonitoredTrainingSession (which is IMO a terrible API). I ended up just writing a hack that works, but isn't really code we can land. I'm attaching the patch.

save_best_val.patch.txt

bernardohenz · 2018-07-25T14:15:55Z

Thanks @reuben , I had to change the name of the MonitoredTrainingSession to train_session for this to work. Now it is working perfectly fine.

lissyx · 2018-10-02T12:29:54Z

@bernardohenz What's the status here, is the issue fixed, do you have a workaround ? Should we close this ?

bernardohenz · 2018-10-02T12:35:22Z

@lissyx yes, the patch from @reuben worked just fine.

reuben · 2018-10-02T13:11:30Z

We should have a proper solution for this in-tree. This would be too much work with the current training setup, but would probably be very simple if we used TF Eager, for example. Reopening so we don't forget.

reuben · 2018-11-17T16:05:18Z

Updated version of the patch is here: https://gist.github.com/reuben/dcc2deaf85568591e34ce363bc3bac2a

tilmankamp · 2019-04-02T13:57:43Z

#1988 added this feature. It checkpoints the best validating epoch and supports different ways to resume - see load command-line parameter.

lock · 2019-05-02T13:59:34Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

bernardohenz closed this as completed Oct 2, 2018

reuben reopened this Oct 2, 2018

tilmankamp mentioned this issue Mar 28, 2019

Automatically checkpointing model with best dev-loss #1987

Closed

tilmankamp closed this as completed Apr 2, 2019

lock bot locked and limited conversation to collaborators May 2, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Save only the best checkpoint #1475

Save only the best checkpoint #1475

bernardohenz commented Jul 24, 2018

reuben commented Jul 24, 2018

bernardohenz commented Jul 25, 2018

lissyx commented Oct 2, 2018

bernardohenz commented Oct 2, 2018

reuben commented Oct 2, 2018

reuben commented Nov 17, 2018

tilmankamp commented Apr 2, 2019

lock bot commented May 2, 2019

Save only the best checkpoint #1475

Save only the best checkpoint #1475

Comments

bernardohenz commented Jul 24, 2018

reuben commented Jul 24, 2018

bernardohenz commented Jul 25, 2018

lissyx commented Oct 2, 2018

bernardohenz commented Oct 2, 2018

reuben commented Oct 2, 2018

reuben commented Nov 17, 2018

tilmankamp commented Apr 2, 2019

lock bot commented May 2, 2019