New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Save only the best checkpoint #1475
Comments
I looked into this briefly but couldn't find a clean way to implement it with MonitoredTrainingSession (which is IMO a terrible API). I ended up just writing a hack that works, but isn't really code we can land. I'm attaching the patch. |
Thanks @reuben , I had to change the name of the MonitoredTrainingSession to |
@bernardohenz What's the status here, is the issue fixed, do you have a workaround ? Should we close this ? |
We should have a proper solution for this in-tree. This would be too much work with the current training setup, but would probably be very simple if we used TF Eager, for example. Reopening so we don't forget. |
Updated version of the patch is here: https://gist.github.com/reuben/dcc2deaf85568591e34ce363bc3bac2a |
#1988 added this feature. It checkpoints the best validating epoch and supports different ways to resume - see |
This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs. |
Instead of using EarlyStop to avoid overfitting, I would like to save the model at the time it had the best (lowest) validation loss. In other words, I would like to keep track only for the best checkpoint (based on val/dev set).
I tried to use a wrapper for tf.train.Saver (this one: https://github.com/vonclites/checkmate), but couldn't make it works with DeepSpeech. Is there a easy way to do tha (maybe using the MonitoredTrainingSession as you are using)?
The text was updated successfully, but these errors were encountered: