-
Notifications
You must be signed in to change notification settings - Fork 503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fixes and unit testing for Callbacks #242
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, left some comments
folder from which to resume training state. | ||
Expects saved states in the form: (all but model optional) | ||
model.pt, optimizer.pt, scheduler.pt, regularizer.pt | ||
All state files present will be loaded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering whether we should
i) try to load best_model.pt (if monitor_metric is not None this will exist)
ii) if that fails, load model.pt (which will be the model from the latest save_interval)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently isn't the way I have things set up - at each save_interval I create a new folder that saves the state. Same for best_model. Do you think it would be better to continually overwrite the save in one folder?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so - to avoid blowing up memory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Awesome, thank you @dhpitt ! |
Simple change. Model loads checkpoint in
Trainer.train()
instead ofTrainer.__init__()