-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SGD] Callback API for SGD+Tune #11316
Conversation
@@ -678,6 +715,7 @@ def step(self): | |||
train_stats = self.trainer.train(max_retries=10, profile=True) | |||
validation_stats = self.trainer.validate(profile=True) | |||
stats = merge_dicts(train_stats, validation_stats) | |||
self._iter += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this state supposed to be saved upon checkpoint?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes good catch. Added it to save_checkpoint and load_checkpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now I know what you're doing here, I think we already provide self.training_iteration
in trainable, so you don't actually need to save/manage this stuff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah got it. Changed it to using self.training_iteration and removed the duplicate metadata checkpointing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice; seems like a nice small change. Left a couple questions/comments. Have you showed this to users?
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
@richardliaw Yes I ran it by Richard (Dendra) and he said he really liked it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few last comments for changes
Co-authored-by: Richard Liaw <rliaw@berkeley.edu>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! thanks for cleaning this up.
Documentation updates to come later.
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.