[SGD] Callback API for SGD+Tune #11316

amogkam · 2020-10-09T18:37:11Z

Documentation updates to come later.

Why are these changes needed?

Related issue number

Checks

I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

python/ray/util/sgd/torch/examples/tune_example.py

python/ray/util/sgd/torch/torch_trainer.py

richardliaw · 2020-10-10T04:08:16Z

python/ray/util/sgd/torch/torch_trainer.py

@@ -678,6 +715,7 @@ def step(self):
        train_stats = self.trainer.train(max_retries=10, profile=True)
        validation_stats = self.trainer.validate(profile=True)
        stats = merge_dicts(train_stats, validation_stats)
+        self._iter += 1


Is this state supposed to be saved upon checkpoint?

Ah yes good catch. Added it to save_checkpoint and load_checkpoint.

Actually, now I know what you're doing here, I think we already provide self.training_iteration in trainable, so you don't actually need to save/manage this stuff.

Ah got it. Changed it to using self.training_iteration and removed the duplicate metadata checkpointing.

richardliaw

Nice; seems like a nice small change. Left a couple questions/comments. Have you showed this to users?

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

amogkam · 2020-10-12T17:49:56Z

@richardliaw Yes I ran it by Richard (Dendra) and he said he really liked it.

python/ray/util/sgd/tests/test_torch.py

python/ray/util/sgd/torch/torch_trainer.py

richardliaw

Left a few last comments for changes

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

python/ray/util/sgd/torch/torch_trainer.py

amogkam · 2020-10-15T19:18:53Z

Updated docstrings

richardliaw

nice! thanks for cleaning this up.

amogkam added 2 commits October 9, 2020 11:35

callback api

5d99fc4

formatting

ca55c05

amogkam requested a review from richardliaw October 9, 2020 18:37

amogkam assigned richardliaw Oct 9, 2020

richardliaw reviewed Oct 10, 2020

View reviewed changes

python/ray/util/sgd/torch/examples/tune_example.py Outdated Show resolved Hide resolved

richardliaw reviewed Oct 10, 2020

View reviewed changes

python/ray/util/sgd/torch/examples/tune_example.py Outdated Show resolved Hide resolved

richardliaw reviewed Oct 10, 2020

View reviewed changes

python/ray/util/sgd/torch/torch_trainer.py Outdated Show resolved Hide resolved

richardliaw reviewed Oct 10, 2020

View reviewed changes

python/ray/util/sgd/torch/torch_trainer.py Outdated Show resolved Hide resolved

richardliaw reviewed Oct 10, 2020

View reviewed changes

amogkam requested a review from krfricke October 12, 2020 17:13

amogkam and others added 3 commits October 12, 2020 10:14

Update python/ray/util/sgd/torch/torch_trainer.py

d94e018

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

addressing comments

f60913d

formatting

ecf4764

amogkam requested a review from richardliaw October 12, 2020 18:04

amogkam added 2 commits October 12, 2020 17:00

fix tests

1d76487

rename

0377028

richardliaw reviewed Oct 13, 2020

View reviewed changes

python/ray/util/sgd/tests/test_torch.py Outdated Show resolved Hide resolved

richardliaw reviewed Oct 13, 2020

View reviewed changes

python/ray/util/sgd/torch/torch_trainer.py Outdated Show resolved Hide resolved

richardliaw reviewed Oct 13, 2020

View reviewed changes

python/ray/util/sgd/torch/torch_trainer.py Outdated Show resolved Hide resolved

richardliaw requested changes Oct 13, 2020

View reviewed changes

richardliaw added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 13, 2020

amogkam and others added 5 commits October 15, 2020 09:20

Update python/ray/util/sgd/tests/test_torch.py

77b6433

Co-authored-by: Richard Liaw <rliaw@berkeley.edu>

addressing comments

6a82a38

lint

98f3724

revert checkpointing

cd9bbc9

formatting

3f37456

amogkam requested a review from richardliaw October 15, 2020 17:24

amogkam removed the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Oct 15, 2020

richardliaw reviewed Oct 15, 2020

View reviewed changes

python/ray/util/sgd/torch/torch_trainer.py Show resolved Hide resolved

Update python/ray/util/sgd/torch/torch_trainer.py

fbf86c1

richardliaw reviewed Oct 15, 2020

View reviewed changes

python/ray/util/sgd/torch/torch_trainer.py Outdated Show resolved Hide resolved

Update python/ray/util/sgd/torch/torch_trainer.py

5df48cf

richardliaw reviewed Oct 15, 2020

View reviewed changes

python/ray/util/sgd/torch/torch_trainer.py Show resolved Hide resolved

Update python/ray/util/sgd/torch/torch_trainer.py

ad58ad4

richardliaw reviewed Oct 15, 2020

View reviewed changes

python/ray/util/sgd/torch/torch_trainer.py Outdated Show resolved Hide resolved

richardliaw and others added 2 commits October 15, 2020 12:06

Update python/ray/util/sgd/torch/torch_trainer.py

a06c7b4

more docstring updates

488dfc4

richardliaw approved these changes Oct 15, 2020

View reviewed changes

richardliaw merged commit 38eb614 into ray-project:master Oct 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SGD] Callback API for SGD+Tune #11316

[SGD] Callback API for SGD+Tune #11316

amogkam commented Oct 9, 2020

richardliaw Oct 10, 2020

amogkam Oct 12, 2020

richardliaw Oct 13, 2020

amogkam Oct 15, 2020

richardliaw left a comment

amogkam commented Oct 12, 2020

richardliaw left a comment

amogkam commented Oct 15, 2020

richardliaw left a comment

[SGD] Callback API for SGD+Tune #11316

[SGD] Callback API for SGD+Tune #11316

Conversation

amogkam commented Oct 9, 2020

Why are these changes needed?

Related issue number

Checks

richardliaw Oct 10, 2020

Choose a reason for hiding this comment

amogkam Oct 12, 2020

Choose a reason for hiding this comment

richardliaw Oct 13, 2020

Choose a reason for hiding this comment

amogkam Oct 15, 2020

Choose a reason for hiding this comment

richardliaw left a comment

Choose a reason for hiding this comment

amogkam commented Oct 12, 2020

richardliaw left a comment

Choose a reason for hiding this comment

amogkam commented Oct 15, 2020

richardliaw left a comment

Choose a reason for hiding this comment