[dist_optim] add distributed functional Adam optimizer #50624

wanchaol · 2021-01-15T22:52:57Z

Stack from ghstack:

[dist_optim] add warning to distributed optimizer #50630 [dist_optim] add warning to distributed optimizer
[dist_optim] add distributed functional AdamW optimizer #50620 [dist_optim] add distributed functional AdamW optimizer
[dist_optim] add distributed functional RMSprop optimizer #50619 [dist_optim] add distributed functional RMSprop optimizer
[dist_optim] add distributed functional Adadelta optimizer #50623 [dist_optim] add distributed functional Adadelta optimizer
[dist_optim] add distributed functional sgd optimizer #50618 [dist_optim] add distributed functional sgd optimizer
[dist_optim] add distributed functional Adam optimizer #50624 [dist_optim] add distributed functional Adam optimizer

Add TorchScript compatible Adam functional optimizer to distributed optimizer

Differential Revision: D25932770

Add TorchScript compatible Adam functional optimizer to distributed optimizer [ghstack-poisoned]

rohan-varma · 2021-01-16T01:46:08Z

torch/distributed/optim/functional_adam.py

+# Define a TorchScript compatible Functional Adam Optimizer
+# where we use these optimizer in a functional way.
+# Instead of using the `param.grad` when updating parameters,
+# we explicitly let the user pass gradients to the `step` function


Just to clarify, so according to the comments below "user" is the DistributedOptimizer API, not RPC application user right? The call to optimizer should remain the same for RPC user?

Yes that's right, DistributedOptimizer API actually pass those grads to the step function, let me update the comment to clarify

rohan-varma · 2021-01-16T01:52:19Z

torch/distributed/optim/functional_adam.py

+                + f"Gradients length: {len(gradients)}"
+            )
+
+        for param, gradient in zip(self.param_group['params'], gradients):


General question, It looks like the similar code in torch/optim/adam.py uses for p in group['params'], and then accesses the grad with p.grad. Although I'm assuming we can't do this since we need the grads explicitly, since dist autograd doesn't populate p.grad?

yeah in distributed autograd context, we don't populate p.grad, instead we call dist_autograd.get_gradients(autograd_ctx_id) to get the list of gradients locally.

rohan-varma

LGTM overall, although I mostly compared the changes to torch/optim/adam.py and torch/distributed/optim/functional_adagrad.py and checked for parity. I don't have context on the changes in torch/optim/functional.py so please get someone to look at that.

@pritamdamania87 Would be great if you get a chance to take a look at these changes as well.

rohan-varma · 2021-01-16T01:57:26Z

torch/distributed/optim/functional_adam.py

+                # update the steps for each param group update
+                state['step'] += 1
+                # record the step after step update
+                state_steps.append(state['step'].item())


I'm guessing all the logic up until the point we call F.adam is aiming to emulate torch/optim/adam.py, although is there any automated way to guarantee this? Could we dedupe the similar parts into helper functions and call those helper functions here? Alternatively, are we guaranteed that the dist optimizer tests will raise an error if the implementations diverge at all?

yes they are similar indeed, but different across different optimizers as there need to be different states for each optimizer (and this is different from the original adam.py as well bc of TorchScript limitations on some syntax), so it's hard to generalized across them. Though, I think we can guaranteed the implementation will not diverge from the functional part as we shared the computation part, and test_optim has a good coverage of it. On the state management side, do you think we should introduce some sort of flag to disable/enable the TorchScript support and compare the results in the test?

rohan-varma · 2021-01-16T01:59:53Z

torch/testing/_internal/distributed/rpc/dist_optimizer_test.py

@@ -199,65 +198,12 @@ def test_dist_optim(self):
            self.assertEqual(new_w1, module1.get_w())
            self.assertEqual(new_w2, module2.get_w())

+    @dist_init()
+    def test_dist_optim(self):
+        self._test_dist_optim_base(optim.SGD, lr=0.05)


~~can we just move this to test_dist_optim_functional?~~
Ah nvm, this is regular optimizer, not torchscripted

Add TorchScript compatible Adam functional optimizer to distributed optimizer Differential Revision: [D25932770](https://our.internmc.facebook.com/intern/diff/D25932770) [ghstack-poisoned]

facebook-github-bot · 2021-01-23T09:13:55Z

@wanchaol merged this pull request in 5cbe1e4.

[dist_optim] add distributed functional Adam optimizer

1aa3a74

Add TorchScript compatible Adam functional optimizer to distributed optimizer [ghstack-poisoned]

wanchaol requested review from mingzhe09088, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners January 15, 2021 22:52

This was referenced Jan 15, 2021

[optimizer] refactor SGD to use functional API #45597

Closed

[optimizer] refactor Adadelta to use functional API #50409

Closed

[optimizer] refactor RMSProp to use functional API #50410

Closed

[optimizer] refactor AdamW to use functional API #50411

Closed

facebook-github-bot added the cla signed label Jan 15, 2021

This was referenced Jan 15, 2021

[dist_optim] add distributed functional sgd optimizer #50618

Closed

[dist_optim] add distributed functional Adadelta optimizer #50623

Closed

facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jan 15, 2021

This was referenced Jan 15, 2021

[dist_optim] add distributed functional RMSprop optimizer #50619

Closed

[dist_optim] add distributed functional AdamW optimizer #50620

Closed

[dist_optim] add warning to distributed optimizer #50630

Closed

rohan-varma reviewed Jan 16, 2021

View reviewed changes

rohan-varma approved these changes Jan 16, 2021

View reviewed changes

rohan-varma reviewed Jan 16, 2021

View reviewed changes

wanchaol added 4 commits January 20, 2021 11:58

Update on "[dist_optim] add distributed functional Adam optimizer"

199fa6b

Add TorchScript compatible Adam functional optimizer to distributed optimizer Differential Revision: [D25932770](https://our.internmc.facebook.com/intern/diff/D25932770) [ghstack-poisoned]

Update on "[dist_optim] add distributed functional Adam optimizer"

ad290d1

Add TorchScript compatible Adam functional optimizer to distributed optimizer Differential Revision: [D25932770](https://our.internmc.facebook.com/intern/diff/D25932770) [ghstack-poisoned]

Update on "[dist_optim] add distributed functional Adam optimizer"

43a1f77

Add TorchScript compatible Adam functional optimizer to distributed optimizer Differential Revision: [D25932770](https://our.internmc.facebook.com/intern/diff/D25932770) [ghstack-poisoned]

Update on "[dist_optim] add distributed functional Adam optimizer"

38e0ef1

Add TorchScript compatible Adam functional optimizer to distributed optimizer Differential Revision: [D25932770](https://our.internmc.facebook.com/intern/diff/D25932770) [ghstack-poisoned]

facebook-github-bot closed this in 5cbe1e4 Jan 23, 2021

facebook-github-bot added the Merged label Jan 23, 2021

facebook-github-bot deleted the gh/wanchaol/156/head branch January 26, 2021 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dist_optim] add distributed functional Adam optimizer #50624

[dist_optim] add distributed functional Adam optimizer #50624

wanchaol commented Jan 15, 2021 •

edited

rohan-varma Jan 16, 2021

wanchaol Jan 20, 2021

rohan-varma Jan 16, 2021

wanchaol Jan 20, 2021

rohan-varma left a comment

rohan-varma Jan 16, 2021

wanchaol Jan 20, 2021

rohan-varma Jan 16, 2021 •

edited

facebook-github-bot commented Jan 23, 2021

[dist_optim] add distributed functional Adam optimizer #50624

[dist_optim] add distributed functional Adam optimizer #50624

Conversation

wanchaol commented Jan 15, 2021 • edited

rohan-varma Jan 16, 2021

Choose a reason for hiding this comment

wanchaol Jan 20, 2021

Choose a reason for hiding this comment

rohan-varma Jan 16, 2021

Choose a reason for hiding this comment

wanchaol Jan 20, 2021

Choose a reason for hiding this comment

rohan-varma left a comment

Choose a reason for hiding this comment

rohan-varma Jan 16, 2021

Choose a reason for hiding this comment

wanchaol Jan 20, 2021

Choose a reason for hiding this comment

rohan-varma Jan 16, 2021 • edited

Choose a reason for hiding this comment

facebook-github-bot commented Jan 23, 2021

wanchaol commented Jan 15, 2021 •

edited

rohan-varma Jan 16, 2021 •

edited