[feat][fix] ShardedDDP deferred init #558

blefaudeux · 2021-03-30T03:42:25Z

Before submitting

Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
Did you read the contributor guideline?
Did you make sure to update the docs?
Did you write any new necessary tests?

What does this PR do?

Another fix on https://fb.workplace.com/groups/778897096280386/permalink/915100272660067/ (+incoming lightning fix, but this one in fairscale could be useful to others). Make sure that it's possible to change the model device after it has been wrapped by ShardedDDP (without this PR the buckets would be on the wrong device. Not completely a "bug" I presume because the .to() call on ShardedDDP had an assert and would have reported that this was not supported)

PR review

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.
cc @ananthsub @SeanNaren

Did you have fun?

Make sure you had fun coding 🙃

SeanNaren · 2021-03-30T10:41:31Z

ah nice! this is a nice feature :) just for my own understanding, if we defer the model transfer for very large models will this come with any noticeable speed benefit by sharding on CPU -> then move to GPU?

blefaudeux

ah nice! this is a nice feature :) just for my own understanding, if we defer the model transfer for very large models will this come with any noticeable speed benefit by sharding on CPU -> then move to GPU?

good question, but I don't think it would change too much, the idea here is more that it gives a little more flexibility for this step (move your model to device anytime, as long as it's ready for the step that's good)

blefaudeux · 2021-03-30T03:52:30Z

fairscale/nn/data_parallel/sharded_ddp.py

    def forward(self, *inputs: Any, **kwargs: Any) -> Any:
        """
        Module forward pass, handles any DDP-specific work in the background. Primes the
        backward pass for gradient reduction to the proper ranks.
        """

-        # Optionally check whether the trainable parameters have changed
+        # Deferred initialization, or change detection
+        needs_setup = len(self._grad_hooks) == 0


this is the real PR change, deffer the initialization so that we use the device at the first forward time, assumed correct

blefaudeux · 2021-03-30T03:53:22Z

fairscale/nn/data_parallel/sharded_ddp.py

@@ -478,6 +483,10 @@ def _setup_backward_hooks(self) -> None:
        This makes the gradient reduction automatic whenever there's a backward pass
        """

+        # Detach possible pre-existing hooks
+        while len(self._grad_hooks) > 0:


recommended by the doc, detach the old hooks before attaching the new ones, I've never seen anything wrong with the current version but better follow the rules..

linting..

min-xu-ai

LGTM

blefaudeux · 2021-03-30T20:35:33Z

tests/nn/data_parallel/test_sharded_ddp_features.py

    optimizer = OSS(params=model.parameters(), optim=torch.optim.SGD, lr=1e-3, momentum=0.99)
    ddp_model = ShardedDataParallel(model, optimizer)

+    # Move the model to another device post-construction


@min-xu-ai this is the sanity check to make sure that this feature stays in

survive the model being moved to device post-construction

7f5d020

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Mar 30, 2021

blefaudeux marked this pull request as draft March 30, 2021 03:46

make sure that a unit test would catch a regression

1d4ddbe

blefaudeux force-pushed the shardedddp_deferred_init branch from 30ea553 to 1d4ddbe Compare March 30, 2021 03:51

blefaudeux requested review from min-xu-ai, msbaines and anj-s March 30, 2021 03:53

blefaudeux marked this pull request as ready for review March 30, 2021 05:09

blefaudeux commented Mar 30, 2021

View reviewed changes

adding a better assert error message

7ca6597

blefaudeux force-pushed the shardedddp_deferred_init branch from 2f4ce67 to 7ca6597 Compare March 30, 2021 16:27

Update pyproject.toml

45a98b4

linting..

min-xu-ai approved these changes Mar 30, 2021

View reviewed changes

blefaudeux commented Mar 30, 2021

View reviewed changes

blefaudeux merged commit daa1bad into master Mar 30, 2021

blefaudeux deleted the shardedddp_deferred_init branch March 30, 2021 20:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat][fix] ShardedDDP deferred init #558

[feat][fix] ShardedDDP deferred init #558

blefaudeux commented Mar 30, 2021 •

edited

Loading

SeanNaren commented Mar 30, 2021

blefaudeux left a comment

blefaudeux Mar 30, 2021

blefaudeux Mar 30, 2021

min-xu-ai left a comment

blefaudeux Mar 30, 2021

[feat][fix] ShardedDDP deferred init #558

[feat][fix] ShardedDDP deferred init #558

Conversation

blefaudeux commented Mar 30, 2021 • edited Loading

Before submitting

What does this PR do?

PR review

Did you have fun?

SeanNaren commented Mar 30, 2021

blefaudeux left a comment

Choose a reason for hiding this comment

blefaudeux Mar 30, 2021

Choose a reason for hiding this comment

blefaudeux Mar 30, 2021

Choose a reason for hiding this comment

min-xu-ai left a comment

Choose a reason for hiding this comment

blefaudeux Mar 30, 2021

Choose a reason for hiding this comment

blefaudeux commented Mar 30, 2021 •

edited

Loading