Allow DDP to wrap multi-GPU modules #19271

mrshenli · 2019-04-15T19:59:25Z

Summary: allow DDP to take multi-gpu models

Differential Revision: D14822375

pietern

Some minor points. Looking good overall. Glad that we'll be able to support multi device modules here!

pietern · 2019-04-17T16:56:22Z

test/test_c10d.py

+            model = QuadraGpuNet(gpus)
+
+        ddp_model = DistributedDataParallel(
+            copy.deepcopy(model),


No need for deepcopy here?

Don't we need to make sure that model and ddp_model operate on independent params so that we can compare?

Never mind, it's needed because of the numerical equivalence testing.

pietern · 2019-04-17T16:58:40Z

test/test_c10d.py

+        self.assertEqual(len(gpus), 4, "expecting 4 gpus per process")
+        gpus = gpus[:4]
+        gpu_strs = list(map(lambda i: torch.device('cuda:' + str(i)), gpus))
+        self._test_gloo_backend(gpus, gpu_strs, True)


These two can be factored into another helper that calls _test_gloo_backend. At the top level it's good to have them be separate tests so that we see the ones that get skipped.

pietern · 2019-04-17T16:58:51Z

test/test_c10d.py

+        self.assertEqual(len(gpus), 4, "expecting 4 gpus per process")
+        gpus = gpus[:4]
+        gpu_strs = list(map(lambda i: torch.device('cuda:' + str(i)), gpus))
+        self._test_nccl_backend(gpus, gpu_strs, True)


These two can be factored into another helper that calls _test_nccl_backend.

pietern · 2019-04-17T17:00:34Z

torch/nn/parallel/distributed.py

@@ -153,11 +153,22 @@ class DistributedDataParallel(Module):

    Args:
        module (Module): module to be parallelized
-        device_ids (list of int or torch.device): CUDA devices (default: all devices)
-        output_device (int or torch.device): device location of output (default: device_ids[0])
+        device_ids (list of int or torch.device):  CUDA devices. This should


Extra whitespace -- is this intentional?

no, not intentional. I will edit, thanks!

Summary: Pull Request resolved: pytorch#19271 allow DDP to take multi-gpu models Reviewed By: pietern Differential Revision: D14822375 fbshipit-source-id: 8c8bcd4526643be5fa44134620d58fcf2c197238

facebook-github-bot · 2019-04-18T07:15:15Z

This pull request has been merged in 6732358.

Summary: Pull Request resolved: pytorch#19271 allow DDP to take multi-gpu models Reviewed By: pietern Differential Revision: D14822375 fbshipit-source-id: 1eebfaa33371766d3129f0ac6f63a573332b2f1c

mrshenli requested review from apaszke and pietern as code owners April 15, 2019 19:59

mrshenli changed the title ~~Allow DDP to wrap multi-GPU modules~~ [WIP][Don't Review] Allow DDP to wrap multi-GPU modules Apr 15, 2019

mrshenli force-pushed the export-D14822375 branch from 17f4652 to b891c96 Compare April 15, 2019 20:50

mrshenli force-pushed the export-D14822375 branch from b891c96 to 68d676c Compare April 16, 2019 18:49

mrshenli changed the title ~~[WIP][Don't Review] Allow DDP to wrap multi-GPU modules~~ Allow DDP to wrap multi-GPU modules Apr 16, 2019

mrshenli force-pushed the export-D14822375 branch from 68d676c to 19acac3 Compare April 16, 2019 22:31

mrshenli force-pushed the export-D14822375 branch from 19acac3 to b2c2c32 Compare April 16, 2019 22:33

mrshenli force-pushed the export-D14822375 branch from b2c2c32 to 29baf9e Compare April 17, 2019 01:56

mrshenli force-pushed the export-D14822375 branch from 29baf9e to 9670ad8 Compare April 17, 2019 03:51

mrshenli force-pushed the export-D14822375 branch from 9670ad8 to 0da1e6f Compare April 17, 2019 03:52

pietern approved these changes Apr 17, 2019

View reviewed changes

pietern added oncall: distributed Add this issue/PR to distributed oncall triage queue triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 17, 2019

pietern added this to the 1.1 milestone Apr 17, 2019

mrshenli force-pushed the export-D14822375 branch from 0da1e6f to c71acdc Compare April 17, 2019 18:16

mrshenli force-pushed the export-D14822375 branch from c71acdc to f326925 Compare April 17, 2019 18:17

mrshenli force-pushed the export-D14822375 branch from f326925 to c7b2c4d Compare April 18, 2019 00:17

Allow DDP to wrap multi-GPU modules (pytorch#19271)

ea8bdc9

Summary: Pull Request resolved: pytorch#19271 allow DDP to take multi-gpu models Reviewed By: pietern Differential Revision: D14822375 fbshipit-source-id: 8c8bcd4526643be5fa44134620d58fcf2c197238

mrshenli force-pushed the export-D14822375 branch from c7b2c4d to ea8bdc9 Compare April 18, 2019 00:18

facebook-github-bot closed this in 6732358 Apr 18, 2019

facebook-github-bot added the merged label Apr 18, 2019

This was referenced Apr 18, 2019

DistributedDataParallel Wrap Multi-GPU Models #18591

Closed

[Don't merge] Adding DistributedDataParallel Tutorial pytorch/tutorials#478

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow DDP to wrap multi-GPU modules #19271

Allow DDP to wrap multi-GPU modules #19271

mrshenli commented Apr 15, 2019

pietern left a comment

pietern Apr 17, 2019

mrshenli Apr 17, 2019

pietern Apr 17, 2019

pietern Apr 17, 2019

pietern Apr 17, 2019

pietern Apr 17, 2019

mrshenli Apr 17, 2019

facebook-github-bot commented Apr 18, 2019

Allow DDP to wrap multi-GPU modules #19271

Allow DDP to wrap multi-GPU modules #19271

Conversation

mrshenli commented Apr 15, 2019

pietern left a comment

Choose a reason for hiding this comment

pietern Apr 17, 2019

Choose a reason for hiding this comment

mrshenli Apr 17, 2019

Choose a reason for hiding this comment

pietern Apr 17, 2019

Choose a reason for hiding this comment

pietern Apr 17, 2019

Choose a reason for hiding this comment

pietern Apr 17, 2019

Choose a reason for hiding this comment

pietern Apr 17, 2019

Choose a reason for hiding this comment

mrshenli Apr 17, 2019

Choose a reason for hiding this comment

facebook-github-bot commented Apr 18, 2019