[Gradient Compression] Explicitly restrict the scope of torch.cuda.sy…

…nchronize to the current device (#49711) Summary: Pull Request resolved: #49711 `torch.cuda.synchronize` uses the current device by default. Explicitly specify this device for better readability. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 119017654 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25672267 fbshipit-source-id: 62a2266727a2ea76175f3c438daf20951091c771
pytorch · Dec 23, 2020 · 88c33ff · 88c33ff
1 parent ee27104
commit 88c33ff
Showing 1 changed file with 2 additions and 2 deletions.
diff --git a/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py b/torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py
@@ -304,7 +304,7 @@ def decompress(fut):
         for p, q, tensor in zip(ps, qs, high_rank_tensors):
             torch.matmul(p, q.t(), out=tensor)
         if torch.cuda.is_available():
-            torch.cuda.synchronize()
+            torch.cuda.synchronize(device)
 
         if state.use_error_feedback:
             # Memorize the local errors.
@@ -494,7 +494,7 @@ def decompress(fut):
             # Memorize the local errors.
             state.error_dict[bucket_index] = input_tensor_cp - input_tensor
         if torch.cuda.is_available():
-            torch.cuda.synchronize()
+            torch.cuda.synchronize(device)
         if not state.warm_start:
             state.p_memory_dict.clear()
             state.q_memory_dict.clear()