Move allgather_coalesced implementation from Python to C++ #29059

agolynski · 2019-11-01T20:59:19Z

Summary:
Pull Request resolved: #29059

Resubmit of reverted PR #28857.

Differential Revision: D18277097

facebook-github-bot · 2019-11-01T20:59:32Z

This pull request was exported from Phabricator. Differential Revision: D18277097

agolynski · 2019-11-01T21:02:27Z

#29059 caused a broken build due to unimplemented function in MPI backend. Fixed here.

facebook-github-bot · 2019-11-01T22:16:51Z

This pull request was exported from Phabricator. Differential Revision: D18277097

facebook-github-bot · 2019-11-01T22:24:49Z

This pull request was exported from Phabricator. Differential Revision: D18277097

facebook-github-bot · 2019-11-01T23:25:44Z

This pull request was exported from Phabricator. Differential Revision: D18277097

pietern

I should have checked CI before approving #28857.

It's all green now, so it should be good to go.

Summary: Pull Request resolved: pytorch#29059 This is a resubmit of reverted diff D18209289 ( PR pytorch#28857 ). Test Plan: buck test caffe2/test:c10d buck test caffe2/test:distributed_gloo Reviewed By: pietern Differential Revision: D18277097 fbshipit-source-id: 3e16c4c5f71e5c051ffef280e021bd253caf127c

facebook-github-bot · 2019-11-04T16:09:16Z

This pull request was exported from Phabricator. Differential Revision: D18277097

facebook-github-bot · 2019-11-04T21:40:06Z

This pull request has been merged in 23695ab.

gchanan · 2019-11-08T18:33:37Z

test/test_c10d.py

@@ -1309,31 +1309,37 @@ def test_allgather_stress_cuda(self):
    def test_allgather_coalesced_checks(self):
        store = c10d.FileStore(self.file_name, self.world_size)
        pg = c10d.ProcessGroupGloo(store, self.rank, self.world_size, self.opts())
-        dummy_input = [torch.Tensor([1])]
+        dummy_input = [torch.zeros([1], dtype=torch.float32)]


sorry -- why are these not translated exactly?

torch.Tensor([1]) is torch.ones([1]), not zeros, right?

also same with the line below, why did that change from -1 to 0?

This only tests error handling, so they underlying values here should not be important (all_gather_coalesced never copies anything in this function). I am happy to change it back if you prefer

gchanan · 2019-11-08T18:38:05Z

torch/lib/c10d/Utils.hpp

@@ -203,6 +203,20 @@ inline void assertCPU(
  }
 }

+inline void assertSameDevice(
+    std::function<void(const std::string&)> fn,
+    const at::ArrayRef<at::Tensor>& tensors) {


don't we have a TensorList for this? (Also I wouldn't expect const reference to it, it's trivial to copy).

ah, a good point. I was trying to be consistent with the other functions in this module mostly use const reference to ArrayRef, and not TensorList (TensorList = ArrayRef).
Actually, I just need to verify tensors in a vector, so I might just accept a const ref to a vector.
Would you prefer that?

agolynski requested review from apaszke, mrshenli and pietern as code owners November 1, 2019 20:59

agolynski force-pushed the export-D18277097 branch from 36b882e to efd623b Compare November 1, 2019 22:16

agolynski force-pushed the export-D18277097 branch from efd623b to 0542a6c Compare November 1, 2019 22:24

agolynski force-pushed the export-D18277097 branch from 0542a6c to b21acc4 Compare November 1, 2019 23:25

pietern changed the title ~~Moving python allgather_coalesced impl from Py to C. (#28857)~~ Moving allgather_coalesced implementation from Python to C++ Nov 4, 2019

pietern changed the title ~~Moving allgather_coalesced implementation from Python to C++~~ Move allgather_coalesced implementation from Python to C++ Nov 4, 2019

pietern approved these changes Nov 4, 2019

View reviewed changes

agolynski force-pushed the export-D18277097 branch from b21acc4 to 557c40b Compare November 4, 2019 16:09

facebook-github-bot closed this in 23695ab Nov 4, 2019

facebook-github-bot added the merged label Nov 4, 2019

gchanan reviewed Nov 8, 2019

View reviewed changes

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move allgather_coalesced implementation from Python to C++ #29059

Move allgather_coalesced implementation from Python to C++ #29059

Uh oh!

agolynski commented Nov 1, 2019 •

edited by pietern

Loading

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

agolynski commented Nov 1, 2019

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

pietern left a comment

Uh oh!

facebook-github-bot commented Nov 4, 2019

Uh oh!

facebook-github-bot commented Nov 4, 2019

Uh oh!

gchanan Nov 8, 2019

Uh oh!

agolynski Nov 8, 2019

Uh oh!

gchanan Nov 8, 2019

Uh oh!

agolynski Nov 8, 2019

Uh oh!

Uh oh!

Move allgather_coalesced implementation from Python to C++ #29059

Move allgather_coalesced implementation from Python to C++ #29059

Uh oh!

Conversation

agolynski commented Nov 1, 2019 • edited by pietern Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

agolynski commented Nov 1, 2019

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

facebook-github-bot commented Nov 1, 2019

Uh oh!

pietern left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 4, 2019

Uh oh!

facebook-github-bot commented Nov 4, 2019

Uh oh!

gchanan Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

agolynski Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

gchanan Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

agolynski Nov 8, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

agolynski commented Nov 1, 2019 •

edited by pietern

Loading