Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scatter_object_list API for c10d #43930

Closed
wants to merge 9 commits into from

Conversation

rohan-varma
Copy link
Member

@rohan-varma rohan-varma commented Sep 1, 2020

Stack from ghstack:

Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the scatter is stored as the first element of scatter_object_output_list, and the src rank is expected to provide an input list scatter_object_input_list which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.

Differential Revision: D23430686

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.

Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!

[ghstack-poisoned]
@dr-ci
Copy link

dr-ci bot commented Sep 1, 2020

💊 CI failures summary and remediations

As of commit 370522c (more details on the Dr. CI page):


  • 3/3 failures introduced in this PR

🕵️ 3 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_build (1/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Dec 04 21:51:33 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:113:25: error: 'RowVectorX' in namespace 'Eigen' does not name a template type
Dec 04 21:51:33                         ^ 
Dec 04 21:51:33 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:113:25: error: 'RowVectorX' in namespace 'Eigen' does not name a template type 
Dec 04 21:51:33  using ERVecXU8 = Eigen::RowVectorX<uint8_t>; 
Dec 04 21:51:33                          ^ 
Dec 04 21:51:33 make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/utils/math/elementwise.cc.o] Error 1 
Dec 04 21:51:33 caffe2/CMakeFiles/torch_cpu.dir/build.make:9475: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/utils/math/elementwise.cc.o' failed 
Dec 04 21:51:33 In file included from /var/lib/jenkins/workspace/caffe2/utils/math/reduce.cc:18:0: 
Dec 04 21:51:33 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:110:24: error: 'RowVector' in namespace 'Eigen' does not name a template type 
Dec 04 21:51:33  using ERVecXt = Eigen::RowVector<T, Eigen::Dynamic>; 
Dec 04 21:51:33                         ^ 
Dec 04 21:51:33 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:113:25: error: 'RowVectorX' in namespace 'Eigen' does not name a template type 
Dec 04 21:51:33  using ERVecXU8 = Eigen::RowVectorX<uint8_t>; 
Dec 04 21:51:33                          ^ 
Dec 04 21:51:33 caffe2/CMakeFiles/torch_cpu.dir/build.make:9499: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/utils/math/reduce.cc.o' failed 
Dec 04 21:51:33 make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/utils/math/reduce.cc.o] Error 1 
Dec 04 21:51:33 CMakeFiles/Makefile2:8261: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/all' failed 
Dec 04 21:51:33 make[1]: *** [caffe2/CMakeFiles/torch_cpu.dir/all] Error 2 
Dec 04 21:51:33 make: *** [all] Error 2 
Dec 04 21:51:33 Makefile:138: recipe for target 'all' failed 
Dec 04 21:51:33 -- Building version 1.8.0a0 
THON=True -DBUILD_TEST=True -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/var/lib/jenkins/workspace/torch -DCMAKE_PREFIX_PATH=/opt/conda/lib/python3.6/site-packages -DNUMPY_INCLUDE_DIR=/opt/conda/lib/python3.6/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/opt/conda/bin/python -DPYTHON_INCLUDE_DIR=/opt/conda/include/python3.6m -DPYTHON_LIBRARY=/opt/conda/lib/libpython3.6m.so.1.0 -DTORCH_BUILD_VERSION=1.8.0a0 -DUSE_LLVM=/opt/llvm -DUSE_NUMPY=True -DWERROR=1 /var/lib/jenkins/workspace 

See CircleCI build pytorch_xla_linux_bionic_py3_6_clang9_build (2/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Dec 04 21:56:27 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:113:25: error: no template named 'RowVectorX' in namespace 'Eigen'
Dec 04 21:56:23 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:113:25: error: no template named 'RowVectorX' in namespace 'Eigen' 
Dec 04 21:56:23 using ERVecXU8 = Eigen::RowVectorX<uint8_t>; 
Dec 04 21:56:23                  ~~~~~~~^ 
Dec 04 21:56:23 2 errors generated. 
Dec 04 21:56:23 make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/utils/math/elementwise.cc.o] Error 1 
Dec 04 21:56:23 caffe2/CMakeFiles/torch_cpu.dir/build.make:5270: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/utils/math/elementwise.cc.o' failed 
Dec 04 21:56:27 In file included from /var/lib/jenkins/workspace/caffe2/utils/math/reduce.cc:18: 
Dec 04 21:56:27 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:110:24: error: no template named 'RowVector' in namespace 'Eigen' 
Dec 04 21:56:27 using ERVecXt = Eigen::RowVector<T, Eigen::Dynamic>; 
Dec 04 21:56:27                 ~~~~~~~^ 
Dec 04 21:56:27 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:113:25: error: no template named 'RowVectorX' in namespace 'Eigen' 
Dec 04 21:56:27 using ERVecXU8 = Eigen::RowVectorX<uint8_t>; 
Dec 04 21:56:27                  ~~~~~~~^ 
Dec 04 21:56:27 2 errors generated. 
Dec 04 21:56:27 make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/utils/math/reduce.cc.o] Error 1 
Dec 04 21:56:27 caffe2/CMakeFiles/torch_cpu.dir/build.make:5283: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/utils/math/reduce.cc.o' failed 
Dec 04 21:56:27 make[1]: *** [caffe2/CMakeFiles/torch_cpu.dir/all] Error 2 
Dec 04 21:56:27 CMakeFiles/Makefile2:7627: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/all' failed 
Dec 04 21:56:27 make: *** [all] Error 2 
Dec 04 21:56:27 Makefile:159: recipe for target 'all' failed 
Dec 04 21:56:27 -- Building version 1.8.0a0 

See CircleCI build pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (3/3)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Dec 04 21:54:40 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:113:25: error: 'RowVectorX' in namespace 'Eigen' does not name a template type
Dec 04 21:54:37                         ^ 
Dec 04 21:54:37 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:113:25: error: 'RowVectorX' in namespace 'Eigen' does not name a template type 
Dec 04 21:54:37  using ERVecXU8 = Eigen::RowVectorX<uint8_t>; 
Dec 04 21:54:37                          ^ 
Dec 04 21:54:37 make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/utils/math/elementwise.cc.o] Error 1 
Dec 04 21:54:37 caffe2/CMakeFiles/torch_cpu.dir/build.make:9475: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/utils/math/elementwise.cc.o' failed 
Dec 04 21:54:40 In file included from /var/lib/jenkins/workspace/caffe2/utils/math/reduce.cc:18:0: 
Dec 04 21:54:40 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:110:24: error: 'RowVector' in namespace 'Eigen' does not name a template type 
Dec 04 21:54:40  using ERVecXt = Eigen::RowVector<T, Eigen::Dynamic>; 
Dec 04 21:54:40                         ^ 
Dec 04 21:54:40 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:113:25: error: 'RowVectorX' in namespace 'Eigen' does not name a template type 
Dec 04 21:54:40  using ERVecXU8 = Eigen::RowVectorX<uint8_t>; 
Dec 04 21:54:40                          ^ 
Dec 04 21:54:40 make[2]: *** [caffe2/CMakeFiles/torch_cpu.dir/utils/math/reduce.cc.o] Error 1 
Dec 04 21:54:40 caffe2/CMakeFiles/torch_cpu.dir/build.make:9499: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/utils/math/reduce.cc.o' failed 
Dec 04 21:54:40 make[1]: *** [caffe2/CMakeFiles/torch_cpu.dir/all] Error 2 
Dec 04 21:54:40 CMakeFiles/Makefile2:9984: recipe for target 'caffe2/CMakeFiles/torch_cpu.dir/all' failed 
Dec 04 21:54:40 Makefile:138: recipe for target 'all' failed 
Dec 04 21:54:40 make: *** [all] Error 2 
Dec 04 21:54:40 -- Building version 1.8.0a0 
YPE=Release -DCMAKE_INSTALL_PREFIX=/var/lib/jenkins/workspace/torch -DCMAKE_PREFIX_PATH=/opt/conda/lib/python3.6/site-packages -DCUDA_NVCC_EXECUTABLE=/opt/cache/lib/nvcc -DNUMPY_INCLUDE_DIR=/opt/conda/lib/python3.6/site-packages/numpy/core/include -DPYTHON_EXECUTABLE=/opt/conda/bin/python -DPYTHON_INCLUDE_DIR=/opt/conda/include/python3.6m -DPYTHON_LIBRARY=/opt/conda/lib/libpython3.6m.so.1.0 -DTORCH_BUILD_VERSION=1.8.0a0 -DUSE_LLVM=/opt/llvm -DUSE_NUMPY=True -DWERROR=1 /var/lib/jenkins/workspace 

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 42 times.

Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.

Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!

[ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.

Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!

[ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.

Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!

[ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.

Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!

[ghstack-poisoned]
)

# Deserialize back to object
scatter_object_output_list[0] = _tensor_to_object(output_tensor, obj_tensor_size)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we only support only one object being scattered onto a rank, which is similar to the scatter tensor collective, but this can be extended such that a rank can get multiple objects as well.

Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.

Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!

[ghstack-poisoned]
@rohan-varma
Copy link
Member Author

@mrshenli Just pinging this diff since it is the last remaining object-based collectives API that we would like to support.

Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.

Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!

[ghstack-poisoned]
Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.

Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!

[ghstack-poisoned]
Copy link
Contributor

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Left some minor comments

element will store the object scattered to this rank.
scatter_object_input_list (List[Any]): List of input objects to scatter.
Each object must be picklable. Only objects on the ``src`` rank will
be scattered, and the argument can be ``None`` for non-src ranks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent

:func:`scatter_object_list` uses ``pickle`` module implicitly, which
is known to be insecure. It is possible to construct malicious pickle
data which will execute arbitrary code during unpickling. Only call this
function with data you trust.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add an example section to the docstring?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please ignore this. I saw it is added in the next PR.

"Expected argument scatter_object_output_list to be a list of size at least 1."
)

my_rank = get_rank()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to pass in the group arg here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, looks like I missed this in the other APIs as well. Will add them and tests that would have caught this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although, I guess the implementation is not buggy, since we pass in the group to all the underlying comm apis.

)

# Scatter per-object sizes to trim tensors when deserializing back to object
scatter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not asking for revision, and I am not sure if this option will be faster or slower) another option is to pack this into the broadcast call so that we can save one comm op, but will need to comm more data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, we can run a benchmark and send a follow up PR if there is indeed a perf benefit.

if self.rank == src_rank
else [None for _ in collectives_object_test_list]
)
scatter_list = scatter_list[: int(os.environ["WORLD_SIZE"])]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be dist.get_world_size()?

Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.

Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!

[ghstack-poisoned]
rohan-varma added a commit that referenced this pull request Dec 4, 2020
Pull Request resolved: #43930

Closes #23232. As part of addressing #23232, this PR adds support for scatter_object_list which is an API to scatter arbitrary picklable objects to all the other ranks.

The implementation approach follows a similar approach as #42189. The result of the `scatter` is stored as the first element of `scatter_object_output_list`, and the src rank is expected to provide an input list `scatter_object_input_list` which contains the objects to scatter.

Note that this API requires 1 broadcast and 2 scatters. This is because we must communicate the maximum object size to be scattered, which only the src rank knows about. After that, we also need to communicate the objects themselves as well as the true sizes of the object.

Note that the API is designed to match the tensor-based collectives other than supporting async_op. For now, it is a blocking call. If we see demand to support async_op, we will have to make more progress on merging work/future to support this.

It only works for Gloo because NCCL doesn't support scatter.
ghstack-source-id: 117904065

Differential Revision: [D23430686](https://our.internmc.facebook.com/intern/diff/D23430686/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23430686/)!
@rohan-varma
Copy link
Member Author

CI error is unrelated:

Dec 04 21:51:33 /var/lib/jenkins/workspace/caffe2/utils/eigen_utils.h:113:25: error: 'RowVectorX' in namespace 'Eigen' does not name a template type
Dec 04 21:51:33  using ERVecXU8 = Eigen::RowVectorX<uint8_t>;

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 02d89f9.

@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/161/head branch December 8, 2020 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants