New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add torch::cuda::ncll::all2all #45900
Closed
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Use it from ProcesGroupNCCL `torch/csrc/cuda/nccl.cpp` is compiled as part of `torch_cuda` library and thus by calling this function from ProcessGroupNCCCL.cpp it avoids linking 2nd instance of libnccl.a into `torch_python` Fixes #42517 [ghstack-poisoned]
malfet
requested review from
mingzhe09088,
mrshenli,
pietern,
pritamdamania87,
rohan-varma and
zhaojuanmao
as code owners
October 6, 2020 15:02
Closed
malfet
added a commit
that referenced
this pull request
Oct 6, 2020
Use it from ProcesGroupNCCL `torch/csrc/cuda/nccl.cpp` is compiled as part of `torch_cuda` library and thus by calling this function from ProcessGroupNCCCL.cpp it avoids linking 2nd instance of libnccl.a into `torch_python` Fixes #42517 ghstack-source-id: 1c5f90f544d06f88f4cc2132bd1587cbdc946911 Pull Request resolved: #45900
facebook-github-bot
added
the
oncall: distributed
Add this issue/PR to distributed oncall triage queue
label
Oct 6, 2020
mingzhe09088
reviewed
Oct 6, 2020
walterddr
approved these changes
Oct 7, 2020
This was referenced Dec 4, 2020
mingzhe09088
pushed a commit
to mingzhe09088/pytorch-1
that referenced
this pull request
Dec 4, 2020
Summary: Pull Request resolved: pytorch#45900 Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp` Fixes pytorch#42517 Here is a NCCL dependency graph: ``` libnccl.a --> libtorch_cuda.so ---> libtorch_python.so | ^ | | --------> libc10d.a ----------------- ``` When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before pytorch#42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so` But adding `ncclSend`|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library. This PR adds `nccl[Send|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library. More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls. Test Plan: Imported from OSS Reviewed By: mingzhe09088 Differential Revision: D24138011 Pulled By: malfet fbshipit-source-id: 33305197fc7d8707b7fd3a66b543f7733b9241a1
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
Use
torch:cuda::nccl:all2all
fromProcesGroupNCCL.cpp
Fixes #42517
Here is a NCCL dependency graph:
When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless
-whole-archive
option is used. Before #42514 all nccl call made fromProcessGroupNCCL.cpp
were also made fromtorch/csrc/cuda/nccl.cpp
, which is compiled as part oflibtorch_cuda.so
But adding
ncclSend
|ncclRecv
to ProcesGroupNCCL.cpp forced linker to embed those intolibtorch_python.so
, which also resulted in linking other dependent symbols into the library.This PR adds
nccl[Send|Recv]
call totorch_cuda.so
by implementingall2all
intorch_cuda
and thus avoids double linking the static library.More involved, but prone solution, would be to use wrappers exported in
torch::cuda::nccl
namespace, instead of making direct NCCL API calls.Differential Revision: D24138011