Add torch::cuda::ncll::all2all #45900

malfet · 2020-10-06T15:02:21Z

Stack from ghstack:

[NCCL] Enable send/recv tests #45994 [NCCL] Enable send/recv tests
[NCCL] Add torch::cuda::nccl::send/recv #45926 [NCCL] Add torch::cuda::nccl::send/recv
Add torch::cuda::ncll::all2all #45900 Add torch::cuda::ncll::all2all
Cleanup nccl.cpp #45899 Cleanup nccl.cpp

Use torch:cuda::nccl:all2all from ProcesGroupNCCL.cpp

Here is a NCCL dependency graph:

libnccl.a --> libtorch_cuda.so ---> libtorch_python.so
    |                                   ^
    |                                   |
    --------> libc10d.a -----------------

When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless -whole-archive option is used. Before #42514 all nccl call made from ProcessGroupNCCL.cpp were also made from torch/csrc/cuda/nccl.cpp, which is compiled as part of libtorch_cuda.so
But adding ncclSend|ncclRecv to ProcesGroupNCCL.cpp forced linker to embed those into libtorch_python.so, which also resulted in linking other dependent symbols into the library.

This PR adds nccl[Send|Recv] call to torch_cuda.so by implementing all2all in torch_cuda and thus avoids double linking the static library.

More involved, but prone solution, would be to use wrappers exported in torch::cuda::nccl namespace, instead of making direct NCCL API calls.

Differential Revision: D24138011

Use it from ProcesGroupNCCL `torch/csrc/cuda/nccl.cpp` is compiled as part of `torch_cuda` library and thus by calling this function from ProcessGroupNCCCL.cpp it avoids linking 2nd instance of libnccl.a into `torch_python` Fixes #42517 [ghstack-poisoned]

Use it from ProcesGroupNCCL `torch/csrc/cuda/nccl.cpp` is compiled as part of `torch_cuda` library and thus by calling this function from ProcessGroupNCCCL.cpp it avoids linking 2nd instance of libnccl.a into `torch_python` Fixes #42517 ghstack-source-id: 1c5f90f544d06f88f4cc2132bd1587cbdc946911 Pull Request resolved: #45900

torch/csrc/cuda/nccl.cpp

torch/csrc/cuda/comm.cpp

facebook-github-bot · 2020-10-08T08:10:53Z

@malfet merged this pull request in c19b9cd.

facebook-github-bot · 2020-10-08T08:10:58Z

@malfet merged this pull request in c19b9cd.

Summary: Pull Request resolved: pytorch#45900 Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp` Fixes pytorch#42517 Here is a NCCL dependency graph: ``` libnccl.a --> libtorch_cuda.so ---> libtorch_python.so | ^ | | --------> libc10d.a ----------------- ``` When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before pytorch#42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so` But adding `ncclSend`|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library. This PR adds `nccl[Send|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library. More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls. Test Plan: Imported from OSS Reviewed By: mingzhe09088 Differential Revision: D24138011 Pulled By: malfet fbshipit-source-id: 33305197fc7d8707b7fd3a66b543f7733b9241a1

Add torch::cuda::ncll::all2all

4138acd

Use it from ProcesGroupNCCL `torch/csrc/cuda/nccl.cpp` is compiled as part of `torch_cuda` library and thus by calling this function from ProcessGroupNCCCL.cpp it avoids linking 2nd instance of libnccl.a into `torch_python` Fixes #42517 [ghstack-poisoned]

malfet requested review from mingzhe09088, mrshenli, pietern, pritamdamania87, rohan-varma and zhaojuanmao as code owners October 6, 2020 15:02

malfet mentioned this pull request Oct 6, 2020

Cleanup nccl.cpp #45899

Closed

facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Oct 6, 2020

malfet requested a review from ngimel October 6, 2020 15:03

mingzhe09088 reviewed Oct 6, 2020

View reviewed changes

torch/csrc/cuda/nccl.cpp Show resolved Hide resolved

mingzhe09088 mentioned this pull request Oct 6, 2020

[NCCL] Add torch::cuda::nccl::send/recv #45926

Closed

walterddr approved these changes Oct 7, 2020

View reviewed changes

torch/csrc/cuda/comm.cpp Show resolved Hide resolved

mingzhe09088 mentioned this pull request Oct 7, 2020

[NCCL] Enable send/recv tests #45994

Closed

facebook-github-bot closed this in c19b9cd Oct 8, 2020

facebook-github-bot added the Merged label Oct 8, 2020

facebook-github-bot deleted the gh/malfet/2/head branch October 11, 2020 14:18

zasdfgbnm mentioned this pull request Oct 20, 2020

Add support for NCCL alltoall #44374

Closed

malfet mentioned this pull request Nov 3, 2020

ProcessGroupNCCL NCCL lib version mismatch #47291

Open

This was referenced Dec 4, 2020

[Release/1.7] Enable NCCL A2A on OSS #48857

Closed

[v1.7.1] Release Tracker #47622

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add torch::cuda::ncll::all2all #45900

Add torch::cuda::ncll::all2all #45900

malfet commented Oct 6, 2020 •

edited by mingzhe09088

facebook-github-bot commented Oct 8, 2020

facebook-github-bot commented Oct 8, 2020

Add torch::cuda::ncll::all2all #45900

Add torch::cuda::ncll::all2all #45900

Conversation

malfet commented Oct 6, 2020 • edited by mingzhe09088

facebook-github-bot commented Oct 8, 2020

facebook-github-bot commented Oct 8, 2020

malfet commented Oct 6, 2020 •

edited by mingzhe09088