Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Release/1.7] Enable NCCL A2A on OSS #48857

Closed
wants to merge 2 commits into from

Commits on Dec 4, 2020

  1. Cleanup nccl.cpp (pytorch#45899)

    Summary:
    Pull Request resolved: pytorch#45899
    
    Use function polymorphism to avoid repeated casts
    I.e. instead of using `NCCL_CHECK(from_nccl_result(` add variant of the function that takes `ncclResult_t` as input argument
    Add non-pointer variant of `to_nccl_comm` to avoid `*to_nccl_comm(&comm)` pattern
    
    Test Plan: Imported from OSS
    
    Reviewed By: walterddr
    
    Differential Revision: D24138012
    
    Pulled By: malfet
    
    fbshipit-source-id: 7f62a03e108cbe455910e86e894afdd1c27e8ff1
    malfet authored and mingzhe0908 committed Dec 4, 2020
    Configuration menu
    Copy the full SHA
    fea103d View commit details
    Browse the repository at this point in the history
  2. Add torch::cuda::ncll::all2all (pytorch#45900)

    Summary:
    Pull Request resolved: pytorch#45900
    
    Use `torch:cuda::nccl:all2all` from `ProcesGroupNCCL.cpp`
    
    Fixes pytorch#42517
    
    Here is a NCCL dependency graph:
    ```
    libnccl.a --> libtorch_cuda.so ---> libtorch_python.so
        |                                   ^
        |                                   |
        --------> libc10d.a -----------------
    ```
    When static library is linked into a dynamic library or an executable, linker is removes all unused/duplicate symbols from that library, unless `-whole-archive` option is used. Before pytorch#42514 all nccl call made from `ProcessGroupNCCL.cpp` were also made from `torch/csrc/cuda/nccl.cpp`, which is compiled as part of `libtorch_cuda.so`
    But adding `ncclSend`|`ncclRecv` to ProcesGroupNCCL.cpp forced linker to embed those into `libtorch_python.so`, which also resulted in linking other dependent symbols into the library.
    
    This PR adds `nccl[Send|Recv]` call to `torch_cuda.so` by implementing `all2all` in `torch_cuda` and thus avoids double linking the static library.
    
    More involved, but prone solution, would be to use wrappers exported in `torch::cuda::nccl` namespace, instead of making direct NCCL API calls.
    
    Test Plan: Imported from OSS
    
    Reviewed By: mingzhe09088
    
    Differential Revision: D24138011
    
    Pulled By: malfet
    
    fbshipit-source-id: 33305197fc7d8707b7fd3a66b543f7733b9241a1
    malfet authored and mingzhe0908 committed Dec 4, 2020
    Configuration menu
    Copy the full SHA
    445963c View commit details
    Browse the repository at this point in the history