Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pytorch][PR] Record FutureNCCL callback stream on CUDA caching allocator #45318

Closed
wants to merge 1 commit into from
Closed

[pytorch][PR] Record FutureNCCL callback stream on CUDA caching allocator #45318

wants to merge 1 commit into from

Commits on Oct 22, 2020

  1. [pytorch][PR] Record FutureNCCL callback stream on CUDA caching alloc…

    …ator (#45318)
    
    Summary:
    Pull Request resolved: #45318
    
    When calling `then()` from WorkNCCL, record the input data pointers in futureNCCLCallbackStream_ before the execution of the input callback.
    
    Note that the recording cannot be directly added to the lambda used by addCallback in ProcessGroupNCCL.hpp. This is because the type of future value in that context is pyobject rather than TensorList, but a type casting will require pybind and introduce Python dependency, which should not be allowed in c10d library.
    
    I have considered creating a util function in a separate file to support this type casting, and then placing it under torch/csrc directory where python dependency is allowed. However, torch/csrc has a dependency on c10d, so this will create a circular dependency.
    
    Finally, a `record_stream_cb_` member is added to FutureNCCL, and the default value is nullptr. A default `record_stream_cb_` implementation is added to `PythonFutureWrapper,` where Python dependency is allowed.
    
    In addition, a few lines are reformatted by lint.
    caffe2/torch/csrc/distributed/c10d/init.cpp is only reformatted.
    
    #Closes: #44203
    
    Test Plan:
    buck test mode/dev-nosan caffe2/test/distributed:c10d -- ProcessGroupNCCLTest
    buck test mode/dev-nosan caffe2/test/distributed:c10d  -- test_accumulate_gradients_no_sync_allreduce_with_then_hook
    buck test mode/dev-nosan caffe2/test/distributed:c10d  -- test_ddp_comm_hook_allreduce_with_then_hook_nccl
    
    Reviewed By: pritamdamania87
    
    Differential Revision: D23910257
    
    fbshipit-source-id: fe41338e3b5510b9d9a631f57b78134e8c7e1b12
    Yi Wang authored and facebook-github-bot committed Oct 22, 2020
    Configuration menu
    Copy the full SHA
    7020fea View commit details
    Browse the repository at this point in the history