Comprehensively test NCCL's `get_future()` API #56838

rohan-varma · 2021-04-23T23:07:39Z

🚀 Feature

In ProcessGroupNCCL, we added a get_future() API to support gradient compression use cases, where a user can call get_future() to schedule additional callbacks when implementing custom gradient compression algorithms.

However, get_future() can be more generally useful and today is created for all nccl collectives as well as recv p2p op, but does not appear to be tested anywhere. It would be great to added tests that use get_future() and then enqueue more CUDA operations on the result and verify all synchronization happens appropriately to ensure this API works as expected.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu @gcramer23

The text was updated successfully, but these errors were encountered:

rohan-varma · 2021-05-07T18:03:13Z

@agolynski Do you have any thoughts on this issue? Can it be assigned to you?

rohan-varma added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Apr 23, 2021

rohan-varma added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 7, 2021

rohan-varma added the better-engineering Relatively self-contained tasks for better engineering contributors label Oct 12, 2021

rohan-varma added the pt_distributed_rampup Ramp up tasks for new developers on PT distributed label Nov 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comprehensively test NCCL's `get_future()` API #56838

Comprehensively test NCCL's `get_future()` API #56838

rohan-varma commented Apr 23, 2021 •

edited by pytorch-probot bot

rohan-varma commented May 7, 2021

Comprehensively test NCCL's get_future() API #56838

Comprehensively test NCCL's get_future() API #56838

Comments

rohan-varma commented Apr 23, 2021 • edited by pytorch-probot bot

🚀 Feature

rohan-varma commented May 7, 2021

Comprehensively test NCCL's `get_future()` API #56838

Comprehensively test NCCL's `get_future()` API #56838

rohan-varma commented Apr 23, 2021 •

edited by pytorch-probot bot