Have FutureNCCL record streams w/ allocator in addCallback #48496

lw · 2020-11-26T18:56:05Z

Stack from ghstack:

Add support for async callbacks in ivalue::Future #48790 Add support for async callbacks in ivalue::Future
Drop FutureNCCL in favor of vanilla CUDAFuture #49014 Drop FutureNCCL in favor of vanilla CUDAFuture
Make CUDAFuture remember and restore current device in callback #48789 Make CUDAFuture remember and restore current device in callback
Remove DataPtr extractor from CUDAFuture #48840 Remove DataPtr extractor from CUDAFuture
Cache the DataPtrs in CUDAFuture #48788 Cache the DataPtrs in CUDAFuture
Split out reusable CUDAFuture from FutureNCCL #48506 Split out reusable CUDAFuture from FutureNCCL
Merge common parts of FutureNCCL into at::ivalue::Future #48505 Merge common parts of FutureNCCL into at::ivalue::Future
Split FutureNCCL's CUDA-specific parts from generic future logic #48504 Split FutureNCCL's CUDA-specific parts from generic future logic
Support wider range of types in FutureNCCL #48502 Support wider range of types in FutureNCCL
Don't store device indices separately on FutureNCCL #48501 Don't store device indices separately on FutureNCCL
Add multi-GPU support to FutureNCCL #48500 Add multi-GPU support to FutureNCCL
Fix FutureNCCL not recording dataptrs with caching alloc in wait() #48563 Fix FutureNCCL not recording dataptrs with caching alloc in wait()
Fix FutureNCCL's completed() disagreeing with wait() #48503 Fix FutureNCCL's completed() disagreeing with wait()
Record CUDA events for "follow-up" FutureNCCL inside markCompleted #48499 Record CUDA events for "follow-up" FutureNCCL inside markCompleted
Use fresh stream from pool for each FutureNCCL callback #48498 Use fresh stream from pool for each FutureNCCL callback
Make FutureNCCL record events in current stream #48497 Make FutureNCCL record events in current stream
Have FutureNCCL record streams w/ allocator in addCallback #48496 Have FutureNCCL record streams w/ allocator in addCallback
Add some safeguards to FutureNCCL #48562 Add some safeguards to FutureNCCL
Remove NCCL dependency from PythonFutureWrapper #48495 Remove NCCL dependency from PythonFutureWrapper
Avoid using FutureNCCL before it's ready #48561 Avoid using FutureNCCL before it's ready

This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed).

There are two ways to add a callback to a Future: then and addCallback (with the former deferring to the latter). FutureNCCL only "patched" then, which caused addCallback to be unsupported. By patching addCallback, on the other hand, we cover both.

The high-level goal of this change though is to remove all CUDA-specific stuff from then, and move it to either markCompleted or to a wrapper around the callback. This will take a few more steps to achieve.

Differential Revision: D25177558

This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- There are two ways to add a callback to a Future: `then` and `addCallback` (with the former deferring to the latter). FutureNCCL only "patched" `then`, which caused `addCallback` to be unsupported. By patching `addCallback`, on the other hand, we cover both. The high-level goal of this change though is to remove all CUDA-specific stuff from `then`, and move it to either `markCompleted` or to a wrapper around the callback. This will take a few more steps to achieve. Differential Revision: [D25177558](https://our.internmc.facebook.com/intern/diff/D25177558/) [ghstack-poisoned]

mrshenli

LGTM!

This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- There are two ways to add a callback to a Future: `then` and `addCallback` (with the former deferring to the latter). FutureNCCL only "patched" `then`, which caused `addCallback` to be unsupported. By patching `addCallback`, on the other hand, we cover both. The high-level goal of this change though is to remove all CUDA-specific stuff from `then`, and move it to either `markCompleted` or to a wrapper around the callback. This will take a few more steps to achieve. Differential Revision: [D25177558](https://our.internmc.facebook.com/intern/diff/D25177558/) [ghstack-poisoned]

dr-ci · 2020-11-29T19:57:47Z

💊 CI failures summary and remediations

As of commit b4bd461 (more details on the Dr. CI page):

2/2 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_bazel_build (1/1)

Step: "Bazel Build" (full log | diagnosis details | 🔁 rerun)

Dec 09 15:30:16 sccache: error: couldn't connect to server

Dec 09 15:30:16 +++ eval 'extract_trap_cmd ' 
Dec 09 15:30:16 ++++ extract_trap_cmd 
Dec 09 15:30:16 ++++ printf '%s\n' '' 
Dec 09 15:30:16 +++ printf '%s\n' cleanup 
Dec 09 15:30:16 ++ trap -- ' 
Dec 09 15:30:16 cleanup' EXIT 
Dec 09 15:30:16 ++ [[ pytorch-linux-xenial-py3.6-gcc7-bazel-build != *pytorch-win-* ]] 
Dec 09 15:30:16 ++ which sccache 
Dec 09 15:30:16 ++ sccache --stop-server 
Dec 09 15:30:16 Stopping sccache server... 
Dec 09 15:30:16 sccache: error: couldn't connect to server 
Dec 09 15:30:16 sccache: caused by: Connection refused (os error 111) 
Dec 09 15:30:16 ++ true 
Dec 09 15:30:16 ++ rm /var/lib/jenkins/sccache_error.log 
Dec 09 15:30:16 rm: cannot remove '/var/lib/jenkins/sccache_error.log': No such file or directory 
Dec 09 15:30:16 ++ true 
Dec 09 15:30:16 ++ [[ pytorch-linux-xenial-py3.6-gcc7-bazel-build == *rocm* ]] 
Dec 09 15:30:16 ++ SCCACHE_ERROR_LOG=/var/lib/jenkins/sccache_error.log 
Dec 09 15:30:16 ++ SCCACHE_IDLE_TIMEOUT=1200 
Dec 09 15:30:16 ++ RUST_LOG=sccache::server=error 
Dec 09 15:30:16 ++ sccache --start-server

1 failure not recognized by patterns:

Job	Step	Action
^{docker-pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9}	^{Check if image should be built}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 15 times.

This commit is part of a stack that reworks FutureNCCL in order to extract a generic CUDA-aware Future subclass. The stack deliberately breaks up this transition into elementary changes, to make it easier to verify that the behavior is preserved (or to highlight how it gets changed). --- There are two ways to add a callback to a Future: `then` and `addCallback` (with the former deferring to the latter). FutureNCCL only "patched" `then`, which caused `addCallback` to be unsupported. By patching `addCallback`, on the other hand, we cover both. The high-level goal of this change though is to remove all CUDA-specific stuff from `then`, and move it to either `markCompleted` or to a wrapper around the callback. This will take a few more steps to achieve. Differential Revision: [D25177558](https://our.internmc.facebook.com/intern/diff/D25177558/) [ghstack-poisoned]

facebook-github-bot · 2020-12-10T13:13:00Z

This pull request has been merged in e4267eb.

lw requested review from mingzhe09088, mrshenli, pietern, pritamdamania87, rohan-varma and zhaojuanmao as code owners November 26, 2020 18:56

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Nov 26, 2020

mrshenli approved these changes Nov 26, 2020

View reviewed changes

lw mentioned this pull request Nov 29, 2020

Debug CI failures of #48501 #48553

Closed

This was referenced Nov 29, 2020

Avoid using FutureNCCL before it's ready #48561

Closed

Add some safeguards to FutureNCCL #48562

Closed

Fix FutureNCCL not recording dataptrs with caching alloc in wait() #48563

Closed

lw added 2 commits November 29, 2020 14:33

This was referenced Dec 3, 2020

Cache the DataPtrs in CUDAFuture #48788

Closed

Make CUDAFuture remember and restore current device in callback #48789

Closed

Add support for async callbacks in ivalue::Future #48790

Closed

lw mentioned this pull request Dec 4, 2020

Remove DataPtr extractor from CUDAFuture #48840

Closed

lw mentioned this pull request Dec 8, 2020

Drop FutureNCCL in favor of vanilla CUDAFuture #49014

Closed

facebook-github-bot closed this in e4267eb Dec 10, 2020

facebook-github-bot added the Merged label Dec 10, 2020

facebook-github-bot deleted the gh/lw/85/head branch December 13, 2020 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Have FutureNCCL record streams w/ allocator in addCallback #48496

Have FutureNCCL record streams w/ allocator in addCallback #48496

lw commented Nov 26, 2020 •

edited

mrshenli left a comment

dr-ci bot commented Nov 29, 2020 •

edited

facebook-github-bot commented Dec 10, 2020

Have FutureNCCL record streams w/ allocator in addCallback #48496

Have FutureNCCL record streams w/ allocator in addCallback #48496

Conversation

lw commented Nov 26, 2020 • edited

mrshenli left a comment

Choose a reason for hiding this comment

dr-ci bot commented Nov 29, 2020 • edited

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_bazel_build (1/1)

1 failure not recognized by patterns:

facebook-github-bot commented Dec 10, 2020

lw commented Nov 26, 2020 •

edited

dr-ci bot commented Nov 29, 2020 •

edited