Add guard for non-default stream in DDP's autograd engine callback #40115

rohan-varma · 2020-06-16T20:08:08Z

Stack from ghstack:

Add guard for non-default stream in DDP's autograd engine callback #40115 Add guard for non-default stream in DDP's autograd engine callback

A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by with torch.cuda.Stream(stream). In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail.

If a user does the following:

stream = torch.cuda.Stream()
with torch.cuda.stream(stream):
model = DDP(...)
loss = model(inptut).sum()
loss.backward()
grad = model.module.weight.grad()
average = dist.all_reduce(grad)

There is a chance that average and grad will not be equal. This is because the CUDA kernels corresponding to the all_reduce call may run before loss.backward()'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all.

This PR fixes the issue by passing the current stream into DDP's callback.

Tested by adding a UT test_DistributedDataParallel_non_default_stream that fails without this PR

Differential Revision: D22073353

NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on Phabricator!

Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)! [ghstack-poisoned]

…callback" Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)! [ghstack-poisoned]

Pull Request resolved: #40115 Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 105998820 Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)!

mrshenli · 2020-06-16T20:48:24Z

torch/csrc/distributed/c10d/reducer.cpp

+    const c10::impl::VirtualGuardImpl guard =
+        c10::impl::VirtualGuardImpl{deviceType};
+    const c10::Stream currentStream =
+        guard.getStream(replica.contents.device());


Does this still work if this is CPU only DDP? Or is it because we need to support DDP so that we cannot directly call getCurrentCUDAStream from CUDAStream?

Yes, we could've done this with at::cuda::getCurrentCudaStream and then set that as the guard below, although this wouldn't work for the CPU use case (i.e. lead to an error since we wouldn't have compiled in CUDA).

This approach here works for CPU or GPU, as the VirtualGuardImpl simply dispatches the call to the relevant guard implementation CPUGuardImpl for CPU and CUDAGuardImpl for CUDA. The former is effectively a no-op since there isn't a concept of streams in CPU.

CPU DDP should work fine with this, and I verified by running the DDP CPU test as well.

…callback" Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)! [ghstack-poisoned]

Pull Request resolved: #40115 Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106005351 Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)!

dr-ci · 2020-06-16T21:30:27Z

💊 CI failures summary and remediations

As of commit fe19c6d (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-xenial-rocm3.3-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 12 times.

…callback" Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)! [ghstack-poisoned]

Pull Request resolved: #40115 Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106010110 Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)!

mrshenli

Thanks for fixing!

…callback" Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)! [ghstack-poisoned]

Pull Request resolved: #40115 Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106340935 Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)!

…callback" Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)! [ghstack-poisoned]

Pull Request resolved: #40115 Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106477572 Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)!

…callback" Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)! [ghstack-poisoned]

Pull Request resolved: #40115 Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106481208 Differential Revision: [D22073353](https://our.internmc.facebook.com/intern/diff/D22073353/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D22073353/)!

facebook-github-bot · 2020-06-24T20:23:38Z

This pull request has been merged in 9a3e16c.

…40115) Summary: Pull Request resolved: #40115 Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106481208 Differential Revision: D22073353 fbshipit-source-id: 70da9b44e5f546ff8b6d8c42022ecc846dff033e

…40115) (#41151) Summary: Pull Request resolved: #40115 Closes #37790 Closes #37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106481208 Differential Revision: D22073353 fbshipit-source-id: 70da9b44e5f546ff8b6d8c42022ecc846dff033e

rohan-varma requested review from mrshenli, pietern, pritamdamania87 and zhaojuanmao as code owners June 16, 2020 20:08

mrshenli reviewed Jun 16, 2020

View reviewed changes

mrshenli approved these changes Jun 17, 2020

View reviewed changes

facebook-github-bot closed this in 9a3e16c Jun 24, 2020

facebook-github-bot added the merged label Jun 24, 2020

facebook-github-bot deleted the gh/rohan-varma/138/head branch June 28, 2020 14:17

This was referenced Jul 8, 2020

[1.6 cherry pick] Add guard for non-default stream in DDP's autograd engine callback #41151

Merged

[v1.6.0] Release Tracker #40472

Closed

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add guard for non-default stream in DDP's autograd engine callback #40115

Add guard for non-default stream in DDP's autograd engine callback #40115

rohan-varma commented Jun 16, 2020 •

edited

mrshenli Jun 16, 2020

rohan-varma Jun 16, 2020 •

edited

dr-ci bot commented Jun 16, 2020 •

edited

mrshenli left a comment

facebook-github-bot commented Jun 24, 2020

Add guard for non-default stream in DDP's autograd engine callback #40115

Add guard for non-default stream in DDP's autograd engine callback #40115

Conversation

rohan-varma commented Jun 16, 2020 • edited

mrshenli Jun 16, 2020

Choose a reason for hiding this comment

rohan-varma Jun 16, 2020 • edited

Choose a reason for hiding this comment

dr-ci bot commented Jun 16, 2020 • edited

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

mrshenli left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 24, 2020

rohan-varma commented Jun 16, 2020 •

edited

rohan-varma Jun 16, 2020 •

edited

dr-ci bot commented Jun 16, 2020 •

edited