Fix CUDA sync when switching streams in RPC tests #59297

lw · 2021-06-02T13:00:57Z

Stack from ghstack:

Move CUDA-related stuff of TP agent to separate file #59377 Move CUDA-related stuff of TP agent to separate file
Make CUDA serde support for TP agent pluggable #59376 Make CUDA serde support for TP agent pluggable
Merge TensorPipe's CPU and CUDA channel registry #59375 Merge TensorPipe's CPU and CUDA channel registry
Remove LazyStreamContext (2 out of 2) #59299 Remove LazyStreamContext (2 out of 2)
Remove LazyStreamContext (1 out of 2) #59298 Remove LazyStreamContext (1 out of 2)
Fix CUDA sync when switching streams in RPC tests #59297 Fix CUDA sync when switching streams in RPC tests
Fix race condition in TP agent #58753 Fix race condition in TP agent
Ensure async_execution works with CUDAFuture #56863 Ensure async_execution works with CUDAFuture
Avoid re-doing CUDA stream sync in OwnerRRef #57355 Avoid re-doing CUDA stream sync in OwnerRRef

PyTorch requires users to manually record tensors with the CUDA caching allocator when switching streams. We weren't doing it.

Also, the usage of an Event can be simplified by using s1.wait(s2).

Differential Revision: D28832902

PyTorch requires users to manually record tensors with the CUDA caching allocator when switching streams. We weren't doing it. Also, the usage of an Event can be simplified by using `s1.wait(s2)`. Differential Revision: [D28832902](https://our.internmc.facebook.com/intern/diff/D28832902/) [ghstack-poisoned]

facebook-github-bot · 2021-06-02T13:01:03Z

💊 CI failures summary and remediations

As of commit 2dd6e82 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-scanned failure(s)

1 failure not recognized by patterns:

Job	Step	Action
^{pytorch_linux_bionic_py3_8_gcc9_coverage_test1}	^{Run tests}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

PyTorch requires users to manually record tensors with the CUDA caching allocator when switching streams. We weren't doing it. Also, the usage of an Event can be simplified by using `s1.wait(s2)`. Differential Revision: [D28832902](https://our.internmc.facebook.com/intern/diff/D28832902/) [ghstack-poisoned]

mrshenli

LGTM!

PyTorch requires users to manually record tensors with the CUDA caching allocator when switching streams. We weren't doing it. Also, the usage of an Event can be simplified by using `s1.wait(s2)`. Differential Revision: [D28832902](https://our.internmc.facebook.com/intern/diff/D28832902/) [ghstack-poisoned]

facebook-github-bot · 2021-06-04T13:54:19Z

This pull request has been merged in 3e7396f.

Summary: Pull Request resolved: pytorch#59297 PyTorch requires users to manually record tensors with the CUDA caching allocator when switching streams. We weren't doing it. Also, the usage of an Event can be simplified by using `s1.wait(s2)`. ghstack-source-id: 130583777 Test Plan: CI Reviewed By: mrshenli Differential Revision: D28832902 fbshipit-source-id: cd4f40ff811fa1b0042deedda2456e22f33b92bd

lw requested review from H-Huang, mrshenli, pritamdamania87, rohan-varma, wayi1 and zhaojuanmao as code owners June 2, 2021 13:00

facebook-github-bot added oncall: distributed Add this issue/PR to distributed oncall triage queue cla signed labels Jun 2, 2021

lw added 2 commits June 3, 2021 02:25

mrshenli approved these changes Jun 3, 2021

View reviewed changes

This was referenced Jun 3, 2021

Merge TensorPipe's CPU and CUDA channel registry #59375

Closed

Make CUDA serde support for TP agent pluggable #59376

Closed

Move CUDA-related stuff of TP agent to separate file #59377

Closed

lw added 4 commits June 3, 2021 08:04

facebook-github-bot closed this in 3e7396f Jun 4, 2021

facebook-github-bot added the Merged label Jun 4, 2021

facebook-github-bot deleted the gh/lw/202/head branch June 7, 2021 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix CUDA sync when switching streams in RPC tests #59297

Fix CUDA sync when switching streams in RPC tests #59297

Uh oh!

lw commented Jun 2, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Jun 2, 2021 •

edited

Loading

Uh oh!

mrshenli left a comment

Uh oh!

facebook-github-bot commented Jun 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix CUDA sync when switching streams in RPC tests #59297

Fix CUDA sync when switching streams in RPC tests #59297

Uh oh!

Conversation

lw commented Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jun 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

1 failure not recognized by patterns:

Uh oh!

mrshenli left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 4, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

lw commented Jun 2, 2021 •

edited

Loading

facebook-github-bot commented Jun 2, 2021 •

edited

Loading