Add default args for CUDA stream and events #52679

nikithamalgifb · 2021-02-23T17:26:42Z

Stack from ghstack:

Add default args for CUDA stream and events #52679 Add default args for CUDA stream and events

[ghstack-poisoned]

ghstack-source-id: e398af5 Pull Request resolved: #52679

facebook-github-bot · 2021-02-23T17:26:52Z

💊 CI failures summary and remediations

As of commit a66d5c9 (more details on the Dr. CI page):

5/5 failures possibly* introduced in this PR
- 1/5 non-scanned failure(s)

🕵️ 4 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_xla_linux_bionic_py3_6_clang9_build (1/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/generate_config_yml.py
Auto-merging .circleci/generate_config_yml.py
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/docker_definitions.py
Auto-merging .circleci/cimodel/data/simple/docker_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (2/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/generate_config_yml.py
Auto-merging .circleci/generate_config_yml.py
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/docker_definitions.py
Auto-merging .circleci/cimodel/data/simple/docker_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (3/4)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Feb 26 08:59:56 [E request_callback_no_python.cpp:656] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future

Feb 26 08:59:56 At:
Feb 26 08:59:56   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Feb 26 08:59:56   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Feb 26 08:59:56 
Feb 26 08:59:56 [E request_callback_no_python.cpp:656] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Feb 26 08:59:56 
Feb 26 08:59:56 At:
Feb 26 08:59:56   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Feb 26 08:59:56   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Feb 26 08:59:56 
Feb 26 08:59:56 [E request_callback_no_python.cpp:656] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Feb 26 08:59:56 
Feb 26 08:59:56 At:
Feb 26 08:59:56   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize
Feb 26 08:59:56   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize
Feb 26 08:59:56 
Feb 26 08:59:56 ok (1.228s)
Feb 26 08:59:57   test_return_future_remote (__main__.ProcessGroupRpcTestWithSpawn) ... RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.
Feb 26 08:59:57 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.
Feb 26 08:59:57 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.
Feb 26 08:59:57 RPC was initialized with the PROCESS_GROUP backend which is deprecated and slated to be removed and superseded by the TENSORPIPE backend. It is recommended to migrate to the TENSORPIPE backend.

pytorch_linux_xenial_py3_6_gcc5_4_build (4/4)

Step: "(Optional) Merge target branch" (full log | diagnosis details | 🔁 rerun)

Automatic merge failed; fix conflicts and then commit the result.

CONFLICT (add/add): Merge conflict in .circleci/generate_config_yml.py
Auto-merging .circleci/generate_config_yml.py
CONFLICT (add/add): Merge conflict in .circleci/config.yml
Auto-merging .circleci/config.yml
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/windows_build_definitions.py
Auto-merging .circleci/cimodel/data/windows_build_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/simple/docker_definitions.py
Auto-merging .circleci/cimodel/data/simple/docker_definitions.py
CONFLICT (add/add): Merge conflict in .circleci/cimodel/data/pytorch_build_data.py
Auto-merging .circleci/cimodel/data/pytorch_build_data.py
Automatic merge failed; fix conflicts and then commit the result.


Exited with code exit status 1

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.0.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

SplitInfinity

Just some nits

SplitInfinity · 2021-02-24T01:24:26Z

test/jit/test_cuda.py

                def forward(self):
-                    device_index = torch.cuda._current_device()
-                    s = torch.jit.cuda.Stream(device_index, 0)
+                    s = torch.jit.cuda.Stream(priority=0)


Suggested change

s = torch.jit.cuda.Stream(priority=0)

s = torch.jit.cuda.Stream()

SplitInfinity · 2021-02-24T01:24:36Z

test/jit/test_cuda.py

            device_index = torch.cuda._current_device()
            s0 = torch.cuda.current_stream(device_index)
-            s1 = torch.jit.cuda.Stream(device_index, 0)
+            s1 = torch.jit.cuda.Stream(priority=0)


Suggested change

s1 = torch.jit.cuda.Stream(priority=0)

s1 = torch.jit.cuda.Stream()

SplitInfinity · 2021-02-24T01:24:46Z

test/jit/test_cuda.py

        def test_event_synchronize() -> float:
-            device_index = torch.cuda._current_device()
-            s = torch.jit.cuda.Stream(device_index, 0)
+            s = torch.jit.cuda.Stream(priority=0)


Suggested change

s = torch.jit.cuda.Stream(priority=0)

s = torch.jit.cuda.Stream()

SplitInfinity · 2021-02-24T01:24:56Z

test/jit/test_cuda.py

        def test_stream_synchronize() -> float:
            device_index = torch.cuda._current_device()
-            s = torch.jit.cuda.Stream(device_index, 0)
+            s = torch.jit.cuda.Stream(priority=0)


Suggested change

s = torch.jit.cuda.Stream(priority=0)

s = torch.jit.cuda.Stream()

test/jit/test_cuda.py

SplitInfinity · 2021-02-24T01:37:02Z

test/jit/test_cuda.py

+        def event_default_args_1() -> bool:
+            e = torch.jit.cuda.Event(blocking=True)
+            return e is not None
+
+        @torch.jit.script
+        def event_default_args_2() -> bool:
+            e = torch.jit.cuda.Event(enable_timing=True)
+            return e is not None
+
+        @torch.jit.script
+        def event_default_args_3() -> bool:
+            e = torch.jit.cuda.Event(interprocess=True)
+            return e is not None
+
+        @torch.jit.script
+        def event_default_args_4() -> bool:
+            e = torch.jit.cuda.Event(interprocess=True, blocking=True)
+            return e is not None
+
+        @torch.jit.script
+        def event_default_args_5() -> bool:
+            e = torch.jit.cuda.Event(enable_timing=True, blocking=True)


Consider using better test names here, and testing that the arguments and default arguments are being respect. That is, the event in event_default_args_3 should not be blocking and we should be able to verify that (I think?).

Well, to verify blocking we would ideally need sleep support in JIT and since that isn't available, it becomes a lot harder to verify if blocking was truly disabled. This is the case even with interprocess. Some of the tests like event_default_args_5 they are already tested - ref (test_event_wait) and hence I didn't add a redundant test for the same.

SplitInfinity · 2021-02-24T01:38:11Z

test/jit/test_cuda.py

            device_index = torch.cuda._current_device()
            current_stream = torch.cuda.current_stream(device_index)
-            user_stream = torch.jit.cuda.Stream(device_index, 0)
+            user_stream = torch.jit.cuda.Stream(priority=0)


Suggested change

user_stream = torch.jit.cuda.Stream(priority=0)

user_stream = torch.jit.cuda.Stream()

SplitInfinity · 2021-02-24T01:38:26Z

test/jit/test_cuda.py

            current_stream = torch.cuda.current_stream(device_index)
            default_stream = torch.cuda.default_stream(device_index)
-            user_stream = torch.jit.cuda.Stream(device_index, 0)
+            user_stream = torch.jit.cuda.Stream(priority=0)


Suggested change

user_stream = torch.jit.cuda.Stream(priority=0)

user_stream = torch.jit.cuda.Stream()

SplitInfinity · 2021-02-24T01:38:34Z

test/jit/test_cuda.py

        def test_simple_stream():
            device_index = torch.cuda._current_device()
-            s = torch.jit.cuda.Stream(device_index, 0)
+            s = torch.jit.cuda.Stream(priority=0)


Suggested change

s = torch.jit.cuda.Stream(priority=0)

s = torch.jit.cuda.Stream()

ghstack-source-id: e398af5 Pull Request resolved: #52679

[ghstack-poisoned]

ghstack-source-id: 675a341 Pull Request resolved: #52679

Co-authored-by: SplitInfinity <meghanl@fb.com>

SplitInfinity

I have a few questions about the tests and how they make sure the default arguments work as expected.

As you alluded to in your comment, there isn't much we can do to really test these properties of events. Maybe it's better to have no test rather than a test that doesn't actually test anything?

SplitInfinity · 2021-02-25T20:03:48Z

test/jit/test_cuda.py

+            e.record(s0)
+            e.wait(s)
+            A = torch.rand(1000, 1000, device="cuda")
+            with torch.jit.cuda.stream(s):
+                B = torch.mm(A, A)
+            s.synchronize()
+            e.record(s0)


What does this test? I thought we would want to test that timing is disabled for this event (since that is the default argument). Would trying to call elapsed_time throw an exception that we could check for using self.assertRaisesRegex?

I tried doing this, but elapsed_time didn't throw any exception. So I tried to verify the blocking=True. Although, it is difficult to prove that the event waits on the stream s here.

SplitInfinity · 2021-02-25T20:05:28Z

test/jit/test_cuda.py

+            e_tik = torch.jit.cuda.Event(enable_timing=True)
+            e_tok = torch.jit.cuda.Event(enable_timing=True)
+            s0 = torch.cuda.current_stream(0)
+            e_tik.record(s0)
+            A = torch.rand(1000, 1000, device="cuda")
+            with torch.jit.cuda.stream(s0):
+                B = torch.mm(A, A)
+            s0.synchronize()
+            e_tok.record(s0)
+            e_tok.synchronize()
+            return e_tik.elapsed_time(e_tok)


What does this test? I thought we would want to test that these events are nonblocking (since that is the default argument).

It's difficult to prove that they are non-blocking, because of sleep not being available. So I instead verified if enable_timing=True can be verified.

SplitInfinity · 2021-02-25T21:00:30Z

test/jit/test_cuda.py

+            e = torch.jit.cuda.Event(interprocess=True)
+            s1 = torch.jit.cuda.Stream()
+            s2 = torch.jit.cuda.Stream()
+            A = torch.rand(1000, 1000, device="cuda")
+            with torch.jit.cuda.stream(s1):
+                B = torch.mm(A, A)
+                s1.record_event(e)
+
+            with torch.jit.cuda.stream(s2):
+                C = torch.mm(A, A)
+                s2.record_event(e)
+            s1.synchronize()
+            s2.synchronize()
+


Same question here - I thought we would want to test that the event is nonblocking and has timing disabled.

I couldn't verify the event is nonblocking and has timing disabled and hence in this test, it verifies if interprocess was enabled. For interprocess=True , the event can be shared between two process. So in this test, I have tried using the event to be recorded between two streams.

nikithamalgifb · 2021-02-26T05:09:03Z

I have a few questions about the tests and how they make sure the default arguments work as expected.

As you alluded to in your comment, there isn't much we can do to really test these properties of events. Maybe it's better to have no test rather than a test that doesn't actually test anything?

I agree, we can probably eliminate the tests. My only objective of introducing the tests were to make sure , even with default arguments, the event object was getting created. But I guess that doesn't add much value here. Wdyt?

SplitInfinity · 2021-02-26T05:17:46Z

Yeah, let's just remove the (new) tests.

[ghstack-poisoned]

ghstack-source-id: 60b1ee6 Pull Request resolved: #52679

nikithamalgifb · 2021-02-26T05:39:18Z

Yeah, let's just remove the (new) tests.

Done.

SplitInfinity

Make sure to address merge conflicts before merging.

nikithamalgifb · 2021-03-02T23:43:44Z

Closing this PR. This PR was merged by another #53025

Add default args for CUDA stream and events

5b109b9

[ghstack-poisoned]

facebook-github-bot added the cla signed label Feb 23, 2021

nikithamalgifb added a commit that referenced this pull request Feb 23, 2021

Add default args for CUDA stream and events

b512237

ghstack-source-id: e398af5 Pull Request resolved: #52679

facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Feb 23, 2021

nikithamalgifb requested a review from SplitInfinity February 23, 2021 17:29

SplitInfinity reviewed Feb 24, 2021

View reviewed changes

nikithamalgifb added a commit that referenced this pull request Feb 24, 2021

Add default args for CUDA stream and events

ea0725d

ghstack-source-id: e398af5 Pull Request resolved: #52679

Update on "Add default args for CUDA stream and events"

95fb99c

[ghstack-poisoned]

nikithamalgifb added a commit that referenced this pull request Feb 24, 2021

Add default args for CUDA stream and events

15d8f04

ghstack-source-id: 675a341 Pull Request resolved: #52679

Update test/jit/test_cuda.py

20f8c64

Co-authored-by: SplitInfinity <meghanl@fb.com>

SplitInfinity reviewed Feb 25, 2021

View reviewed changes

Update on "Add default args for CUDA stream and events"

a66d5c9

[ghstack-poisoned]

nikithamalgifb added a commit that referenced this pull request Feb 26, 2021

Add default args for CUDA stream and events

d7287e1

ghstack-source-id: 60b1ee6 Pull Request resolved: #52679

nikithamalgifb requested a review from SplitInfinity February 26, 2021 16:41

SplitInfinity approved these changes Feb 26, 2021

View reviewed changes

nikithamalgifb mentioned this pull request Mar 1, 2021

Add default arguments to cuda stream and events #53025

Closed

nikithamalgifb closed this Mar 2, 2021

facebook-github-bot deleted the gh/nikithamalgifb/8/head branch April 2, 2021 14:17

	s = torch.jit.cuda.Stream(priority=0)
	s = torch.jit.cuda.Stream()

	s1 = torch.jit.cuda.Stream(priority=0)
	s1 = torch.jit.cuda.Stream()

	user_stream = torch.jit.cuda.Stream(priority=0)
	user_stream = torch.jit.cuda.Stream()

Add default args for CUDA stream and events #52679

Add default args for CUDA stream and events #52679

Uh oh!

Conversation

nikithamalgifb commented Feb 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Feb 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 4 new failures recognized by patterns

pytorch_xla_linux_bionic_py3_6_clang9_build (1/4)

pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc5_4_build (2/4)

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (3/4)

pytorch_linux_xenial_py3_6_gcc5_4_build (4/4)

ci.pytorch.org: 1 failed

Uh oh!

SplitInfinity left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SplitInfinity left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikithamalgifb Feb 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nikithamalgifb commented Feb 26, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SplitInfinity commented Feb 26, 2021

Uh oh!

nikithamalgifb commented Feb 26, 2021

Uh oh!

SplitInfinity left a comment

Choose a reason for hiding this comment

Uh oh!

nikithamalgifb commented Mar 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nikithamalgifb commented Feb 23, 2021 •

edited

Loading

facebook-github-bot commented Feb 23, 2021 •

edited

Loading

nikithamalgifb Feb 26, 2021 •

edited

Loading

nikithamalgifb commented Feb 26, 2021 •

edited

Loading