[RPC Framework] Forbid process group backend in RPC #56854

wayi1 · 2021-04-24T06:16:39Z

Stack from ghstack:

[RPC Framework] Forbid process group backend in RPC #56854 [RPC Framework] Forbid process group backend in RPC
Clang format dist_utils.py and rpc/__init__.py #56853 Clang format dist_utils.py and rpc/init.py

To resolve #51670, forbid process group PRC backend. This can avoid the need of checking the current backend in the torch script remote_module_template.

Otherwise, we need to check the current RPC backend to determine whether we want to move the forward output back to CPU.

If the RPC backend is process group, then move the forward output on CUDA back to CPU, because process group backend cannot support CUDA backend.
If the RPC backend is TensorPipe, then we don't need to move the forward output on CUDA, as long as a device map is provided.

Differential Revision: D27984658

To resolve #51670, forbid process group PRC backend. This can avoid the need of checking the current backend in the torch script `remote_module_template`. Differential Revision: [D27984658](https://our.internmc.facebook.com/intern/diff/D27984658/) [ghstack-poisoned]

facebook-github-bot · 2021-04-24T06:16:44Z

💊 CI failures summary and remediations

As of commit 36f1a3f (more details on the Dr. CI page):

7/7 failures possibly* introduced in this PR
- 1/7 non-scanned failure(s)

🕵️ 6 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

mypy (1/6)

Step: "Run mypy" (full log | diagnosis details | 🔁 rerun)

2021-04-24T06:22:11.8312914Z torch/testing/_int...pe]" has no attribute "TENSORPIPE" [attr-defined]

2021-04-24T06:21:01.8062288Z env:
2021-04-24T06:21:01.8062867Z   pythonLocation: /opt/hostedtoolcache/Python/3.8.9/x64
2021-04-24T06:21:01.8063658Z   LD_LIBRARY_PATH: /opt/hostedtoolcache/Python/3.8.9/x64/lib
2021-04-24T06:21:01.8064227Z   LOCAL_FILES: 
2021-04-24T06:21:01.8064649Z ##[endgroup]
2021-04-24T06:21:01.8151644Z + for CONFIG in mypy*.ini
2021-04-24T06:21:01.8153617Z + mypy --config=mypy-strict.ini
2021-04-24T06:21:21.6558103Z Success: no issues found in 77 source files
2021-04-24T06:21:22.3732526Z + for CONFIG in mypy*.ini
2021-04-24T06:21:22.3733605Z + mypy --config=mypy.ini
2021-04-24T06:22:11.8312914Z torch/testing/_internal/dist_utils.py:94: error: "Type[BackendType]" has no attribute "TENSORPIPE"  [attr-defined]
2021-04-24T06:22:42.3090191Z Found 1 error in 1 file (checked 1316 source files)
2021-04-24T06:22:43.5308930Z ##[error]Process completed with exit code 1.
2021-04-24T06:22:43.5413992Z Post job cleanup.
2021-04-24T06:22:43.6538305Z [command]/usr/bin/git version
2021-04-24T06:22:43.6598033Z git version 2.31.1
2021-04-24T06:22:43.6644014Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand
2021-04-24T06:22:43.6688439Z [command]/usr/bin/git submodule foreach --recursive git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :
2021-04-24T06:22:43.6979564Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader
2021-04-24T06:22:43.7023956Z http.https://github.com/.extraheader
2021-04-24T06:22:43.7033465Z [command]/usr/bin/git config --local --unset-all http.https://github.com/.extraheader

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 (2/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 24 08:22:37 AssertionError: "RPC backend on...ation worker3, but found tensor on device: cuda:0"

Apr 24 08:22:37   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 93, in wrapper
Apr 24 08:22:37     return func(*args, **kwargs)
Apr 24 08:22:37   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/dist_utils.py", line 100, in new_test_method
Apr 24 08:22:37     return_value = old_test_method(self, *arg, **kwargs)
Apr 24 08:22:37   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 4302, in test_cuda
Apr 24 08:22:37     rpc.rpc_sync(dst, torch.add, args=(t1, t2))
Apr 24 08:22:37   File "/opt/conda/lib/python3.6/unittest/case.py", line 217, in __exit__
Apr 24 08:22:37     expected_regex.pattern, str(exc_value)))
Apr 24 08:22:37   File "/opt/conda/lib/python3.6/unittest/case.py", line 135, in _raiseFailure
Apr 24 08:22:37     raise self.test_case.failureException(msg)
Apr 24 08:22:37 AssertionError: "RPC backend only supports CPU tensors.*Found tensor on device: cuda:0" does not match "TensorPipe RPC backend only supports CPU tensors by default, please move your tensors to CPU before sending them over RPC, or call `set_device_map` on `TensorPipeRpcBackendOptions` to explicitly configure device mapping. Request device mapping is not available for destination worker3, but found tensor on device: cuda:0"
Apr 24 08:22:37 
Apr 24 08:22:37 
Apr 24 08:22:37 
Apr 24 08:22:37 ----------------------------------------------------------------------
Apr 24 08:22:37 Ran 16 tests in 35.574s
Apr 24 08:22:37 
Apr 24 08:22:37 FAILED (errors=1, skipped=9)
Apr 24 08:22:37 
Apr 24 08:22:37 Generating XML reports...
Apr 24 08:22:37 Generated XML report: test-reports/dist-gloo/distributed.rpc.cuda.test_process_group_agent/TEST-ProcessGroupCudaDistAutogradTestWithSpawn-20210424082201.xml

pytorch_macos_10_13_py3_test (3/6)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

Apr 24 07:22:37 ERROR [1.932s]: test_timeout_in...on (__main__.FaultyJitFaultyAgentRpcTestWithSpawn)

Apr 24 07:22:37     fn()
Apr 24 07:22:37   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/dist_utils.py", line 97, in new_test_method
Apr 24 07:22:37     rpc_backend_options=rpc.TensorPipeRpcBackendOptions(init_method=self.init_method),
Apr 24 07:22:37   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/distributed/rpc/__init__.py", line 120, in init_rpc
Apr 24 07:22:37     raise TypeError("Argument backend must be a member of BackendType")
Apr 24 07:22:37 TypeError: Argument backend must be a member of BackendType
Apr 24 07:22:37 
Apr 24 07:22:37 
Apr 24 07:22:37 
Apr 24 07:22:37 ======================================================================
Apr 24 07:22:37 ERROR [1.932s]: test_timeout_in_torchscript_function (__main__.FaultyJitFaultyAgentRpcTestWithSpawn)
Apr 24 07:22:37 ----------------------------------------------------------------------
Apr 24 07:22:37 Traceback (most recent call last):
Apr 24 07:22:37   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 374, in wrapper
Apr 24 07:22:37     self._join_processes(fn)
Apr 24 07:22:37   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 567, in _join_processes
Apr 24 07:22:37     self._check_return_codes(elapsed_time)
Apr 24 07:22:37   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 610, in _check_return_codes
Apr 24 07:22:37     raise RuntimeError(error)
Apr 24 07:22:37 RuntimeError: Process 0 exited with error code 10 and exception:
Apr 24 07:22:37 Traceback (most recent call last):

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (4/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 24 08:10:49 AssertionError: 'agent.num_pend...us': '0.000000', 'agent.client_active_calls': '0'}

Apr 24 08:10:49   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 376, in wrapper
Apr 24 08:10:49     fn()
Apr 24 08:10:49   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/dist_utils.py", line 100, in new_test_method
Apr 24 08:10:49     return_value = old_test_method(self, *arg, **kwargs)
Apr 24 08:10:49   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/distributed/rpc/rpc_test.py", line 4175, in test_process_group_debug_info
Apr 24 08:10:49     self.assertIn("agent.num_pending_requests", info)
Apr 24 08:10:49   File "/opt/conda/lib/python3.8/unittest/case.py", line 1179, in assertIn
Apr 24 08:10:49     self.fail(self._formatMessage(msg, standardMsg))
Apr 24 08:10:49   File "/opt/conda/lib/python3.8/unittest/case.py", line 753, in fail
Apr 24 08:10:49     raise self.failureException(msg)
Apr 24 08:10:49 AssertionError: 'agent.num_pending_requests' not found in {'agent.num_idle_threads': '16', 'agent.server_active_calls': '0', 'agent.thread_pool_size': '16', 'agent.server_active_async_calls': '0', 'agent.gil_average_wait_time_us': '0.000000', 'agent.client_active_calls': '0'}
Apr 24 08:10:49 
Apr 24 08:10:49 
Apr 24 08:10:49 
Apr 24 08:10:49 ======================================================================
Apr 24 08:10:49 ERROR [2.964s]: test_process_group_set_default_timeout (__main__.ProcessGroupProcessGroupAgentRpcTestWithSpawn)
Apr 24 08:10:49 ----------------------------------------------------------------------
Apr 24 08:10:49 Traceback (most recent call last):
Apr 24 08:10:49   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 374, in wrapper
Apr 24 08:10:49     self._join_processes(fn)
Apr 24 08:10:49   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 567, in _join_processes

pytorch_linux_bionic_py3_6_clang9_noarch_test (5/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 24 07:10:27 ERROR [1.836s]: test_timeout_in...on (__main__.FaultyJitFaultyAgentRpcTestWithSpawn)

Apr 24 07:10:27     fn()
Apr 24 07:10:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/dist_utils.py", line 97, in new_test_method
Apr 24 07:10:27     rpc_backend_options=rpc.TensorPipeRpcBackendOptions(init_method=self.init_method),
Apr 24 07:10:27   File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/__init__.py", line 120, in init_rpc
Apr 24 07:10:27     raise TypeError("Argument backend must be a member of BackendType")
Apr 24 07:10:27 TypeError: Argument backend must be a member of BackendType
Apr 24 07:10:27 
Apr 24 07:10:27 
Apr 24 07:10:27 
Apr 24 07:10:27 ======================================================================
Apr 24 07:10:27 ERROR [1.836s]: test_timeout_in_torchscript_function (__main__.FaultyJitFaultyAgentRpcTestWithSpawn)
Apr 24 07:10:27 ----------------------------------------------------------------------
Apr 24 07:10:27 Traceback (most recent call last):
Apr 24 07:10:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 374, in wrapper
Apr 24 07:10:27     self._join_processes(fn)
Apr 24 07:10:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 567, in _join_processes
Apr 24 07:10:27     self._check_return_codes(elapsed_time)
Apr 24 07:10:27   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 610, in _check_return_codes
Apr 24 07:10:27     raise RuntimeError(error)
Apr 24 07:10:27 RuntimeError: Process 0 exited with error code 10 and exception:
Apr 24 07:10:27 Traceback (most recent call last):

pytorch_linux_xenial_py3_6_gcc5_4_test (6/6)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Apr 24 08:00:03 ERROR [2.149s]: test_timeout_in...on (__main__.FaultyJitFaultyAgentRpcTestWithSpawn)

Apr 24 08:00:03     fn()
Apr 24 08:00:03   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/dist_utils.py", line 97, in new_test_method
Apr 24 08:00:03     rpc_backend_options=rpc.TensorPipeRpcBackendOptions(init_method=self.init_method),
Apr 24 08:00:03   File "/opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/__init__.py", line 120, in init_rpc
Apr 24 08:00:03     raise TypeError("Argument backend must be a member of BackendType")
Apr 24 08:00:03 TypeError: Argument backend must be a member of BackendType
Apr 24 08:00:03 
Apr 24 08:00:03 
Apr 24 08:00:03 
Apr 24 08:00:03 ======================================================================
Apr 24 08:00:03 ERROR [2.149s]: test_timeout_in_torchscript_function (__main__.FaultyJitFaultyAgentRpcTestWithSpawn)
Apr 24 08:00:03 ----------------------------------------------------------------------
Apr 24 08:00:03 Traceback (most recent call last):
Apr 24 08:00:03   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 374, in wrapper
Apr 24 08:00:03     self._join_processes(fn)
Apr 24 08:00:03   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 567, in _join_processes
Apr 24 08:00:03     self._check_return_codes(elapsed_time)
Apr 24 08:00:03   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 610, in _check_return_codes
Apr 24 08:00:03     raise RuntimeError(error)
Apr 24 08:00:03 RuntimeError: Process 0 exited with error code 10 and exception:
Apr 24 08:00:03 Traceback (most recent call last):

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

To resolve #51670, forbid process group PRC backend. This can avoid the need of checking the current backend in the torch script `remote_module_template`. Differential Revision: [D27984658](https://our.internmc.facebook.com/intern/diff/D27984658/) [ghstack-poisoned]

Pull Request resolved: #56854 To resolve #51670, forbid process group PRC backend. This can avoid the need of checking the current backend in the torch script `remote_module_template`. ghstack-source-id: 127345847 Differential Revision: [D27984658](https://our.internmc.facebook.com/intern/diff/D27984658/)

rohan-varma

I was under the impression that in 1.9, PG backend will stay in a deprecated state and be the last release where it exists, as per #55616. So afaik we probably want to keep supporting it with RPC framework for now? cc @mrshenli

mrshenli · 2021-04-26T17:38:21Z

I was under the impression that in 1.9, PG backend will stay in a deprecated state and be the last release where it exists, as per #55616. So afaik we probably want to keep supporting it with RPC framework for now?

Yep the plan is to keep PG RPC backend in v1.9.

If the goal is to make RemoteModule only work with TensorPipe backend, I think it is fine, as RemoteModule is still beta release and we have the flexibility to make changes. However, that means the check needs to be done in RemoteModule, instead of disabling PG backend for RPC.

wayi1 · 2021-04-26T17:42:32Z

I was under the impression that in 1.9, PG backend will stay in a deprecated state and be the last release where it exists, as per #55616. So afaik we probably want to keep supporting it with RPC framework for now?

Yep the plan is to keep PG RPC backend in v1.9.

If the goal is to make RemoteModule only work with TensorPipe backend, I think it is fine, as RemoteModule is still beta release and we have the flexibility to make changes. However, that means the check needs to be done in RemoteModule, instead of disabling PG backend for RPC.

Yes, I understand that this PR is not completely deprecating PG backend for RPC. This is just the minimum work to support a follow-up PR. Plan to discuss with @lw and create a separate PR to complete remove PG backend.

mrshenli · 2021-04-26T17:53:18Z

Yes, I understand that this PR is not completely deprecating PG backend for RPC. This is just the minimum work to support a follow-up PR.

I might miss sth. IIUC, this PR will throw a ValueError when the RPC backend is PG in init_rpc. So, from user's perspective, we are completely deprecating PG backend for RPC?

If the RPC backend is process group, then move the forward output on CUDA back to CPU, because process group backend cannot support CUDA backend.

If this is what you would like to avoid, is it acceptable to let RemoteModule error out when the backend is PG?

mrshenli · 2021-04-26T17:55:11Z

If the RPC backend is TensorPipe, then we don't need to move the forward output on CUDA, as long as a device map is provided.

Curious, how does RemoteModule know whether the device map is provided? Read from the RPC agent state?

wayi1 · 2021-04-26T19:17:03Z

Yes, I understand that this PR is not completely deprecating PG backend for RPC. This is just the minimum work to support a follow-up PR.

I might miss sth. IIUC, this PR will throw a ValueError when the RPC backend is PG in init_rpc. So, from user's perspective, we are completely deprecating PG backend for RPC?

If the RPC backend is process group, then move the forward output on CUDA back to CPU, because process group backend cannot support CUDA backend.

If this is what you would like to avoid, is it acceptable to let RemoteModule error out when the backend is PG?

Yes, per discussion with @pritamdamania87, we plan to error out PG backend on RemoteModule.

lw · 2021-04-26T19:19:10Z

I agree with @mrshenli and @rohan-varma, we should not forbid the PG backend in 1.9. If some component requires the RPC agent to support CUDA tensors, we can add runtime checks for this in that component. If it helps, I think we could consider adding a supportsCuda() method to the RPC agent API to help with this kind of checks.

wayi1 · 2021-04-26T19:59:59Z

I agree with @mrshenli and @rohan-varma, we should not forbid the PG backend in 1.9. If some component requires the RPC agent to support CUDA tensors, we can add runtime checks for this in that component. If it helps, I think we could consider adding a supportsCuda() method to the RPC agent API to help with this kind of checks.

Thanks for the suggestion! Not sure if it's worthwhile adding a new method here temporarily for only version 1.9. If we don't want to forbid PG backend before 1.9, we can probably bypass this issue at this time.

I created #56943 that always moves forward output back to CPU for now, and added some TODOs to fix after 1.9.

wayi1 · 2021-04-26T23:47:13Z

If the RPC backend is TensorPipe, then we don't need to move the forward output on CUDA, as long as a device map is provided.

Curious, how does RemoteModule know whether the device map is provided? Read from the RPC agent state?

Same thought, but none of the getters in RpcAgent class can provide a device map at this time.

@lw: could you provide a get_device_map() method in RpcAgent class?

lw · 2021-04-27T08:25:30Z

@lw: could you provide a get_device_map() method in RpcAgent class?

@mrshenli @pritamdamania87 Doesn't dist autograd also need to retrieve the device maps from the RPC agent in order to store the for use in the backwards pass? How does it do so? If there's already some code to do that, we should reuse it. If not, I'm fine with adding that method.

mrshenli · 2021-04-27T14:37:14Z

Yes, per discussion with @pritamdamania87, we plan to error out PG backend on RemoteModule.

This will break the promise we made in: #55615

wayi1 · 2021-04-27T16:58:50Z

Yes, per discussion with @pritamdamania87, we plan to error out PG backend on RemoteModule.

This will break the promise we made in: #55615

@mrshenli Thanks for referring to this issue! I don't think there is a need to forbid PG backend at this time.

Now I realize that I don't really need to check the PRC backend in RemoteModule. Since TensorPipe backend can only support sending GPU tensors over the wire with a device map, what I only need to check in RemoteModule is whether a device map is set.

We can close this PR once @lw's comment #56854 (comment) is addressed.

pritamdamania87 · 2021-04-27T18:59:44Z

@mrshenli @pritamdamania87 Doesn't dist autograd also need to retrieve the device maps from the RPC agent in order to store the for use in the backwards pass? How does it do so? If there's already some code to do that, we should reuse it. If not, I'm fine with adding that method.

Yes, I added support for this in #44859.

wayi1 · 2021-04-27T19:41:27Z

@mrshenli @pritamdamania87 Doesn't dist autograd also need to retrieve the device maps from the RPC agent in order to store the for use in the backwards pass? How does it do so? If there's already some code to do that, we should reuse it. If not, I'm fine with adding that method.

Yes, I added support for this in #44859.

Thanks for the pointer!

@lw So it seems that all we need now is just exposing a Python API for C++ getDeviceMap.

Expose a Python API to get the device map and unblock RemoteModule work. See: #56854 (comment) Additionally, add a const decorator for the C++ getter. Differential Revision: [D28070160](https://our.internmc.facebook.com/intern/diff/D28070160/) [ghstack-poisoned]

Expose a Python API to get the device map and unblock RemoteModule work. See: #56854 (comment) Additionally, add a const decorator for the C++ getter. #Original PR issue: #51670 Differential Revision: [D28070160](https://our.internmc.facebook.com/intern/diff/D28070160/) [ghstack-poisoned]

Pull Request resolved: #57179 Expose a Python API to get the device map and unblock RemoteModule work. See: #56854 (comment) Additionally, add a const decorator for the C++ getter. #Original PR issue: #51670 ghstack-source-id: 127684266 Differential Revision: [D28070160](https://our.internmc.facebook.com/intern/diff/D28070160/)

Summary: Pull Request resolved: #57179 Expose a Python API to get the device map and unblock RemoteModule work. See: #56854 (comment) Additionally, add a const decorator for the C++ getter. #Original PR issue: #51670 ghstack-source-id: 127684266 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D28070160 fbshipit-source-id: 624d14552d82b99487f72e16428fa75c7a47f61f

wayi1 · 2021-05-01T09:37:26Z

Abandon this PR.

No need to forbid Process Group RPC backend. Whether to allow sending the GPU tensors over the wire is now is determined by whether a device map on the remote worker is set or not.

) Summary: Pull Request resolved: pytorch#57179 Expose a Python API to get the device map and unblock RemoteModule work. See: pytorch#56854 (comment) Additionally, add a const decorator for the C++ getter. #Original PR issue: pytorch#51670 ghstack-source-id: 127684266 Test Plan: waitforbuildbot Reviewed By: mrshenli Differential Revision: D28070160 fbshipit-source-id: 624d14552d82b99487f72e16428fa75c7a47f61f

wayi1 requested review from H-Huang, mingzhe09088, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners April 24, 2021 06:16

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Apr 24, 2021

wayi1 mentioned this pull request Apr 24, 2021

Clang format dist_utils.py and rpc/__init__.py #56853

Closed

rohan-varma reviewed Apr 26, 2021

View reviewed changes

wayi1 requested a review from lw April 26, 2021 17:29

wayi1 changed the title ~~[RPC Framework] Forbid process group backend in RPC~~ [DO NOT LAND BEFORE 1.9] [RPC Framework] Forbid process group backend in RPC Apr 26, 2021

wayi1 mentioned this pull request Apr 28, 2021

[RPC Framework] Expose a Python API for device map getter #57179

Closed

wayi1 closed this May 1, 2021

wayi1 changed the title ~~[DO NOT LAND BEFORE 1.9] [RPC Framework] Forbid process group backend in RPC~~ [RPC Framework] Forbid process group backend in RPC May 1, 2021

facebook-github-bot deleted the gh/SciPioneer/114/head branch May 31, 2021 14:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RPC Framework] Forbid process group backend in RPC #56854

[RPC Framework] Forbid process group backend in RPC #56854

wayi1 commented Apr 24, 2021 •

edited

facebook-github-bot commented Apr 24, 2021 •

edited

rohan-varma left a comment

mrshenli commented Apr 26, 2021

wayi1 commented Apr 26, 2021 •

edited

mrshenli commented Apr 26, 2021 •

edited

mrshenli commented Apr 26, 2021

wayi1 commented Apr 26, 2021

lw commented Apr 26, 2021

wayi1 commented Apr 26, 2021 •

edited

wayi1 commented Apr 26, 2021 •

edited

lw commented Apr 27, 2021

mrshenli commented Apr 27, 2021

wayi1 commented Apr 27, 2021

pritamdamania87 commented Apr 27, 2021

wayi1 commented Apr 27, 2021

wayi1 commented May 1, 2021

[RPC Framework] Forbid process group backend in RPC #56854

[RPC Framework] Forbid process group backend in RPC #56854

Conversation

wayi1 commented Apr 24, 2021 • edited

facebook-github-bot commented Apr 24, 2021 • edited

💊 CI failures summary and remediations

🕵️ 6 new failures recognized by patterns

mypy (1/6)

pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 (2/6)

pytorch_macos_10_13_py3_test (3/6)

pytorch_linux_bionic_py3_8_gcc9_coverage_test2 (4/6)

pytorch_linux_bionic_py3_6_clang9_noarch_test (5/6)

pytorch_linux_xenial_py3_6_gcc5_4_test (6/6)

rohan-varma left a comment

Choose a reason for hiding this comment

mrshenli commented Apr 26, 2021

wayi1 commented Apr 26, 2021 • edited

mrshenli commented Apr 26, 2021 • edited

mrshenli commented Apr 26, 2021

wayi1 commented Apr 26, 2021

lw commented Apr 26, 2021

wayi1 commented Apr 26, 2021 • edited

wayi1 commented Apr 26, 2021 • edited

lw commented Apr 27, 2021

mrshenli commented Apr 27, 2021

wayi1 commented Apr 27, 2021

pritamdamania87 commented Apr 27, 2021

wayi1 commented Apr 27, 2021

wayi1 commented May 1, 2021

wayi1 commented Apr 24, 2021 •

edited

facebook-github-bot commented Apr 24, 2021 •

edited

wayi1 commented Apr 26, 2021 •

edited

mrshenli commented Apr 26, 2021 •

edited

wayi1 commented Apr 26, 2021 •

edited

wayi1 commented Apr 26, 2021 •

edited