Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[wip][ci-all] fix processgroupnccl profiling #48664

Closed
wants to merge 11 commits into from

Conversation

rohan-varma
Copy link
Member

Differential Revision: D25250227

[ghstack-poisoned]

Fixes #{issue number}

@dr-ci
Copy link

dr-ci bot commented Dec 1, 2020

💊 CI failures summary and remediations

As of commit bfc4f77 (more details on the Dr. CI page):



🕵️ 5 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 07 23:16:34 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 07 23:16:34 At: 
Dec 07 23:16:34   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:16:34   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:16:34  
Dec 07 23:16:34 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 07 23:16:34  
Dec 07 23:16:34 At: 
Dec 07 23:16:34   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:16:34   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:16:34  
Dec 07 23:16:34 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 07 23:16:34  
Dec 07 23:16:34 At: 
Dec 07 23:16:34   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:16:34   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:16:34  
Dec 07 23:16:34 [W tensorpipe_agent.cpp:547] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:16:35 ok (1.636s) 
Dec 07 23:16:36   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:16:36 [W tensorpipe_agent.cpp:547] RPC agent for worker3 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:16:36 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown) 

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX_test (2/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 07 23:44:58 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 07 23:44:58 At: 
Dec 07 23:44:58   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:44:58   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:44:58  
Dec 07 23:44:58 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 07 23:44:58  
Dec 07 23:44:58 At: 
Dec 07 23:44:58   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:44:58   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:44:58  
Dec 07 23:44:58 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 07 23:44:58  
Dec 07 23:44:58 At: 
Dec 07 23:44:58   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:44:58   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:44:58  
Dec 07 23:44:58 [W tensorpipe_agent.cpp:547] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:44:58 [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:44:58 /opt/conda/lib/python3.6/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /var/lib/jenkins/workspace/c10/cuda/CUDAFunctions.cpp:104.) 
Dec 07 23:44:58   return torch._C._cuda_getDeviceCount() > 0 
Dec 07 23:44:58 [W tensorpipe_agent.cpp:547] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 

See CircleCI build pytorch_parallelnative_linux_xenial_py3_6_gcc5_4_test (3/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 07 23:20:32 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 07 23:20:32 At: 
Dec 07 23:20:32   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:20:32   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:20:32  
Dec 07 23:20:32 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 07 23:20:32  
Dec 07 23:20:32 At: 
Dec 07 23:20:32   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:20:32   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:20:32  
Dec 07 23:20:32 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 07 23:20:32  
Dec 07 23:20:32 At: 
Dec 07 23:20:32   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:20:32   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:20:32  
Dec 07 23:20:32 [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:20:32 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:20:32 ok (1.836s) 
Dec 07 23:20:34   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:20:34 ok (1.836s) 

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_nogpu_NO_AVX2_test (4/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 07 23:42:07 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 07 23:42:07 At: 
Dec 07 23:42:07   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:42:07   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:42:07  
Dec 07 23:42:07 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 07 23:42:07  
Dec 07 23:42:07 At: 
Dec 07 23:42:07   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:42:07   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:42:07  
Dec 07 23:42:07 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 07 23:42:07  
Dec 07 23:42:07 At: 
Dec 07 23:42:07   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:42:07   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:42:07  
Dec 07 23:42:07 [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:42:07 [W tensorpipe_agent.cpp:547] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:42:07 /opt/conda/lib/python3.6/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /var/lib/jenkins/workspace/c10/cuda/CUDAFunctions.cpp:104.) 
Dec 07 23:42:07   return torch._C._cuda_getDeviceCount() > 0 
Dec 07 23:42:07 /opt/conda/lib/python3.6/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /var/lib/jenkins/workspace/c10/cuda/CUDAFunctions.cpp:104.) 

See CircleCI build pytorch_paralleltbb_linux_xenial_py3_6_gcc5_4_test (5/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Dec 07 23:28:26 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Dec 07 23:28:26 At: 
Dec 07 23:28:26   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:28:26   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:28:26  
Dec 07 23:28:26 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 07 23:28:26  
Dec 07 23:28:26 At: 
Dec 07 23:28:26   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:28:26   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:28:26  
Dec 07 23:28:26 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future 
Dec 07 23:28:26  
Dec 07 23:28:26 At: 
Dec 07 23:28:26   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(120): serialize 
Dec 07 23:28:26   /opt/conda/lib/python3.6/site-packages/torch/distributed/rpc/internal.py(172): serialize 
Dec 07 23:28:26  
Dec 07 23:28:26 [W tensorpipe_agent.cpp:547] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:28:26 [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:28:26 ok (1.735s) 
Dec 07 23:28:28   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown) 
Dec 07 23:28:28 [W tensorpipe_agent.cpp:547] RPC agent for worker0 encountered error when reading incoming request from worker1: EOF: end of file (this is expected to happen during shutdown) 

1 job timed out:

  • pytorch_linux_xenial_cuda10_2_cudnn7_py3_multigpu_test

❄️ 5 failures tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (1/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Dec 07 23:36:02 RuntimeError: Process 0 terminated or timed out after 500.07820081710815 seconds
Dec 07 23:36:02 ====================================================================== 
Dec 07 23:36:02 ERROR [500.098s]: test_grad_layout_1devicemodule_1replicaperprocess (__main__.DistributedDataParallelTest) 
Dec 07 23:36:02 ---------------------------------------------------------------------- 
Dec 07 23:36:02 Traceback (most recent call last): 
Dec 07 23:36:02   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 278, in wrapper 
Dec 07 23:36:02     self._join_processes(fn) 
Dec 07 23:36:02   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 395, in _join_processes 
Dec 07 23:36:02     self._check_return_codes(elapsed_time) 
Dec 07 23:36:02   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 436, in _check_return_codes 
Dec 07 23:36:02     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time)) 
Dec 07 23:36:02 RuntimeError: Process 0 terminated or timed out after 500.07820081710815 seconds 
Dec 07 23:36:02  
Dec 07 23:36:02 ====================================================================== 
Dec 07 23:36:02 FAIL [5.002s]: test_default_store_timeout_nccl (__main__.TimeoutTest) 
Dec 07 23:36:02 ---------------------------------------------------------------------- 
Dec 07 23:36:02 Traceback (most recent call last): 
Dec 07 23:36:02   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_utils.py", line 1404, in wrapper 
Dec 07 23:36:02     return func(*args, **kwargs) 
Dec 07 23:36:02   File "distributed/test_c10d.py", line 635, in test_default_store_timeout_nccl 
Dec 07 23:36:02     self._test_default_store_timeout("nccl") 
Dec 07 23:36:02   File "distributed/test_c10d.py", line 620, in _test_default_store_timeout 

See CircleCI build pytorch_linux_xenial_cuda11_1_cudnn8_py3_gcc7_test (2/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Dec 08 01:32:19 RuntimeError: Process 0 terminated or timed out after 400.03903794288635 seconds
Dec 08 01:32:19 ====================================================================== 
Dec 08 01:32:19 ERROR [400.065s]: test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) 
Dec 08 01:32:19 ---------------------------------------------------------------------- 
Dec 08 01:32:19 Traceback (most recent call last): 
Dec 08 01:32:19   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 278, in wrapper 
Dec 08 01:32:19     self._join_processes(fn) 
Dec 08 01:32:19   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 395, in _join_processes 
Dec 08 01:32:19     self._check_return_codes(elapsed_time) 
Dec 08 01:32:19   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 436, in _check_return_codes 
Dec 08 01:32:19     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time)) 
Dec 08 01:32:19 RuntimeError: Process 0 terminated or timed out after 400.03903794288635 seconds 
Dec 08 01:32:19  
Dec 08 01:32:19 ---------------------------------------------------------------------- 
Dec 08 01:32:19 Ran 163 tests in 6403.538s 
Dec 08 01:32:19  
Dec 08 01:32:19 FAILED (errors=4, skipped=105) 
Dec 08 01:32:19  
Dec 08 01:32:19 Generating XML reports... 
Dec 08 01:32:19 Generated XML report: test-reports/dist-nccl/TEST-TestDistBackendWithSpawn-20201207234535.xml 
Dec 08 01:32:19 Traceback (most recent call last): 
Dec 08 01:32:19   File "test/run_test.py", line 874, in <module> 

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 (3/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Dec 08 01:31:38 RuntimeError: Process 0 terminated or timed out after 400.0543022155762 seconds
Dec 08 01:31:38 ====================================================================== 
Dec 08 01:31:38 ERROR [400.080s]: test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) 
Dec 08 01:31:38 ---------------------------------------------------------------------- 
Dec 08 01:31:38 Traceback (most recent call last): 
Dec 08 01:31:38   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 278, in wrapper 
Dec 08 01:31:38     self._join_processes(fn) 
Dec 08 01:31:38   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 395, in _join_processes 
Dec 08 01:31:38     self._check_return_codes(elapsed_time) 
Dec 08 01:31:38   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 436, in _check_return_codes 
Dec 08 01:31:38     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time)) 
Dec 08 01:31:38 RuntimeError: Process 0 terminated or timed out after 400.0543022155762 seconds 
Dec 08 01:31:38  
Dec 08 01:31:38 ---------------------------------------------------------------------- 
Dec 08 01:31:38 Ran 163 tests in 6395.893s 
Dec 08 01:31:38  
Dec 08 01:31:38 FAILED (errors=4, skipped=105) 
Dec 08 01:31:38  
Dec 08 01:31:38 Generating XML reports... 
Dec 08 01:31:38 Generated XML report: test-reports/dist-nccl/TEST-TestDistBackendWithSpawn-20201207234502.xml 
Dec 08 01:31:38 Traceback (most recent call last): 
Dec 08 01:31:38   File "test/run_test.py", line 874, in <module> 

See CircleCI build pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_test (4/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Dec 08 01:27:33 RuntimeError: Process 0 terminated or timed out after 400.05966448783875 seconds
Dec 08 01:27:33 ====================================================================== 
Dec 08 01:27:33 ERROR [400.085s]: test_ddp_uneven_inputs (__main__.TestDistBackendWithSpawn) 
Dec 08 01:27:33 ---------------------------------------------------------------------- 
Dec 08 01:27:33 Traceback (most recent call last): 
Dec 08 01:27:33   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 278, in wrapper 
Dec 08 01:27:33     self._join_processes(fn) 
Dec 08 01:27:33   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 395, in _join_processes 
Dec 08 01:27:33     self._check_return_codes(elapsed_time) 
Dec 08 01:27:33   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 436, in _check_return_codes 
Dec 08 01:27:33     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time)) 
Dec 08 01:27:33 RuntimeError: Process 0 terminated or timed out after 400.05966448783875 seconds 
Dec 08 01:27:33  
Dec 08 01:27:33 ---------------------------------------------------------------------- 
Dec 08 01:27:33 Ran 163 tests in 6406.710s 
Dec 08 01:27:33  
Dec 08 01:27:33 FAILED (errors=4, skipped=105) 
Dec 08 01:27:33  
Dec 08 01:27:33 Generating XML reports... 
Dec 08 01:27:33 Generated XML report: test-reports/dist-nccl/TEST-TestDistBackendWithSpawn-20201207234046.xml 
Dec 08 01:27:33 Traceback (most recent call last): 
Dec 08 01:27:33   File "test/run_test.py", line 874, in <module> 

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_slow_test (5/5)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Dec 08 02:12:08 unknown file: Failure
Dec 08 02:12:00 done blocking all streams 
Dec 08 02:12:00 Starting to sleep 
Dec 08 02:12:00 done sleeping 
Dec 08 02:12:08 Rank 0 done with sleep  
Dec 08 02:12:08 RANK 0 calling syncStreams  
Dec 08 02:12:08 0 is int  
Dec 08 02:12:08  blocking stream for device 0 
Dec 08 02:12:08 1 is int  
Dec 08 02:12:08  blocking stream for device 1 
Dec 08 02:12:08 done blocking all streams 
Dec 08 02:12:08 unknown file: Failure 
Dec 08 02:12:08 C++ exception with description "NCCL communicator was aborted." thrown in the test body. 
Dec 08 02:12:08 [  FAILED  ] ProcessGroupNCCLErrorsTest.testNCCLErrorsBlocking (22881 ms) 
Dec 08 02:12:08 [ RUN      ] ProcessGroupNCCLErrorsTest.testNCCLTimedoutErrorsBlocking 
Dec 08 02:12:16 Rank 0 done with sleep  
Dec 08 02:12:16 RANK 0 calling syncStreams  
Dec 08 02:12:16 0 is int  
Dec 08 02:12:16  blocking stream for device 0 
Dec 08 02:12:16 1 is int  
Dec 08 02:12:16  blocking stream for device 1 
Dec 08 02:12:16 done blocking all streams 

Extra GitHub checks: 1 failed


ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 69 times.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rohan-varma has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@rohan-varma rohan-varma removed the request for review from albanD December 4, 2020 21:58
@rohan-varma rohan-varma closed this Dec 9, 2020
@facebook-github-bot facebook-github-bot deleted the ci-all/rohan/nccl_prof_fix branch January 27, 2021 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants