Skip to content

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Apr 28, 2021

No description provided.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 28, 2021

💊 CI failures summary and remediations

As of commit 361e065 (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



❄️ 2 failures tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test1 (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Apr 28 19:31:05 RuntimeError: Process 0 terminated or timed out after 110.04786825180054 seconds
Apr 28 19:31:05 ======================================================================
Apr 28 19:31:05 ERROR [110.069s]: test_nccl_high_priority_stream (__main__.TestDistBackendWithFork)
Apr 28 19:31:05 ----------------------------------------------------------------------
Apr 28 19:31:05 Traceback (most recent call last):
Apr 28 19:31:05   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 374, in wrapper
Apr 28 19:31:05     self._join_processes(fn)
Apr 28 19:31:05   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 566, in _join_processes
Apr 28 19:31:05     self._check_return_codes(elapsed_time)
Apr 28 19:31:05   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 614, in _check_return_codes
Apr 28 19:31:05     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time))
Apr 28 19:31:05 RuntimeError: Process 0 terminated or timed out after 110.04786825180054 seconds
Apr 28 19:31:05 
Apr 28 19:31:05 ----------------------------------------------------------------------
Apr 28 19:31:05 Ran 196 tests in 466.983s
Apr 28 19:31:05 
Apr 28 19:31:05 FAILED (errors=3, skipped=118)
Apr 28 19:31:05 
Apr 28 19:31:05 Generating XML reports...
Apr 28 19:31:05 Generated XML report: test-reports/dist-nccl/distributed.test_distributed_fork/TEST-TestDistBackendWithFork-20210428192318.xml
Apr 28 19:31:06 Traceback (most recent call last):
Apr 28 19:31:06   File "test/run_test.py", line 1156, in <module>

See CircleCI build pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_test2 (2/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) ❄️

Apr 28 19:23:16 RuntimeError: Process 0 terminated or timed out after 110.05276894569397 seconds
Apr 28 19:23:16 ======================================================================
Apr 28 19:23:16 ERROR [110.076s]: test_nccl_high_priority_stream (__main__.TestDistBackendWithSpawn)
Apr 28 19:23:16 ----------------------------------------------------------------------
Apr 28 19:23:16 Traceback (most recent call last):
Apr 28 19:23:16   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 374, in wrapper
Apr 28 19:23:16     self._join_processes(fn)
Apr 28 19:23:16   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 566, in _join_processes
Apr 28 19:23:16     self._check_return_codes(elapsed_time)
Apr 28 19:23:16   File "/opt/conda/lib/python3.6/site-packages/torch/testing/_internal/common_distributed.py", line 614, in _check_return_codes
Apr 28 19:23:16     raise RuntimeError('Process {} terminated or timed out after {} seconds'.format(i, elapsed_time))
Apr 28 19:23:16 RuntimeError: Process 0 terminated or timed out after 110.05276894569397 seconds
Apr 28 19:23:16 
Apr 28 19:23:16 ----------------------------------------------------------------------
Apr 28 19:23:16 Ran 196 tests in 656.247s
Apr 28 19:23:16 
Apr 28 19:23:16 FAILED (errors=4, skipped=117)
Apr 28 19:23:16 
Apr 28 19:23:16 Generating XML reports...
Apr 28 19:23:16 Generated XML report: test-reports/dist-nccl/distributed.test_distributed_spawn/TEST-TestDistBackendWithSpawn-20210428191220.xml
Apr 28 19:23:17 Traceback (most recent call last):
Apr 28 19:23:17   File "test/run_test.py", line 1156, in <module>

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

Copy link
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@facebook-github-bot
Copy link
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@mruberry merged this pull request in 5c68072.

krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
…pytorch#57162)

Summary: Pull Request resolved: pytorch#57162

Reviewed By: ngimel

Differential Revision: D28141902

Pulled By: mruberry

fbshipit-source-id: fd35e73e10167e3e44da4daf6582183bc4a0de7f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants