Skip to content

DISABLED test_barrier_timeout_group (__main__.TestDistBackendWithFork) #50699

@mrshenli

Description

@mrshenli

https://app.circleci.com/pipelines/github/pytorch/pytorch/260892/workflows/49f4393a-1751-4d51-9d0f-88a63100cc6f/jobs/10204991/steps

Jan 16 02:11:53 ======================================================================
Jan 16 02:11:53 ERROR [1.255s]: test_barrier_timeout_group (__main__.TestDistBackendWithFork)
Jan 16 02:11:53 ----------------------------------------------------------------------
Jan 16 02:11:53 Traceback (most recent call last):
Jan 16 02:11:53   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 300, in wrapper
Jan 16 02:11:53     self._join_processes(fn)
Jan 16 02:11:53   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 417, in _join_processes
Jan 16 02:11:53     self._check_return_codes(elapsed_time)
Jan 16 02:11:53   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 453, in _check_return_codes
Jan 16 02:11:53     raise RuntimeError(error)
Jan 16 02:11:53 RuntimeError: Processes 1 exited with error code 10
Jan 16 02:11:53 
Jan 16 02:11:53 ----------------------------------------------------------------------
Jan 16 02:11:28   test_barrier_timeout_group (__main__.TestDistBackendWithFork) ... ERROR:root:Caught exception: 
Jan 16 02:11:28 Traceback (most recent call last):
Jan 16 02:11:28   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 303, in wrapper
Jan 16 02:11:28     fn()
Jan 16 02:11:28   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 79, in wrapper
Jan 16 02:11:28     return func(*args, **kwargs)
Jan 16 02:11:28   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 555, in test_barrier_timeout_group
Jan 16 02:11:28     self._test_barrier_timeout(group_id, timeout)
Jan 16 02:11:28   File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 520, in _test_barrier_timeout
Jan 16 02:11:28     self.assertGreaterEqual(time.time(), expected_time)
Jan 16 02:11:28   File "/Users/distiller/workspace/miniconda3/lib/python3.7/unittest/case.py", line 1257, in assertGreaterEqual
Jan 16 02:11:28     self.fail(self._formatMessage(msg, standardMsg))
Jan 16 02:11:28   File "/Users/distiller/workspace/miniconda3/lib/python3.7/unittest/case.py", line 693, in fail
Jan 16 02:11:28     raise self.failureException(msg)
Jan 16 02:11:28 AssertionError: 1610763088.719677 not greater than or equal to 1610763088.744849
Jan 16 02:11:28 exiting process with exit code: 10
Jan 16 02:11:28 Process 1 terminated with exit code 10, terminating remaining processes.
Jan 16 02:11:28 ERROR (1.255s)

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu

Metadata

Metadata

Assignees

Labels

module: flaky-testsProblem is a flaky test in CIoncall: distributedAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions