-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
module: flaky-testsProblem is a flaky test in CIProblem is a flaky test in CIoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
Jan 16 02:11:53 ======================================================================
Jan 16 02:11:53 ERROR [1.255s]: test_barrier_timeout_group (__main__.TestDistBackendWithFork)
Jan 16 02:11:53 ----------------------------------------------------------------------
Jan 16 02:11:53 Traceback (most recent call last):
Jan 16 02:11:53 File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 300, in wrapper
Jan 16 02:11:53 self._join_processes(fn)
Jan 16 02:11:53 File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 417, in _join_processes
Jan 16 02:11:53 self._check_return_codes(elapsed_time)
Jan 16 02:11:53 File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 453, in _check_return_codes
Jan 16 02:11:53 raise RuntimeError(error)
Jan 16 02:11:53 RuntimeError: Processes 1 exited with error code 10
Jan 16 02:11:53
Jan 16 02:11:53 ----------------------------------------------------------------------
Jan 16 02:11:28 test_barrier_timeout_group (__main__.TestDistBackendWithFork) ... ERROR:root:Caught exception:
Jan 16 02:11:28 Traceback (most recent call last):
Jan 16 02:11:28 File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 303, in wrapper
Jan 16 02:11:28 fn()
Jan 16 02:11:28 File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/common_distributed.py", line 79, in wrapper
Jan 16 02:11:28 return func(*args, **kwargs)
Jan 16 02:11:28 File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 555, in test_barrier_timeout_group
Jan 16 02:11:28 self._test_barrier_timeout(group_id, timeout)
Jan 16 02:11:28 File "/Users/distiller/workspace/miniconda3/lib/python3.7/site-packages/torch/testing/_internal/distributed/distributed_test.py", line 520, in _test_barrier_timeout
Jan 16 02:11:28 self.assertGreaterEqual(time.time(), expected_time)
Jan 16 02:11:28 File "/Users/distiller/workspace/miniconda3/lib/python3.7/unittest/case.py", line 1257, in assertGreaterEqual
Jan 16 02:11:28 self.fail(self._formatMessage(msg, standardMsg))
Jan 16 02:11:28 File "/Users/distiller/workspace/miniconda3/lib/python3.7/unittest/case.py", line 693, in fail
Jan 16 02:11:28 raise self.failureException(msg)
Jan 16 02:11:28 AssertionError: 1610763088.719677 not greater than or equal to 1610763088.744849
Jan 16 02:11:28 exiting process with exit code: 10
Jan 16 02:11:28 Process 1 terminated with exit code 10, terminating remaining processes.
Jan 16 02:11:28 ERROR (1.255s)
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @agolynski @SciPioneer @H-Huang @mrzzd @cbalioglu
Metadata
Metadata
Assignees
Labels
module: flaky-testsProblem is a flaky test in CIProblem is a flaky test in CIoncall: distributedAdd this issue/PR to distributed oncall triage queueAdd this issue/PR to distributed oncall triage queuetriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module