-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Closed
Labels
high prioritymodule: flaky-testsProblem is a flaky test in CIProblem is a flaky test in CImodule: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizerRelated to RPC, distributed autograd, RRef, and distributed optimizertriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
Jun 16 20:20:53 ======================================================================
Jun 16 20:20:53 ERROR [2.049s]: test_forward_async (__main__.RemoteModuleTestWithSpawn)
Jun 16 20:20:53 ----------------------------------------------------------------------
Jun 16 20:20:53 Traceback (most recent call last):
Jun 16 20:20:53 File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 204, in wrapper
Jun 16 20:20:53 self._join_processes(fn)
Jun 16 20:20:53 File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 306, in _join_processes
Jun 16 20:20:53 self._check_return_codes(elapsed_time)
Jun 16 20:20:53 File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 339, in _check_return_codes
Jun 16 20:20:53 raise RuntimeError(error)
Jun 16 20:20:53 RuntimeError: Processes 0 exited with error code 10
Jun 16 20:20:47 test_forward_async (__main__.RemoteModuleTestWithSpawn) ... ERROR:root:Caught exception:
Jun 16 20:20:47 Traceback (most recent call last):
Jun 16 20:20:47 File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 207, in wrapper
Jun 16 20:20:47 fn()
Jun 16 20:20:47 File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/dist_utils.py", line 93, in new_test_method
Jun 16 20:20:47 return_value = old_test_method(self, *arg, **kwargs)
Jun 16 20:20:47 File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/distributed/nn/api/remote_module_test.py", line 127, in test_forward_async
Jun 16 20:20:47 ret = ret_fut.wait()
Jun 16 20:20:47 File "/opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 168, in _handle_exception
Jun 16 20:20:47 raise result.exception_type(result.msg)
Jun 16 20:20:47 AttributeError: On WorkerInfo(id=1, name=worker1):
Jun 16 20:20:47 AttributeError("Can't get attribute '_remote_forward' on <module 'torch.distributed.nn.jit.templates.instantiated._remote_module_non_sriptable' from '/opt/conda/lib/python3.8/site-packages/torch/distributed/nn/jit/templates/instantiated/_remote_module_non_sriptable.py'> Default RPC pickler does not serialize\n function code. Ensure that UDFs are defined on both caller and\n callee modules.")
Jun 16 20:20:47 Traceback (most recent call last):
Jun 16 20:20:47 File "/opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 154, in _run_function
Jun 16 20:20:47 raise python_udf
Jun 16 20:20:47 AttributeError: Can't get attribute '_remote_forward' on <module 'torch.distributed.nn.jit.templates.instantiated._remote_module_non_sriptable' from '/opt/conda/lib/python3.8/site-packages/torch/distributed/nn/jit/templates/instantiated/_remote_module_non_sriptable.py'> Default RPC pickler does not serialize
Jun 16 20:20:47 function code. Ensure that UDFs are defined on both caller and
Jun 16 20:20:47 callee modules.
Jun 16 20:20:47
Jun 16 20:20:47 exiting process with exit code: 10
Jun 16 20:20:47 Writing /opt/conda/lib/python3.8/site-packages/torch/distributed/nn/jit/templates/instantiated/_remote_module_non_sriptable.py
Jun 16 20:20:47 Removing /opt/conda/lib/python3.8/site-packages/torch/distributed/nn/jit/templates/instantiated/_remote_module_non_sriptable.py
Jun 16 20:20:47 Process 0 terminated with exit code 10, terminating remaining processes.
Jun 16 20:20:47 ERROR (2.049s)
cc @ezyang @gchanan @zou3519 @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @xush6528 @jjlilley @osalpekar
Metadata
Metadata
Assignees
Labels
high prioritymodule: flaky-testsProblem is a flaky test in CIProblem is a flaky test in CImodule: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizerRelated to RPC, distributed autograd, RRef, and distributed optimizertriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module