Skip to content

DISABLED test_forward_async (__main__.RemoteModuleTestWithSpawn) #40120

@mrshenli

Description

@mrshenli

https://app.circleci.com/pipelines/github/pytorch/pytorch/181824/workflows/d531675b-933d-424a-ab93-ad9b739a81e8/jobs/5875082/steps

Jun 16 20:20:53 ======================================================================
Jun 16 20:20:53 ERROR [2.049s]: test_forward_async (__main__.RemoteModuleTestWithSpawn)
Jun 16 20:20:53 ----------------------------------------------------------------------
Jun 16 20:20:53 Traceback (most recent call last):
Jun 16 20:20:53   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 204, in wrapper
Jun 16 20:20:53     self._join_processes(fn)
Jun 16 20:20:53   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 306, in _join_processes
Jun 16 20:20:53     self._check_return_codes(elapsed_time)
Jun 16 20:20:53   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 339, in _check_return_codes
Jun 16 20:20:53     raise RuntimeError(error)
Jun 16 20:20:53 RuntimeError: Processes 0 exited with error code 10
Jun 16 20:20:47   test_forward_async (__main__.RemoteModuleTestWithSpawn) ... ERROR:root:Caught exception: 
Jun 16 20:20:47 Traceback (most recent call last):
Jun 16 20:20:47   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/common_distributed.py", line 207, in wrapper
Jun 16 20:20:47     fn()
Jun 16 20:20:47   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/dist_utils.py", line 93, in new_test_method
Jun 16 20:20:47     return_value = old_test_method(self, *arg, **kwargs)
Jun 16 20:20:47   File "/opt/conda/lib/python3.8/site-packages/torch/testing/_internal/distributed/nn/api/remote_module_test.py", line 127, in test_forward_async
Jun 16 20:20:47     ret = ret_fut.wait()
Jun 16 20:20:47   File "/opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 168, in _handle_exception
Jun 16 20:20:47     raise result.exception_type(result.msg)
Jun 16 20:20:47 AttributeError: On WorkerInfo(id=1, name=worker1):
Jun 16 20:20:47 AttributeError("Can't get attribute '_remote_forward' on <module 'torch.distributed.nn.jit.templates.instantiated._remote_module_non_sriptable' from '/opt/conda/lib/python3.8/site-packages/torch/distributed/nn/jit/templates/instantiated/_remote_module_non_sriptable.py'> Default RPC pickler does not serialize\n            function code. Ensure that UDFs are defined on both caller and\n            callee modules.")
Jun 16 20:20:47 Traceback (most recent call last):
Jun 16 20:20:47   File "/opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py", line 154, in _run_function
Jun 16 20:20:47     raise python_udf
Jun 16 20:20:47 AttributeError: Can't get attribute '_remote_forward' on <module 'torch.distributed.nn.jit.templates.instantiated._remote_module_non_sriptable' from '/opt/conda/lib/python3.8/site-packages/torch/distributed/nn/jit/templates/instantiated/_remote_module_non_sriptable.py'> Default RPC pickler does not serialize
Jun 16 20:20:47             function code. Ensure that UDFs are defined on both caller and
Jun 16 20:20:47             callee modules.
Jun 16 20:20:47 
Jun 16 20:20:47 exiting process with exit code: 10
Jun 16 20:20:47 Writing /opt/conda/lib/python3.8/site-packages/torch/distributed/nn/jit/templates/instantiated/_remote_module_non_sriptable.py
Jun 16 20:20:47 Removing /opt/conda/lib/python3.8/site-packages/torch/distributed/nn/jit/templates/instantiated/_remote_module_non_sriptable.py
Jun 16 20:20:47 Process 0 terminated with exit code 10, terminating remaining processes.
Jun 16 20:20:47 ERROR (2.049s)

cc @ezyang @gchanan @zou3519 @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @xush6528 @jjlilley @osalpekar

Metadata

Metadata

Assignees

No one assigned

    Labels

    high prioritymodule: flaky-testsProblem is a flaky test in CImodule: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizertriage reviewtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions