Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely remove FutureMessage from RRef Implementations #50004

Closed
wants to merge 6 commits into from

Conversation

mrshenli
Copy link
Contributor

@mrshenli mrshenli commented Jan 3, 2021

Stack from ghstack:

Differential Revision: D25750602

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jan 3, 2021

💊 CI failures summary and remediations

As of commit 748210f (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 1/3 non-CircleCI failure(s)

🕵️ 2 new failures recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (1/2)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 07 01:00:26 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Jan 07 01:00:26 At:
Jan 07 01:00:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Jan 07 01:00:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Jan 07 01:00:26 
Jan 07 01:00:26 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Jan 07 01:00:26 
Jan 07 01:00:26 At:
Jan 07 01:00:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Jan 07 01:00:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Jan 07 01:00:26 
Jan 07 01:00:26 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Jan 07 01:00:26 
Jan 07 01:00:26 At:
Jan 07 01:00:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Jan 07 01:00:26   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Jan 07 01:00:26 
Jan 07 01:00:26 [W tensorpipe_agent.cpp:547] RPC agent for worker1 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Jan 07 01:00:26 [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Jan 07 01:00:27 ok (2.658s)
Jan 07 01:00:29   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Jan 07 01:00:29 [W tensorpipe_agent.cpp:547] RPC agent for worker3 encountered error when reading incoming request from worker2: EOF: end of file (this is expected to happen during shutdown)

See CircleCI build pytorch_windows_vs2019_py36_cuda10.1_test1 (2/2)

Step: "Test" (full log | diagnosis details | 🔁 rerun)

RuntimeError: test_nn failed!
  test_ReflectionPad1d (__main__.TestNN) ... ok (0.108s)
  test_ReflectionPad1d_alert_nondeterministic_cuda (__main__.TestNN) ... ok (0.007s)
  test_ReflectionPad1d_cuda (__main__.TestNN) ... ok (0.020s)
  test_ReflectionPad2d (__main__.TestNN) ... ok (1.118s)
  test_ReflectionPad2d_alert_nondeterministic_cuda (__main__.TestNN) ... ok (0.014s)
  test_ReflectionPad2d_cuda (__main__.TestNN) ... Traceback (most recent call last):
  File "run_test.py", line 910, in <module>
    main()
  File "run_test.py", line 889, in main
    raise RuntimeError(err_message)
RuntimeError: test_nn failed!

(base) circleci@PACKER-5FD865C5 C:\Users\circleci\project\test>if ERRORLEVEL 1 exit /b 1 
+ cleanup
+ retcode=1
+ set +x


Exited with code exit status 1


1 job timed out:

  • pytorch_linux_bionic_py3_8_gcc9_coverage_test1

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 30 times.

mrshenli added a commit that referenced this pull request Jan 3, 2021
ghstack-source-id: 35c3828add2a153c5acfd5b651c98890afab5eb8
Pull Request resolved: #50004
@facebook-github-bot
Copy link
Contributor

@mrshenli merged this pull request in 2d5f57c.

@facebook-github-bot facebook-github-bot deleted the gh/mrshenli/272/head branch January 11, 2021 15:17
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 14, 2021
)

Summary: Pull Request resolved: pytorch#50004

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D25750602

Pulled By: mrshenli

fbshipit-source-id: 06854a77f4fb5cc4c34a1ede843301157ebf7309
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants