Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace FutureMessage with ivalue::Future in RpcAgent retry logic #49995

Closed
wants to merge 10 commits into from

Conversation

mrshenli
Copy link
Contributor

@mrshenli mrshenli commented Jan 1, 2021

Stack from ghstack:

Differential Revision: D25745301

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Jan 1, 2021

💊 CI failures summary and remediations

As of commit fac5a65 (more details on the Dr. CI page):


  • 3/3 failures possibly* introduced in this PR
    • 2/3 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_bionic_py3_8_gcc9_coverage_test1 (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

Jan 07 00:59:54 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Jan 07 00:59:54 At:
Jan 07 00:59:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Jan 07 00:59:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Jan 07 00:59:54 
Jan 07 00:59:54 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Jan 07 00:59:54 
Jan 07 00:59:54 At:
Jan 07 00:59:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Jan 07 00:59:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Jan 07 00:59:54 
Jan 07 00:59:54 [E request_callback_no_python.cpp:636] Received error while processing request type 258: RuntimeError: Can not pickle torch.futures.Future
Jan 07 00:59:54 
Jan 07 00:59:54 At:
Jan 07 00:59:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(120): serialize
Jan 07 00:59:54   /opt/conda/lib/python3.8/site-packages/torch/distributed/rpc/internal.py(172): serialize
Jan 07 00:59:54 
Jan 07 00:59:54 [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Jan 07 00:59:54 [W tensorpipe_agent.cpp:547] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Jan 07 00:59:55 ok (2.767s)
Jan 07 00:59:56   test_return_future_remote (__main__.TensorPipeRpcTestWithSpawn) ... [W tensorpipe_agent.cpp:547] RPC agent for worker2 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)
Jan 07 00:59:57 [W tensorpipe_agent.cpp:547] RPC agent for worker3 encountered error when reading incoming request from worker0: EOF: end of file (this is expected to happen during shutdown)

1 job timed out:

  • pytorch_linux_bionic_py3_8_gcc9_coverage_test1

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 68 times.

mrshenli added a commit that referenced this pull request Jan 2, 2021
ghstack-source-id: 9c7912236dca602f0495e05914573701a37629e7
Pull Request resolved: #49995
Copy link
Contributor Author

@mrshenli mrshenli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR has internal changes

@facebook-github-bot
Copy link
Contributor

@mrshenli merged this pull request in d730c7e.

@facebook-github-bot facebook-github-bot deleted the gh/mrshenli/271/head branch January 11, 2021 15:17
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 14, 2021
…torch#49995)

Summary: Pull Request resolved: pytorch#49995

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D25745301

Pulled By: mrshenli

fbshipit-source-id: b5e3a7e0b377496924847d8d70d61de32e2d87f4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants