-
Notifications
You must be signed in to change notification settings - Fork 24.9k
Closed
Labels
module: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizerRelated to RPC, distributed autograd, RRef, and distributed optimizermodule: tensorpipeRelated to Tensorpipe RPC AgentRelated to Tensorpipe RPC AgenttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
🚀 Feature
https://github.com/pytorch/pytorch/pull/38590/files is adding support for RRef timeouts, but the necessary error handling is not yet added to Tensorpipe agent. The test test_rref_timeout
is disabled for tensorpipe in that PR, so we should add the necessary support to TP and enable this test and verify that RRef timeouts work appropriately.
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @xush6528 @jjlilley @osalpekar @jiayisuse @lw @beauby
Metadata
Metadata
Assignees
Labels
module: rpcRelated to RPC, distributed autograd, RRef, and distributed optimizerRelated to RPC, distributed autograd, RRef, and distributed optimizermodule: tensorpipeRelated to Tensorpipe RPC AgentRelated to Tensorpipe RPC AgenttriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module