-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
🚀 Feature
Since TensorPipes RPC Agent will support a result_placement argument, which allows the receiver to send the computed result to a worker that's not necessarily the sender, we must ensure that the sender receives confirmation of message success/failure.
For example, if CPU0 sends a message to CPU1, and indicates that the result should be sent to GPU0, then GPU0 must indicate to CPU0 that the message was received. This would likely require that we have longer timeouts for such messages.
Furthermore if the result must be placed on multiple devices, the original sender must ensure each of the machines receiving the result confirms successful message completion.
cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @xush6528 @jjlilley @osalpekar