TensorPipeRpcAgent "set_device_map" should implicitly use the same order for the backward pass. #44170

pritamdamania87 · 2020-09-04T02:27:42Z

🚀 Feature

TensorPipeRpcAgent should implicitly use the reverse device mapping specified in "set_device_map" for the backward pass.

Motivation

In the current implementation, user's need to specify the forward and backward pass mapping themselves as follows to run the backward pass appropriately:

        # Forward pass.
        options.set_device_map(dst, {self.rank: (self.rank + 1) % self.world_size})
        # Backward pass.
        reverse_rank = (self.rank - 1 + self.world_size) % self.world_size
        options.set_device_map(worker_name(reverse_rank), {self.rank: reverse_rank})

This isn't a good user experience since it would be hard for users to predict which all nodes are involved in a backward pass and how to set up the backward pass device mapping. Instead, it would be ideal if the distributed autograd engine remembers the forward mapping (as part of the send-recv nodes) and uses the inverse of that during the backward pass.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @xush6528 @jjlilley @osalpekar @jiayisuse @lw @beauby

The text was updated successfully, but these errors were encountered:

lw · 2020-09-07T14:01:47Z

Interesting issue... Just to clarify: if the forward pass goes from A to B, then the backward pass consists in an RPC request initiated by B towards A, right?

We (actually, @mrshenli) did implement the logic that makes responses use the reverse map of the request, but we didn't think that there may be new requests that we want to behave like responses.

One quick fix we could imagine for 1.7 is the following: if the user specified a device mapping from A to B and another one from B to A, we check that one is the inverse of the other and fail if it's not the case. If the user specified only one of them, then we set the other one to its inverse. If the user didn't specify any, then we use the identity mapping for both directions.

This is a rather strict requirement but I think it's fine if the first iteration is restrictive as we can always relax it later on without breaking backwards compatibility.

TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)! [ghstack-poisoned]

…nsorPipe." TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)! [ghstack-poisoned]

Pull Request resolved: #44859 TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 ghstack-source-id: 112255543 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

…nsorPipe." TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)! [ghstack-poisoned]

Pull Request resolved: #44859 TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 ghstack-source-id: 112351599 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

…nsorPipe." TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)! [ghstack-poisoned]

Pull Request resolved: #44859 TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 ghstack-source-id: 112417506 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

…nsorPipe." TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)! [ghstack-poisoned]

Pull Request resolved: #44859 TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 ghstack-source-id: 112516251 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

Pull Request resolved: #44859 TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 ghstack-source-id: 119112245 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

…nsorPipe." TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)! [ghstack-poisoned]

Pull Request resolved: #44859 TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 ghstack-source-id: 119950842 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

…nsorPipe." TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: #44170 Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/) **NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)! [ghstack-poisoned]

pritamdamania87 added module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer module: tensorpipe Related to Tensorpipe RPC Agent labels Sep 4, 2020

pritamdamania87 assigned mrshenli Sep 4, 2020

zhangguanheng66 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 7, 2020

pritamdamania87 assigned pritamdamania87 and unassigned mrshenli Sep 17, 2020

pritamdamania87 mentioned this issue Sep 17, 2020

Support device map for distributed autograd while using TensorPipe. #44859

Closed

facebook-github-bot closed this as completed in 40eea6d Jan 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TensorPipeRpcAgent "set_device_map" should implicitly use the same order for the backward pass. #44170

TensorPipeRpcAgent "set_device_map" should implicitly use the same order for the backward pass. #44170

pritamdamania87 commented Sep 4, 2020 •

edited by pytorch-probot bot

lw commented Sep 7, 2020

TensorPipeRpcAgent "set_device_map" should implicitly use the same order for the backward pass. #44170

TensorPipeRpcAgent "set_device_map" should implicitly use the same order for the backward pass. #44170

Comments

pritamdamania87 commented Sep 4, 2020 • edited by pytorch-probot bot

🚀 Feature

Motivation

lw commented Sep 7, 2020

pritamdamania87 commented Sep 4, 2020 •

edited by pytorch-probot bot