Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TensorPipeRpcAgent "set_device_map" should implicitly use the same order for the backward pass. #44170

Closed
pritamdamania87 opened this issue Sep 4, 2020 · 1 comment
Assignees
Labels
module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer module: tensorpipe Related to Tensorpipe RPC Agent triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Comments

@pritamdamania87
Copy link
Contributor

pritamdamania87 commented Sep 4, 2020

🚀 Feature

TensorPipeRpcAgent should implicitly use the reverse device mapping specified in "set_device_map" for the backward pass.

Motivation

In the current implementation, user's need to specify the forward and backward pass mapping themselves as follows to run the backward pass appropriately:

        # Forward pass.
        options.set_device_map(dst, {self.rank: (self.rank + 1) % self.world_size})
        # Backward pass.
        reverse_rank = (self.rank - 1 + self.world_size) % self.world_size
        options.set_device_map(worker_name(reverse_rank), {self.rank: reverse_rank})

This isn't a good user experience since it would be hard for users to predict which all nodes are involved in a backward pass and how to set up the backward pass device mapping. Instead, it would be ideal if the distributed autograd engine remembers the forward mapping (as part of the send-recv nodes) and uses the inverse of that during the backward pass.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @rohan-varma @xush6528 @jjlilley @osalpekar @jiayisuse @lw @beauby

@pritamdamania87 pritamdamania87 added module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer module: tensorpipe Related to Tensorpipe RPC Agent labels Sep 4, 2020
@lw
Copy link
Contributor

lw commented Sep 7, 2020

Interesting issue... Just to clarify: if the forward pass goes from A to B, then the backward pass consists in an RPC request initiated by B towards A, right?

We (actually, @mrshenli) did implement the logic that makes responses use the reverse map of the request, but we didn't think that there may be new requests that we want to behave like responses.

One quick fix we could imagine for 1.7 is the following: if the user specified a device mapping from A to B and another one from B to A, we check that one is the inverse of the other and fail if it's not the case. If the user specified only one of them, then we set the other one to its inverse. If the user didn't specify any, then we use the identity mapping for both directions.

This is a rather strict requirement but I think it's fine if the first iteration is restrictive as we can always relax it later on without breaking backwards compatibility.

@zhangguanheng66 zhangguanheng66 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Sep 7, 2020
pritamdamania87 pushed a commit that referenced this issue Sep 17, 2020
TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Sep 17, 2020
…nsorPipe."

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Sep 17, 2020
Pull Request resolved: #44859

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170
ghstack-source-id: 112255543

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!
pritamdamania87 pushed a commit that referenced this issue Sep 18, 2020
…nsorPipe."

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Sep 18, 2020
Pull Request resolved: #44859

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170
ghstack-source-id: 112351599

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!
pritamdamania87 pushed a commit that referenced this issue Sep 18, 2020
…nsorPipe."

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Sep 18, 2020
Pull Request resolved: #44859

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170
ghstack-source-id: 112417506

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!
pritamdamania87 pushed a commit that referenced this issue Sep 21, 2020
…nsorPipe."

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Sep 21, 2020
Pull Request resolved: #44859

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170
ghstack-source-id: 112516251

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!
pritamdamania87 pushed a commit that referenced this issue Dec 24, 2020
Pull Request resolved: #44859

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170
ghstack-source-id: 119112245

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!
pritamdamania87 pushed a commit that referenced this issue Dec 24, 2020
…nsorPipe."

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

[ghstack-poisoned]
pritamdamania87 pushed a commit that referenced this issue Jan 19, 2021
Pull Request resolved: #44859

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170
ghstack-source-id: 119950842

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!
pritamdamania87 pushed a commit that referenced this issue Jan 19, 2021
…nsorPipe."

TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.

As a result, users had to specify both forward and backward device mapping
which is very tedious to do.

In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.

#Closes: #44170

Differential Revision: [D23751975](https://our.internmc.facebook.com/intern/diff/D23751975/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D23751975/)!

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module: rpc Related to RPC, distributed autograd, RRef, and distributed optimizer module: tensorpipe Related to Tensorpipe RPC Agent triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

No branches or pull requests

4 participants