Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gradient Compression] Warm-start of PowerSGD #49451

Closed
wants to merge 8 commits into from

Conversation

wayi1
Copy link
Contributor

@wayi1 wayi1 commented Dec 16, 2020

Stack from ghstack:

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: D25583086

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Dec 16, 2020
wayi1 pushed a commit that referenced this pull request Dec 16, 2020
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

ghstack-source-id: 1186954
Pull Request resolved: #49451
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Dec 16, 2020

💊 CI failures summary and remediations

As of commit 390dd4e (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 59 times.

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Dec 16, 2020
Pull Request resolved: #49451

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 118736193

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Dec 16, 2020
Pull Request resolved: #49451

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 118751780

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Dec 16, 2020
Pull Request resolved: #49451

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 118758002

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Dec 18, 2020
Pull Request resolved: #49451

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 118948215

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

[ghstack-poisoned]
Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Dec 21, 2020
Pull Request resolved: #49451

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 118984540

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)
Copy link
Member

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I suppose that we'll follow up with information regarding data around the accuracy/speed tradeoffs for all of these configurations? Would be great to get some data around how the different flags trade off.

@wayi1
Copy link
Contributor Author

wayi1 commented Dec 21, 2020

LGTM. I suppose that we'll follow up with information regarding data around the accuracy/speed tradeoffs for all of these configurations? Would be great to get some data around how the different flags trade off.

Have collected some experimental results for only one experiment so far, but still need experiment on a more training cases to draw a conclusion.

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in c348fae.

@facebook-github-bot facebook-github-bot deleted the gh/SciPioneer/38/head branch December 25, 2020 15:16
hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 6, 2021
Summary:
Pull Request resolved: pytorch#49451

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202
ghstack-source-id: 119014132

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl
buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook

Reviewed By: rohan-varma

Differential Revision: D25583086

fbshipit-source-id: a757df3c4cfcc0ead4647f7de2f43198f1e063ee
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants