[Gradient Compression] Warm-start of PowerSGD #49451

wayi1 · 2020-12-16T08:27:02Z

Stack from ghstack:

[Gradient Compression] Directly let world_size = group_to_use.size() #49715 [Gradient Compression] Directly let world_size = process_group.size()
[Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device #49711 [Gradient Compression] Explicitly restrict the scope of torch.cuda.synchronize to the current device
[Gradient Compression] Change wait() to value() in some callbacks of PowerSGD communication hook #49709 [Gradient Compression] Change wait() to value() in some callbacks of PowerSGD communication hook
[Gradient Compression] Warm-start of PowerSGD #49451 [Gradient Compression] Warm-start of PowerSGD

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible.

This can give a better compression performance in terms of both accuracy and speed.

Also add a unit test for batched PowerSGD to test_c10d.py.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: D25583086

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) ghstack-source-id: 1186954 Pull Request resolved: #49451

facebook-github-bot · 2020-12-16T09:14:07Z

💊 CI failures summary and remediations

As of commit 390dd4e (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

This comment has been revised 59 times.

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]

Pull Request resolved: #49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118736193 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]

Pull Request resolved: #49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118751780 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]

Pull Request resolved: #49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118758002 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

test/distributed/test_c10d.py

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]

Pull Request resolved: #49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118948215 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]

Pull Request resolved: #49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 118984540 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/)

rohan-varma

LGTM. I suppose that we'll follow up with information regarding data around the accuracy/speed tradeoffs for all of these configurations? Would be great to get some data around how the different flags trade off.

torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py

wayi1 · 2020-12-21T21:49:57Z

LGTM. I suppose that we'll follow up with information regarding data around the accuracy/speed tradeoffs for all of these configurations? Would be great to get some data around how the different flags trade off.

Have collected some experimental results for only one experiment so far, but still need experiment on a more training cases to draw a conclusion.

Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25583086](https://our.internmc.facebook.com/intern/diff/D25583086/) [ghstack-poisoned]

facebook-github-bot · 2020-12-22T11:10:07Z

This pull request has been merged in c348fae.

Summary: Pull Request resolved: pytorch#49451 Reuse the low-rank tensors P(s) and Q(s) from the previous iteration if possible. This can give a better compression performance in terms of both accuracy and speed. Also add a unit test for batched PowerSGD to test_c10d.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression pytorch#47202 ghstack-source-id: 119014132 Test Plan: buck test mode/dev-nosan caffe2/test/distributed:c10d -- test_powerSGD_ddp_comm_hook_nccl buck test mode/dev-nosan caffe2/test/distributed:distributed_nccl_fork -- test_DistributedDataParallel_powerSGD_ddp_comm_hook Reviewed By: rohan-varma Differential Revision: D25583086 fbshipit-source-id: a757df3c4cfcc0ead4647f7de2f43198f1e063ee

wayi1 requested review from mingzhe09088, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners December 16, 2020 08:27

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Dec 16, 2020

rohan-varma reviewed Dec 18, 2020

View reviewed changes

test/distributed/test_c10d.py Outdated Show resolved Hide resolved

wayi1 requested a review from rohan-varma December 18, 2020 23:28

rohan-varma approved these changes Dec 21, 2020

View reviewed changes

torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py Show resolved Hide resolved

torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py Show resolved Hide resolved

facebook-github-bot closed this in c348fae Dec 22, 2020

facebook-github-bot added the Merged label Dec 22, 2020

facebook-github-bot deleted the gh/SciPioneer/38/head branch December 25, 2020 15:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gradient Compression] Warm-start of PowerSGD #49451

[Gradient Compression] Warm-start of PowerSGD #49451

wayi1 commented Dec 16, 2020 •

edited

facebook-github-bot commented Dec 16, 2020 •

edited

rohan-varma left a comment

wayi1 commented Dec 21, 2020

facebook-github-bot commented Dec 22, 2020

[Gradient Compression] Warm-start of PowerSGD #49451

[Gradient Compression] Warm-start of PowerSGD #49451

Conversation

wayi1 commented Dec 16, 2020 • edited

facebook-github-bot commented Dec 16, 2020 • edited

💊 CI failures summary and remediations

rohan-varma left a comment

Choose a reason for hiding this comment

wayi1 commented Dec 21, 2020

facebook-github-bot commented Dec 22, 2020

wayi1 commented Dec 16, 2020 •

edited

facebook-github-bot commented Dec 16, 2020 •

edited