[Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q #48507

wayi1 · 2020-11-27T00:36:09Z

Stack from ghstack:

[Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q #48507 [Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q

Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step.

Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps.

'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: D25191589

…nitializing low-rank matrix Q Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step. Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps. 'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675 Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/) [ghstack-poisoned]

…nitializing low-rank matrix Q Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step. Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps. 'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675 Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/) ghstack-source-id: 117398610 Pull Request resolved: #48507

codecov · 2020-11-27T04:20:44Z

Codecov Report

Merging #48507 (563914d) into gh/SciPioneer/27/base (5bb2a87) will decrease coverage by 0.13%.
The diff coverage is 40.00%.

@@                    Coverage Diff                    @@
##           gh/SciPioneer/27/base   #48507      +/-   ##
=========================================================
- Coverage                  80.91%   80.77%   -0.14%     
=========================================================
  Files                       1855     1855              
  Lines                     200241   200166      -75     
=========================================================
- Hits                      162023   161691     -332     
- Misses                     38218    38475     +257

…state for initializing low-rank matrix Q" Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step. Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps. 'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675 Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/) [ghstack-poisoned]

…nitializing low-rank matrix Q Pull Request resolved: #48507 Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step. Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps. 'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675 Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117402238 Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/)

rohan-varma

Looks good, but can we add some tests/log outputs that verify that the random states are generated as expected?

Also, in general it would be good to add some basic unittesting around the invariant properties this comm. hook should provide. Could we do that in follow up PRs?

torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py

wayi1 · 2020-11-30T19:30:37Z

Looks good, but can we add some tests/log outputs that verify that the random states are generated as expected?

Also, in general it would be good to add some basic unittesting around the invariant properties this comm. hook should provide. Could we do that in follow up PRs?

Yeah, we can do that in a separate PR. Currently it's still in the proof of concept process, and I have verified the invariant property by logs.

…state for initializing low-rank matrix Q" Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step. Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps. 'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675 Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/) [ghstack-poisoned]

…nitializing low-rank matrix Q Pull Request resolved: #48507 Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step. Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps. 'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675 Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117484255 Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/)

dr-ci · 2020-11-30T20:12:08Z

💊 CI failures summary and remediations

As of commit 973be1c (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

1/1 broken upstream at merge base 5bb2a87 since Nov 30

🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet:

pytorch_bazel_build since Nov 30
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 1 time.

facebook-github-bot · 2020-12-01T03:16:08Z

This pull request has been merged in ddb6594.

wayi1 requested review from apaszke, mingzhe09088, mrshenli, pietern, pritamdamania87, rohan-varma and zhaojuanmao as code owners November 27, 2020 00:36

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Nov 27, 2020

wayi1 self-assigned this Nov 27, 2020

rohan-varma approved these changes Nov 30, 2020

View reviewed changes

torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py Show resolved Hide resolved

torch/distributed/algorithms/ddp_comm_hooks/powerSGD_hook.py Show resolved Hide resolved

facebook-github-bot closed this in ddb6594 Dec 1, 2020

facebook-github-bot added the Merged label Dec 1, 2020

facebook-github-bot deleted the gh/SciPioneer/27/head branch December 4, 2020 15:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q #48507

[Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q #48507

wayi1 commented Nov 27, 2020 •

edited

codecov bot commented Nov 27, 2020 •

edited

rohan-varma left a comment

wayi1 commented Nov 30, 2020

dr-ci bot commented Nov 30, 2020 •

edited

facebook-github-bot commented Dec 1, 2020

[Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q #48507

[Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q #48507

Conversation

wayi1 commented Nov 27, 2020 • edited

codecov bot commented Nov 27, 2020 • edited

Codecov Report

rohan-varma left a comment

Choose a reason for hiding this comment

wayi1 commented Nov 30, 2020

dr-ci bot commented Nov 30, 2020 • edited

💊 CI failures summary and remediations

🚧 1 ongoing upstream failure:

facebook-github-bot commented Dec 1, 2020

wayi1 commented Nov 27, 2020 •

edited

codecov bot commented Nov 27, 2020 •

edited

dr-ci bot commented Nov 30, 2020 •

edited