Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gradient Compression] Add a random generator to PowerSGD state for initializing low-rank matrix Q #48507

Closed
wants to merge 4 commits into from

Conversation

wayi1
Copy link
Contributor

@wayi1 wayi1 commented Nov 27, 2020

Stack from ghstack:

Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step.

Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps.

'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: D25191589

…nitializing low-rank matrix Q

Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step.

Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps.

'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Nov 27, 2020
wayi1 pushed a commit that referenced this pull request Nov 27, 2020
…nitializing low-rank matrix Q

Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step.

Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps.

'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/)

ghstack-source-id: 117398610
Pull Request resolved: #48507
@wayi1 wayi1 self-assigned this Nov 27, 2020
@codecov
Copy link

codecov bot commented Nov 27, 2020

Codecov Report

Merging #48507 (563914d) into gh/SciPioneer/27/base (5bb2a87) will decrease coverage by 0.13%.
The diff coverage is 40.00%.

@@                    Coverage Diff                    @@
##           gh/SciPioneer/27/base   #48507      +/-   ##
=========================================================
- Coverage                  80.91%   80.77%   -0.14%     
=========================================================
  Files                       1855     1855              
  Lines                     200241   200166      -75     
=========================================================
- Hits                      162023   161691     -332     
- Misses                     38218    38475     +257     

…state for initializing low-rank matrix Q"

Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step.

Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps.

'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Nov 27, 2020
…nitializing low-rank matrix Q

Pull Request resolved: #48507

Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step.

Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps.

'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 117402238

Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/)
Copy link
Member

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but can we add some tests/log outputs that verify that the random states are generated as expected?

Also, in general it would be good to add some basic unittesting around the invariant properties this comm. hook should provide. Could we do that in follow up PRs?

@wayi1
Copy link
Contributor Author

wayi1 commented Nov 30, 2020

Looks good, but can we add some tests/log outputs that verify that the random states are generated as expected?

Also, in general it would be good to add some basic unittesting around the invariant properties this comm. hook should provide. Could we do that in follow up PRs?

Yeah, we can do that in a separate PR. Currently it's still in the proof of concept process, and I have verified the invariant property by logs.

…state for initializing low-rank matrix Q"

Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step.

Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps.

'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/)

[ghstack-poisoned]
…state for initializing low-rank matrix Q"

Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step.

Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps.

'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Nov 30, 2020
…nitializing low-rank matrix Q

Pull Request resolved: #48507

Previously the random seed is the length of input tensor, which is not guaranteed to be the different for different batches. Now initialize a random generator in PowerSGD state, and use this generator to create a random seed to randomize the low-rank tensor Q at every step.

Therefore, the initial tensor Q should be the same across all the replicas at the same step, but different at different steps.

'torch.manual_seed' is used in the same way as https://github.com/epfml/powersgd/blob/master/gradient_reducers.py#L675

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 117484255

Differential Revision: [D25191589](https://our.internmc.facebook.com/intern/diff/D25191589/)
@dr-ci
Copy link

dr-ci bot commented Nov 30, 2020

💊 CI failures summary and remediations

As of commit 973be1c (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 1 time.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ddb6594.

@facebook-github-bot facebook-github-bot deleted the gh/SciPioneer/27/head branch December 4, 2020 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants