[Gradient Compression] Add an index field to GradBucket for PowerSGD #48757

wayi1 · 2020-12-03T00:40:13Z

Stack from ghstack:

[Gradient Compression] Add an index field to GradBucket for PowerSGD #48757 [Gradient Compression] Add an index field to GradBucket for PowerSGD

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor.

Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: D25288496

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/) [ghstack-poisoned]

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/) ghstack-source-id: 117719173 Pull Request resolved: #48757

dr-ci · 2020-12-03T02:30:31Z

💊 CI failures summary and remediations

As of commit c05592d (more details on the Dr. CI page):

✅ None of the CI failures appear to be your fault 💚

1/1 broken upstream at merge base 55b9373 since Dec 04

🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet:

pytorch_xla_linux_bionic_py3_6_clang9_test since Dec 04
- 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 26 times.

…r PowerSGD" Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/) [ghstack-poisoned]

Pull Request resolved: #48757 Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117748471 Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

…r PowerSGD" Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/) [ghstack-poisoned]

Pull Request resolved: #48757 Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. The replacement will be done in a separate diff, as the definition of this new method somehow couldn't be recognized in the OSS version. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117911015 Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

rohan-varma

LGTM, please ensure CI is green

torch/csrc/distributed/c10d/init.cpp

…r PowerSGD" Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/) [ghstack-poisoned]

Pull Request resolved: #48757 Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. The replacement will be done in a separate diff, as the definition of this new method somehow couldn't be recognized in the OSS version. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117916685 Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

…r PowerSGD" Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/) [ghstack-poisoned]

Pull Request resolved: #48757 Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. The replacement will be done in a separate diff, as the definition of this new method somehow couldn't be recognized in the OSS version. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 117939208 Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

facebook-github-bot · 2020-12-05T11:13:13Z

This pull request has been merged in 7439bc4.

wayi1 requested review from mingzhe09088, mrshenli, pritamdamania87, rohan-varma and zhaojuanmao as code owners December 3, 2020 00:40

facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Dec 3, 2020

wayi1 mentioned this pull request Dec 3, 2020

[Gradient Compression] Error feedback for PowerSGD (still need to fix the key in error_dict) #48670

Closed

wayi1 mentioned this pull request Dec 4, 2020

[Gradient Compression] Replace the key of error_dict in PowerSGD state with bucket index #48867

Closed

rohan-varma approved these changes Dec 4, 2020

View reviewed changes

torch/csrc/distributed/c10d/init.cpp Outdated Show resolved Hide resolved

facebook-github-bot closed this in 7439bc4 Dec 5, 2020

facebook-github-bot added the Merged label Dec 5, 2020

facebook-github-bot deleted the gh/SciPioneer/29/head branch December 8, 2020 15:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gradient Compression] Add an index field to GradBucket for PowerSGD #48757

[Gradient Compression] Add an index field to GradBucket for PowerSGD #48757

wayi1 commented Dec 3, 2020 •

edited

dr-ci bot commented Dec 3, 2020 •

edited

rohan-varma left a comment

facebook-github-bot commented Dec 5, 2020

[Gradient Compression] Add an index field to GradBucket for PowerSGD #48757

[Gradient Compression] Add an index field to GradBucket for PowerSGD #48757

Conversation

wayi1 commented Dec 3, 2020 • edited

dr-ci bot commented Dec 3, 2020 • edited

💊 CI failures summary and remediations

🚧 1 ongoing upstream failure:

rohan-varma left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 5, 2020

wayi1 commented Dec 3, 2020 •

edited

dr-ci bot commented Dec 3, 2020 •

edited