Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Gradient Compression] Add an index field to GradBucket for PowerSGD #48757

Closed
wants to merge 6 commits into from

Conversation

wayi1
Copy link
Contributor

@wayi1 wayi1 commented Dec 3, 2020

Stack from ghstack:

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor.

Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: D25288496

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor.

Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added cla signed oncall: distributed Add this issue/PR to distributed oncall triage queue labels Dec 3, 2020
wayi1 pushed a commit that referenced this pull request Dec 3, 2020
Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor.

Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

ghstack-source-id: 117719173
Pull Request resolved: #48757
@dr-ci
Copy link

dr-ci bot commented Dec 3, 2020

💊 CI failures summary and remediations

As of commit c05592d (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



🚧 1 ongoing upstream failure:

These were probably caused by upstream breakages that are not fixed yet:


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 26 times.

…r PowerSGD"

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor.

Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Dec 3, 2020
Pull Request resolved: #48757

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor.

Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 117748471

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)
…r PowerSGD"

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor.

Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Dec 4, 2020
Pull Request resolved: #48757

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. The replacement will be done in a separate diff, as the definition of this new method somehow couldn't be recognized in the OSS version.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 117911015

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)
Copy link
Member

@rohan-varma rohan-varma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, please ensure CI is green

torch/csrc/distributed/c10d/init.cpp Outdated Show resolved Hide resolved
…r PowerSGD"

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor.

Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Dec 4, 2020
Pull Request resolved: #48757

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. The replacement will be done in a separate diff, as the definition of this new method somehow couldn't be recognized in the OSS version.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 117916685

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)
…r PowerSGD"

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor.

Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

[ghstack-poisoned]
…r PowerSGD"

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor.

Howevever, sometimes the buckets can be rebuilt in the forward pass. In this case, the shape of the bucket with the same index will not be consistent with the one in the previous iteration, and hence the error tensor will be re--initialized as a zero tensor of the new shape.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Dec 5, 2020
Pull Request resolved: #48757

Add an index field to GradBucekt, so error_dict is keyed by this index instead of the hashcode of input tensor. The replacement will be done in a separate diff, as the definition of this new method somehow couldn't be recognized in the OSS version.

Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202
ghstack-source-id: 117939208

Differential Revision: [D25288496](https://our.internmc.facebook.com/intern/diff/D25288496/)
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 7439bc4.

@facebook-github-bot facebook-github-bot deleted the gh/SciPioneer/29/head branch December 8, 2020 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants