Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Gradient Compression] Simplify the implementation of error feedback …
…and warm-start (#50981) Summary: Pull Request resolved: #50981 Since vanilla allreduce will to be applied in the first few iterations, bucket rebuilding process will not affect caching per-variable tensors. Previously the cached tensors used for error feedback and warm-up need to be rebuilt later, because their corresponding input tensors' shape will be changed after the bucket rebuild process. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 ghstack-source-id: 120617971 Test Plan: real run Reviewed By: rohan-varma Differential Revision: D26034418 fbshipit-source-id: e8744431c7f3142d75b77b60110e6861c2ff5c14
- Loading branch information
1 parent
00d4ec8
commit b619d37
Showing
1 changed file
with
9 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters