Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[Gradient Compression] Simplify the implementation of warm-start
Since PowerSGD will to be applied in the first few iterations, bucket rebuilding process will not affect caching per-variable tensors. Previously the cached tensors used for error feedback need to be rebuilt later, because their corresponding input tensors' shape wil be changed after the bucket rebuild process. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D26034418](https://our.internmc.facebook.com/intern/diff/D26034418/) ghstack-source-id: 120257256 Pull Request resolved: #50981
- Loading branch information