Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update on "[Gradient Compression] Implement the original layerwise Po…
…werSGD" The existing implementation applies PowerSGD to a batch of flattened tensors, which is a coarse-grained compression. This hook now is renamed as "batched_powerSGD_hook". Now implement the original implementation in the paper, which applies PowerSGD to each per-parameter tensor. This is a layerwise fine-grained compression. Although this original implementation is slower, it is expected to achieve a higher accuracy, especially when the shapes of per-param tensors cannot be aligned. Also add a test in distributed_test.py. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D25511543](https://our.internmc.facebook.com/intern/diff/D25511543/) [ghstack-poisoned]
- Loading branch information