[Gradient Compression] Allow PowerSGD to run vallina allreduce for the first K iterations #50973

…e first K iterations This can extend the original PowerSGD method to a hybrid approach: vanilla allreduce + PowerSGD. This can help further improve the accuracy, at the cost of a lower speedup. Also add more comments on the fields in `PowerSGDState`. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D26031478](https://our.internmc.facebook.com/intern/diff/D26031478/) [ghstack-poisoned]

…duce for the first K iterations" This can extend the original PowerSGD method to a hybrid approach: vanilla allreduce + PowerSGD. This can help further improve the accuracy, at the cost of a lower speedup. Also add more comments on the fields in `PowerSGDState`. Original PR issue: Investigate Applying PowerSGD to Communication Hook for Gradient Compression #47202 Differential Revision: [D26031478](https://our.internmc.facebook.com/intern/diff/D26031478/) [ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Gradient Compression] Allow PowerSGD to run vallina allreduce for the first K iterations #50973

[Gradient Compression] Allow PowerSGD to run vallina allreduce for the first K iterations #50973

Commits on Jan 23, 2021

Commits on Jan 25, 2021