Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[pytorch][PR][Gradient Compression] Reduce the peak memory of fp16 co…
…mpression provided by ddp comm hook The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket. #Closes: #45968 Differential Revision: [D24178118](https://our.internmc.facebook.com/intern/diff/D24178118/) [ghstack-poisoned]
- Loading branch information