New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[pytorch][PR][Gradient Compression] Reduce the peak memory of fp16 compression provided by ddp comm hook #46078
Conversation
…mpression provided by ddp comm hook The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket. #Closes: #45968 Differential Revision: [D24178118](https://our.internmc.facebook.com/intern/diff/D24178118/) [ghstack-poisoned]
…mpression provided by ddp comm hook The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket. #Closes: #45968 Differential Revision: [D24178118](https://our.internmc.facebook.com/intern/diff/D24178118/) ghstack-source-id: 113935840 Pull Request resolved: #46078
💊 CI failures summary and remediationsAs of commit 1150400 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 3 times. |
… of fp16 compression provided by ddp comm hook" The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket. #Closes: #45968 Differential Revision: [D24178118](https://our.internmc.facebook.com/intern/diff/D24178118/) [ghstack-poisoned]
…mpression provided by ddp comm hook Pull Request resolved: #46078 The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket. #Closes: #45968 ghstack-source-id: 113996453 Differential Revision: [D24178118](https://our.internmc.facebook.com/intern/diff/D24178118/)
Codecov Report
@@ Coverage Diff @@
## gh/SciPioneer/12/base #46078 +/- ##
======================================================
Coverage 68.27% 68.28%
======================================================
Files 410 410
Lines 53306 53306
======================================================
+ Hits 36397 36398 +1
+ Misses 16909 16908 -1
Continue to review full report at Codecov.
|
This pull request has been merged in ee3d3e6. |
Stack from ghstack:
The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket.
#Closes: #45968
Differential Revision: D24178118