Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pytorch][PR][Gradient Compression] Reduce the peak memory of fp16 compression provided by ddp comm hook #46078

Closed
wants to merge 2 commits into from

Conversation

wayi1
Copy link
Contributor

@wayi1 wayi1 commented Oct 9, 2020

Stack from ghstack:

The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket.

#Closes: #45968

Differential Revision: D24178118

…mpression provided by ddp comm hook

The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket.

#Closes: #45968

Differential Revision: [D24178118](https://our.internmc.facebook.com/intern/diff/D24178118/)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Oct 9, 2020
wayi1 pushed a commit that referenced this pull request Oct 9, 2020
…mpression provided by ddp comm hook

The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket.

#Closes: #45968

Differential Revision: [D24178118](https://our.internmc.facebook.com/intern/diff/D24178118/)

ghstack-source-id: 113935840
Pull Request resolved: #46078
@wayi1 wayi1 self-assigned this Oct 9, 2020
@dr-ci
Copy link

dr-ci bot commented Oct 9, 2020

💊 CI failures summary and remediations

As of commit 1150400 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 3 times.

… of fp16 compression provided by ddp comm hook"

The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket.

#Closes: #45968

Differential Revision: [D24178118](https://our.internmc.facebook.com/intern/diff/D24178118/)

[ghstack-poisoned]
wayi1 pushed a commit that referenced this pull request Oct 9, 2020
…mpression provided by ddp comm hook

Pull Request resolved: #46078

The peak memory usage of ddp comm hook has increased due to an extra copy of gradient tensors. To reduce the memory usage, decompress the fp16 tensor in place of the tensor stored in the the gradient bucket.

#Closes: #45968
ghstack-source-id: 113996453

Differential Revision: [D24178118](https://our.internmc.facebook.com/intern/diff/D24178118/)
@codecov
Copy link

codecov bot commented Oct 10, 2020

Codecov Report

Merging #46078 into gh/SciPioneer/12/base will increase coverage by 0.00%.
The diff coverage is n/a.

Impacted file tree graph

@@                  Coverage Diff                   @@
##           gh/SciPioneer/12/base   #46078   +/-   ##
======================================================
  Coverage                  68.27%   68.28%           
======================================================
  Files                        410      410           
  Lines                      53306    53306           
======================================================
+ Hits                       36397    36398    +1     
+ Misses                     16909    16908    -1     
Impacted Files Coverage Δ
torch/testing/_internal/expecttest.py 78.57% <0.00%> (+1.02%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40828b6...1150400. Read the comment docs.

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in ee3d3e6.

@facebook-github-bot facebook-github-bot deleted the gh/SciPioneer/12/head branch October 16, 2020 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants