Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Only populate grad accumulator to var mapping for find_unused_parameters=True in DDP #45942

Closed
wants to merge 5 commits into from

Conversation

rohan-varma
Copy link
Member

@rohan-varma rohan-varma commented Oct 7, 2020

Stack from ghstack:

We only need to keep track of this for traversing the autograd graph
when find_unused_parameters=True. Without that, we populate and keep this
mapping in memory, which occupies sizeof(pointer) * number of grad accumulators
of extra memory.

Also renames the variable to something more meaningful.

Differential Revision: D24154407

…ers=True in DDP

We only need to keep track of this for traversing the autograd graph
when find_unused_parameters=True. Without that, we populate and keep this
mapping in memory, which occupies sizeof(pointer) * number of grad accumulators
of extra memory.

Differential Revision: [D24154407](https://our.internmc.facebook.com/intern/diff/D24154407/)

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Oct 7, 2020
rohan-varma added a commit that referenced this pull request Oct 7, 2020
…ers=True in DDP

We only need to keep track of this for traversing the autograd graph
when find_unused_parameters=True. Without that, we populate and keep this
mapping in memory, which occupies sizeof(pointer) * number of grad accumulators
of extra memory.

Differential Revision: [D24154407](https://our.internmc.facebook.com/intern/diff/D24154407/)

ghstack-source-id: 113723598
Pull Request resolved: #45942
…sed_parameters=True in DDP"


We only need to keep track of this for traversing the autograd graph
when find_unused_parameters=True. Without that, we populate and keep this
mapping in memory, which occupies sizeof(pointer) * number of grad accumulators
of extra memory.

Also renames the variable to something more meaningful. 

Differential Revision: [D24154407](https://our.internmc.facebook.com/intern/diff/D24154407/)

[ghstack-poisoned]
…sed_parameters=True in DDP"


We only need to keep track of this for traversing the autograd graph
when find_unused_parameters=True. Without that, we populate and keep this
mapping in memory, which occupies sizeof(pointer) * number of grad accumulators
of extra memory.

Also renames the variable to something more meaningful. 

Differential Revision: [D24154407](https://our.internmc.facebook.com/intern/diff/D24154407/)

[ghstack-poisoned]
@codecov
Copy link

codecov bot commented Oct 9, 2020

Codecov Report

❗ No coverage uploaded for pull request base (gh/rohan-varma/181/base@b1374ed). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@                    Coverage Diff                     @@
##             gh/rohan-varma/181/base   #45942   +/-   ##
==========================================================
  Coverage                           ?   68.28%           
==========================================================
  Files                              ?      410           
  Lines                              ?    53609           
  Branches                           ?        0           
==========================================================
  Hits                               ?    36608           
  Misses                             ?    17001           
  Partials                           ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b1374ed...1e29d71. Read the comment docs.

…sed_parameters=True in DDP"


We only need to keep track of this for traversing the autograd graph
when find_unused_parameters=True. Without that, we populate and keep this
mapping in memory, which occupies sizeof(pointer) * number of grad accumulators
of extra memory.

Also renames the variable to something more meaningful. 

Differential Revision: [D24154407](https://our.internmc.facebook.com/intern/diff/D24154407/)

[ghstack-poisoned]
…sed_parameters=True in DDP"


We only need to keep track of this for traversing the autograd graph
when find_unused_parameters=True. Without that, we populate and keep this
mapping in memory, which occupies sizeof(pointer) * number of grad accumulators
of extra memory.

Also renames the variable to something more meaningful. 

Differential Revision: [D24154407](https://our.internmc.facebook.com/intern/diff/D24154407/)

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in f739875.

@facebook-github-bot facebook-github-bot deleted the gh/rohan-varma/181/head branch October 17, 2020 14:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Merged oncall: distributed Add this issue/PR to distributed oncall triage queue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants