[FSDP] Add `keep_low_precision_grads` support when CPU offloading #86495

awgu · 2022-10-07T21:25:04Z

Stack from ghstack:

[FSDP] Change backward_prefetch default to BACKWARD_PRE #86513 [FSDP] Change backward_prefetch default to BACKWARD_PRE
[FSDP] Add low_prec prefix to param and reduce dtype varnames #86512 [FSDP] Add _low_prec prefix to param and reduce dtype varnames
[FSDP] Add keep_low_precision_grads support when CPU offloading #86495 [FSDP] Add keep_low_precision_grads support when CPU offloading
[FSDP] Add initial summon_full_params(with_grads=True) #85738 [FSDP] Add initial summon_full_params(with_grads=True)
[FSDP] Add use_orig_params #84911 [FSDP] Add use_orig_params

When CPU offloading, FSDP uses _cpu_grad, not _saved_grad_shard. This adds support for keep_low_precision_grads for that case.

[ghstack-poisoned]

pytorch-bot · 2022-10-07T21:25:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86495

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 5 Pending

As of commit 5fcc277:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 7f7cd5b25a32452a396adeb434ae58d31b579812 Pull Request resolved: #86495

…loading" When CPU offloading, FSDP uses `_cpu_grad`, not `_saved_grad_shard`. This adds support for `keep_low_precision_grads` for that case. [ghstack-poisoned]

ghstack-source-id: 5211bbd049442bc53d7a3bd2db846b294627db72 Pull Request resolved: #86495

test/distributed/fsdp/test_fsdp_mixed_precision.py

torch/distributed/fsdp/flat_param.py

rohan-varma

LGTM

…loading" When CPU offloading, FSDP uses `_cpu_grad`, not `_saved_grad_shard`. This adds support for `keep_low_precision_grads` for that case. [ghstack-poisoned]

awgu · 2022-10-08T00:17:07Z

@pytorchbot merge

pytorchmergebot · 2022-10-08T00:20:10Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

github-actions · 2022-10-08T03:27:19Z

Hey @awgu.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

…6495) (#86495) Summary: When CPU offloading, FSDP uses `_cpu_grad`, not `_saved_grad_shard`. This adds support for `keep_low_precision_grads` for that case. Pull Request resolved: #86495 Approved by: https://github.com/rohan-varma Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/af9c6bc851cfc8fba9e4c71830b783cb34d92a05 Reviewed By: seemethere Differential Revision: D40217972 Pulled By: seemethere fbshipit-source-id: 4d5b595d9246dba4237bea7ca6c27b8cebf1beff

[FSDP] Add keep_low_precision_grads support when CPU offloading

e753729

[ghstack-poisoned]

awgu requested review from mrshenli, pritamdamania87, zhaojuanmao, rohan-varma, H-Huang, kwen2501 and mingzhe09088 as code owners October 7, 2022 21:25

pytorch-bot bot added release notes: distributed (fsdp) release notes category labels Oct 7, 2022

awgu added a commit that referenced this pull request Oct 7, 2022

[FSDP] Add keep_low_precision_grads support when CPU offloading

30e2f84

ghstack-source-id: 7f7cd5b25a32452a396adeb434ae58d31b579812 Pull Request resolved: #86495

Update on "[FSDP] Add keep_low_precision_grads support when CPU off…

e8a28b8

…loading" When CPU offloading, FSDP uses `_cpu_grad`, not `_saved_grad_shard`. This adds support for `keep_low_precision_grads` for that case. [ghstack-poisoned]

awgu added a commit that referenced this pull request Oct 7, 2022

[FSDP] Add keep_low_precision_grads support when CPU offloading

f0a24d5

ghstack-source-id: 5211bbd049442bc53d7a3bd2db846b294627db72 Pull Request resolved: #86495

rohan-varma reviewed Oct 7, 2022

View reviewed changes

test/distributed/fsdp/test_fsdp_mixed_precision.py Outdated Show resolved Hide resolved

torch/distributed/fsdp/flat_param.py Show resolved Hide resolved

rohan-varma self-requested a review October 7, 2022 23:24

rohan-varma approved these changes Oct 7, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 7, 2022

Update on "[FSDP] Add keep_low_precision_grads support when CPU off…

5fcc277

…loading" When CPU offloading, FSDP uses `_cpu_grad`, not `_saved_grad_shard`. This adds support for `keep_low_precision_grads` for that case. [ghstack-poisoned]

awgu mentioned this pull request Oct 8, 2022

[FSDP] Add low_prec prefix to param and reduce dtype varnames #86512

Closed

awgu mentioned this pull request Oct 8, 2022

[FSDP] Change backward_prefetch default to BACKWARD_PRE #86513

Closed

pytorchmergebot added the Merged label Oct 8, 2022

pytorchmergebot closed this in af9c6bc Oct 8, 2022

facebook-github-bot deleted the gh/awgu/115/head branch June 8, 2023 15:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP] Add `keep_low_precision_grads` support when CPU offloading #86495

[FSDP] Add `keep_low_precision_grads` support when CPU offloading #86495

awgu commented Oct 7, 2022 •

edited

pytorch-bot bot commented Oct 7, 2022 •

edited

rohan-varma left a comment

awgu commented Oct 8, 2022

pytorchmergebot commented Oct 8, 2022

github-actions bot commented Oct 8, 2022

[FSDP] Add keep_low_precision_grads support when CPU offloading #86495

[FSDP] Add keep_low_precision_grads support when CPU offloading #86495

Conversation

awgu commented Oct 7, 2022 • edited

pytorch-bot bot commented Oct 7, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86495

✅ No Failures, 5 Pending

rohan-varma left a comment

Choose a reason for hiding this comment

awgu commented Oct 8, 2022

pytorchmergebot commented Oct 8, 2022

Merge started

github-actions bot commented Oct 8, 2022

[FSDP] Add `keep_low_precision_grads` support when CPU offloading #86495

[FSDP] Add `keep_low_precision_grads` support when CPU offloading #86495

awgu commented Oct 7, 2022 •

edited

pytorch-bot bot commented Oct 7, 2022 •

edited