[FSDP] Fix input grad propagation when using param mixed precision #90921

awgu · 2022-12-15T14:33:09Z

Stack from ghstack:

[FSDP] Fix input grad propagation when using param mixed precision #90921 [FSDP] Fix input grad propagation when using param mixed precision
[FSDP][Easy] ufmt files #90858 [FSDP][Easy] ufmt files
[FSDP][BE] Remove _module_to_handles, HandleConfig; use term "fqn"; clarify docs #90840 [FSDP][BE] Remove _module_to_handles, HandleConfig; use term "fqn"; clarify docs

For parameter mixed precision, we cast the inputs to the low precision parameter dtype. If the input has tensors that require gradient, then we must cast them in place in order for them to receive a gradient. The cast should be tracked by autograd (e.g. with grad_fn equal to ToCopyBackward0). This removes the torch.no_grad context when calling _apply_to_tensors.

[ghstack-poisoned]

pytorch-bot · 2022-12-15T14:33:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90921

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit cd938ce:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 9f301e7252ae7e38cc4eecfcbe73ad3afb0dc459 Pull Request resolved: #90921

…recision" [ghstack-poisoned]

ghstack-source-id: 30fbaa288f4cf524360f4e8db459d1fe09ae22b0 Pull Request resolved: #90921

…recision" [ghstack-poisoned]

ghstack-source-id: 7b779515038cb08910611f663e7da589e98d0935 Pull Request resolved: #90921

…recision" For parameter mixed precision, we cast the inputs to the low precision parameter dtype. If the input has tensors that require gradient, then we must cast them in place in order for them to receive a gradient. Otherwise, the tensor that resulted from the out-of-place cast receives the gradient and is not in scope to the user. To preserve BC as much as possible, this PR only does the in-place cast if the tensor requires gradient. [ghstack-poisoned]

ghstack-source-id: a6b8861faa3fcc0ee326c35ab0bcdc25508d688b Pull Request resolved: #90921

mrshenli

LGTM

rohan-varma

LGTM, thanks for the fix!

rohan-varma · 2022-12-15T20:16:21Z

torch/distributed/fsdp/_runtime_utils.py

-
-    with torch.no_grad():
-        return (_apply_to_tensors(cast_fn, args), _apply_to_tensors(cast_fn, kwargs))
+        return x.to(dtype)


hmm, I am a bit concerned about unforseen BC issues, but also our testing surface is quite solid. Did we consider doing this only for inputs that require grad to err on the safe side?

I think if x does not require gradient, then x.to(dtype) will not be tracked by autograd. if x does require gradient, then x.to(dtype) will be tracked by autograd. This should be exactly the behavior we want. In other words, the casing should be handled naturally already if I understand correctly.

test/distributed/fsdp/test_fsdp_mixed_precision.py

…recision" For parameter mixed precision, we cast the inputs to the low precision parameter dtype. If the input has tensors that require gradient, then we must cast them in place in order for them to receive a gradient. The cast should be tracked by autograd (e.g. with `grad_fn` equal to `ToCopyBackward0`). This removes the `torch.no_grad` context when calling `_apply_to_tensors`. [ghstack-poisoned]

ghstack-source-id: ad234ed096a8c598818af2dfe6b77ae2ef1ebd54 Pull Request resolved: #90921

awgu · 2022-12-15T21:30:58Z

@pytorchbot merge

pytorchmergebot · 2022-12-15T21:33:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[FSDP] Fix input grad propagation when using param mixed precision

0d393c9

[ghstack-poisoned]

awgu requested review from mrshenli, pritamdamania87, zhaojuanmao, rohan-varma, H-Huang, kwen2501 and wanchaol as code owners December 15, 2022 14:33

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Dec 15, 2022

awgu added a commit that referenced this pull request Dec 15, 2022

[FSDP] Fix input grad propagation when using param mixed precision

760cd2d

ghstack-source-id: 9f301e7252ae7e38cc4eecfcbe73ad3afb0dc459 Pull Request resolved: #90921

Update on "[FSDP] Fix input grad propagation when using param mixed p…

6e098ec

…recision" [ghstack-poisoned]

awgu added a commit that referenced this pull request Dec 15, 2022

[FSDP] Fix input grad propagation when using param mixed precision

8533930

ghstack-source-id: 30fbaa288f4cf524360f4e8db459d1fe09ae22b0 Pull Request resolved: #90921

Update on "[FSDP] Fix input grad propagation when using param mixed p…

ca7904c

…recision" [ghstack-poisoned]

awgu added a commit that referenced this pull request Dec 15, 2022

[FSDP] Fix input grad propagation when using param mixed precision

f22c165

ghstack-source-id: 7b779515038cb08910611f663e7da589e98d0935 Pull Request resolved: #90921

awgu added the topic: bug fixes topic category label Dec 15, 2022

awgu added a commit that referenced this pull request Dec 15, 2022

[FSDP] Fix input grad propagation when using param mixed precision

f4cd6a9

ghstack-source-id: a6b8861faa3fcc0ee326c35ab0bcdc25508d688b Pull Request resolved: #90921

mrshenli approved these changes Dec 15, 2022

View reviewed changes

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 15, 2022

rohan-varma approved these changes Dec 15, 2022

View reviewed changes

awgu added a commit that referenced this pull request Dec 15, 2022

[FSDP] Fix input grad propagation when using param mixed precision

5048717

ghstack-source-id: ad234ed096a8c598818af2dfe6b77ae2ef1ebd54 Pull Request resolved: #90921

pytorchmergebot added the Merged label Dec 15, 2022

pytorchmergebot closed this in d04e3c9 Dec 15, 2022

facebook-github-bot deleted the gh/awgu/280/head branch June 8, 2023 15:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP] Fix input grad propagation when using param mixed precision #90921

[FSDP] Fix input grad propagation when using param mixed precision #90921

awgu commented Dec 15, 2022 •

edited

pytorch-bot bot commented Dec 15, 2022 •

edited

mrshenli left a comment

rohan-varma left a comment

rohan-varma Dec 15, 2022

awgu Dec 15, 2022

awgu commented Dec 15, 2022

pytorchmergebot commented Dec 15, 2022

[FSDP] Fix input grad propagation when using param mixed precision #90921

[FSDP] Fix input grad propagation when using param mixed precision #90921

Conversation

awgu commented Dec 15, 2022 • edited

pytorch-bot bot commented Dec 15, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90921

✅ No Failures

mrshenli left a comment

Choose a reason for hiding this comment

rohan-varma left a comment

Choose a reason for hiding this comment

rohan-varma Dec 15, 2022

Choose a reason for hiding this comment

awgu Dec 15, 2022

Choose a reason for hiding this comment

awgu commented Dec 15, 2022

pytorchmergebot commented Dec 15, 2022

Merge started

awgu commented Dec 15, 2022 •

edited

pytorch-bot bot commented Dec 15, 2022 •

edited