[FSDP] Propagate requires_grad attribute to unsharded params #109892

edpizzi · 2023-09-22T16:17:02Z

Summary:
This preserves requires_grad in the case where all parameters within a FlatParameter have the same requires_grad value.

Currently, unsharded parameters have requires_grad=True in some cases where the FlatParameter and all original parameters have requires_grad=False.

This could be extended to support FlatParameters with a mix of requires_grad states by extending ParamInfo to capture requires_grad for each parameter.

Test Plan: test added

Differential Revision: D49517155

pytorch-bot · 2023-09-22T16:17:05Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109892

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 168c5d2 with merge base 92de1d3 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2023-09-22T16:17:06Z

The committers listed above are authorized under a signed CLA.

✅ login: edpizzi / name: Ed Pizzi (168c5d2)

facebook-github-bot · 2023-09-22T16:17:39Z

This pull request was exported from Phabricator. Differential Revision: D49517155

awgu

This makes sense to me!

…#109892) Summary: This preserves `requires_grad` in the case where all parameters within a `FlatParameter` have the same `requires_grad` value. Currently, unsharded parameters have `requires_grad=True` in some cases where the `FlatParameter` and all original parameters have `requires_grad=False`. This could be extended to support `FlatParameters` with a mix of `requires_grad` states by extending `ParamInfo` to capture `requires_grad` for each parameter. Test Plan: test added Reviewed By: awgu Differential Revision: D49517155

facebook-github-bot · 2023-09-22T18:08:54Z

This pull request was exported from Phabricator. Differential Revision: D49517155

facebook-github-bot · 2023-09-22T18:09:03Z

This pull request was exported from Phabricator. Differential Revision: D49517155

edpizzi · 2023-09-23T15:38:27Z

@pytorchbot merge

pytorchmergebot · 2023-09-23T15:40:06Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-09-23T15:40:10Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm5.6-py3.8 / test (default, 2, 3, linux.rocm.gpu)

Details for Dev Infra team

Raised by workflow job

edpizzi · 2023-09-24T01:29:08Z

@pytorchbot merge

pytorchmergebot · 2023-09-24T01:30:45Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…checkpoints Summary: EMA can be configured to exclude frozen (`requires_grad=False`) parameters and buffers, reducing memory use and checkpoint size. However `FULL_STATE_DICT` FSDP + EMA checkpoints construct an inner `EMAState` after unsharding FSDP parameters. This inner `EMAState` uses default `include_frozen` and `include_buffers` settings, resulting in checkpoints containing frozen parameters and buffers regardless of settings. Propagate `include_frozen` and `include_buffers` settings to the inner `EMAState` when gathering `FULL_STATE_DICT` FSDP EMA state. This change only affects frozen parameters with a parallel fix to PyTorch FSDP to propagate `requires_grad` across parameter sharding/unsharding: pytorch/pytorch#109892. Differential Revision: D49517178 fbshipit-source-id: fefc9bb898a93c1746dad02007ca5b3384b0fcd6

…checkpoints Summary: Pull Request resolved: #620 EMA can be configured to exclude frozen (`requires_grad=False`) parameters and buffers, reducing memory use and checkpoint size. However `FULL_STATE_DICT` FSDP + EMA checkpoints construct an inner `EMAState` after unsharding FSDP parameters. This inner `EMAState` uses default `include_frozen` and `include_buffers` settings, resulting in checkpoints containing frozen parameters and buffers regardless of settings. Propagate `include_frozen` and `include_buffers` settings to the inner `EMAState` when gathering `FULL_STATE_DICT` FSDP EMA state. This change only affects frozen parameters with a parallel fix to PyTorch FSDP to propagate `requires_grad` across parameter sharding/unsharding: pytorch/pytorch#109892. Reviewed By: daveboat Differential Revision: D49517178 fbshipit-source-id: 0fe159dcec9ec1f2c456ae2ee7798681e7536249

…#109892) Summary: This preserves `requires_grad` in the case where all parameters within a `FlatParameter` have the same `requires_grad` value. Currently, unsharded parameters have `requires_grad=True` in some cases where the `FlatParameter` and all original parameters have `requires_grad=False`. This could be extended to support `FlatParameters` with a mix of `requires_grad` states by extending `ParamInfo` to capture `requires_grad` for each parameter. Test Plan: test added Differential Revision: D49517155 Pull Request resolved: pytorch#109892 Approved by: https://github.com/awgu

edpizzi requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, awgu, kwen2501, wanchaol, fegin, fduwjj, wz337, kiukchung and d4l3k as code owners September 22, 2023 16:17

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Sep 22, 2023

facebook-github-bot added the fb-exported label Sep 22, 2023

awgu approved these changes Sep 22, 2023

View reviewed changes

edpizzi force-pushed the export-D49517155 branch from 2a94937 to 9bc05e0 Compare September 22, 2023 18:08

edpizzi force-pushed the export-D49517155 branch from 9bc05e0 to 168c5d2 Compare September 22, 2023 18:08

fegin added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 23, 2023

pytorchmergebot added the merging label Sep 23, 2023

pytorchmergebot removed the merging label Sep 23, 2023

pytorchmergebot added the merging label Sep 24, 2023

pytorchmergebot added Merged and removed merging labels Sep 24, 2023

pytorchmergebot closed this in c13177f Sep 24, 2023

edpizzi mentioned this pull request Sep 25, 2023

Propagate include_frozen/buffers to EMAState in FSDP FULL_STATE_DICT checkpoints facebookresearch/d2go#620

Closed

edpizzi deleted the export-D49517155 branch September 25, 2023 21:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP] Propagate requires_grad attribute to unsharded params #109892

[FSDP] Propagate requires_grad attribute to unsharded params #109892

edpizzi commented Sep 22, 2023

pytorch-bot bot commented Sep 22, 2023 •

edited

linux-foundation-easycla bot commented Sep 22, 2023 •

edited

facebook-github-bot commented Sep 22, 2023

awgu left a comment

facebook-github-bot commented Sep 22, 2023

facebook-github-bot commented Sep 22, 2023

edpizzi commented Sep 23, 2023

pytorchmergebot commented Sep 23, 2023

pytorchmergebot commented Sep 23, 2023

edpizzi commented Sep 24, 2023

pytorchmergebot commented Sep 24, 2023

[FSDP] Propagate requires_grad attribute to unsharded params #109892

[FSDP] Propagate requires_grad attribute to unsharded params #109892

Conversation

edpizzi commented Sep 22, 2023

pytorch-bot bot commented Sep 22, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/109892

✅ No Failures

linux-foundation-easycla bot commented Sep 22, 2023 • edited

facebook-github-bot commented Sep 22, 2023

awgu left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Sep 22, 2023

facebook-github-bot commented Sep 22, 2023

edpizzi commented Sep 23, 2023

pytorchmergebot commented Sep 23, 2023

Merge started

pytorchmergebot commented Sep 23, 2023

Merge failed

edpizzi commented Sep 24, 2023

pytorchmergebot commented Sep 24, 2023

Merge started

pytorch-bot bot commented Sep 22, 2023 •

edited

linux-foundation-easycla bot commented Sep 22, 2023 •

edited