[FSDP] Support unfreezing params for reshard-only hook #104186

awgu · 2023-06-26T11:57:37Z

Stack from ghstack (oldest at bottom):

-> [FSDP] Support unfreezing params for reshard-only hook #104186

This fixes #104148 (unfreezing parameters after n steps).

This fixes a bug where we did not delete the post-backward hook state properly for the requires_grad=False case.
This makes the already_resharded correct for SHARD_GRAD_OP.
This generalizes _clear_grads_if_needed() to _reset_flat_param_grad_info_if_needed() to additionally include propagating the original parameters' requires_grad to the flat parameter.

[ghstack-poisoned]

pytorch-bot · 2023-06-26T11:57:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104186

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 2b5c8f1:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: acd123f95b2c3fd90555aeef08274292c0c30f9d Pull Request resolved: #104186

[ghstack-poisoned]

ghstack-source-id: 487b4b9611d6a7f307c03682b463ed48ed35c0b9 Pull Request resolved: #104186

speediedan · 2023-06-27T23:45:35Z

Thanks for the timely and adroit (as always!) fix/enhancement @awgu! 🎉 🚀

rohan-varma

thanks for the quick fix!

rohan-varma · 2023-06-27T19:32:47Z

test/distributed/fsdp/test_fsdp_fine_tune.py

-                for param in seq[i * 2].parameters(recurse=False):
-                    param.requires_grad = False
-        return seq
+                for param in seq[i * 2].parameters(recurse=True):


doesn't .parameters(recurse=True) the default?

Yep. This recurse=True is unnecessary. I will leave it to avoid triggering CI.

rohan-varma · 2023-06-28T00:22:33Z

torch/distributed/fsdp/flat_param.py

+    def _reset_flat_param_grad_info_if_needed(self):
+        """
+        When ``use_orig_params=True``:
+        (1) sets the underlying ``flat_param.grad`` to ``None`` if *all* of the


Does this mean we need a unittest to ensure the flat_param.grad is None appropriately?

This is the existing behavior, which we already have unit tests for in test_fsdp_use_orig_params.py.

awgu · 2023-06-28T00:39:38Z

@pytorchbot merge

pytorchmergebot · 2023-06-28T00:41:33Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2023-06-28T01:01:49Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cuda11.8-py3 / build

Details for Dev Infra team

Raised by workflow job

awgu · 2023-06-28T11:02:15Z

@pytorchbot merge

pytorchmergebot · 2023-06-28T11:04:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[FSDP] Support unfreezing params for reshard-only hook

ce510a6

[ghstack-poisoned]

awgu requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, kwen2501, wanchaol, fegin, fduwjj, kiukchung and d4l3k as code owners June 26, 2023 11:57

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Jun 26, 2023

awgu added a commit that referenced this pull request Jun 26, 2023

[FSDP] Support unfreezing params for reshard-only hook

1db9014

ghstack-source-id: acd123f95b2c3fd90555aeef08274292c0c30f9d Pull Request resolved: #104186

awgu marked this pull request as draft June 26, 2023 11:58

Update on "[FSDP] Support unfreezing params for reshard-only hook"

2b5c8f1

[ghstack-poisoned]

awgu added a commit that referenced this pull request Jun 26, 2023

[FSDP] Support unfreezing params for reshard-only hook

b8f1b57

ghstack-source-id: 487b4b9611d6a7f307c03682b463ed48ed35c0b9 Pull Request resolved: #104186

awgu added the topic: improvements topic category label Jun 26, 2023

awgu marked this pull request as ready for review June 26, 2023 14:48

rohan-varma approved these changes Jun 28, 2023

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 28, 2023

pytorchmergebot added the merging label Jun 28, 2023

pytorchmergebot removed the merging label Jun 28, 2023

rohan-varma approved these changes Jun 28, 2023

View reviewed changes

fegin approved these changes Jun 28, 2023

View reviewed changes

pytorchmergebot added the merging label Jun 28, 2023

pytorchmergebot added Merged and removed merging labels Jun 28, 2023

pytorchmergebot closed this in 9db8ad7 Jun 28, 2023

facebook-github-bot deleted the gh/awgu/403/head branch July 1, 2023 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP] Support unfreezing params for reshard-only hook #104186

[FSDP] Support unfreezing params for reshard-only hook #104186

awgu commented Jun 26, 2023 •

edited

pytorch-bot bot commented Jun 26, 2023 •

edited

speediedan commented Jun 27, 2023

rohan-varma left a comment

rohan-varma Jun 27, 2023

awgu Jun 28, 2023

rohan-varma Jun 28, 2023

awgu Jun 28, 2023

awgu commented Jun 28, 2023

pytorchmergebot commented Jun 28, 2023

pytorchmergebot commented Jun 28, 2023

awgu commented Jun 28, 2023

pytorchmergebot commented Jun 28, 2023

[FSDP] Support unfreezing params for reshard-only hook #104186

[FSDP] Support unfreezing params for reshard-only hook #104186

Conversation

awgu commented Jun 26, 2023 • edited

pytorch-bot bot commented Jun 26, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104186

✅ No Failures

speediedan commented Jun 27, 2023

rohan-varma left a comment

Choose a reason for hiding this comment

rohan-varma Jun 27, 2023

Choose a reason for hiding this comment

awgu Jun 28, 2023

Choose a reason for hiding this comment

rohan-varma Jun 28, 2023

Choose a reason for hiding this comment

awgu Jun 28, 2023

Choose a reason for hiding this comment

awgu commented Jun 28, 2023

pytorchmergebot commented Jun 28, 2023

Merge started

pytorchmergebot commented Jun 28, 2023

Merge failed

awgu commented Jun 28, 2023

pytorchmergebot commented Jun 28, 2023

Merge started

awgu commented Jun 26, 2023 •

edited

pytorch-bot bot commented Jun 26, 2023 •

edited