Skip to content

Conversation

soulitzer
Copy link
Contributor

@soulitzer soulitzer commented Jun 2, 2023

@soulitzer soulitzer requested a review from albanD as a code owner June 2, 2023 18:30
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 2, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102859

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8e8edde:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, nice catch

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, nice catch

@soulitzer soulitzer added release notes: autograd release notes category topic: bug fixes topic category ciflow/trunk Trigger trunk jobs on your pull request labels Jun 2, 2023
@facebook-github-bot facebook-github-bot deleted the gh/soulitzer/212/head branch June 8, 2023 18:50
pytorchmergebot pushed a commit that referenced this pull request Jun 8, 2023
This PR makes a first attempt at improving FSDP's fine-tuning support by adding hooks to reshard frozen parameters in the backward pass.
- Without this, frozen parameters involved in gradient computation are kept as unsharded through the entire backward pass.
- The approach is to register a multi-grad ~~post~~-hook on the _input_ activations to the FSDP module, where the hook performs the resharding after all gradients for the FSDP module must have been computed (meaning that we are safe to reshard).

~~This PR relies on adding a "multi-grad post-hook" that differs from the existing "multi-grad hook" from `register_multi_grad_hook()`. I find that with `register_multi_grad_hook()`, sometimes the unit test counting the number of times `_post_backward_reshard()` is called fails (due to it not being called).~~ This was resolved in #102859.
Pull Request resolved: #101982
Approved by: https://github.com/rohan-varma
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: autograd release notes category topic: bug fixes topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants