Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multihooks should not keep tensor alive in closure #102859

Closed
wants to merge 1 commit into from

Conversation

@soulitzer soulitzer requested a review from albanD as a code owner June 2, 2023 18:30
@pytorch-bot
Copy link

pytorch-bot bot commented Jun 2, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/102859

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8e8edde:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, nice catch

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oups, nice catch

@soulitzer soulitzer added release notes: autograd release notes category topic: bug fixes topic category ciflow/trunk Trigger trunk jobs on your pull request labels Jun 2, 2023
alimoezzi pushed a commit to alimoezzi/pytorch that referenced this pull request Jun 3, 2023
@facebook-github-bot facebook-github-bot deleted the gh/soulitzer/212/head branch June 8, 2023 18:50
pytorchmergebot pushed a commit that referenced this pull request Jun 8, 2023
This PR makes a first attempt at improving FSDP's fine-tuning support by adding hooks to reshard frozen parameters in the backward pass.
- Without this, frozen parameters involved in gradient computation are kept as unsharded through the entire backward pass.
- The approach is to register a multi-grad ~~post~~-hook on the _input_ activations to the FSDP module, where the hook performs the resharding after all gradients for the FSDP module must have been computed (meaning that we are safe to reshard).

~~This PR relies on adding a "multi-grad post-hook" that differs from the existing "multi-grad hook" from `register_multi_grad_hook()`. I find that with `register_multi_grad_hook()`, sometimes the unit test counting the number of times `_post_backward_reshard()` is called fails (due to it not being called).~~ This was resolved in #102859.
Pull Request resolved: #101982
Approved by: https://github.com/rohan-varma
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: autograd release notes category topic: bug fixes topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants