[FSDP] enable autograd in forward prefetching #116792

weifengpy · 2024-01-04T21:56:07Z

problem
when prefetching for next forward, current forward may be annotated by
@torch.no_grad. param.grad_fn keeps being None during prefetching.
_post_backward_hook never gets triggered

repro
pytest test/distributed/fsdp/test_fsdp_freezing_weights.py

solution
this PR enabled autograd during prefetching (_use_unsharded_views), so
param.grad_fn are properly assigned for next forward

a longer-term fix would be moving _use_unsharded_views out of
_prefetch_handle and put it in _pre_forward_unshard

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225

**problem** when prefetching for next forward, current forward may be annotated by `@torch.no_grad`. `param.grad_fn` keeps being None during prefetching. `_post_backward_hook` never gets triggered repro ```pytest test/distributed/fsdp/test_fsdp_freezing_weights.py``` **solution** this PR enabled autograd during prefetching (`_use_unsharded_views`), so `param.grad_fn` are properly assigned for next forward a longer-term fix would be moving `_use_unsharded_views` out of `_prefetch_handle` and put it in `_pre_forward_unshard`

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

[FSDP] enable autograd in forward prefetching

pytorch-bot · 2024-01-04T21:56:09Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/116792

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit e67952f with merge base 43fb1b6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

awgu

LGTM! Thanks for the quick fix and unit testing!

facebook-github-bot · 2024-01-04T23:23:18Z

@weifengpy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

weifengpy · 2024-01-05T18:41:54Z

@pytorchmergebot merge

pytorchmergebot · 2024-01-05T18:43:57Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

**problem** when prefetching for next forward, current forward may be annotated by `@torch.no_grad`. `param.grad_fn` keeps being None during prefetching. `_post_backward_hook` never gets triggered repro ```pytest test/distributed/fsdp/test_fsdp_freezing_weights.py``` **solution** this PR enabled autograd during prefetching (`_use_unsharded_views`), so `param.grad_fn` are properly assigned for next forward a longer-term fix would be moving `_use_unsharded_views` out of `_prefetch_handle` and put it in `_pre_forward_unshard` Pull Request resolved: pytorch#116792 Approved by: https://github.com/awgu

Co-authored-by: Wei (Will) Feng <134637289+weifengpy@users.noreply.github.com> resolved: #116792

weifengpy and others added 4 commits January 4, 2024 12:53

[FSDP] add find_unused_parameters=True for DDP parity test

faff288

address linter

2b9ed59

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

Merge pull request #22 from weifengpy/fwd_prefetch_enable_grad_weif

e67952f

[FSDP] enable autograd in forward prefetching

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Jan 4, 2024

github-actions bot added oncall: distributed Add this issue/PR to distributed oncall triage queue ciflow/inductor labels Jan 4, 2024

weifengpy requested a review from awgu January 4, 2024 21:56

awgu approved these changes Jan 4, 2024

View reviewed changes

weifengpy added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 4, 2024

pytorchmergebot added the merging label Jan 5, 2024

pytorchmergebot closed this in ebedce2 Jan 5, 2024

pytorchmergebot added Merged and removed merging labels Jan 5, 2024

awgu mentioned this pull request Jan 12, 2024

[docs] start a new FSDP notes doc #117323

Closed

mvpatel2000 mentioned this pull request Jan 14, 2024

FSDP + DTensor Loss Flatlines Randomly #117471

Closed

atalman added this to the 2.2.1 milestone Jan 17, 2024

mvpatel2000 mentioned this pull request Feb 6, 2024

[v2.2.1] Release Tracker #119295

Closed

Skylion007 mentioned this pull request Feb 12, 2024

[FSDP] enable autograd in forward prefetching (#116792) #119688

Merged

atalman pushed a commit that referenced this pull request Feb 14, 2024

[FSDP] enable autograd in forward prefetching (#116792) (#119688)

bbfcfb0

Co-authored-by: Wei (Will) Feng <134637289+weifengpy@users.noreply.github.com> resolved: #116792

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP] enable autograd in forward prefetching #116792

[FSDP] enable autograd in forward prefetching #116792

weifengpy commented Jan 4, 2024 •

edited by pytorch-bot bot

pytorch-bot bot commented Jan 4, 2024 •

edited

awgu left a comment

facebook-github-bot commented Jan 4, 2024

weifengpy commented Jan 5, 2024

pytorchmergebot commented Jan 5, 2024

[FSDP] enable autograd in forward prefetching #116792

[FSDP] enable autograd in forward prefetching #116792

Conversation

weifengpy commented Jan 4, 2024 • edited by pytorch-bot bot

pytorch-bot bot commented Jan 4, 2024 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/116792

✅ No Failures

awgu left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jan 4, 2024

weifengpy commented Jan 5, 2024

pytorchmergebot commented Jan 5, 2024

Merge started

weifengpy commented Jan 4, 2024 •

edited by pytorch-bot bot

pytorch-bot bot commented Jan 4, 2024 •

edited