[FSDP] fix: fix for fsdp exec order pre fwd record #110138

Edwiv · 2023-09-27T04:12:02Z

When the sharding_strategy is set to SHARD_GRAD_OP and forward_prefetch=True, during direct validation run, self.is_first_iter will always be True (because training=False, iter+1 is not executed). Additionally, the _pre_forward_order_index of the first handle entering the record_pre_forward function is 0. This causes the handle to have a False result in the if condition at line 166 when entering the record_pre_forward function again (the expected value should be True because _pre_forward_order_index has actually been assigned a value). As a result, the first handle is repetitively added to handles_pre_forward_order, leading to incorrect prefetching order.

pytorch-bot · 2023-09-27T04:12:06Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110138

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 2 Pending

As of commit 51ca211 with merge base 33d8f5f ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

linux-foundation-easycla · 2023-09-27T04:12:08Z

The committers listed above are authorized under a signed CLA.

✅ login: Edwiv / name: Edwiv (95a8bea, 559081e, 51ca211)

awgu · 2023-09-27T12:29:44Z

@Edwiv could you sign the CLA when you get the chance?

…j/fix/fsdp_exec_order

Edwiv · 2023-09-28T09:25:00Z

done

awgu · 2023-09-28T13:05:49Z

@pytorchbot merge

pytorchmergebot · 2023-09-28T13:07:52Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[FSDP] fix: fix for fsdp exec order pre fwd record

95a8bea

Edwiv requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, awgu, kwen2501, wanchaol, fegin, fduwjj, kiukchung, d4l3k and wz337 as code owners September 27, 2023 04:12

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Sep 27, 2023

pytorchbot added the open source label Sep 27, 2023

awgu approved these changes Sep 27, 2023

View reviewed changes

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 27, 2023

Edwiv added 2 commits September 27, 2023 12:59

[FSDP] fix: fix for fsdp exec order pre fwd

559081e

Merge remote-tracking branch 'origin/zyj/fix/fsdp_exec_order' into zy…

51ca211

…j/fix/fsdp_exec_order

pytorchmergebot added the merging label Sep 28, 2023

pytorchmergebot added Merged and removed merging labels Sep 28, 2023

pytorchmergebot closed this in 7f57373 Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP] fix: fix for fsdp exec order pre fwd record #110138

[FSDP] fix: fix for fsdp exec order pre fwd record #110138

Edwiv commented Sep 27, 2023

pytorch-bot bot commented Sep 27, 2023 •

edited

linux-foundation-easycla bot commented Sep 27, 2023 •

edited

awgu commented Sep 27, 2023

Edwiv commented Sep 28, 2023

awgu commented Sep 28, 2023

pytorchmergebot commented Sep 28, 2023

[FSDP] fix: fix for fsdp exec order pre fwd record #110138

[FSDP] fix: fix for fsdp exec order pre fwd record #110138

Conversation

Edwiv commented Sep 27, 2023

pytorch-bot bot commented Sep 27, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/110138

⏳ No Failures, 2 Pending

linux-foundation-easycla bot commented Sep 27, 2023 • edited

awgu commented Sep 27, 2023

Edwiv commented Sep 28, 2023

awgu commented Sep 28, 2023

pytorchmergebot commented Sep 28, 2023

Merge started

pytorch-bot bot commented Sep 27, 2023 •

edited

linux-foundation-easycla bot commented Sep 27, 2023 •

edited