[FSDP] `summon_full_params()` in computation stream #86836

awgu · 2022-10-12T21:33:27Z

Stack from ghstack:

[FSDP] summon_full_params() in computation stream #86836 [FSDP] summon_full_params() in computation stream
[FSDP] Rename streams #86833 [FSDP] Rename streams
[FSDP][2/N] Fix grad zero vs. None edge case #87308 [FSDP][2/N] Fix grad zero vs. None edge case
[FSDP][1/N] Update summon_full_params(with_grads) None gradient #87314 [FSDP][1/N] Update summon_full_params(with_grads) None gradient

This should help with memory usage. In particular, this allows FSDP to use caching allocator blocks from the computation stream for the summon_full_params() all-gathers, which should help avoid over-allocating blocks to the unshard stream.

[ghstack-poisoned]

pytorch-bot · 2022-10-12T21:33:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86836

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c777cde:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 53f4fcf Pull Request resolved: #86836

[ghstack-poisoned]

ghstack-source-id: 2bc70f8 Pull Request resolved: pytorch#86836

ghstack-source-id: 9463308 Pull Request resolved: pytorch/pytorch#86836

ghstack-source-id: 2bc70f8 Pull Request resolved: pytorch#86836

rohan-varma

LGTM, but curious why we need this change.

rohan-varma · 2022-10-24T02:29:24Z

torch/distributed/fsdp/fully_sharded_data_parallel.py

        free_unsharded_flat_params = [handle.needs_unshard() for handle in self._handles]
-        self._unshard(self._handles)
-        self._streams["computation"].wait_stream(self._streams["unshard"])
+        # No need to call `wait_stream()` since we unshard in the computation


Curious, why would we want to move this to computation stream?

This allows us to use caching allocator blocks from the computation stream for these all-gathers, which should help avoid over-allocating blocks to the unshard stream.

This should help with memory usage. [ghstack-poisoned]

ghstack-source-id: 7982746 Pull Request resolved: #86836

awgu · 2022-10-24T14:43:10Z

@pytorchbot merge

pytorchmergebot · 2022-10-24T14:44:49Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

github-actions · 2022-10-24T14:45:30Z

Hey @awgu.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

This should help with memory usage. In particular, this allows FSDP to use caching allocator blocks from the computation stream for the `summon_full_params()` all-gathers, which should help avoid over-allocating blocks to the unshard stream. Pull Request resolved: pytorch#86836 Approved by: https://github.com/rohan-varma

[FSDP] summon_full_params() in computation stream

a9a66c5

[ghstack-poisoned]

awgu mentioned this pull request Oct 12, 2022

[FSDP] Rename streams #86833

Closed

pytorch-bot bot added the release notes: distributed (sharded) release notes category label Oct 12, 2022

This was referenced Oct 12, 2022

[FSDP] Replace current_stream() with explicit streams in FSDP #86834

Closed

[FSDP] Replace current_stream() in flat_param.py #86835

Closed

awgu pushed a commit that referenced this pull request Oct 12, 2022

[FSDP] summon_full_params() in computation stream

b421f68

ghstack-source-id: 53f4fcf Pull Request resolved: #86836

awgu added release notes: distributed (fsdp) release notes category and removed release notes: distributed (sharded) release notes category labels Oct 12, 2022

awgu marked this pull request as ready for review October 12, 2022 21:37

awgu requested review from mrshenli, zhaojuanmao, pritamdamania87, rohan-varma, mingzhe09088, H-Huang and kwen2501 as code owners October 12, 2022 21:37

awgu marked this pull request as draft October 13, 2022 01:10

Update on "[FSDP] summon_full_params() in computation stream"

c8ba623

[ghstack-poisoned]

awgu mentioned this pull request Oct 13, 2022

[FSDP] Make post-backward wait_stream() explicit #86873

Closed

awgu marked this pull request as ready for review October 13, 2022 02:16

Update on "[FSDP] summon_full_params() in computation stream"

8330f1e

[ghstack-poisoned]

awgu mentioned this pull request Oct 13, 2022

[FSDP] Clarify torch.cuda.current_stream() usage in post-backward #86874

Closed

awgu pushed a commit to awgu/pytorch that referenced this pull request Oct 15, 2022

[FSDP] summon_full_params() in computation stream

fe1d22d

ghstack-source-id: 2bc70f8 Pull Request resolved: pytorch#86836

Rick0317 pushed a commit to Rick0317/pytorch that referenced this pull request Oct 18, 2022

[FSDP] summon_full_params() in computation stream

ce3e24a

ghstack-source-id: 9463308 Pull Request resolved: pytorch/pytorch#86836

awgu pushed a commit to awgu/pytorch that referenced this pull request Oct 21, 2022

[FSDP] summon_full_params() in computation stream

db96ec2

ghstack-source-id: 2bc70f8 Pull Request resolved: pytorch#86836

rohan-varma approved these changes Oct 24, 2022

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 24, 2022

Update on "[FSDP] summon_full_params() in computation stream"

d060183

This should help with memory usage. [ghstack-poisoned]

Update on "[FSDP] summon_full_params() in computation stream"

c777cde

This should help with memory usage. [ghstack-poisoned]

awgu pushed a commit that referenced this pull request Oct 24, 2022

[FSDP] summon_full_params() in computation stream

2bec650

ghstack-source-id: 7982746 Pull Request resolved: #86836

pytorchmergebot added the Merged label Oct 24, 2022

pytorchmergebot closed this in a06e235 Oct 24, 2022

awgu added the topic: not user facing topic category label Oct 24, 2022

facebook-github-bot deleted the gh/awgu/125/head branch June 8, 2023 15:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[FSDP] `summon_full_params()` in computation stream #86836

[FSDP] `summon_full_params()` in computation stream #86836

Uh oh!

awgu commented Oct 12, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 12, 2022 •

edited

Loading

Uh oh!

rohan-varma left a comment

Uh oh!

rohan-varma Oct 24, 2022

Uh oh!

awgu Oct 24, 2022

Uh oh!

awgu commented Oct 24, 2022

Uh oh!

pytorchmergebot commented Oct 24, 2022

Uh oh!

github-actions bot commented Oct 24, 2022

Uh oh!

Uh oh!

[FSDP] summon_full_params() in computation stream #86836

[FSDP] summon_full_params() in computation stream #86836

Uh oh!

Conversation

awgu commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 12, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/86836

✅ No Failures

Uh oh!

rohan-varma left a comment

Choose a reason for hiding this comment

Uh oh!

rohan-varma Oct 24, 2022

Choose a reason for hiding this comment

Uh oh!

awgu Oct 24, 2022

Choose a reason for hiding this comment

Uh oh!

awgu commented Oct 24, 2022

Uh oh!

pytorchmergebot commented Oct 24, 2022

Merge started

Uh oh!

github-actions bot commented Oct 24, 2022

Uh oh!

Uh oh!

[FSDP] `summon_full_params()` in computation stream #86836

[FSDP] `summon_full_params()` in computation stream #86836

awgu commented Oct 12, 2022 •

edited

Loading

pytorch-bot bot commented Oct 12, 2022 •

edited

Loading