[FSDP][3/N] Unify `fully_shard` auto wrap #104408

awgu · 2023-06-29T12:52:09Z

Stack from ghstack (oldest at bottom):

This moves fully_shard to use _auto_wrap() just like FullyShardedDataParallel. This means that fully_shard goes through the _init_param_handle_from_module() path (i.e. 1 fully_shard per "wrap"), removing the need for _init_param_handles_from_module() (which was 1 fully_shard for all "wraps" of a given policy). _auto_wrap() simply calls fully_shard on target submodules.

This includes several important fixes:

We should register the pre/post-forward hooks on the module regardless of it has managed parameters.
We can permit _module_handles to return [] in the composable path (for when the module has no managed parameters).
We should unify the paths for _get_buffers_and_dtypes_for_computation() (previously, composable path was buggy in some cases).

[ghstack-poisoned]

pytorch-bot · 2023-06-29T12:52:12Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104408

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c501068:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 9e0cf806b4bc63ef5bb5361c6a4c1cb33cd80c7c Pull Request resolved: #104408

This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]

ghstack-source-id: dede963dea8977e420b32c5b10b0587688c9f693 Pull Request resolved: pytorch#104408

This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]

ghstack-source-id: 457c67dc90ed19b789b827fd99a06b4bcd5951b6 Pull Request resolved: pytorch#104408

This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]

ghstack-source-id: 457c67dc90ed19b789b827fd99a06b4bcd5951b6 Pull Request resolved: pytorch#104408

This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]

ghstack-source-id: f811cfa160eb549249eda792e376dc0277053373 Pull Request resolved: pytorch#104408

This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]

ghstack-source-id: 141ba57ed15da65b7052e81f706b82925a376d33 Pull Request resolved: pytorch#104408

This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]

ghstack-source-id: 5185a43b01c99015c96e04ea13a87f2449f39022 Pull Request resolved: pytorch#104408

This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. [ghstack-poisoned]

ghstack-source-id: 4ad898a9777d75b08b32a786fcaab38388900ae9 Pull Request resolved: pytorch#104408

This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. This includes several important fixes: - We should register the pre/post-forward hooks on the module regardless of it has managed parameters. - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters). - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases). [ghstack-poisoned]

rohan-varma

awesome, thanks for unifying the code paths!

rohan-varma · 2023-07-07T17:43:05Z

torch/distributed/fsdp/_common_utils.py

+        # A valid FSDP state may have no managed parameters and hence no
+        # handles, meaning no entry in `_fully_sharded_module_to_handles`
+        if len(state._handles) == 0:
+            return []


did we add a test for this + test to ensure that if a composable FSDP module manages no params, it is still marked as FSDP managed?

I think the _has_fsdp_params() check is still valid. Before, the composable path would raise an error when it did not need to, which is why I had to add this case.

In other words, this is covered by the existing tests. summon_full_params() on a module with fully_shard() applied but no managed parameters would error otherwise.

This moves `fully_shard` to use `_auto_wrap()` just like `FullyShardedDataParallel`. This means that `fully_shard` goes through the `_init_param_handle_from_module()` path (i.e. 1 `fully_shard` per "wrap"), removing the need for `_init_param_handles_from_module()` (which was 1 `fully_shard` for all "wraps" of a given policy). `_auto_wrap()` simply calls `fully_shard` on target submodules. This includes several important fixes: - We should register the pre/post-forward hooks on the module regardless of it has managed parameters. - We can permit `_module_handles` to return `[]` in the composable path (for when the module has no managed parameters). - We should unify the paths for `_get_buffers_and_dtypes_for_computation()` (previously, composable path was buggy in some cases). [ghstack-poisoned]

[FSDP][3/N] Unify fully_shard auto wrap

960ce5c

[ghstack-poisoned]

awgu mentioned this pull request Jun 28, 2023

[FSDP][1/N] Move wrapper ModuleWrapPolicy to new path #104346

Closed

awgu mentioned this pull request Jun 29, 2023

[FSDP] Annotate modules for fully_shard #104363

Closed

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Jun 29, 2023

awgu mentioned this pull request Jun 29, 2023

[FSDP][2/N][Easy] Prepare _auto_wrap for fully_shard #104407

Closed

awgu added a commit that referenced this pull request Jun 29, 2023

[FSDP][3/N] Unify fully_shard auto wrap

022eca5

ghstack-source-id: 9e0cf806b4bc63ef5bb5361c6a4c1cb33cd80c7c Pull Request resolved: #104408

awgu mentioned this pull request Jun 29, 2023

[FSDP][4/N] Remove _get_fully_sharded_module_to_states #104409

Closed

awgu added a commit to awgu/pytorch that referenced this pull request Jun 29, 2023

[FSDP][3/N] Unify fully_shard auto wrap

12c30a0

ghstack-source-id: dede963dea8977e420b32c5b10b0587688c9f693 Pull Request resolved: pytorch#104408

awgu added a commit to awgu/pytorch that referenced this pull request Jun 29, 2023

[FSDP][3/N] Unify fully_shard auto wrap

f43bafb

ghstack-source-id: 457c67dc90ed19b789b827fd99a06b4bcd5951b6 Pull Request resolved: pytorch#104408

awgu mentioned this pull request Jun 29, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now) #104418

Closed

awgu added the topic: not user facing topic category label Jun 29, 2023

awgu mentioned this pull request Jun 29, 2023

[FSDP][6/N] Check valid param freezing for ModuleWrapPolicy #104427

Closed

awgu added a commit to awgu/pytorch that referenced this pull request Jun 29, 2023

[FSDP][3/N] Unify fully_shard auto wrap

d1c5559

ghstack-source-id: 457c67dc90ed19b789b827fd99a06b4bcd5951b6 Pull Request resolved: pytorch#104408

awgu added a commit to awgu/pytorch that referenced this pull request Jun 30, 2023

[FSDP][3/N] Unify fully_shard auto wrap

2c51374

ghstack-source-id: f811cfa160eb549249eda792e376dc0277053373 Pull Request resolved: pytorch#104408

awgu added a commit to awgu/pytorch that referenced this pull request Jun 30, 2023

[FSDP][3/N] Unify fully_shard auto wrap

27e4cf6

ghstack-source-id: 141ba57ed15da65b7052e81f706b82925a376d33 Pull Request resolved: pytorch#104408

awgu added 2 commits June 30, 2023 00:43

awgu added a commit to awgu/pytorch that referenced this pull request Jun 30, 2023

[FSDP][3/N] Unify fully_shard auto wrap

5b80be5

ghstack-source-id: 5185a43b01c99015c96e04ea13a87f2449f39022 Pull Request resolved: pytorch#104408

awgu added a commit to awgu/pytorch that referenced this pull request Jul 5, 2023

[FSDP][3/N] Unify fully_shard auto wrap

8b6ccfe

ghstack-source-id: 4ad898a9777d75b08b32a786fcaab38388900ae9 Pull Request resolved: pytorch#104408

awgu added a commit to awgu/pytorch that referenced this pull request Jul 5, 2023

[FSDP][3/N] Unify fully_shard auto wrap

316aa2f

ghstack-source-id: 4ad898a9777d75b08b32a786fcaab38388900ae9 Pull Request resolved: pytorch#104408

awgu marked this pull request as ready for review July 5, 2023 15:36

awgu requested a review from mrshenli as a code owner July 5, 2023 15:36

awgu requested review from zhaojuanmao, rohan-varma, H-Huang, kwen2501, wanchaol, fegin, fduwjj, yhcharles, kiukchung and d4l3k as code owners July 5, 2023 15:36

This was referenced Jul 6, 2023

SetVariable in dynamo #103205

Closed

[WIP] Living branch / PR for FSDP development #103711

Closed

Migrate tuple(handle) -> handle #104488

Closed

voznesenskym and others added 3 commits July 6, 2023 20:25

rohan-varma approved these changes Jul 7, 2023

View reviewed changes

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 7, 2023

pytorchmergebot added the Merged label Jul 8, 2023

pytorchmergebot closed this in d9be036 Jul 8, 2023

facebook-github-bot deleted the gh/awgu/412/head branch July 11, 2023 14:16

awgu mentioned this pull request Jul 18, 2023

[FSDP] Revisit mixed-precision casting logic #105499

Open

3 tasks

awgu mentioned this pull request Aug 3, 2023

[RFC][FSDP] fully_shard(policy=...) + summon_full_params is wrong #104277

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP][3/N] Unify `fully_shard` auto wrap #104408

[FSDP][3/N] Unify `fully_shard` auto wrap #104408

awgu commented Jun 29, 2023 •

edited

pytorch-bot bot commented Jun 29, 2023 •

edited

rohan-varma left a comment

rohan-varma Jul 7, 2023

awgu Jul 7, 2023

[FSDP][3/N] Unify fully_shard auto wrap #104408

[FSDP][3/N] Unify fully_shard auto wrap #104408

Conversation

awgu commented Jun 29, 2023 • edited

pytorch-bot bot commented Jun 29, 2023 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104408

✅ No Failures

rohan-varma left a comment

Choose a reason for hiding this comment

rohan-varma Jul 7, 2023

Choose a reason for hiding this comment

awgu Jul 7, 2023

Choose a reason for hiding this comment

[FSDP][3/N] Unify `fully_shard` auto wrap #104408

[FSDP][3/N] Unify `fully_shard` auto wrap #104408

awgu commented Jun 29, 2023 •

edited

pytorch-bot bot commented Jun 29, 2023 •

edited