[FSDP][5/N] Unblock `ignored_states` + auto wrap (for now) #104418

awgu · 2023-06-29T15:58:31Z

Stack from ghstack (oldest at bottom):

The "for now" is because we still have the issue that when using the parameter ignored_states path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know.

[ghstack-poisoned]

pytorch-bot · 2023-06-29T15:58:35Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104418

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 8b25684:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 38b3b336679415d39fcafd4c8c4c4d707e3816c5 Pull Request resolved: #104418

The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]

ghstack-source-id: 3dece821cc9b5b6fac09519f56414fbcf4ca3874 Pull Request resolved: #104418

The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]

ghstack-source-id: 7a743d525ebfeee9362719c13dcef95993ecb0fa Pull Request resolved: #104418

ghstack-source-id: 7a743d525ebfeee9362719c13dcef95993ecb0fa Pull Request resolved: pytorch#104418

The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]

ghstack-source-id: 60be363bf4780ea201280664d0fe0ede7e7faaa5 Pull Request resolved: pytorch#104418

The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]

ghstack-source-id: 28c95ac2e70c519db0e32cd6b3147ede9aed54cd Pull Request resolved: pytorch#104418

The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]

ghstack-source-id: eee579391a4b12b8558a922a2a17cf93f9289b90 Pull Request resolved: pytorch#104418

The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]

ghstack-source-id: 5d23b076f1edb0ecb0b441f14fb7732e2be2d7e1 Pull Request resolved: pytorch#104418

The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]

rohan-varma · 2023-07-06T22:30:32Z

torch/distributed/fsdp/fully_sharded_data_parallel.py

@@ -417,6 +417,7 @@ def __init__(
                "forward_prefetch": forward_prefetch,
                "limit_all_gathers": limit_all_gathers,
                "use_orig_params": use_orig_params,
+                "ignored_states": self._ignored_params,


why do we need to propagate ignored_states, but not ignored_modules?

Good point. Maybe we should also propagate ignored_modules. This might be another bug.

The reason not propagating ignored_modules does not break correctness today is because we only use ignored_modules to compute the ignored parameters and to compute which modules to ignore for auto wrapping.

For those two functionalities, we do not need the nested FSDP instances to have ignored_modules. However, we should probably still propagate it in case we use ignored_modules for something else in the future.

I see. In general, what do you think about having both ignored_states and ignored_modules? this seems like it can get confusing to users, shall we just consolidate to ignored_states?

rohan-varma · 2023-07-06T22:31:48Z

test/distributed/fsdp/test_fsdp_ignored_modules.py

@@ -173,29 +165,32 @@ def _test_ignored_modules_transformer(
            CUDAInitMode.CUDA_BEFORE,
            deterministic=True,
        )
+        if use_auto_wrap:
+            nonwrapped_model.output_proj.weight = nn.Parameter(


why do we need this reassignment?

See the comment above about unsharing the weight.

I can duplicate the comment here if it helps.

The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]

awgu · 2023-07-08T12:38:18Z

@pytorchbot merge

pytorchmergebot · 2023-07-08T12:39:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[FSDP][5/N] Unblock ignored_states + auto wrap (for now)

42fe599

[ghstack-poisoned]

awgu mentioned this pull request Jun 29, 2023

[FSDP][1/N] Move wrapper ModuleWrapPolicy to new path #104346

Closed

awgu mentioned this pull request Jun 29, 2023

[FSDP] Annotate modules for fully_shard #104363

Closed

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Jun 29, 2023

This was referenced Jun 29, 2023

[FSDP][3/N] Unify fully_shard auto wrap #104408

Closed

[FSDP][2/N][Easy] Prepare _auto_wrap for fully_shard #104407

Closed

[FSDP][4/N] Remove _get_fully_sharded_module_to_states #104409

Closed

awgu added a commit that referenced this pull request Jun 29, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now)

b5a5182

ghstack-source-id: 38b3b336679415d39fcafd4c8c4c4d707e3816c5 Pull Request resolved: #104418

awgu mentioned this pull request Jun 29, 2023

[FSDP] Fix ignored_states + auto wrap #104275

Closed

awgu added a commit that referenced this pull request Jun 29, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now)

707a1fb

ghstack-source-id: 3dece821cc9b5b6fac09519f56414fbcf4ca3874 Pull Request resolved: #104418

awgu added the topic: bug fixes topic category label Jun 29, 2023

awgu added a commit that referenced this pull request Jun 29, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now)

2b1f3af

ghstack-source-id: 7a743d525ebfeee9362719c13dcef95993ecb0fa Pull Request resolved: #104418

awgu mentioned this pull request Jun 29, 2023

[FSDP][6/N] Check valid param freezing for ModuleWrapPolicy #104427

Closed

awgu added a commit to awgu/pytorch that referenced this pull request Jun 29, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now)

394f454

ghstack-source-id: 7a743d525ebfeee9362719c13dcef95993ecb0fa Pull Request resolved: pytorch#104418

awgu added a commit to awgu/pytorch that referenced this pull request Jun 30, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now)

3634579

ghstack-source-id: 60be363bf4780ea201280664d0fe0ede7e7faaa5 Pull Request resolved: pytorch#104418

awgu added a commit to awgu/pytorch that referenced this pull request Jun 30, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now)

ed5176d

ghstack-source-id: 28c95ac2e70c519db0e32cd6b3147ede9aed54cd Pull Request resolved: pytorch#104418

awgu added 2 commits June 30, 2023 00:43

awgu added a commit to awgu/pytorch that referenced this pull request Jun 30, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now)

4b3aec0

ghstack-source-id: eee579391a4b12b8558a922a2a17cf93f9289b90 Pull Request resolved: pytorch#104418

awgu added a commit to awgu/pytorch that referenced this pull request Jul 5, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now)

2261b54

ghstack-source-id: 5d23b076f1edb0ecb0b441f14fb7732e2be2d7e1 Pull Request resolved: pytorch#104418

awgu added a commit to awgu/pytorch that referenced this pull request Jul 5, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now)

a472449

ghstack-source-id: 5d23b076f1edb0ecb0b441f14fb7732e2be2d7e1 Pull Request resolved: pytorch#104418

awgu marked this pull request as ready for review July 5, 2023 15:36

awgu requested a review from mrshenli as a code owner July 5, 2023 15:36

awgu requested review from zhaojuanmao, rohan-varma, H-Huang, kwen2501, wanchaol, fegin, fduwjj, kiukchung and d4l3k as code owners July 5, 2023 15:36

This was referenced Jul 6, 2023

SetVariable in dynamo #103205

Closed

[WIP] Living branch / PR for FSDP development #103711

Closed

Migrate tuple(handle) -> handle #104488

Closed

rohan-varma reviewed Jul 6, 2023

View reviewed changes

awgu added 2 commits July 7, 2023 12:49

rohan-varma approved these changes Jul 7, 2023

View reviewed changes

awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 7, 2023

pytorchmergebot added the merging label Jul 8, 2023

pytorchmergebot added Merged and removed merging labels Jul 8, 2023

pytorchmergebot closed this in e600505 Jul 8, 2023

facebook-github-bot deleted the gh/awgu/414/head branch July 11, 2023 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP][5/N] Unblock `ignored_states` + auto wrap (for now) #104418

[FSDP][5/N] Unblock `ignored_states` + auto wrap (for now) #104418

awgu commented Jun 29, 2023 •

edited

Loading

pytorch-bot bot commented Jun 29, 2023 •

edited

Loading

rohan-varma Jul 6, 2023

awgu Jul 6, 2023

awgu Jul 7, 2023

rohan-varma Jul 7, 2023

rohan-varma Jul 6, 2023

awgu Jul 6, 2023

awgu Jul 6, 2023

awgu commented Jul 8, 2023

pytorchmergebot commented Jul 8, 2023

[FSDP][5/N] Unblock ignored_states + auto wrap (for now) #104418

[FSDP][5/N] Unblock ignored_states + auto wrap (for now) #104418

Conversation

awgu commented Jun 29, 2023 • edited Loading

pytorch-bot bot commented Jun 29, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104418

✅ No Failures

rohan-varma Jul 6, 2023

Choose a reason for hiding this comment

awgu Jul 6, 2023

Choose a reason for hiding this comment

awgu Jul 7, 2023

Choose a reason for hiding this comment

rohan-varma Jul 7, 2023

Choose a reason for hiding this comment

rohan-varma Jul 6, 2023

Choose a reason for hiding this comment

awgu Jul 6, 2023

Choose a reason for hiding this comment

awgu Jul 6, 2023

Choose a reason for hiding this comment

awgu commented Jul 8, 2023

pytorchmergebot commented Jul 8, 2023

Merge started

[FSDP][5/N] Unblock `ignored_states` + auto wrap (for now) #104418

[FSDP][5/N] Unblock `ignored_states` + auto wrap (for now) #104418

awgu commented Jun 29, 2023 •

edited

Loading

pytorch-bot bot commented Jun 29, 2023 •

edited

Loading