[FSDP][2/N] `_summon_full_params` -> `_unshard_params` #92297

awgu · 2023-01-17T10:41:30Z

Stack from ghstack:

[FSDP][3/N] Refactor summon_full_params unit tests #92298 [FSDP][3/N] Refactor summon_full_params unit tests
[FSDP][2/N] _summon_full_params -> _unshard_params #92297 [FSDP][2/N] _summon_full_params -> _unshard_params

Overview
This PR stack will add support for unsharding FSDP's sharded parameters for fully_shard. This PR takes the first step by doing some internal refactoring.

The existing API for wrapper FSDP is the static method summon_full_params(), which calls into the helper _summon_full_params().
This PR refactors:
- summon_full_params() core logic to _unshard_params()
- _summon_full_params() to _unshard_params_recurse(), which has a recurse: bool argument
- Previous _unshard_params() to _unshard_fsdp_state_params(), which applies to a single FSDP state

Details

This PR introduces _get_fsdp_states_with_modules() and _get_root_fsdp_states_with_modules(), which additionally return the modules along with the FSDP states. The modules are needed for handling FlatParameter registration.
- We may be able to remove this if we clean up the use_orig_params=True vs. False code paths because for True, the FlatParameter is not registered, meaning that it does not need to be de-registered.
- Since fully_shard requires use_orig_params=True, we may not need _get_fsdp_states_with_modules() and _get_root_fsdp_root_modules(); however, I prefer to make the separation of FSDP state and module explicit for now for clarity.

Follow-Ups

writeback=True and rank0_only=True raises an error. The previous explanation was:

is not supported, as model parameter shapes will be different across ranks, and writing to them can lead to inconsistencies across ranks when the context is exited.

I am not exactly sure what the different model parameter shapes refers to. However, I believe that we can support writeback=True and rank0_only=True by broadcasting the FlatParameter from rank 0 in the finally, writing back, and freeing. This should not increase the peak memory since rank 0 already holds the unsharded FlatParameter in GPU memory before writing back and nonzero ranks do not have any other unsharded FlatParameters in GPU memory.

[ghstack-poisoned]

pytorch-bot · 2023-01-17T10:41:32Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92297

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit c8420cc:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 998f0f505d552707385478dc6802cefd64049968 Pull Request resolved: pytorch#92297

**Overview** This PR stack will add support for unsharding FSDP's sharded parameters for `fully_shard`. This PR takes the first step by doing some internal refactoring. - The existing API for wrapper FSDP is the static method `summon_full_params()`, which calls into the helper `_summon_full_params()`. - This PR refactors: - `summon_full_params()` core logic to `_unshard_params()` - `_summon_full_params()` to `_unshard_params_recurse()`, which has a `recurse: bool` argument - Previous `_unshard_params()` to `_unshard_fsdp_state_params()`, which applies to a single FSDP state **Details** - This PR introduces `_get_fsdp_states_with_modules()` and `_get_root_fsdp_states_with_modules()`, which additionally return the modules along with the FSDP states. The modules are needed for handling `FlatParameter` registration. - We may be able to remove this if we clean up the `use_orig_params=True` vs. `False` code paths because for `True`, the `FlatParameter` is not registered, meaning that it does not need to be de-registered. - Since `fully_shard` requires `use_orig_params=True`, we may not need `_get_fsdp_states_with_modules()` and `_get_root_fsdp_root_modules()`; however, I prefer to make the separation of FSDP state and module explicit for now for clarity. **Follow-Ups** - `writeback=True` and `rank0_only=True` raises an error. The previous explanation was: > is not supported, as model parameter shapes will be different across ranks, and writing to them can lead to inconsistencies across ranks when the context is exited. I am not exactly sure what the different model parameter shapes refers to. However, I believe that we can support `writeback=True` and `rank0_only=True` by broadcasting the `FlatParameter` from rank 0 in the `finally`, writing back, and freeing. This should not increase the peak memory since rank 0 already holds the unsharded `FlatParameter` in GPU memory before writing back and nonzero ranks do not have any other unsharded `FlatParameter`s in GPU memory. [ghstack-poisoned]

rohan-varma

Shall we add unittests for summon_full_params composable path?

rohan-varma · 2023-02-01T23:36:34Z

torch/distributed/fsdp/_unshard_param_utils.py

-            "to them can lead to inconsistencies across ranks when the "
-            "context is exited."
-        )
+        # TODO: Rank 0 can broadcast the `FlatParameter` to allow all ranks to


could we file an issue for this? would it work for use_orig_params=True as well?

I think it should work for both use_orig_params=True and False. I will file an issue.

rohan-varma · 2023-02-01T23:37:06Z

torch/distributed/fsdp/_unshard_param_utils.py

+    if recurse:
+        with contextlib.ExitStack() as stack:
+            # TODO (awgu): The traversal function does not traverse through
+            # incompatible composable APIs. Verify if this is the desired


Could you elaborate, what's an example of this?

fully_shard( Module( replicate( Submodule( fully_shard(Subsubmodule), Subsubmodule, ), Submodule, )

Because the traversal utils do not go through incompatible composable APIs (here, replicate), calling _unshard_params on the root Module will not unshard the parameters of the fully sharded Subsubmodule.

awgu · 2023-02-02T01:26:25Z

Shall we add unittests for summon_full_params composable path?

Yes, this has not been added yet. (I have a local [4/N] commit that does add a frontend for that path, but I did not open a PR for it since we have not finalized what the API should look like.) I will add tests when we include that.

**Overview** This PR stack will add support for unsharding FSDP's sharded parameters for `fully_shard`. This PR takes the first step by doing some internal refactoring. - The existing API for wrapper FSDP is the static method `summon_full_params()`, which calls into the helper `_summon_full_params()`. - This PR refactors: - `summon_full_params()` core logic to `_unshard_params()` - `_summon_full_params()` to `_unshard_params_recurse()`, which has a `recurse: bool` argument - Previous `_unshard_params()` to `_unshard_fsdp_state_params()`, which applies to a single FSDP state **Details** - This PR introduces `_get_fsdp_states_with_modules()` and `_get_root_fsdp_states_with_modules()`, which additionally return the modules along with the FSDP states. The modules are needed for handling `FlatParameter` registration. - We may be able to remove this if we clean up the `use_orig_params=True` vs. `False` code paths because for `True`, the `FlatParameter` is not registered, meaning that it does not need to be de-registered. - Since `fully_shard` requires `use_orig_params=True`, we may not need `_get_fsdp_states_with_modules()` and `_get_root_fsdp_root_modules()`; however, I prefer to make the separation of FSDP state and module explicit for now for clarity. **Follow-Ups** - `writeback=True` and `rank0_only=True` raises an error. The previous explanation was: > is not supported, as model parameter shapes will be different across ranks, and writing to them can lead to inconsistencies across ranks when the context is exited. I am not exactly sure what the different model parameter shapes refers to. However, I believe that we can support `writeback=True` and `rank0_only=True` by broadcasting the `FlatParameter` from rank 0 in the `finally`, writing back, and freeing. This should not increase the peak memory since rank 0 already holds the unsharded `FlatParameter` in GPU memory before writing back and nonzero ranks do not have any other unsharded `FlatParameter`s in GPU memory. [ghstack-poisoned]

…n-dev-setup * origin: (898 commits) Move dynamo.optimizations.distributed to backends (pytorch#93408) Remove cuda 11.6 from nightly (pytorch#93979) Refactor dynamo register_backend/BACKENDS (pytorch#93389) Remove cuda 11.6 from CI replace with 11.7 (pytorch#93406) [Dynamo] Rename `GuardBuilder.guarded_code` -> `check_fn_manager` (pytorch#93934) Revert "Remove CUDA 11.6 from nightly builds (pytorch#93404)" Revert "[inductor] fix crash issue when input is a view tensor (pytorch#90150)" Basic Validation for FSDP `state_dict` transformations of modules with persistent buffers (pytorch#93396) Merge Inductor perf smoke test with other inductor CI tests (pytorch#93395) [inductor] Don't import torchvision (pytorch#93027) [FSDP][3/N] Refactor `summon_full_params` unit tests (pytorch#92298) [FSDP][2/N] `_summon_full_params` -> `_unshard_params` (pytorch#92297) Remove CUDA 11.6 from nightly builds (pytorch#93404) Mark buffers that reuse other buffers (pytorch#93329) Refactor to allow reuse of SchedulerNode.allocate (pytorch#93328) retire sparse_mask_helper (pytorch#91714) update fbgemm third party (pytorch#93907) [inductor] fix crash issue when input is a view tensor (pytorch#90150) [Inductor] add config for weight prepacking (pytorch#93811) Check for none for NNModuleVariable.__module__ (pytorch#93326) ...

[FSDP][2/N] _summon_full_params -> _unshard_params

5097a0f

[ghstack-poisoned]

pytorch-bot bot added the release notes: distributed (fsdp) release notes category label Jan 17, 2023

This was referenced Jan 17, 2023

[FSDP][1/N] Split fully_shard unit tests #92296

Closed

[FSDP][3/N] Refactor summon_full_params unit tests #92298

Closed

awgu added a commit to awgu/pytorch that referenced this pull request Jan 17, 2023

[FSDP][2/N] _summon_full_params -> _unshard_params

3e0e4f3

ghstack-source-id: 998f0f505d552707385478dc6802cefd64049968 Pull Request resolved: pytorch#92297

awgu marked this pull request as ready for review January 17, 2023 17:04

awgu requested review from mrshenli, zhaojuanmao, rohan-varma, H-Huang, kwen2501 and wanchaol as code owners January 17, 2023 17:04

awgu mentioned this pull request Jan 19, 2023

[WIP][FSDP] Add unshard_params for fully_shard #92639

Closed

rohan-varma approved these changes Feb 1, 2023

View reviewed changes

awgu added ciflow/trunk Trigger trunk jobs on your pull request topic: not user facing topic category labels Feb 2, 2023

pytorchmergebot added the Merged label Feb 2, 2023

pytorchmergebot closed this in 1099073 Feb 2, 2023

facebook-github-bot deleted the gh/awgu/302/head branch June 8, 2023 15:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP][2/N] `_summon_full_params` -> `_unshard_params` #92297

[FSDP][2/N] `_summon_full_params` -> `_unshard_params` #92297

awgu commented Jan 17, 2023 •

edited

Loading

pytorch-bot bot commented Jan 17, 2023 •

edited

Loading

rohan-varma left a comment

rohan-varma Feb 1, 2023

awgu Feb 2, 2023

awgu Feb 2, 2023

rohan-varma Feb 1, 2023

awgu Feb 2, 2023

awgu commented Feb 2, 2023

[FSDP][2/N] _summon_full_params -> _unshard_params #92297

[FSDP][2/N] _summon_full_params -> _unshard_params #92297

Conversation

awgu commented Jan 17, 2023 • edited Loading

pytorch-bot bot commented Jan 17, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92297

✅ No Failures

rohan-varma left a comment

Choose a reason for hiding this comment

rohan-varma Feb 1, 2023

Choose a reason for hiding this comment

awgu Feb 2, 2023

Choose a reason for hiding this comment

awgu Feb 2, 2023

Choose a reason for hiding this comment

rohan-varma Feb 1, 2023

Choose a reason for hiding this comment

awgu Feb 2, 2023

Choose a reason for hiding this comment

awgu commented Feb 2, 2023

[FSDP][2/N] `_summon_full_params` -> `_unshard_params` #92297

[FSDP][2/N] `_summon_full_params` -> `_unshard_params` #92297

awgu commented Jan 17, 2023 •

edited

Loading

pytorch-bot bot commented Jan 17, 2023 •

edited

Loading