-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSDP][5/N] Unblock ignored_states
+ auto wrap (for now)
#104418
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104418
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 8b25684: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 38b3b336679415d39fcafd4c8c4c4d707e3816c5 Pull Request resolved: #104418
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
ghstack-source-id: 3dece821cc9b5b6fac09519f56414fbcf4ca3874 Pull Request resolved: #104418
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
ghstack-source-id: 7a743d525ebfeee9362719c13dcef95993ecb0fa Pull Request resolved: #104418
ghstack-source-id: 7a743d525ebfeee9362719c13dcef95993ecb0fa Pull Request resolved: pytorch#104418
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
ghstack-source-id: 60be363bf4780ea201280664d0fe0ede7e7faaa5 Pull Request resolved: pytorch#104418
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
ghstack-source-id: 28c95ac2e70c519db0e32cd6b3147ede9aed54cd Pull Request resolved: pytorch#104418
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
ghstack-source-id: eee579391a4b12b8558a922a2a17cf93f9289b90 Pull Request resolved: pytorch#104418
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
ghstack-source-id: 5d23b076f1edb0ecb0b441f14fb7732e2be2d7e1 Pull Request resolved: pytorch#104418
ghstack-source-id: 5d23b076f1edb0ecb0b441f14fb7732e2be2d7e1 Pull Request resolved: pytorch#104418
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
@@ -417,6 +417,7 @@ def __init__( | |||
"forward_prefetch": forward_prefetch, | |||
"limit_all_gathers": limit_all_gathers, | |||
"use_orig_params": use_orig_params, | |||
"ignored_states": self._ignored_params, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need to propagate ignored_states, but not ignored_modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Maybe we should also propagate ignored_modules
. This might be another bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason not propagating ignored_modules
does not break correctness today is because we only use ignored_modules
to compute the ignored parameters and to compute which modules to ignore for auto wrapping.
For those two functionalities, we do not need the nested FSDP instances to have ignored_modules
. However, we should probably still propagate it in case we use ignored_modules
for something else in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. In general, what do you think about having both ignored_states and ignored_modules? this seems like it can get confusing to users, shall we just consolidate to ignored_states?
@@ -173,29 +165,32 @@ def _test_ignored_modules_transformer( | |||
CUDAInitMode.CUDA_BEFORE, | |||
deterministic=True, | |||
) | |||
if use_auto_wrap: | |||
nonwrapped_model.output_proj.weight = nn.Parameter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this reassignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment above about unsharing the weight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can duplicate the comment here if it helps.
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
The "for now" is because we still have the issue that when using the parameter `ignored_states` path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know. [ghstack-poisoned]
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
ModuleWrapPolicy
#104427ignored_states
+ auto wrap (for now) #104418_get_fully_sharded_module_to_states
#104409fully_shard
auto wrap #104408_auto_wrap
forfully_shard
#104407ModuleWrapPolicy
to new path #104346The "for now" is because we still have the issue that when using the parameter
ignored_states
path, we do not recover the ignored modules, so FSDP still wraps those as empty shells (no managed parameters), which is not ideal. This is not a blocking issue as far as I know.