-
Notifications
You must be signed in to change notification settings - Fork 25.2k
[FSDP] Enable mixed hybrid/non-hybrid sharding strategies #90846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/90846
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit c814a10: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this. [ghstack-poisoned]
In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this. [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the preemptive stamp - coming back for a full review :)
import sys | ||
from collections import Counter |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have formatting changes separated out and in a different PR? Formatting changes with logical changes make it harder for reviewers to understand the critical parts of the PR to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that. I separated it out. Eventually, we should try to have all developers use ufmt so that we can ufmt files right before pushing. Converging to the same formatter saves the uncertainty of how code should be formatted, and we chose ufmt since PyTorch recommends that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious about how we will encourage this practice. lintrunner
is today's automated tool that can run pre-commit and is enforced by PyTorch CI.
Should we work with dev infra / CI folks to add ufmt to lintrunner
? Without automation, enforcement of a linting standard is prone to breaking.
In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this. [ghstack-poisoned]
In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this. [ghstack-poisoned]
ghstack-source-id: 54419fe Pull Request resolved: pytorch#90846
I will fix test failures tomorrow. |
In the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this. [ghstack-poisoned]
ghstack-source-id: b14bebe Pull Request resolved: pytorch#90846
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack:
fully_shard
#90874 [FSDP][5/N] Add manual "wrapping" support forfully_shard
fsdp_modules(root_only=True)
->_get_fsdp_root_states()
#90862 [FSDP][3/N] Movefsdp_modules(root_only=True)
->_get_fsdp_root_states()
fsdp_modules(root_only=False)
->_get_fsdp_states()
#90861 [FSDP][2/N] Movefsdp_modules(root_only=False)
->_get_fsdp_states()
entry
->fsdp_module
to be more descriptive #90864 [FSDP][Easy] Renameentry
->fsdp_module
to be more descriptive_get_fsdp_states()
#90860 [FSDP][1/N] Add_get_fsdp_states()
run_subtests
for hybrid shard test #90859 [FSDP][Easy] Userun_subtests
for hybrid shard test_module_to_handles
,HandleConfig
; use term "fqn"; clarify docs #90840 [FSDP][BE] Remove_module_to_handles
,HandleConfig
; use term "fqn"; clarify docsIn the context of hybrid sharding strategies, we only need to enforce the same process groups among the instances using a hybrid sharding strategy, not all instances. We can even mix and match the two different hybrid sharding strategies. This PR relaxes the validation to support this.