-
Notifications
You must be signed in to change notification settings - Fork 21.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSDP][optim_state_dict][1/N] Restructure _optim_state_dict to prepare the support of use_orig_param #89898
Conversation
…e the support of use_orig_param [ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89898
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit 18d5617: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
…t to prepare the support of use_orig_param" [ghstack-poisoned]
…t to prepare the support of use_orig_param" **Motivation:** Restructure some APIs in _optim_state_dict.py to allow better future extension, mostly for supporting use_orig_params. NO logic change in this PR. [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this reorganization!
:class:`list`. | ||
def _map_param_id_to_optim_keys( | ||
optim_state_dict: Dict[str, Any], | ||
group: Optional[dist.ProcessGroup], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: While refactoring optimizer state dict, it could be worthwhile to keep an eye out for process group usage (mainly looking for any accidental default process group usage). This is in preparation for HSDP, where every process group usage now should map to the intra-node process group from HSDP.
cc: @rohan-varma
return r0_param_id_to_optim_state_key, optim_state_key_to_param_id | ||
|
||
|
||
def _process_param_groups( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I am wondering, should we call this _unflatten_param_groups
to be more specific?
…t to prepare the support of use_orig_param" **Motivation:** Restructure some APIs in _optim_state_dict.py to allow better future extension, mostly for supporting use_orig_params. NO logic change in this PR. [ghstack-poisoned]
…t to prepare the support of use_orig_param" **Motivation:** Restructure some APIs in _optim_state_dict.py to allow better future extension, mostly for supporting use_orig_params. NO logic change in this PR. [ghstack-poisoned]
…t to prepare the support of use_orig_param" **Motivation:** Restructure some APIs in _optim_state_dict.py to allow better future extension, mostly for supporting use_orig_params. NO logic change in this PR. [ghstack-poisoned]
@pytorchbot merge -f "The failing test is unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…e the support of use_orig_param (pytorch#89898) **Motivation:** Restructure some APIs in _optim_state_dict.py to allow better future extension, mostly for supporting use_orig_params. NO logic change in this PR. Pull Request resolved: pytorch#89898 Approved by: https://github.com/awgu
Stack from ghstack (oldest at bottom):
Motivation:
Restructure some APIs in _optim_state_dict.py to allow better future extension, mostly for supporting use_orig_params. NO logic change in this PR.