New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Only make a shallow copy when loading optimizer state_dict #106082
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/106082
Note: Links to docs will display an error until the docs builds have been completed. ✅ 2 Unrelated FailuresAs of commit 8aea828: UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This should also save memory when loading from a checkpoint. [ghstack-poisoned]
This should also save memory when loading from a checkpoint. [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM! Very good catch!
The thing we do still deep copy is the param_groups, which is much lighter weight. This should also save memory when loading from a checkpoint. The deepcopy was introduced in ecfcf39, but module.py had only a shallow copy at that point so it did not actually bring parity. [ghstack-poisoned]
ghstack-source-id: cee33dcf6d531cb942fb9cde6947943dd7d709b2 Pull Request resolved: #106082
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok!
The thing we do still deep copy is the param_groups, which is much lighter weight. This should also save memory when loading from a checkpoint. The deepcopy was introduced in ecfcf39, but module.py had only a shallow copy at that point so it did not actually bring parity. [ghstack-poisoned]
ghstack-source-id: 8028f48dbef35e145614fe14b0557c8f5fc0e736 Pull Request resolved: #106082
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: Command
Details for Dev Infra teamRaised by workflow job |
The thing we do still deep copy is the param_groups, which is much lighter weight. This should also save memory when loading from a checkpoint. The deepcopy was introduced in ecfcf39, but module.py had only a shallow copy at that point so it did not actually bring parity. Incorporates an XLA fix, which is why I'm updating the pin to pytorch/xla@ca5eab8 [ghstack-poisoned]
ghstack-source-id: 12751888e2adc1663544cd94cdd145721b1ec926 Pull Request resolved: #106082
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
The thing we do still deep copy is the param_groups, which is much lighter weight. This should also save memory when loading from a checkpoint.
The deepcopy was introduced in ecfcf39, but module.py had only a shallow copy at that point so it did not actually bring parity.
Incorporates an XLA fix, which is why I'm updating the pin to pytorch/xla@ca5eab8
Stack from ghstack (oldest at bottom):