-
Notifications
You must be signed in to change notification settings - Fork 25.7k
[BE][docs]Improve and update checkpoint documentation #96862
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/96862
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 FailuresAs of commit e36eb85: NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
[ghstack-poisoned]
Updates: - recommend user to use non-reentrant, mention that reentrant will be deprecated in the future - merges all the warnings into a single list of non-reentrant improvements over reentrant - adds an additional entry to the list about allowing backward inside checkpointed region [ghstack-poisoned]
Updates: - recommend user to use non-reentrant, mention that reentrant will be deprecated in the future - merges all the warnings into a single list of non-reentrant improvements over reentrant - adds an additional entry to the list about allowing backward inside checkpointed region [ghstack-poisoned]
| # 6. During recompute, we see that in the original graph, gx has already | ||
| # cleared x and y since backward is run at (3) without retain_graph=True | ||
| # We save x and w, however. | ||
| # 7. Continue with returning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
purely cosmetic change
Updates: - recommend user to use non-reentrant, mention that reentrant will be deprecated in the future - merges all the warnings into a single list of non-reentrant improvements over reentrant - adds an additional entry to the list about allowing backward inside checkpointed region [ghstack-poisoned]
torch/utils/checkpoint.py
Outdated
| use ``use_reentrant=False`` (non-reentrant variant). If you have a use case | ||
| that requires the reentrant variant, please file an issue. | ||
| The non-reentrant variant offers several improvements over the reentrant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feel quite weird here tbh.
This is supposed to be the doc explaining how to use this function. Not a discussion on the evolution of things.
Also I think that for this part, saying that we recommend use_reentrant=False and punt the discussion on the tradeoff to a note here. Or even somewhere else.
I feel like this doc should be much much simpler now that we have a working impl. Not more complex!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. Kept the note here for now, though we can definitely move elsewhere later.
I feel like the simplicity would come closer to when we deprecate reentrant. Right now I don't think we should change too much since reentrant is still the default?
Updates: - recommend user to use non-reentrant, mention that reentrant will be deprecated in the future - merges all the warnings into a single list of non-reentrant improvements over reentrant - adds an additional entry to the list about allowing backward inside checkpointed region [ghstack-poisoned]
Updates: - recommend user to use non-reentrant, mention that reentrant will be deprecated in the future - merges all the warnings into a single list of non-reentrant improvements over reentrant - adds an additional entry to the list about allowing backward inside checkpointed region [ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. This looks good!
|
@pytorchbot merge -f "Unrelated failures" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Updates: - ~recommend user to use non-reentrant, mention that reentrant will be deprecated in the future~ - merges all the warnings into a single list of non-reentrant improvements over reentrant - adds an additional entry to the list about allowing backward inside checkpointed region Pull Request resolved: pytorch/pytorch#96862 Approved by: https://github.com/albanD
Updates: - ~recommend user to use non-reentrant, mention that reentrant will be deprecated in the future~ - merges all the warnings into a single list of non-reentrant improvements over reentrant - adds an additional entry to the list about allowing backward inside checkpointed region Pull Request resolved: pytorch/pytorch#96862 Approved by: https://github.com/albanD
Stack from ghstack (oldest at bottom):
Updates:
recommend user to use non-reentrant, mention that reentrant will be deprecated in the future