Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic model checkpointing for pytorch-lightning training #10935
Automatic model checkpointing for pytorch-lightning training #10935
Changes from all commits
7d1f3a6
7b8aa8a
fbd78d0
fc4b78c
510dce5
987c0b8
1176113
e5b3916
2d477c5
9f19dc1
0586388
13ef85f
029b978
d749f75
5dd187e
e446073
62aa928
81e30d6
739a350
1c63358
4e414b3
5fd4d08
557091d
0bc2f9a
d8c4ccc
d5b77bb
113be24
d8aa855
df07b7b
21af0aa
250840d
da14b5b
9f0e0c4
fa7a6e1
c12d334
3c8f186
7e23cfc
b9653e6
ce752ac
d562550
3305894
54f14e5
91ea6a0
9d1da54
d1dfdb6
9f27ca6
0f33c52
4891c91
afc7846
e0bfbfb
01c5b69
4d5521b
27dc379
7a8527c
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I will add it to
__all__
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do I use this API? Any example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a demo notebook https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#notebook/2173893049403456
After model checkpoint is logged, you can call
load_checkpoint
API to get the checkpoint modelThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add an example in the docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a
Returns
section in the docstring?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Returns section and example code are added
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can skip this try catch block because we are using lazy loading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seemingly lazy loading doesn't address it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and if we don't swallow the exception here,
when initializaing the
mlflow.pytorch
module (even if it is lazy-loading, it needs initialization when it is loaded), exception is raised.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are right, I forgot that this is called pytorch module. In fact this autologging part should go to
lightning
orpytorch_lightning
, but it's independent of this PR.