-
-
Notifications
You must be signed in to change notification settings - Fork 655
1586 Add greater_or_equal option to Checkpoint handler #1597
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Nic Ma <nma@nvidia.com>
Hi @Nic-Ma , thanks a lot for the PR ! Seems like |
Hi @vfdev-5 , Yes, I didn't change the default behavior, just added an option for the case that Thanks. |
@Nic-Ma I think I misunderstood the PR.
I think it was asked previously and we added that such that latest equally scored model is stored... Let me check that |
OK, catched up, what we did was to be able to save the latest model with the same filename. |
Thanks for your confirmation, and this is a really useful feature, especially in FL, for example: Thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments to fix the implementation
Signed-off-by: Nic Ma <nma@nvidia.com>
Thanks for the explanation! Yes, true that this makes a lot of sense. I wonder if we should not set that as a default behaviour now ? And for users who'd like to have BC, they could use |
But I think for regular training on a fixed dataset, maybe the earlier model with the same metrics is better, because the later model may be overfitting? That's why we usually use Thanks. |
yes, it could also prevent saving overfitted model. Let's keep it False by default, I agree. |
Signed-off-by: Nic Ma <nma@nvidia.com>
Signed-off-by: Nic Ma <nma@nvidia.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks a lot @Nic-Ma !
I can't see the CI errors, could you please help me figure it out? Thanks. |
@Nic-Ma it is not an error, I just updated the PR to the latest master and thus cancelled Circle CI tests, but Github interprets this as a failure with a red cross. |
Signed-off-by: Nic Ma <nma@nvidia.com>
Oh, I forgot about adding
@Nic-Ma could you please send a follow-up PR with that ? |
Most project uses |
@ydcjeff thanks for the details ! In the provided link, actually I couldn't find any new args added, but I think it may seem reasonable to use |
@vfdev-5 Yea, the example may be belongs to changing of internal behaviour. |
Hi @vfdev-5 , This is a missing feature for our project, could you please help add this MR to your 0.4.4 bug fix release? Thanks. |
Hi @Nic-Ma, sure ! |
Hi @Nic-Ma , actually, just checked but this PR is already present in v0.4.3: https://pytorch.org/ignite/handlers.html#ignite.handlers.Checkpoint |
Cool, thanks!! |
Signed-off-by: Nic Ma nma@nvidia.com
Fixes #1586
Description:
This PR added an option
greater_or_equal
to the Checkpoint handler, whether to save checkpoint if new priority equals to _saved[0].Check list: