1586 Add greater_or_equal option to Checkpoint handler #1597

Nic-Ma · 2021-02-01T10:02:46Z

Signed-off-by: Nic Ma nma@nvidia.com

Description:
This PR added an option greater_or_equal to the Checkpoint handler, whether to save checkpoint if new priority equals to _saved[0].

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

Signed-off-by: Nic Ma <nma@nvidia.com>

vfdev-5 · 2021-02-01T10:11:54Z

Hi @Nic-Ma , thanks a lot for the PR !
I see your point about score_function and this option : #1586 (comment)

Seems like greater_or_equal will be only used with score_function, right ? As if we use Checkpoint that defines its priority by the iteration, than greater_or_equal=False wont be helpful...

Nic-Ma · 2021-02-01T10:18:06Z

Hi @vfdev-5 ,

Yes, I didn't change the default behavior, just added an option for the case that score equals to the previous _save[0] and you want to save the latest checkpoint even with the same score.
If using iteration as the priority, it will not have the case that priority equals to the previous one, right?

Thanks.

vfdev-5 · 2021-02-01T10:25:07Z

@Nic-Ma I think I misunderstood the PR.

Yes, I didn't change the default behavior, just added an option for the case that score equals to the previous _save[0] and you want to save the latest checkpoint even with the same score.
If using iteration as the priority, it will not have the case that priority equals to the previous one, right?

I think it was asked previously and we added that such that latest equally scored model is stored... Let me check that

vfdev-5 · 2021-02-01T10:36:58Z

OK, catched up, what we did was to be able to save the latest model with the same filename.

Nic-Ma · 2021-02-01T10:41:04Z

Thanks for your confirmation, and this is a really useful feature, especially in FL, for example:
During FL training, we keep the training run for a long time and add more and more data to train the model day by day, the later model should be better even has the same metrics.

Thanks.

vfdev-5

Just some comments to fix the implementation

ignite/handlers/checkpoint.py

Signed-off-by: Nic Ma <nma@nvidia.com>

vfdev-5 · 2021-02-01T10:43:27Z

Thanks for the explanation! Yes, true that this makes a lot of sense. I wonder if we should not set that as a default behaviour now ? And for users who'd like to have BC, they could use greater_or_equal=False...

Nic-Ma · 2021-02-01T10:45:49Z

Thanks for the explanation! Yes, true that this makes a lot of sense. I wonder if we should not set that as a default behaviour now ? And for users who'd like to have BC, they could use greater_or_equal=False...

But I think for regular training on a fixed dataset, maybe the earlier model with the same metrics is better, because the later model may be overfitting? That's why we usually use early-stopping?
So I prefer to default to False, and it will not break any ignite previous behavior.

Thanks.

vfdev-5 · 2021-02-01T10:48:52Z

yes, it could also prevent saving overfitted model. Let's keep it False by default, I agree.

Signed-off-by: Nic Ma <nma@nvidia.com>

ignite/handlers/checkpoint.py

Signed-off-by: Nic Ma <nma@nvidia.com>

vfdev-5

LGTM! Thanks a lot @Nic-Ma !

Nic-Ma · 2021-02-01T11:02:11Z

I can't see the CI errors, could you please help me figure it out?

Thanks.

vfdev-5 · 2021-02-01T11:05:03Z

I can't see the CI errors, could you please help me figure it out?

Thanks.

@Nic-Ma it is not an error, I just updated the PR to the latest master and thus cancelled Circle CI tests, but Github interprets this as a failure with a red cross.

Signed-off-by: Nic Ma <nma@nvidia.com>

vfdev-5 · 2021-02-01T13:46:50Z

Oh, I forgot about adding versionadd in the docs.
In the end of the docs, we can add

.. versionadded:: 0.4.3
    Added ``greater_or_equal`` parameter.

@Nic-Ma could you please send a follow-up PR with that ?

ydcjeff · 2021-02-01T14:19:45Z

Most project uses versionadd for new class, funtion, etc and uses versionchanged for adding new args / behaviours and bug fixes.
Example: https://docs.python.org/3/library/os.html#os.environb

vfdev-5 · 2021-02-01T14:23:20Z

@ydcjeff thanks for the details ! In the provided link, actually I couldn't find any new args added, but I think it may seem reasonable to use versionchanged for that.

Nic-Ma · 2021-02-01T14:25:34Z

Hi @vfdev-5 @ydcjeff ,

Sure, I submitted PR #1600 and changed to versionchanged.
Could you please help review it?

Thanks.

ydcjeff · 2021-02-01T14:25:44Z

@vfdev-5 Yea, the example may be belongs to changing of internal behaviour.

Nic-Ma · 2021-02-25T00:38:03Z

Hi @vfdev-5 ,

This is a missing feature for our project, could you please help add this MR to your 0.4.4 bug fix release?

Thanks.

vfdev-5 · 2021-02-25T00:39:36Z

Hi @Nic-Ma, sure !

vfdev-5 · 2021-03-01T08:44:31Z

Hi @Nic-Ma , actually, just checked but this PR is already present in v0.4.3: https://pytorch.org/ignite/handlers.html#ignite.handlers.Checkpoint
Please, let me know if I'm missing something...

Nic-Ma · 2021-03-01T09:16:15Z

Cool, thanks!!

[NVIDIA] add greater_or_equal option to checkpoint handler

a8c0a06

Signed-off-by: Nic Ma <nma@nvidia.com>

Nic-Ma mentioned this pull request Feb 1, 2021

Support options to compare in CheckpointHander #1586

Closed

vfdev-5 reviewed Feb 1, 2021

View reviewed changes

ignite/handlers/checkpoint.py Outdated Show resolved Hide resolved

ignite/handlers/checkpoint.py Outdated Show resolved Hide resolved

ignite/handlers/checkpoint.py Outdated Show resolved Hide resolved

[NVIDIA] fix mypy errors

330c540

Signed-off-by: Nic Ma <nma@nvidia.com>

[NVIDIA] update doc-string

0ad512a

Signed-off-by: Nic Ma <nma@nvidia.com>

vfdev-5 reviewed Feb 1, 2021

View reviewed changes

ignite/handlers/checkpoint.py Outdated Show resolved Hide resolved

vfdev-5 reviewed Feb 1, 2021

View reviewed changes

ignite/handlers/checkpoint.py Outdated Show resolved Hide resolved

Nic-Ma and others added 2 commits February 1, 2021 18:58

[NVIDIA] update according to comments

b29fed6

Signed-off-by: Nic Ma <nma@nvidia.com>

Merge branch 'master' into add-compare-fn

e3864aa

vfdev-5 approved these changes Feb 1, 2021

View reviewed changes

[NVIDIA] fix typo

762a8dd

Signed-off-by: Nic Ma <nma@nvidia.com>

vfdev-5 merged commit 0981e37 into pytorch:master Feb 1, 2021

Nic-Ma deleted the add-compare-fn branch February 1, 2021 12:05

vfdev-5 mentioned this pull request Feb 25, 2021

[v0.4.4] Release Tracker #1701

Closed

Uh oh!

1586 Add greater_or_equal option to Checkpoint handler #1597

1586 Add greater_or_equal option to Checkpoint handler #1597

Uh oh!

Conversation

Nic-Ma commented Feb 1, 2021 • edited by vfdev-5 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vfdev-5 commented Feb 1, 2021

Uh oh!

Nic-Ma commented Feb 1, 2021

Uh oh!

vfdev-5 commented Feb 1, 2021

Uh oh!

vfdev-5 commented Feb 1, 2021

Uh oh!

Nic-Ma commented Feb 1, 2021

Uh oh!

vfdev-5 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vfdev-5 commented Feb 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nic-Ma commented Feb 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vfdev-5 commented Feb 1, 2021

Uh oh!

Uh oh!

Uh oh!

vfdev-5 left a comment

Choose a reason for hiding this comment

Uh oh!

Nic-Ma commented Feb 1, 2021

Uh oh!

vfdev-5 commented Feb 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vfdev-5 commented Feb 1, 2021

Uh oh!

ydcjeff commented Feb 1, 2021

Uh oh!

vfdev-5 commented Feb 1, 2021

Uh oh!

Nic-Ma commented Feb 1, 2021

Uh oh!

ydcjeff commented Feb 1, 2021

Uh oh!

Nic-Ma commented Feb 25, 2021

Uh oh!

vfdev-5 commented Feb 25, 2021

Uh oh!

vfdev-5 commented Mar 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Nic-Ma commented Mar 1, 2021

Uh oh!

Uh oh!

Nic-Ma commented Feb 1, 2021 •

edited by vfdev-5

Loading

vfdev-5 commented Feb 1, 2021 •

edited

Loading

Nic-Ma commented Feb 1, 2021 •

edited

Loading

vfdev-5 commented Feb 1, 2021 •

edited

Loading

vfdev-5 commented Mar 1, 2021 •

edited

Loading