[Feature] Support EarlyStoppingHook #739

nijkah · 2022-11-17T14:01:19Z

Motivation

Closes #356

Modification

Added a new variable stop_training in EpochBasedTrainLoop and IterBasedTrainLoop.
Edited run method as

    def run(self) -> torch.nn.Module:
        """Launch training."""
        while self._epoch < self._max_epochs and not self.stop_training:
            self.run_epoch()

**
EarlyStoppingHook itself does not save the best checkpoint. Should we support it?

I referred to some logics from

mmengine/hooks/early_stopping_hook.py

codecov · 2022-11-19T10:38:55Z

Codecov Report

❗ No coverage uploaded for pull request base (main@25dfe41). Click here to learn what that means.
Patch has no changes to coverable lines.

❗ Current head fe6ff30 differs from pull request most recent head 046d7be. Consider uploading reports for the commit 046d7be to get more accurate results

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #739   +/-   ##
=======================================
  Coverage        ?   76.71%           
=======================================
  Files           ?      139           
  Lines           ?    10921           
  Branches        ?     2184           
=======================================
  Hits            ?     8378           
  Misses          ?     2186           
  Partials        ?      357

Flag	Coverage Δ
unittests	`76.71% <0.00%> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

nijkah · 2022-11-19T11:13:44Z

Hi @HAOCHENYE, I have a little question.
In the UT with initializing Runner, the error raises when the experiment_name of Runner isn't specified. Also, it happens when the experiment_name is duplicated with the other UTs although there is a logic for cleaning its temporary directory.

Is it intended (the developers should consider) or unintended behavior?

HAOCHENYE · 2022-11-20T13:01:51Z

Hi @HAOCHENYE, I have a little question. In the UT with initializing Runner, the error raises when the experiment_name of Runner isn't specified. Also, it happens when the experiment_name is duplicated with the other UTs although there is a logic for cleaning its temporary directory.

Is it intended (the developers should consider) or unintended behavior?

Usually, if the user does not specify the experiment name, Runner will automatically generate the experiment name and each Runner instance will have unique experiment names with different timestamp suffix. However, if the time interval between Runner's creation is so short that the timestamp suffixes are identical, it could lead to Runners with duplicate experiment names, which is not allowed in current design.

MMEngine has multiple global variances such as MessageHub, Visualizer and DefaultScope, and all these variables are marked with the experiment name of Runner. Global variables are very dangerous, therefore these variables must have unique names.

As for the unit tests will often raise an error about the duplicated experiment name, it is caused by the irregularity of the current unit testing, and we plan to refactor the unit tests, and provide a more general function to build Runner in unit tests.

nijkah · 2022-11-21T01:44:02Z

One more question! I am thinking about which experiments we should consider for no improvement.

Current implementation retains top scores ignoring the recent inferior results.
But it may be more intuitive to retain latest scores to check the trend of training.

Which one will be better?

HAOCHENYE · 2022-11-21T19:29:50Z

One more question! I am thinking about which experiments we should consider for no improvement.

Current implementation retains top scores ignoring the recent inferior results. But it may be more intuitive to retain latest scores to check the trend of training.

Which one will be better?

Additional notes： We could also borrow the implementation in

https://github.com/Lightning-AI/lightning/blob/9a4e8a8c528d6475ab33c46a0a84e27273cc10bd/src/pytorch_lightning/callbacks/early_stopping.py#L38

Back to the topic, it could make more sense to me to directly compare the latest metric with the previous best Nth metrics and finally update the Nth metrics with the latest one, rather than first update the Nth metrics, and then compare them 🤣. what do you think about it?

nijkah · 2022-11-22T04:05:54Z

I think your provided link looks neat. I'll rewrite the code especially for stopping logic and __init__ parameters.

zhouzaida · 2023-02-23T11:23:46Z

Hi @nijkah , thanks for your contribution. We are planning to merge this PR today and release a new version 0.6.0. The feature to resume can be supported in the future if users require it. In addition, this PR can be merged after resolving the conflict.

@zhouzaida I apologize for not finishing the rest work. I think it is okay to merge right now. I'll try to stay tuned to the future work for this.

Hi @nijkah , thanks for your contribution again 👍.

mmengine/hooks/early_stopping_hook.py

tests/test_hooks/test_early_stopping_hook.py

mmengine/hooks/early_stopping_hook.py

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

mmengine/hooks/early_stopping_hook.py

tests/test_hooks/test_early_stopping_hook.py

mmengine/hooks/early_stopping_hook.py

[Feature] EarlyStoppingHook

753bf2c

nijkah requested review from zhouzaida, HAOCHENYE and RangiLyu as code owners November 17, 2022 14:01

mm-assistant bot assigned ice-tong Nov 17, 2022

delete redundant line

833d5fc

HAOCHENYE added the planned feature label Nov 17, 2022

HAOCHENYE reviewed Nov 18, 2022

View reviewed changes

mmengine/hooks/early_stopping_hook.py Show resolved Hide resolved

mmengine/hooks/early_stopping_hook.py Outdated Show resolved Hide resolved

mmengine/hooks/early_stopping_hook.py Outdated Show resolved Hide resolved

Assert stop_training and rename tests

8e2bd76

nijkah requested review from HAOCHENYE and removed request for RangiLyu and zhouzaida November 18, 2022 11:43

HAOCHENYE reviewed Nov 18, 2022

View reviewed changes

mmengine/hooks/early_stopping_hook.py Outdated Show resolved Hide resolved

HAOCHENYE reviewed Nov 18, 2022

View reviewed changes

mmengine/hooks/early_stopping_hook.py Outdated Show resolved Hide resolved

HAOCHENYE reviewed Nov 18, 2022

View reviewed changes

mmengine/hooks/early_stopping_hook.py Outdated Show resolved Hide resolved

nijkah added 2 commits November 18, 2022 12:08

Fix UT

3e07e10

rename metric to monitor

b20112b

Fix UT

67dc666

nijkah force-pushed the feature/earlystop branch from 9867724 to 67dc666 Compare November 19, 2022 10:48

Fix UT

1808b79

HAOCHENYE added this to the 0.5.0 milestone Nov 20, 2022

edit docstring on patience

1107993

This was referenced Nov 25, 2022

Calling for volunteers for developing cool features! 🚀 #731

Closed

[Attention] 超级视客营 MMEngine 🚀🚀🚀 #732

Closed

lint

bccd43c

zhouzaida reviewed Feb 23, 2023

View reviewed changes

mmengine/hooks/early_stopping_hook.py Outdated Show resolved Hide resolved

zhouzaida reviewed Feb 23, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

zhouzaida reviewed Feb 23, 2023

View reviewed changes

mmengine/hooks/early_stopping_hook.py Outdated Show resolved Hide resolved

nijkah and others added 2 commits February 23, 2023 22:07

Apply suggestions from code review

bb6f31a

Co-authored-by: Zaida Zhou <58739961+zhouzaida@users.noreply.github.com>

delete save_last

4b40655

zhouzaida reviewed Mar 5, 2023

View reviewed changes

mmengine/hooks/early_stopping_hook.py Outdated Show resolved Hide resolved

zhouzaida reviewed Mar 5, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

infer rule more robust

1ade543

zhouzaida previously approved these changes Mar 5, 2023

View reviewed changes

zhouzaida modified the milestones: 0.6.0, 0.7.0 Mar 5, 2023

HAOCHENYE reviewed Mar 6, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

HAOCHENYE reviewed Mar 6, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

HAOCHENYE reviewed Mar 6, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

HAOCHENYE reviewed Mar 6, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

HAOCHENYE reviewed Mar 6, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

HAOCHENYE reviewed Mar 6, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

HAOCHENYE reviewed Mar 6, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

HAOCHENYE reviewed Mar 6, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

HAOCHENYE reviewed Mar 6, 2023

View reviewed changes

tests/test_hooks/test_early_stopping_hook.py Outdated Show resolved Hide resolved

refine unit test

946a8a4

HAOCHENYE dismissed zhouzaida’s stale review via 946a8a4 March 6, 2023 04:56

zhouzaida reviewed Mar 6, 2023

View reviewed changes

mmengine/hooks/early_stopping_hook.py Outdated Show resolved Hide resolved

Update mmengine/hooks/early_stopping_hook.py

046d7be

zhouzaida approved these changes Mar 6, 2023

View reviewed changes

zhouzaida merged commit b3430e4 into open-mmlab:main Mar 6, 2023

eugene123tw mentioned this pull request Nov 10, 2023

Migrate default detection model to OTXv2 openvinotoolkit/training_extensions#2598

Merged

8 tasks

1dmesh mentioned this pull request Feb 14, 2024

[Feature] Early Stopping, Validation Loss #1491

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Support EarlyStoppingHook #739

[Feature] Support EarlyStoppingHook #739

nijkah commented Nov 17, 2022 •

edited

codecov bot commented Nov 19, 2022 •

edited

nijkah commented Nov 19, 2022

HAOCHENYE commented Nov 20, 2022

nijkah commented Nov 21, 2022

HAOCHENYE commented Nov 21, 2022

nijkah commented Nov 22, 2022

zhouzaida commented Feb 23, 2023

[Feature] Support EarlyStoppingHook #739

[Feature] Support EarlyStoppingHook #739

Conversation

nijkah commented Nov 17, 2022 • edited

Motivation

Modification

codecov bot commented Nov 19, 2022 • edited

Codecov Report

nijkah commented Nov 19, 2022

HAOCHENYE commented Nov 20, 2022

nijkah commented Nov 21, 2022

HAOCHENYE commented Nov 21, 2022

nijkah commented Nov 22, 2022

zhouzaida commented Feb 23, 2023

nijkah commented Nov 17, 2022 •

edited

codecov bot commented Nov 19, 2022 •

edited