Fix validation loss logging #1494

mfernezir · 2022-04-20T20:00:51Z

Motivation

The validation loss is currently not logged in the setting of both train and val workflow modes.

The problem is described in this issue:
#1396

Iteration runner calls the model's val_step here
https://github.com/open-mmlab/mmcv/blob/44edcdd91f8940a58c9680c609febc757a50a040/mmcv/runner/iter_based_runner.py#L77

The model's val_step returns the loss under the same key as in the training mode, which gets lost in all output logs later on.

The training log in master

2022-04-19 19:32:16,444 - mmseg - INFO - workflow: [('train', 1), ('val', 1)], max: 60000 iters
2022-04-19 19:32:16,444 - mmseg - INFO - Checkpoints will be saved to /home/SENSETIME/zhengmiao/openmmlab/mmsegmentation/work_dir/test_val by HardDiskBackend.
2022-04-19 19:32:28,897 - mmseg - INFO - Iter [50/60000]        lr: 9.993e-04, eta: 1:42:42, time: 0.103, data_time: 0.031, memory: 1826, decode.loss_ce: 8.7741, decode.acc_seg: 46.9911, loss: 8.7741
2022-04-19 19:32:38,829 - mmseg - INFO - Iter [100/60000]       lr: 9.987e-04, eta: 1:39:56, time: 0.097, data_time: 0.025, memory: 1826, decode.loss_ce: 7.8357, decode.acc_seg: 55.5759, loss: 7.8357
2022-04-19 19:32:48,881 - mmseg - INFO - Iter [150/60000]       lr: 9.980e-04, eta: 1:40:39, time: 0.103, data_time: 0.031, memory: 1826, decode.loss_ce: 6.8078, decode.acc_seg: 55.5736, loss: 6.8078
2022-04-19 19:32:58,964 - mmseg - INFO - Iter [200/60000]       lr: 9.973e-04, eta: 1:39:12, time: 0.095, data_time: 0.024, memory: 1826, decode.loss_ce: 7.4038, decode.acc_seg: 55.5497, loss: 7.4038
2022-04-19 19:33:10,215 - mmseg - INFO - Iter [250/60000]       lr: 9.966e-04, eta: 1:41:36, time: 0.112, data_time: 0.040, memory: 1826, decode.loss_ce: 7.9811, decode.acc_seg: 50.0213, loss: 7.9811
2022-04-19 19:33:21,737 - mmseg - INFO - Iter [300/60000]       lr: 9.960e-04, eta: 1:44:22, time: 0.119, data_time: 0.047, memory: 1826, decode.loss_ce: 7.5471, decode.acc_seg: 56.7233, loss: 7.5471

The training log in this pr

2022-04-19 19:38:37,786 - mmseg - INFO - workflow: [('train', 1), ('val', 1)], max: 60000 iters
2022-04-19 19:38:37,786 - mmseg - INFO - Checkpoints will be saved to /home/SENSETIME/zhengmiao/openmmlab/mmsegmentation/work_dir/test_val by HardDiskBackend.
2022-04-19 19:38:48,518 - mmseg - INFO - Iter [50/60000]        lr: 9.993e-04, eta: 1:24:30, time: 0.085, data_time: 0.007, memory: 1826, decode.loss_ce: 8.6380, decode.acc_seg: 47.9639, loss: 8.6380, decode.loss_ce_val: 9.4497, decode.acc_seg_val: 42.7901, loss_val: 9.4497
2022-04-19 19:38:56,737 - mmseg - INFO - Iter [100/60000]       lr: 9.987e-04, eta: 1:22:30, time: 0.081, data_time: 0.005, memory: 1826, decode.loss_ce: 8.6980, decode.acc_seg: 50.3345, loss: 8.6980, decode.loss_ce_val: 7.5011, decode.acc_seg_val: 52.2022, loss_val: 7.5011
2022-04-19 19:39:05,216 - mmseg - INFO - Iter [150/60000]       lr: 9.980e-04, eta: 1:22:42, time: 0.083, data_time: 0.010, memory: 1826, decode.loss_ce: 8.0604, decode.acc_seg: 50.3131, loss: 8.0604, decode.loss_ce_val: 9.1770, decode.acc_seg_val: 45.5095, loss_val: 9.1770
2022-04-19 19:39:14,397 - mmseg - INFO - Iter [200/60000]       lr: 9.973e-04, eta: 1:25:00, time: 0.092, data_time: 0.021, memory: 1826, decode.loss_ce: 8.1406, decode.acc_seg: 50.2319, loss: 8.1406, decode.loss_ce_val: 8.5317, decode.acc_seg_val: 50.3491, loss_val: 8.5317
2022-04-19 19:39:23,989 - mmseg - INFO - Iter [250/60000]       lr: 9.966e-04, eta: 1:26:10, time: 0.092, data_time: 0.021, memory: 1826, decode.loss_ce: 7.0185, decode.acc_seg: 61.1673, loss: 7.0185, decode.loss_ce_val: 6.7485, decode.acc_seg_val: 65.5584, loss_val: 6.7485
2022-04-19 19:39:34,398 - mmseg - INFO - Iter [300/60000]       lr: 9.960e-04, eta: 1:29:11, time: 0.105, data_time: 0.033, memory: 1826, decode.loss_ce: 7.7831, decode.acc_seg: 52.2874, loss: 7.7831, decode.loss_ce_val: 8.1309, decode.acc_seg_val: 52.5872, loss_val: 8.1309

Modification

val_step is modified to add _val suffix to all validation losses, to distinguish those losses from those in the training step. This enables both training and validation loss logging.

CLAassistant · 2022-04-20T20:04:49Z

All committers have signed the CLA.

codecov · 2022-04-21T01:26:27Z

Codecov Report

Merging #1494 (533f9ec) into master (618d3c3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1494   +/-   ##
=======================================
  Coverage   90.30%   90.31%           
=======================================
  Files         140      140           
  Lines        8335     8339    +4     
  Branches     1400     1401    +1     
=======================================
+ Hits         7527     7531    +4     
  Misses        570      570           
  Partials      238      238

Flag	Coverage Δ
unittests	`90.31% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
mmseg/models/segmentors/base.py	`59.20% <100.00%> (+1.34%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 618d3c3...533f9ec. Read the comment docs.

* Add parameter safe_serialization to DiffusionPipeline.save_pretrained * Add option safe_serialization on ModelMixin.save_pretrained * Add test test_save_safe_serialization * Black * Re-trigger the CI * Fix doc-builder * Validate files are saved as safetensor in test_save_safe_serialization

jason102811 · 2023-03-29T04:32:59Z

mfernezir，您好！您在MMSeg项目中给我们提的PR非常重要，感谢您付出私人时间帮助改进开源项目，相信很多开发者会从你的PR中受益。
我们非常期待与您继续合作，OpenMMLab专门成立了贡献者组织MMSIG，为贡献者们提供开源证书、荣誉体系和专享好礼，可通过添加微信：openmmlabwx 联系我们（请备注mmsig+GitHub id），由衷希望您能加入！
Dear mfernezir,
First of all, we want to express our gratitude for your significant PR in the MMSeg project. Your contribution is highly appreciated, and we are grateful for your efforts in helping improve this open-source project during your personal time. We believe that many developers will benefit from your PR.
We are looking forward to continuing our collaboration with you. OpenMMLab has established a special contributors' organization called MMSIG, which provides contributors with open-source certificates, a recognition system, and exclusive rewards. You can contact us by adding our WeChat（if you have WeChat): openmmlabwx, or join in our discord： https://discord.gg/qH9fysxPDW. We sincerely hope you will join us!
Best regards！ @mfernezir

Fix validation loss logging

533f9ec

MeowZheng requested a review from Junjun2016 April 21, 2022 02:15

MeowZheng approved these changes Apr 21, 2022

View reviewed changes

MengzhangLI approved these changes Apr 21, 2022

View reviewed changes

Junjun2016 approved these changes Apr 21, 2022

View reviewed changes

MeowZheng mentioned this pull request Apr 22, 2022

Support to record validation loss open-mmlab/mmengine#189

Open

MeowZheng merged commit 7553fbe into open-mmlab:master Apr 22, 2022

MengzhangLI mentioned this pull request May 21, 2022

some question about loss #1604

Closed

ZhimingNJ pushed a commit to AetrexTechnology/mmsegmentation that referenced this pull request Jun 29, 2022

Fix validation loss logging (open-mmlab#1494)

9e17a0d

This was referenced Jan 24, 2023

namespaced validation logs to prevent overwriting open-mmlab/mmdetection#9662

Closed

Fix validation loss logging open-mmlab/mmdetection#9663

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix validation loss logging #1494

Fix validation loss logging #1494

mfernezir commented Apr 20, 2022 •

edited by MeowZheng

CLAassistant commented Apr 20, 2022 •

edited

codecov bot commented Apr 21, 2022 •

edited

jason102811 commented Mar 29, 2023

Fix validation loss logging #1494

Fix validation loss logging #1494

Conversation

mfernezir commented Apr 20, 2022 • edited by MeowZheng

Motivation

The training log in master

The training log in this pr

Modification

CLAassistant commented Apr 20, 2022 • edited

codecov bot commented Apr 21, 2022 • edited

Codecov Report

jason102811 commented Mar 29, 2023

mfernezir commented Apr 20, 2022 •

edited by MeowZheng

CLAassistant commented Apr 20, 2022 •

edited

codecov bot commented Apr 21, 2022 •

edited