Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix validation loss logging #1494

Merged
merged 1 commit into from Apr 22, 2022

Conversation

mfernezir
Copy link
Contributor

@mfernezir mfernezir commented Apr 20, 2022

Motivation

The validation loss is currently not logged in the setting of both train and val workflow modes.

The problem is described in this issue:
#1396

Iteration runner calls the model's val_step here
https://github.com/open-mmlab/mmcv/blob/44edcdd91f8940a58c9680c609febc757a50a040/mmcv/runner/iter_based_runner.py#L77

The model's val_step returns the loss under the same key as in the training mode, which gets lost in all output logs later on.

The training log in master

2022-04-19 19:32:16,444 - mmseg - INFO - workflow: [('train', 1), ('val', 1)], max: 60000 iters
2022-04-19 19:32:16,444 - mmseg - INFO - Checkpoints will be saved to /home/SENSETIME/zhengmiao/openmmlab/mmsegmentation/work_dir/test_val by HardDiskBackend.
2022-04-19 19:32:28,897 - mmseg - INFO - Iter [50/60000]        lr: 9.993e-04, eta: 1:42:42, time: 0.103, data_time: 0.031, memory: 1826, decode.loss_ce: 8.7741, decode.acc_seg: 46.9911, loss: 8.7741
2022-04-19 19:32:38,829 - mmseg - INFO - Iter [100/60000]       lr: 9.987e-04, eta: 1:39:56, time: 0.097, data_time: 0.025, memory: 1826, decode.loss_ce: 7.8357, decode.acc_seg: 55.5759, loss: 7.8357
2022-04-19 19:32:48,881 - mmseg - INFO - Iter [150/60000]       lr: 9.980e-04, eta: 1:40:39, time: 0.103, data_time: 0.031, memory: 1826, decode.loss_ce: 6.8078, decode.acc_seg: 55.5736, loss: 6.8078
2022-04-19 19:32:58,964 - mmseg - INFO - Iter [200/60000]       lr: 9.973e-04, eta: 1:39:12, time: 0.095, data_time: 0.024, memory: 1826, decode.loss_ce: 7.4038, decode.acc_seg: 55.5497, loss: 7.4038
2022-04-19 19:33:10,215 - mmseg - INFO - Iter [250/60000]       lr: 9.966e-04, eta: 1:41:36, time: 0.112, data_time: 0.040, memory: 1826, decode.loss_ce: 7.9811, decode.acc_seg: 50.0213, loss: 7.9811
2022-04-19 19:33:21,737 - mmseg - INFO - Iter [300/60000]       lr: 9.960e-04, eta: 1:44:22, time: 0.119, data_time: 0.047, memory: 1826, decode.loss_ce: 7.5471, decode.acc_seg: 56.7233, loss: 7.5471

The training log in this pr

2022-04-19 19:38:37,786 - mmseg - INFO - workflow: [('train', 1), ('val', 1)], max: 60000 iters
2022-04-19 19:38:37,786 - mmseg - INFO - Checkpoints will be saved to /home/SENSETIME/zhengmiao/openmmlab/mmsegmentation/work_dir/test_val by HardDiskBackend.
2022-04-19 19:38:48,518 - mmseg - INFO - Iter [50/60000]        lr: 9.993e-04, eta: 1:24:30, time: 0.085, data_time: 0.007, memory: 1826, decode.loss_ce: 8.6380, decode.acc_seg: 47.9639, loss: 8.6380, decode.loss_ce_val: 9.4497, decode.acc_seg_val: 42.7901, loss_val: 9.4497
2022-04-19 19:38:56,737 - mmseg - INFO - Iter [100/60000]       lr: 9.987e-04, eta: 1:22:30, time: 0.081, data_time: 0.005, memory: 1826, decode.loss_ce: 8.6980, decode.acc_seg: 50.3345, loss: 8.6980, decode.loss_ce_val: 7.5011, decode.acc_seg_val: 52.2022, loss_val: 7.5011
2022-04-19 19:39:05,216 - mmseg - INFO - Iter [150/60000]       lr: 9.980e-04, eta: 1:22:42, time: 0.083, data_time: 0.010, memory: 1826, decode.loss_ce: 8.0604, decode.acc_seg: 50.3131, loss: 8.0604, decode.loss_ce_val: 9.1770, decode.acc_seg_val: 45.5095, loss_val: 9.1770
2022-04-19 19:39:14,397 - mmseg - INFO - Iter [200/60000]       lr: 9.973e-04, eta: 1:25:00, time: 0.092, data_time: 0.021, memory: 1826, decode.loss_ce: 8.1406, decode.acc_seg: 50.2319, loss: 8.1406, decode.loss_ce_val: 8.5317, decode.acc_seg_val: 50.3491, loss_val: 8.5317
2022-04-19 19:39:23,989 - mmseg - INFO - Iter [250/60000]       lr: 9.966e-04, eta: 1:26:10, time: 0.092, data_time: 0.021, memory: 1826, decode.loss_ce: 7.0185, decode.acc_seg: 61.1673, loss: 7.0185, decode.loss_ce_val: 6.7485, decode.acc_seg_val: 65.5584, loss_val: 6.7485
2022-04-19 19:39:34,398 - mmseg - INFO - Iter [300/60000]       lr: 9.960e-04, eta: 1:29:11, time: 0.105, data_time: 0.033, memory: 1826, decode.loss_ce: 7.7831, decode.acc_seg: 52.2874, loss: 7.7831, decode.loss_ce_val: 8.1309, decode.acc_seg_val: 52.5872, loss_val: 8.1309

Modification

val_step is modified to add _val suffix to all validation losses, to distinguish those losses from those in the training step. This enables both training and validation loss logging.

@CLAassistant
Copy link

CLAassistant commented Apr 20, 2022

CLA assistant check
All committers have signed the CLA.

@codecov
Copy link

codecov bot commented Apr 21, 2022

Codecov Report

Merging #1494 (533f9ec) into master (618d3c3) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master    #1494   +/-   ##
=======================================
  Coverage   90.30%   90.31%           
=======================================
  Files         140      140           
  Lines        8335     8339    +4     
  Branches     1400     1401    +1     
=======================================
+ Hits         7527     7531    +4     
  Misses        570      570           
  Partials      238      238           
Flag Coverage Δ
unittests 90.31% <100.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
mmseg/models/segmentors/base.py 59.20% <100.00%> (+1.34%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 618d3c3...533f9ec. Read the comment docs.

@MeowZheng MeowZheng merged commit 7553fbe into open-mmlab:master Apr 22, 2022
ZhimingNJ pushed a commit to AetrexTechnology/mmsegmentation that referenced this pull request Jun 29, 2022
aravind-h-v pushed a commit to aravind-h-v/mmsegmentation that referenced this pull request Mar 27, 2023
* Add parameter safe_serialization to DiffusionPipeline.save_pretrained

* Add option safe_serialization on ModelMixin.save_pretrained

* Add test test_save_safe_serialization

* Black

* Re-trigger the CI

* Fix doc-builder

* Validate files are saved as safetensor in test_save_safe_serialization
@jason102811
Copy link

mfernezir,您好!您在MMSeg项目中给我们提的PR非常重要,感谢您付出私人时间帮助改进开源项目,相信很多开发者会从你的PR中受益。
我们非常期待与您继续合作,OpenMMLab专门成立了贡献者组织MMSIG,为贡献者们提供开源证书、荣誉体系和专享好礼,可通过添加微信:openmmlabwx 联系我们(请备注mmsig+GitHub id),由衷希望您能加入!
Dear mfernezir,
First of all, we want to express our gratitude for your significant PR in the MMSeg project. Your contribution is highly appreciated, and we are grateful for your efforts in helping improve this open-source project during your personal time. We believe that many developers will benefit from your PR.
We are looking forward to continuing our collaboration with you. OpenMMLab has established a special contributors' organization called MMSIG, which provides contributors with open-source certificates, a recognition system, and exclusive rewards. You can contact us by adding our WeChat(if you have WeChat): openmmlabwx, or join in our discord: https://discord.gg/qH9fysxPDW. We sincerely hope you will join us!
Best regards! @mfernezir

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants