Modify the inference interface to adapt to standalone and distributed… #96

wang-hua-2019 · 2023-03-23T12:00:43Z

… inference

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

You have read the Contributing Guidelines on pull requests
Your code builds clean without any errors or warnings
You are using approved terminology
You have added unit tests

Motivation

(Write your motivation for proposed changes here.)

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

SamitHuang · 2023-03-23T12:14:18Z

tools/train.py

-                is_train=False)
+            cfg.eval.dataset,
+            cfg.eval.loader,
+            num_shards=device_num,


若改成distributed eval，需要对多卡的评估结果做聚合，需注意的是 f1-score, acc等评估指标仅做多卡平均并不能得到正确的结果，需要对预测结果或评估指标做分布式的更新。

已修改聚合方法

SamitHuang · 2023-03-23T12:43:41Z

mindocr/utils/callbacks.py

+        epoch_time = time.time() - self.epoch_start_time
+        per_step_time = epoch_time * 1000 / cb_params.batch_num
+        fps = 1000 * self.batch_size / per_step_time
+        msg = 'epoch: [%s/%s] loss: %.6f, epoch time: %.3f s, per step time: %.3f ms, fps: %.2f' % (


这个loss只反映最后一个batch的平均loss，不是整个training data的平均loss，建议复用EvalSaveCallback中的training loss的计算结果，将epoch training loss的在EvalSaveCall的on_train_epoch_end中打印。

这个losscallback我这边不涉及修改，后面是不是统一修改一下

…d inference

HaoyangLee assigned wang-hua-2019 Mar 23, 2023

HaoyangLee requested a review from SamitHuang March 23, 2023 12:08

SamitHuang reviewed Mar 23, 2023

View reviewed changes

SamitHuang mentioned this pull request Mar 24, 2023

add resume train and loss info and save checkpoint #85

Closed

4 tasks

wang-hua-2019 force-pushed the main branch 2 times, most recently from 88b8c28 to f360339 Compare March 24, 2023 11:11

Modify the inference interface to adapt to standalone and distribute…

52ac8f7

…d inference

wang-hua-2019 force-pushed the main branch from f360339 to 52ac8f7 Compare March 24, 2023 11:26

kingcong merged commit dc1a583 into mindspore-lab:main Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modify the inference interface to adapt to standalone and distributed… #96

Modify the inference interface to adapt to standalone and distributed… #96

wang-hua-2019 commented Mar 23, 2023 •

edited

Loading

SamitHuang Mar 23, 2023

wang-hua-2019 Mar 24, 2023

SamitHuang Mar 23, 2023

wang-hua-2019 Mar 24, 2023

Modify the inference interface to adapt to standalone and distributed… #96

Modify the inference interface to adapt to standalone and distributed… #96

Conversation

wang-hua-2019 commented Mar 23, 2023 • edited Loading

Motivation

Test Plan

Related Issues and PRs

SamitHuang Mar 23, 2023

Choose a reason for hiding this comment

wang-hua-2019 Mar 24, 2023

Choose a reason for hiding this comment

SamitHuang Mar 23, 2023

Choose a reason for hiding this comment

wang-hua-2019 Mar 24, 2023

Choose a reason for hiding this comment

wang-hua-2019 commented Mar 23, 2023 •

edited

Loading