Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRNN自定义数据集存在与数据绑定的损失上溢 loss:65504 #610

Open
panxua opened this issue Nov 15, 2023 · 1 comment
Open
Assignees

Comments

@panxua
Copy link

panxua commented Nov 15, 2023

现象:
存在和数据绑定的损失函数上溢
截图:
损失上溢1115
现状: 已解决
原因:

  1. 对于“标注长度 > max_text_len”,数据处理会置空而没有提示
  2. 对于“标注长度 + 重复标识符 > pred_seq_len”,会导致CTCLoss上溢,无提示。

详细说明:地址
解决方法:
统计标注最大长度,配置seq_max_len;
统计标注+重复标识符最大长度,配置pred_seq_len
并分别修改训练、评估、预测中的img_shape中的宽度,满足4 x pred_seq_len
建议:
在raining_recognition_custom_dataset中提示用户,
https://github.com/mindspore-lab/mindocr/blob/main/docs/en/tutorials/training_recognition_custom_dataset.md
https://github.com/mindspore-lab/mindocr/blob/main/docs/cn/tutorials/training_recognition_custom_dataset.md

@zhtmike
Copy link
Collaborator

zhtmike commented Nov 15, 2023

Hello, we provide two additional options to solve the problem you mentioned. For reason 1, you can add filter_max_len: True in your configure file to filter these problematic cases; And you can add filter_max_len: True and extra_count_if_repeat: True to filter these cases raised from reason 2. For detail, you can check configs/rec/svtr/svtr_tiny.yaml. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants