Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix error during loss reduce in callback #469

Merged
merged 3 commits into from
Jul 1, 2023

Conversation

zhtmike
Copy link
Collaborator

@zhtmike zhtmike commented Jun 30, 2023

Recent change on callback.py cause RuntimeError: Couldn't get correct hccl hcom with group hccl_world_group in MindSpore 1.10 in OpenI. Seems ops.ReduceSum must be compiled first. This is a fix.

Meanwhile this change give a better loss value report once the loss output is fp16.

Thank you for your contribution to the MindOCR repo.
Before submitting this PR, please make sure:

Motivation

(Write your motivation for proposed changes here.)

Test Plan

(How should this PR be tested? Do you require special setup to run the test or repro the fixed bug?)

Related Issues and PRs

(Is this PR part of a group of changes? Link the other relevant PRs and Issues here. Use https://help.github.com/en/articles/closing-issues-using-keywords for help on GitHub syntax)

mindocr/utils/misc.py Outdated Show resolved Hide resolved
Co-authored-by: Rustam Khadipash <16683750+hadipash@users.noreply.github.com>
@SamitHuang SamitHuang merged commit 0ef002f into mindspore-lab:main Jul 1, 2023
@zhtmike zhtmike deleted the reduce_fix branch July 3, 2023 06:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants