You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current code could lead to WER report inconsistent during training and inference if we use fuse_loss_wer in the Transducer model training, i.e., model.joint.fuse_loss_wer=True and model.joint. fused_batch_size > 1. In this setting, only the last sub-mini-batch WER is accumulated during validation stage.
The text was updated successfully, but these errors were encountered:
Thank you very much for raising this! We have fixed it in this PR - #8587
It occurred due to a large refactor and unification of metrics in ASR to make it simpler to extend in the long run.
The patch will be there in the next NeMo release, and we have added a release note in the 1.23 release page https://github.com/NVIDIA/NeMo/releases/tag/v1.23.0 so that users are aware and can utilize correct metrics during evaluation by using the speech to text eval script (or disabling fused batch explicitly)
Describe the bug
As shown in
(https://github.com/NVIDIA/NeMo/blob/main/nemo/collections/asr/metrics/wer.py#L349-L350), the new scores and words will be assigned to the object and the pervious scores and words are dropped.
or
The current code could lead to WER report inconsistent during training and inference if we use fuse_loss_wer in the Transducer model training, i.e., model.joint.fuse_loss_wer=True and model.joint. fused_batch_size > 1. In this setting, only the last sub-mini-batch WER is accumulated during validation stage.
The text was updated successfully, but these errors were encountered: