You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not sure if it is specific to the Conformer-Transducer, but I noticed that the current develop commit fails to converge, so the WER gets stuck at 100% and the loss gets stuck around ~106 past a few epochs.
I have started bisecting the issue. 7c63cf2c436805ad368292666a7bee1debd9fa46 is good, 0a91bc09fbafd32e25bd299534178da9a0d23a6d is bad. This is using the same environment in both cases, with the default config.
I've seen a few changes to RelPos and InputNorm going through I believe, could this be related?
Maybe, it was fine in inference though. The InputNorm change doesn't even affect the mode used by the model, and I couldn't find an issue in the RelPos change... We'll see what the bisecting says anyway, it should be done by today anyway.
Describe the bug
I am not sure if it is specific to the Conformer-Transducer, but I noticed that the current
develop
commit fails to converge, so the WER gets stuck at 100% and the loss gets stuck around ~106 past a few epochs.I have started bisecting the issue.
7c63cf2c436805ad368292666a7bee1debd9fa46
is good,0a91bc09fbafd32e25bd299534178da9a0d23a6d
is bad. This is using the same environment in both cases, with the default config.Expected behaviour
The model should converge.
To Reproduce
python3 train.py hparams/conformer_transducer.yaml --data_folder /corpus/LibriSpeech/ --precision=fp16
Environment Details
No response
Relevant Log Output
Bad training:
Good training:
Additional Context
No response
The text was updated successfully, but these errors were encountered: