-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
local attention with unidirectional lstm not converging #41
Comments
It seems a very related question was also asked in #42.
Let this smallest network with highest time reduction, high batch size, less/no SpecAugment etc train like that for as long as needed, before increasing anything. This small network should first get some half-way good score. Only when you see that, at that point the pretraining can increase the depth and other things, but only carefully and slowly (such that the network does not totally break again). |
Thanks @albertz for suggesting these.
All the models are with global attention |
Seems like decreasing the learning rate helps. What other combination do you suggest to try next? |
All the things which I wrote already (here), but basically everything else as well. |
Lowering the learning rate |
Hi @albertz ,
I tried with different learning rates but model seems to be not converging after changing bidirectional lstm to unidirectional lstm in case of local attention setup.
Can you suggest something?
What else should I try.
The text was updated successfully, but these errors were encountered: