-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Token Accuracy drops drastically for Tedlium database training #73
Comments
Your training seems to break down as you are getting super large objective values |
It might help to go back to the original values, or even set num_sequence=10 and lower frame_num_limit starting from the iteration in which training breaks down. Can you try this and let us know what you find?
Florian Metze http://www.cs.cmu.edu/directory/florian-metze |
I set the values back to the original and the outcome was the same, the Deepak Vinayak Kadetotad On Fri, Jul 22, 2016 at 3:00 PM, Florian Metze notifications@github.com
|
My suggestion is to decrease frame_num_limit to a smaller number. This reduces deltas of gradients in each step and may improve training stability. I normally set it to 25000 |
These steps helped me alot and I'm now not encountering any nans or huge values (-1e+30) for Obj(log[Pzx]) in training anymore: (1) Check for utterances, where were are more targets than frames to align. Especially if your setup is similar to the v2 scripts with frame subsampling. I added this check in src/netbin/train-ctc-parallel.cc, after the check for too many frames in line 152-155:
This will also output problematic sequences and give you an idea if your setup is ok. E.g. for my particular setup (German), the subsampling of 3 was too much and made a significant amount of sequences too short (>50%), 2 was much better. (2) In the current Eesen version, only the LSTM-layer class does gradient clipping. I found it helpful to also add gradient clipping to the AffineLayer class, which is usually used at the end of the network. (3) Try out my eesen branch with adaptive learning (Adagrad, RMSProp). I'm achieving good results with RMSProp. It effectively scales the learn rate individually for each parameter of the network in the updates, which results in better/faster convergence and more stable training. I've also added code for (2) to that branch. |
Benjamin, thanks for all of this - I wanted to look at this for a long time, but am swamped with other things right now. Improving stability and looking at your other improvements is on the list of things to do. Regarding the above: I had added that same check at the script level (train_ctc_parallel_x3a.sh), just before doing the sub-sampling, and it had also worked. I am simply dropping utterances that are too short at that point. I think this may not have made it into the release version of the scripts either, but that is also something that I want to work on before Christmas. By default, I'd rather treat this kind of stuff on the script level than in C code, because it is more transparent for the user. Will see what works best. Anyway, thanks for all your contributions! |
Hi,
In the 14th iteration for training the token accuracy drops drastically
VLOG1 After 62000 sequences (33.1789Hr): Obj(log[Pzx]) = -8.45854 TokenAcc = 97.4986%
VLOG1 After 63000 sequences (34.1167Hr): Obj(log[Pzx]) = -8.56525 TokenAcc = 97.4942%
VLOG1 After 64000 sequences (35.0689Hr): Obj(log[Pzx]) = -8.71485 TokenAcc = 97.4921%
VLOG1 After 65000 sequences (36.0363Hr): Obj(log[Pzx]) = -8.6e+29 TokenAcc = 19.0128%
VLOG1 After 66000 sequences (37.0192Hr): Obj(log[Pzx]) = -1e+30 TokenAcc = 4.44885%
The only change I made was in train_ctc_parallel_x3.sh where I increased
num_sequence=20 valid_num_sequence=40 frame_num_limit=4000000
Is this leading to the error?
The text was updated successfully, but these errors were encountered: