-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timit fbank result is ok? and how to add some features such as delta-delta? #59
Comments
Hi, my guess is that you will need to reduce the number of parameters in the model - l=5 and c=320 are good settings for Switchboard and TEDLIUM, with hundreds of hours of training data, but not for TIMIT, with just a few. The difference in the ark files shows this (somewhat). The TIMIT speakers are much shorter than the TEDLIUM speakers, and therefore the sum and sum-of-squares of the data in the speaker is much smaller (which is what I think you’re showing). Finally, during decoding, you can see that the network likes to output “outmoded” and “journalese” for some reason. Presumably you are still using the TEDLIUM language model? Do you know someone who is familiar with the Kaldi TIMIT recipe? I think you need to adapt the Eesen recipe a bit more for it to give good results, the Kaldi TIMIT recipe would probably be a good starting point to see what is being done. Florian
Florian Metze http://www.cs.cmu.edu/directory/florian-metze |
Hi @fmetze , thanks for your suggestion, I will try it. |
Word-based language model built on TIMIT is relatively weak. |
@yajiemiao do you means test the phones eesen recognized, not the word ? |
yep |
@yajiemiao ok thanks very much. |
@fmetze |
We have not run such experiments. I think there is some work on how to build uni-directional LSTMs that work for speech (mainly stacking future frames rather than relying on the RNN to learn them), or decompose the sentence BiLSTM into a series of shorter BiLSTMs that one can evaluate quickly,but we have not implemented any of this in Eesen. Would be a great feature, though ;-)
Florian Metze http://www.cs.cmu.edu/directory/florian-metze |
In general, CTC highly depends on BiLSTM for reasonable performance. If you refer to http://www.cs.cmu.edu/~ymiao/pub/icassp2016_ctc.pdf, on Switchboard, Uni-directional models perform >15% worse than Bi-directional models, with the same number of model parameters. |
@yajiemiao @zhangjiulong can you please share the example tested with TIMIT dataset? |
I just convert timit format to stm format and runns using tedlium scripts. |
@Aasimrafique, were you able to convert the TIMIT format to STM format as instructed by @zhangjiulong? If so could you please share how you did it exactly. Thanks. |
Unfortunately, as mentioned in Wikipedia: |
@riebling I forgot to add in scripts in the end, I do have access to the TIMIT dataset and what I meant to ask was if the TIMIT dataset test scripts could be shared. |
Oops, my misunderstanding. My best guess is that at least here at CMU, there is no TIMIT Eesen experiment to share. The only person that seems to have tried this (aside from Yajie, who is no longer with us) is @zhangjiulong Florian suggests people try adapting Kaldi TIMIT experiment. This does not imply he has done so
or therefore has any scripts to share. |
Hi I tested timit data using eesen, but the result is not good as follows:
training process
testing process
I checked the ark file of timit and tedlium data and I found some difference, but I do not know how the difference come.
The tedlium's ark file like this :
And the timit's is like this:
but the scripts is the same as the tedliums', (I just modified exist code),the diff runs like this:
and the scripts is like this:
Is there some thing wrong?
and what is out-moded and journalese mean?
The text was updated successfully, but these errors were encountered: