-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Toward a stable version #35
Comments
At least, we should add docstring for |
I agree. Type (class) and shape are essential information for everyone. |
Frankly, I don't have so much experience, and if @kan-bayashi initiates this, I'll follow you and add/modify/enhance the document. |
The implementation of the end detection is finished #46 The performance was (really) slightly decreased, and this is quite effective by considering the fact that we don't have to tune the maxlenratio parameter. Manual setting (maxlenratio=0.8)
Automatic with end detection (maxlenratio=0.0)
|
@ShigekiKarita I'm thinking of implementing the LM integration. This is performed by modifying an existing chainer's ptb recipe to train an LSTMLM (https://github.com/chainer/chainer/blob/master/examples/ptb/train_ptb.py). Then, integrate LSTMLM with our main E2E. Can I ask you to make pytorch version of the training part later? Once you make the LSTMLM training part, I can implement the pytorch integration part. If you agree, I'll start the chainer-based implementation. If you think we should implement LSTMLM training part more seamless way for both pytorch and chainer rather than the above separate ways, I'm happy to do so, and want to discuss with you more about it. |
@sw005320 It sounds nice. I like the separate ways because I'll be little bit away from here on a few weeks around Jan 1st but I will keep on watching and discussing with you. And you can find PTB example in pytorch here https://github.com/pytorch/examples/tree/master/word_language_model |
Which is easier for you to port to the pytorch-backend LSTMLM training? |
I prefer the manual training loop because this trainer uses device operations inside it unlike e2e_asr_train.py (instead of the trainer, |
Thanks. |
@ShigekiKarita, @takaaki-hori and I discussed the possibility of implementing attention/CTC joint decoding, but it seems that warp_ctc does not provide enough interface to compute CTC scores during decoding efficiently. @takaaki-hori will explain it a bit more detail, but we may think to implement re-scoring rather than joint decoding. |
@sw005320 , @ShigekiKarita , I added attention/CTC joint decoding and tested with Voxforge and WSJ. |
Great Hori-san. I'll review it. BTW, I'm also about to finish the LM integration and prepare to commit it (CER 5.9 -> 5.3, WER 18.0 -> 14.7 in the WSJ task). |
Guys, by combining LSTMLM and joint attention/CTC decoding, we finally get CER 5.3 -> 3.8, WER 14.7 -> 9.3 in the WSJ task!!! The nice thing is that we don't have to set min/maxlength and penalty (all set to 0.0), while we might need to tune the CTC and LM weights (0.3 and 1.0, respectively, see #76).
|
@sw005320 Great result! I will do it. |
I just added the fisher_swbd recipe. The results will be added later. Also, I finished Librespeech experiments with pytorch, and we got 7.7% WER for clean conditions. This is not bad. I'll work on making a language model training script for pytorch. Then, we'll have some more improvements in the Librespeech and fisher_swbd recipes, like the WSJ case. |
The results of tedlium with ctc joint decoding and lm rescoring are as follows:
for dev set, CER 12.6 -> 10.8, WER 24.8 -> 19.8 |
It seems that #85 solves randomness issues in the pytorch backend. |
Updated CSJ recipe results (#91).
task: vggblmsp -> +ctc joint -> ++ lm rescoring |
I think we almost finish all our targets except for VGG (@ShigekiKarita is this still difficult?). After refactoring #102, we can move on to a next development plan. |
Yes, it is still difficult. I hope that someone else also takes a look at the connection part between VGG and BLSTMP. |
@ShigekiKarita, did you specify the that problem is the connection part? You mean VGG and BLSTMP themselves are correct? |
Yes. For BLSTMP, experimental results are equal as seen in #9 . For VGG, chainer/pytorch impls show equal activations and gradients when all the parameters initialized with contantats w[:]=1 and b[:]=0 (without random values) #47. https://github.com/ShigekiKarita/espnet/blob/2a44f292d44c9e23c6ac8e24ea7eb9e2c64b0cb8/test/test_vgg.py Hence, the last part is the connetion between VGG and BLSTMP. |
To use current version, we have to update warp-ctc (#105). cd tools
rm -r warp-ctc
make warp-ctc |
Is the pytorch version of the LM integration a priority for this stable version? Many thanks for the great repo. |
@kan-bayashi is considering it. It is not super high priority for now, but anyway we're working on it. |
I think we have fixed many issues, and we can add a version 1.0 (or 0.1) as a stable version.
Toward that we need to finish
_ilens_to_index
tonp.cumsum
If you have any action items, please add them in this issue.
Then, we can move to more research-related implementation.
The text was updated successfully, but these errors were encountered: