Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Toward a stable version #35

Closed
12 of 14 tasks
sw005320 opened this issue Dec 25, 2017 · 26 comments
Closed
12 of 14 tasks

Toward a stable version #35

sw005320 opened this issue Dec 25, 2017 · 26 comments
Assignees

Comments

@sw005320
Copy link
Contributor

sw005320 commented Dec 25, 2017

I think we have fixed many issues, and we can add a version 1.0 (or 0.1) as a stable version.
Toward that we need to finish

  • VGG2L for pytorch by @ShigekiKarita
  • AN4 recipe by me
  • AMI recipe
  • swbd recipe
  • fisher_swbd recipe
  • LM integration @sw005320
  • Attention/CTC joint decoding @takaaki-hori
  • End detection
  • Documentation by @sw005320 @kan-bayashi
  • Modify L.embed to avoid the randomness @takaaki-hori
  • Add WER scoring
  • label smoothing by @takaaki-hori
  • replace _ilens_to_index to np.cumsum
  • refactor main training and recognition to be independent of pytorch and chainer backends.

If you have any action items, please add them in this issue.
Then, we can move to more research-related implementation.

@kan-bayashi
Copy link
Member

At least, we should add docstring for src/nets.

@ShigekiKarita
Copy link
Member

I agree. Type (class) and shape are essential information for everyone.

@sw005320
Copy link
Contributor Author

Frankly, I don't have so much experience, and if @kan-bayashi initiates this, I'll follow you and add/modify/enhance the document.
Also, we should make a webpage somewhere.
Do you have any idea (just using github website host service?)?

@sw005320
Copy link
Contributor Author

sw005320 commented Dec 27, 2017

The implementation of the end detection is finished #46

The performance was (really) slightly decreased, and this is quite effective by considering the fact that we don't have to tune the maxlenratio parameter.
We can make this (maxlenratio=0.0 then the end detection works) as default in future.

Manual setting (maxlenratio=0.8)

$ grep Avg exp/tr_it_a03_pt_enddetect/decode_*_it_beam20_eacc.best_p0_len0.0-0.8/result.txt
exp/tr_it_a03_pt_enddetect/decode_dt_it_beam20_eacc.best_p0_len0.0-0.8/result.txt:| Sum/Avg               | 1080   78951 | 84.2    7.3    8.5    3.7   19.4   99.1 |
exp/tr_it_a03_pt_enddetect/decode_et_it_beam20_eacc.best_p0_len0.0-0.8/result.txt:| Sum/Avg               | 1050   77586 | 84.2    7.1    8.7    3.5   19.3   98.9 |

Automatic with end detection (maxlenratio=0.0)

$ grep Avg exp/tr_it_a03_pt_enddetect/decode_*_it_beam20_eacc.best_p0_len0.0-0.0/result.txt
exp/tr_it_a03_pt_enddetect/decode_dt_it_beam20_eacc.best_p0_len0.0-0.0/result.txt:| Sum/Avg               | 1080   78951 | 84.3    7.3    8.5    3.8   19.5   99.1 |
exp/tr_it_a03_pt_enddetect/decode_et_it_beam20_eacc.best_p0_len0.0-0.0/result.txt:| Sum/Avg               | 1050   77586 | 84.2    7.1    8.7    3.5   19.3   98.9 |

@sw005320
Copy link
Contributor Author

sw005320 commented Dec 27, 2017

@ShigekiKarita I'm thinking of implementing the LM integration. This is performed by modifying an existing chainer's ptb recipe to train an LSTMLM (https://github.com/chainer/chainer/blob/master/examples/ptb/train_ptb.py). Then, integrate LSTMLM with our main E2E. Can I ask you to make pytorch version of the training part later? Once you make the LSTMLM training part, I can implement the pytorch integration part. If you agree, I'll start the chainer-based implementation. If you think we should implement LSTMLM training part more seamless way for both pytorch and chainer rather than the above separate ways, I'm happy to do so, and want to discuss with you more about it.

@ShigekiKarita
Copy link
Member

@sw005320 It sounds nice. I like the separate ways because I'll be little bit away from here on a few weeks around Jan 1st but I will keep on watching and discussing with you.

And you can find PTB example in pytorch here https://github.com/pytorch/examples/tree/master/word_language_model

@sw005320
Copy link
Contributor Author

sw005320 commented Dec 30, 2017

Which is easier for you to port to the pytorch-backend LSTMLM training?
(chainer trainer based) https://github.com/chainer/chainer/blob/master/examples/ptb/train_ptb.py
or
(manual training loop) https://github.com/chainer/chainer/blob/master/examples/ptb/train_ptb_custom_loop.py

@ShigekiKarita
Copy link
Member

I prefer the manual training loop because this trainer uses device operations inside it unlike e2e_asr_train.py (instead of the trainer, model.__call__ does)

@sw005320
Copy link
Contributor Author

Thanks.
This is my expectation.
I'll work on it.

@sw005320
Copy link
Contributor Author

sw005320 commented Jan 4, 2018

@ShigekiKarita, @takaaki-hori and I discussed the possibility of implementing attention/CTC joint decoding, but it seems that warp_ctc does not provide enough interface to compute CTC scores during decoding efficiently. @takaaki-hori will explain it a bit more detail, but we may think to implement re-scoring rather than joint decoding.

@takaaki-hori
Copy link

@sw005320 , @ShigekiKarita , I added attention/CTC joint decoding and tested with Voxforge and WSJ.
I got some CER reduction (14.7->12.5 in Voxforge and 5.9->5.5 in WSJ), where I used decoding options "--minlenratio 0.0 --maxlenratio 0.0 --ctc-weight 0.3"
Can you take a look at the code and try it with other tasks? To test it, you first need to move to "joint-decoding" branch, and add "--ctc-weight" option in run.sh like egs/wsj/asr1/run.sh.

@sw005320
Copy link
Contributor Author

Great Hori-san. I'll review it. BTW, I'm also about to finish the LM integration and prepare to commit it (CER 5.9 -> 5.3, WER 18.0 -> 14.7 in the WSJ task).

@sw005320
Copy link
Contributor Author

Guys, by combining LSTMLM and joint attention/CTC decoding, we finally get CER 5.3 -> 3.8, WER 14.7 -> 9.3 in the WSJ task!!! The nice thing is that we don't have to set min/maxlength and penalty (all set to 0.0), while we might need to tune the CTC and LM weights (0.3 and 1.0, respectively, see #76).
@kan-bayashi, can you play with LSTMLM and joint decoding with the TEDLIUM recipe? You can train LSTMLM by using text data by referring tools/kaldi/egs/tedlium/s5_r2/local/ted_train_lm.sh and simply using

gunzip -c db/TEDLIUM_release2/LM/*.en.gz | sed 's/ <\/s>//g' | local/join_suffix.py | gzip -c  > ${dir}/data/text/train.txt.gz

@kan-bayashi
Copy link
Member

@sw005320 Great result! I will do it.

@sw005320 sw005320 mentioned this issue Jan 29, 2018
@sw005320
Copy link
Contributor Author

I just added the fisher_swbd recipe. The results will be added later. Also, I finished Librespeech experiments with pytorch, and we got 7.7% WER for clean conditions. This is not bad. I'll work on making a language model training script for pytorch. Then, we'll have some more improvements in the Librespeech and fisher_swbd recipes, like the WSJ case.

@kan-bayashi
Copy link
Member

The results of tedlium with ctc joint decoding and lm rescoring are as follows:

exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_dev_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.txt:|        Sum/Avg                          |         507                 95429        |        91.8                  4.2                  4.0                  2.7                 10.8                 89.3        |
exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_test_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.txt:|        Sum/Avg                       |        1155                145066         |        92.2                  3.7                   4.1                  2.4                  10.1                 85.3         |
exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_dev_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.wrd.txt:|        Sum/Avg                           |         507                17783         |        83.2                 13.7                   3.1                  3.0                  19.8                 89.3         |
exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_test_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.wrd.txt:|        Sum/Avg                       |        1155                 27500         |        84.0                   12.3                   3.7                   2.6                  18.6                  85.3         |

for dev set, CER 12.6 -> 10.8, WER 24.8 -> 19.8
for test set, CER 11.9 -> 10.1, WER 23.4 -> 18.6

@sw005320
Copy link
Contributor Author

sw005320 commented Feb 3, 2018

It seems that #85 solves randomness issues in the pytorch backend.

@kan-bayashi
Copy link
Member

kan-bayashi commented Feb 10, 2018

Updated CSJ recipe results (#91).

# Deep VGGBLSTMP (elayers=6) with chainer backend
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        SPKR            |        # Snt                 # Wrd        |        Corr                  Sub                   Del                  Ins                  Err                 S.Err        |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        Sum/Avg         |        1272                  43897        |        91.4                  6.4                   2.3                  1.6                 10.2                  67.6        |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        SPKR            |        # Snt                 # Wrd        |        Corr                  Sub                   Del                  Ins                  Err                 S.Err        |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        Sum/Avg         |        1292                  43623        |        93.7                  5.1                   1.3                  1.2                  7.5                  65.2        |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        SPKR            |        # Snt                 # Wrd        |        Corr                  Sub                   Del                  Ins                  Err                 S.Err        |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        Sum/Avg         |        1385                  28225        |        93.6                  5.0                   1.4                  1.6                  8.0                  47.9        |

# Deep VGGBLSTMP (elayers=6) with chainer backend + CTC joint decoding
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         Sum/Avg          |         1272                   43897          |         91.6                    6.0                    2.3                    1.4                     9.7                   66.5         |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         Sum/Avg          |         1292                   43623          |         94.1                    4.6                    1.3                    1.0                     6.9                   64.5         |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         Sum/Avg          |         1385                   28225          |         93.9                    4.7                    1.4                    1.4                     7.5                   47.7         |

+# Deep VGGBLSTMP (elayers=6) with chainer backend + CTC joint decoding + LM rescoreing
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         Sum/Avg          |         1272                   43897          |         92.5                    5.3                    2.2                    1.3                     8.8                   63.4         |
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         Sum/Avg          |         1292                   43623          |         94.7                    4.1                    1.2                    0.9                     6.2                   60.7         |
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         Sum/Avg          |         1385                   28225          |         94.3                    4.2                    1.5                    1.2                     7.0                   45.2         |

task: vggblmsp -> +ctc joint -> ++ lm rescoring
eval1 : 10.2 -> 9.7 -> 8.8
eval2: 7.5 -> 6.9 -> 6.2
eval3: 8.0 -> 7.5 -> 7.0

@sw005320
Copy link
Contributor Author

I think we almost finish all our targets except for VGG (@ShigekiKarita is this still difficult?). After refactoring #102, we can move on to a next development plan.

@ShigekiKarita
Copy link
Member

Yes, it is still difficult. I hope that someone else also takes a look at the connection part between VGG and BLSTMP.

@sw005320
Copy link
Contributor Author

@ShigekiKarita, did you specify the that problem is the connection part? You mean VGG and BLSTMP themselves are correct?

@ShigekiKarita
Copy link
Member

Yes. For BLSTMP, experimental results are equal as seen in #9 . For VGG, chainer/pytorch impls show equal activations and gradients when all the parameters initialized with contantats w[:]=1 and b[:]=0 (without random values) #47. https://github.com/ShigekiKarita/espnet/blob/2a44f292d44c9e23c6ac8e24ea7eb9e2c64b0cb8/test/test_vgg.py

Hence, the last part is the connetion between VGG and BLSTMP.

@kan-bayashi
Copy link
Member

To use current version, we have to update warp-ctc (#105).
Please try following commands to update.

cd tools
rm -r warp-ctc
make warp-ctc

@geniki
Copy link

geniki commented Feb 21, 2018

Is the pytorch version of the LM integration a priority for this stable version?

Many thanks for the great repo.

@sw005320
Copy link
Contributor Author

@kan-bayashi is considering it. It is not super high priority for now, but anyway we're working on it.

@sw005320
Copy link
Contributor Author

@geniki We have finished the pytorch LM integration #114

We have finished the most of action items. I'll close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants