Toward a stable version #35

sw005320 · 2017-12-25T07:03:25Z

kan-bayashi · 2017-12-26T05:34:11Z

At least, we should add docstring for src/nets.

ShigekiKarita · 2017-12-26T05:37:43Z

I agree. Type (class) and shape are essential information for everyone.

sw005320 · 2017-12-27T20:48:51Z

Frankly, I don't have so much experience, and if @kan-bayashi initiates this, I'll follow you and add/modify/enhance the document.
Also, we should make a webpage somewhere.
Do you have any idea (just using github website host service?)?

sw005320 · 2017-12-27T21:52:54Z

The implementation of the end detection is finished #46

The performance was (really) slightly decreased, and this is quite effective by considering the fact that we don't have to tune the maxlenratio parameter.
We can make this (maxlenratio=0.0 then the end detection works) as default in future.

Manual setting (maxlenratio=0.8)

$ grep Avg exp/tr_it_a03_pt_enddetect/decode_*_it_beam20_eacc.best_p0_len0.0-0.8/result.txt
exp/tr_it_a03_pt_enddetect/decode_dt_it_beam20_eacc.best_p0_len0.0-0.8/result.txt:| Sum/Avg               | 1080   78951 | 84.2    7.3    8.5    3.7   19.4   99.1 |
exp/tr_it_a03_pt_enddetect/decode_et_it_beam20_eacc.best_p0_len0.0-0.8/result.txt:| Sum/Avg               | 1050   77586 | 84.2    7.1    8.7    3.5   19.3   98.9 |

Automatic with end detection (maxlenratio=0.0)

$ grep Avg exp/tr_it_a03_pt_enddetect/decode_*_it_beam20_eacc.best_p0_len0.0-0.0/result.txt
exp/tr_it_a03_pt_enddetect/decode_dt_it_beam20_eacc.best_p0_len0.0-0.0/result.txt:| Sum/Avg               | 1080   78951 | 84.3    7.3    8.5    3.8   19.5   99.1 |
exp/tr_it_a03_pt_enddetect/decode_et_it_beam20_eacc.best_p0_len0.0-0.0/result.txt:| Sum/Avg               | 1050   77586 | 84.2    7.1    8.7    3.5   19.3   98.9 |

sw005320 · 2017-12-27T22:51:07Z

@ShigekiKarita I'm thinking of implementing the LM integration. This is performed by modifying an existing chainer's ptb recipe to train an LSTMLM (https://github.com/chainer/chainer/blob/master/examples/ptb/train_ptb.py). Then, integrate LSTMLM with our main E2E. Can I ask you to make pytorch version of the training part later? Once you make the LSTMLM training part, I can implement the pytorch integration part. If you agree, I'll start the chainer-based implementation. If you think we should implement LSTMLM training part more seamless way for both pytorch and chainer rather than the above separate ways, I'm happy to do so, and want to discuss with you more about it.

ShigekiKarita · 2017-12-28T02:10:46Z

@sw005320 It sounds nice. I like the separate ways because I'll be little bit away from here on a few weeks around Jan 1st but I will keep on watching and discussing with you.

And you can find PTB example in pytorch here https://github.com/pytorch/examples/tree/master/word_language_model

sw005320 · 2017-12-30T09:00:35Z

Which is easier for you to port to the pytorch-backend LSTMLM training?
(chainer trainer based) https://github.com/chainer/chainer/blob/master/examples/ptb/train_ptb.py
or
(manual training loop) https://github.com/chainer/chainer/blob/master/examples/ptb/train_ptb_custom_loop.py

ShigekiKarita · 2017-12-30T09:23:38Z

I prefer the manual training loop because this trainer uses device operations inside it unlike e2e_asr_train.py (instead of the trainer, model.__call__ does)

sw005320 · 2017-12-30T09:24:37Z

Thanks.
This is my expectation.
I'll work on it.

sw005320 · 2018-01-04T19:30:43Z

@ShigekiKarita, @takaaki-hori and I discussed the possibility of implementing attention/CTC joint decoding, but it seems that warp_ctc does not provide enough interface to compute CTC scores during decoding efficiently. @takaaki-hori will explain it a bit more detail, but we may think to implement re-scoring rather than joint decoding.

takaaki-hori · 2018-01-15T18:33:16Z

@sw005320 , @ShigekiKarita , I added attention/CTC joint decoding and tested with Voxforge and WSJ.
I got some CER reduction (14.7->12.5 in Voxforge and 5.9->5.5 in WSJ), where I used decoding options "--minlenratio 0.0 --maxlenratio 0.0 --ctc-weight 0.3"
Can you take a look at the code and try it with other tasks? To test it, you first need to move to "joint-decoding" branch, and add "--ctc-weight" option in run.sh like egs/wsj/asr1/run.sh.

sw005320 · 2018-01-16T16:05:11Z

Great Hori-san. I'll review it. BTW, I'm also about to finish the LM integration and prepare to commit it (CER 5.9 -> 5.3, WER 18.0 -> 14.7 in the WSJ task).

sw005320 · 2018-01-26T04:26:54Z

Guys, by combining LSTMLM and joint attention/CTC decoding, we finally get CER 5.3 -> 3.8, WER 14.7 -> 9.3 in the WSJ task!!! The nice thing is that we don't have to set min/maxlength and penalty (all set to 0.0), while we might need to tune the CTC and LM weights (0.3 and 1.0, respectively, see #76).
@kan-bayashi, can you play with LSTMLM and joint decoding with the TEDLIUM recipe? You can train LSTMLM by using text data by referring tools/kaldi/egs/tedlium/s5_r2/local/ted_train_lm.sh and simply using

gunzip -c db/TEDLIUM_release2/LM/*.en.gz | sed 's/ <\/s>//g' | local/join_suffix.py | gzip -c  > ${dir}/data/text/train.txt.gz

kan-bayashi · 2018-01-27T03:45:55Z

@sw005320 Great result! I will do it.

sw005320 · 2018-01-29T21:14:17Z

I just added the fisher_swbd recipe. The results will be added later. Also, I finished Librespeech experiments with pytorch, and we got 7.7% WER for clean conditions. This is not bad. I'll work on making a language model training script for pytorch. Then, we'll have some more improvements in the Librespeech and fisher_swbd recipes, like the WSJ case.

kan-bayashi · 2018-02-02T08:42:55Z

The results of tedlium with ctc joint decoding and lm rescoring are as follows:

exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_dev_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.txt:|        Sum/Avg                          |         507                 95429        |        91.8                  4.2                  4.0                  2.7                 10.8                 89.3        |
exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_test_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.txt:|        Sum/Avg                       |        1155                145066         |        92.2                  3.7                   4.1                  2.4                  10.1                 85.3         |
exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_dev_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.wrd.txt:|        Sum/Avg                           |         507                17783         |        83.2                 13.7                   3.1                  3.0                  19.8                 89.3         |
exp/train_trim_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_test_beam20_eacc.best_p0.1_len0.0-0.0_ctcw0.3_rnnlm1.0/result.wrd.txt:|        Sum/Avg                       |        1155                 27500         |        84.0                   12.3                   3.7                   2.6                  18.6                  85.3         |

for dev set, CER 12.6 -> 10.8, WER 24.8 -> 19.8
for test set, CER 11.9 -> 10.1, WER 23.4 -> 18.6

sw005320 · 2018-02-03T15:56:25Z

It seems that #85 solves randomness issues in the pytorch backend.

kan-bayashi · 2018-02-10T03:18:33Z

Updated CSJ recipe results (#91).

# Deep VGGBLSTMP (elayers=6) with chainer backend
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        SPKR            |        # Snt                 # Wrd        |        Corr                  Sub                   Del                  Ins                  Err                 S.Err        |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        Sum/Avg         |        1272                  43897        |        91.4                  6.4                   2.3                  1.6                 10.2                  67.6        |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        SPKR            |        # Snt                 # Wrd        |        Corr                  Sub                   Del                  Ins                  Err                 S.Err        |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        Sum/Avg         |        1292                  43623        |        93.7                  5.1                   1.3                  1.2                  7.5                  65.2        |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        SPKR            |        # Snt                 # Wrd        |        Corr                  Sub                   Del                  Ins                  Err                 S.Err        |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.1_len0.1-0.5/result.txt:|        Sum/Avg         |        1385                  28225        |        93.6                  5.0                   1.4                  1.6                  8.0                  47.9        |

# Deep VGGBLSTMP (elayers=6) with chainer backend + CTC joint decoding
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         Sum/Avg          |         1272                   43897          |         91.6                    6.0                    2.3                    1.4                     9.7                   66.5         |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         Sum/Avg          |         1292                   43623          |         94.1                    4.6                    1.3                    1.0                     6.9                   64.5         |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.0/result.txt:|         Sum/Avg          |         1385                   28225          |         93.9                    4.7                    1.4                    1.4                     7.5                   47.7         |

+# Deep VGGBLSTMP (elayers=6) with chainer backend + CTC joint decoding + LM rescoreing
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval1_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         Sum/Avg          |         1272                   43897          |         92.5                    5.3                    2.2                    1.3                     8.8                   63.4         |
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval2_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         Sum/Avg          |         1292                   43623          |         94.7                    4.1                    1.2                    0.9                     6.2                   60.7         |
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         SPKR             |         # Snt                  # Wrd          |         Corr                    Sub                    Del                    Ins                     Err                  S.Err         |
 +exp/train_nodup_vggblstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150/decode_eval3_beam20_eacc.best_p0.0_len0.0-0.0_ctcw0.3_rnnlm0.3/result.txt:|         Sum/Avg          |         1385                   28225          |         94.3                    4.2                    1.5                    1.2                     7.0                   45.2         |

task: vggblmsp -> +ctc joint -> ++ lm rescoring
eval1 : 10.2 -> 9.7 -> 8.8
eval2: 7.5 -> 6.9 -> 6.2
eval3: 8.0 -> 7.5 -> 7.0

sw005320 · 2018-02-17T15:27:58Z

I think we almost finish all our targets except for VGG (@ShigekiKarita is this still difficult?). After refactoring #102, we can move on to a next development plan.

ShigekiKarita · 2018-02-21T01:24:02Z

Yes, it is still difficult. I hope that someone else also takes a look at the connection part between VGG and BLSTMP.

sw005320 · 2018-02-21T02:55:22Z

@ShigekiKarita, did you specify the that problem is the connection part? You mean VGG and BLSTMP themselves are correct?

ShigekiKarita · 2018-02-21T03:13:34Z

Yes. For BLSTMP, experimental results are equal as seen in #9 . For VGG, chainer/pytorch impls show equal activations and gradients when all the parameters initialized with contantats w[:]=1 and b[:]=0 (without random values) #47. https://github.com/ShigekiKarita/espnet/blob/2a44f292d44c9e23c6ac8e24ea7eb9e2c64b0cb8/test/test_vgg.py

Hence, the last part is the connetion between VGG and BLSTMP.

kan-bayashi · 2018-02-21T04:57:34Z

To use current version, we have to update warp-ctc (#105).
Please try following commands to update.

cd tools
rm -r warp-ctc
make warp-ctc

geniki · 2018-02-21T14:04:21Z

Is the pytorch version of the LM integration a priority for this stable version?

Many thanks for the great repo.

sw005320 · 2018-02-21T14:06:37Z

@kan-bayashi is considering it. It is not super high priority for now, but anyway we're working on it.

sw005320 · 2018-03-28T14:16:47Z

@geniki We have finished the pytorch LM integration #114

We have finished the most of action items. I'll close this issue.

sw005320 assigned ShigekiKarita, sw005320 and kan-bayashi Dec 25, 2017

ShigekiKarita mentioned this issue Dec 28, 2017

[WIP] pytorch VGG implementation #47

Closed

This was referenced Dec 28, 2017

SWBD recipe #48

Merged

added WER scoring #50

Merged

AMI recipe #52

Merged

sw005320 mentioned this issue Jan 4, 2018

integrates a language model #53

Merged

kan-bayashi mentioned this issue Jan 4, 2018

added how-to-publish #56

Merged

sw005320 assigned takaaki-hori Jan 4, 2018

takaaki-hori mentioned this issue Jan 16, 2018

Add joint CTC/attention decoding #66

Merged

This was referenced Jan 18, 2018

Use cumsum #67

Merged

use cumsum and simplify reshape #73

Merged

sw005320 mentioned this issue Jan 26, 2018

update WSJ result #76

Merged

sw005320 mentioned this issue Jan 29, 2018

Fix recipes #80

Merged

kan-bayashi mentioned this issue Feb 2, 2018

Added lm preparation in tedlium recipe #77

Merged

sw005320 mentioned this issue Feb 2, 2018

Chainer or Pytorch #82

Closed

takaaki-hori mentioned this issue Feb 12, 2018

Stable version #95

Merged

sw005320 closed this as completed Mar 28, 2018

himanshucodz55 mentioned this issue Jul 24, 2022

RuntimeError: [1] is setting up NCCL communicator and retreiving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: Timeout waiting for key: default_pg/0/0 after 1800000 ms #4531

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Toward a stable version #35

Toward a stable version #35

sw005320 commented Dec 25, 2017 •

edited

Loading

kan-bayashi commented Dec 26, 2017

ShigekiKarita commented Dec 26, 2017

sw005320 commented Dec 27, 2017

sw005320 commented Dec 27, 2017 •

edited

Loading

sw005320 commented Dec 27, 2017 •

edited

Loading

ShigekiKarita commented Dec 28, 2017

sw005320 commented Dec 30, 2017 •

edited

Loading

ShigekiKarita commented Dec 30, 2017

sw005320 commented Dec 30, 2017

sw005320 commented Jan 4, 2018 •

edited

Loading

takaaki-hori commented Jan 15, 2018

sw005320 commented Jan 16, 2018

sw005320 commented Jan 26, 2018

kan-bayashi commented Jan 27, 2018

sw005320 commented Jan 29, 2018

kan-bayashi commented Feb 2, 2018

sw005320 commented Feb 3, 2018

kan-bayashi commented Feb 10, 2018 •

edited

Loading

sw005320 commented Feb 17, 2018

ShigekiKarita commented Feb 21, 2018

sw005320 commented Feb 21, 2018

ShigekiKarita commented Feb 21, 2018

kan-bayashi commented Feb 21, 2018

geniki commented Feb 21, 2018

sw005320 commented Feb 21, 2018

sw005320 commented Mar 28, 2018

Toward a stable version #35

Toward a stable version #35

Comments

sw005320 commented Dec 25, 2017 • edited Loading

kan-bayashi commented Dec 26, 2017

ShigekiKarita commented Dec 26, 2017

sw005320 commented Dec 27, 2017

sw005320 commented Dec 27, 2017 • edited Loading

sw005320 commented Dec 27, 2017 • edited Loading

ShigekiKarita commented Dec 28, 2017

sw005320 commented Dec 30, 2017 • edited Loading

ShigekiKarita commented Dec 30, 2017

sw005320 commented Dec 30, 2017

sw005320 commented Jan 4, 2018 • edited Loading

takaaki-hori commented Jan 15, 2018

sw005320 commented Jan 16, 2018

sw005320 commented Jan 26, 2018

kan-bayashi commented Jan 27, 2018

sw005320 commented Jan 29, 2018

kan-bayashi commented Feb 2, 2018

sw005320 commented Feb 3, 2018

kan-bayashi commented Feb 10, 2018 • edited Loading

sw005320 commented Feb 17, 2018

ShigekiKarita commented Feb 21, 2018

sw005320 commented Feb 21, 2018

ShigekiKarita commented Feb 21, 2018

kan-bayashi commented Feb 21, 2018

geniki commented Feb 21, 2018

sw005320 commented Feb 21, 2018

sw005320 commented Mar 28, 2018

sw005320 commented Dec 25, 2017 •

edited

Loading

sw005320 commented Dec 27, 2017 •

edited

Loading

sw005320 commented Dec 27, 2017 •

edited

Loading

sw005320 commented Dec 30, 2017 •

edited

Loading

sw005320 commented Jan 4, 2018 •

edited

Loading

kan-bayashi commented Feb 10, 2018 •

edited

Loading