Espnet2 transducer v2 #4032

b-flo · 2022-02-04T13:32:42Z

Hi,

This PR is a draft for the new version of Transducer models in ESPnet2, separated from the main ASR task (CTC+Att). It's working but please note that :

This is not the final version, it's only to open discussion. If needed, I have some alternatives / other versions.
Some parts or features are removed compared to the previous version. It can be easily added but I would like to add them one by one with careful testing or feedback.
This draft may contain minor issues and typos, useless code, etc. Feel free to point out any weird/wrong parts.

Performance should be on par or better than previously. I also found out what caused the performance degradation for the Voxforge model (mainly due to initialization, and some small training differences). It may be worth extending the investigation though!
@jeon30c Would it be possible for you to re-train a Librispeech model with this version to compare performance, please?
@sw005320 Do you know if we have other models to compare? I'm not sure who already used the first version.

Also, after we are set on the task and model definition, I would like to at least make the encoder and decoder fully customizable (similar to the custom model in ESPnet1). Mainly, the changes would be :

Add unified Encoder containing PreEncoder (bottlenecks/input blocks) + BodyEncoder (supporting main nets/blocks + some bridge blocks)
Same for Decoder.
Refactor the BeamSearch / Scorer part to reflect changes and optimize for ESPnet2.

After that :

Add tests
Add documentation

mergify · 2022-02-04T13:33:19Z

This pull request is now in conflict :(

jeon30c · 2022-02-04T14:00:41Z

@b-flo Great! I will work on training a model using the codes. I will post the results later.

codecov · 2022-02-04T14:04:38Z

Codecov Report

Merging #4032 (0157e81) into master (4323c52) will increase coverage by 0.68%.
The diff coverage is 98.77%.

@@            Coverage Diff             @@
##           master    #4032      +/-   ##
==========================================
+ Coverage   81.61%   82.30%   +0.68%     
==========================================
  Files         458      478      +20     
  Lines       39894    41536    +1642     
==========================================
+ Hits        32561    34185    +1624     
- Misses       7333     7351      +18

Flag	Coverage Δ
test_integration_espnet1	`67.13% <ø> (ø)`
test_integration_espnet2	`51.02% <64.89%> (+0.71%)`	⬆️
test_python	`69.28% <96.67%> (+1.12%)`	⬆️
test_utils	`24.45% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
espnet2/asr/decoder/transducer_decoder.py	`100.00% <ø> (ø)`
espnet2/bin/asr_transducer_inference.py	`92.93% <92.93%> (ø)`
espnet2/asr_transducer/espnet_transducer_model.py	`97.82% <97.82%> (ø)`
espnet2/asr_transducer/beam_search_transducer.py	`99.13% <99.13%> (ø)`
espnet2/asr/espnet_model.py	`81.44% <100.00%> (ø)`
espnet2/asr/transducer/beam_search_transducer.py	`98.74% <100.00%> (+0.94%)`	⬆️
espnet2/asr_transducer/activation.py	`100.00% <100.00%> (ø)`
espnet2/asr_transducer/decoder/abs_decoder.py	`100.00% <100.00%> (ø)`
espnet2/asr_transducer/decoder/rnn_decoder.py	`100.00% <100.00%> (ø)`
...spnet2/asr_transducer/decoder/stateless_decoder.py	`100.00% <100.00%> (ø)`
... and 30 more

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

b-flo · 2022-02-04T15:14:55Z

Thank you very much! Btw, the following should be changed in comparison to your previous training :

Remove init: none in training config
Add --asr_transducer true to asr.sh parameters in run.sh

Outside of that (and bugs), everything should work as intended!

b-flo · 2022-02-10T14:54:37Z

Hi @sw005320, can you take a look, please? If you're OK with the design and @jeon30c can reproduce or improve results, I'll add the next items

jeon30c · 2022-02-11T02:11:26Z

@b-flo

I observed that the loss was not well decreasing compared to the previous version.
As the basic functionality is same, I think it should be almost same to the previous training as

b-flo · 2022-02-11T06:32:04Z

I observed that the loss was not well decreasing compared to the previous version.
As the basic functionality is same, I think it should be almost same to the previous training as ...

The implementation is equivalent except for the model initialization which relies on the ESPnet1 one here. If you comment l.435 in espnet2/tasks/asr_transducer.py, it should be equivalent to your previous run. Could you also test, please? Btw, what about CER/WER with this model?

Edit

For information, on Voxforge I observed the following :

without initialization

CER

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_default_asr_model_valid.loss.best/dt_it	1035	75494	87.1	6.3	6.7	3.0	15.9	98.8
decode_default_asr_model_valid.loss.best/et_it	1103	81228	88.4	5.8	5.7	3.0	14.5	97.6

WER

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_default_asr_model_valid.loss.best/dt_it	1035	12587	55.0	36.3	8.7	4.4	49.4	98.8
decode_default_asr_model_valid.loss.best/et_it	1103	13699	58.7	34.2	7.1	4.7	46.0	97.6

with ESPnet1 initialization

CER

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_default_asr_model_valid.loss.best/dt_it	1035	75494	90.1	5.0	4.9	2.6	12.4	97.3
decode_default_asr_model_valid.loss.best/et_it	1103	81228	90.8	4.8	4.3	2.6	11.8	95.2

WER

dataset	Snt	Wrd	Corr	Sub	Del	Ins	Err	S.Err
decode_default_asr_model_valid.loss.best/dt_it	1035	12587	62.0	31.3	6.7	4.4	42.3	97.3
decode_default_asr_model_valid.loss.best/et_it	1103	13699	63.3	30.5	6.2	4.5	41.2	95.2

Performance with ESPnet2 init (chainer or xavier_uniform) is slightly worse than without initialization.
I also did the comparison with another custom dataset (~80h) and ended up with comparable results in each case.

However, I'm a bit confused by the difference in terms of loss here. In my experiments, the losses are in the same range despite performance variation.

mergify · 2022-02-14T02:53:20Z

This pull request is now in conflict :(

jeon30c · 2022-02-14T07:58:27Z

@b-flo

This is very interesting. As you said, I commented out the init part and retrained a model, that was very similar to the previous training procedure as is shown at the above figure. The init function seems to have great effect on performance.

After finishing the training completely, I will post again the final WER results.

b-flo · 2022-02-14T08:45:26Z

Initialization is one of the parts I want to address alongside training techniques. I'll work on that after adding custom architectures, for now, I'll add back the init option with support for "espnet1 initialization".

jeon30c · 2022-02-16T00:58:00Z

Without init, the performance is on par with espnet2 transducer v1. For test-clean and test-other, WERs are 3.1, 7.2, respectively.

b-flo · 2022-04-26T12:41:30Z

I added back the old version, everything should be the same as before except for an optional parameter of JointNetwork I had to rename.

I kept error_calculator.py and beam_search_transducer.py in transducer/ for now, let me know what I should do with them based on my previous message.

Edit: I'm not sure what's going with codecov, it seems tests related to asr/transducer/error_calculator.py and asr/transducer/beam_search_transducer.py are not counted. Does it have to do with my renaming of the test files methods to avoid names clashes with tests for asr_transducer?
Never mind, I'm stupid.

sw005320 · 2022-04-28T12:01:05Z

@b-flo, I'm asking @pyf98 to review this PR, but he is busy these days.
So, please wait for a while.

While we wait for his review, I'll list a couple of high-level comments.

use our naming conventions for functions and variables: Please use the same or similar names as many as possible.
minimize the configuration changes from the current ones. (it is similar to the above comment, but it is particularly important)
minimize duplication of the core network model codes (e.g., conformer, lstm, transformer, etc.)

b-flo · 2022-04-28T12:06:57Z

While we wait for his review, I'll list a couple of high-level comments.
....

We're talking about the v1, right? If so, I reverted to what it was before this PR, no worries! No changes outside some files moved and a JointNetwork(...) I'm sharing between versions (I had to rename one variable vocab_size -> dim_vocab because of that).

sw005320 · 2022-04-28T12:12:36Z

No, I'm talking about this PR.
The following is a good example

encoder_output_size is used in many ASR codes.
Please do so.

chintu619 · 2022-04-29T05:57:02Z

Sorry in advance for the lengthy comments below.

Last week I started training the librispeech recipe with this PR. I used the auxiliary CTC loss and small weight decay from your Librispeech-100 training config. The model training went well, without any issues. I uploaded the pretrained model, training images etc. here. Currently I just uploaded this model to my personal HF hub. After the PR is merged, I can upload it to HF/espnet hub.

Note: the pretrained model that I uploaded for transducer v1 as part of #4327, is actually from the above model training. Basically, I copied the pretrained model weights from v2 to v1. I ensured that both models were making identical computations, which explains why the dev/test scores are also identical. However, this process was a bit challenging because of the changes in names of encoder layers from v1 to v2. To maintain consistency with other ASR configs, would it be possible to retain the same layer names as v1? This will facilitate the use of other pretrained models using: --init_param model.pth:encoder:encoder.

I noticed that the bias of the linear decoder layer in the JointNetwork is set to False in v2 here. So when I copied the weights from v2 to v1, I set this bias to zero. When this PR is merged, I will update the pretrained model in #4327 by disabling this bias.

b-flo · 2022-04-29T06:06:37Z

I noticed that the bias of the linear decoder layer in the JointNetwork is set to False in v2 here.

Yes, that's intended! I don't recall the full discussion but we came to the conclusion with Hirofumi and Mingkun (warp-transducer author) that the bias in the decoder linear projection was redundant information. However, it shouldn't make a difference in practice.

Edit: Oh, I didn't set it to False in the first version of ESPnet2, got it. Either is fine for me!

mergify · 2022-05-18T21:33:09Z

This pull request is now in conflict :(

doc/espnet2_tutorial.md

ci/test_integration_espnet2.sh

doc/espnet2_tutorial.md

espnet2/asr_transducer/beam_search_transducer.py

b-flo · 2022-06-06T15:33:05Z

@csukuangfj Thanks a lot for taking the time to review!! I did not notify but this PR is somewhat dead in its current form. I'm currently re-working this version with streaming and deployment in mind. Some parts from this PR remain but others may be removed or heavily changed on my side.

Btw, I took Icefall as a reference for the new version (i.e.: for streaming + some Conformer tricks). Could I kindly ask you or another Icefall member to help review the new version when it's available?

csukuangfj · 2022-06-06T15:37:10Z

Could I kindly ask you or another Icefall member to help review the new version when it's available?

Yes, we are glad to. Is there any PR about your new version?

b-flo · 2022-06-06T15:51:28Z

Is there any PR about your new version?

There is none for now. Most of the stuff for v1 is done and seems to work on my side (v2 is extending to other *-former architecture) but I did not finish testing and debugging. I should open a PR this week or the week after, I'll ping you at this time if it's okay!

b-flo · 2022-06-28T13:10:03Z

I'm closing the PR, I'm opening a new one. Sorry about the delay, other things and experiments took priority.

b-flo added 4 commits February 4, 2022 13:19

remove espnet2 transducer tests

de3976e

remove previous transducer version

170093a

add new transducer version

a10025b

add dummy handle for transducer asr task

a9a08f1

mergify bot added conflicts ESPnet2 labels Feb 4, 2022

fix conflict

fee3c08

b-flo added this to the v.0.10.6 milestone Feb 4, 2022

mergify bot removed the conflicts label Feb 4, 2022

b-flo marked this pull request as draft February 4, 2022 13:47

b-flo added the RNNT (RNN) transducer related issue label Feb 4, 2022

cleaner changes to template

4c7351b

kan-bayashi modified the milestones: v.0.10.6, v.0.10.7 Feb 8, 2022

Merge branch 'master' into espnet2_transducer_v2

c9cd279

mergify bot added the conflicts label Feb 14, 2022

fix conflict

8488e46

mergify bot removed the conflicts label Feb 14, 2022

add back initialization options + chainer_espnet1 option

8c7fc89

b-flo added 4 commits April 26, 2022 08:31

move espnet2/asr/transducer to espnet2/asr_transducer

60e17b3

add back old transducer version

9aa2499

add back test for old transducer version and rearrange new tests

beb786c

fix file mode

bd06120

b-flo added 8 commits April 26, 2022 13:09

fix conflict between asr/transducer and asr_transducer tests

e58fc60

add disclaimer to transducer tutorial for the multiple versions

70b1db3

add missing files

294610d

remove espnet1 ref (single local usage)

4f2e28e

revert espnet1 doc changes; clean-up espnet2 doc introduction

4e954d9

use asr_task scheme instead of asr_transducer

571814c

fix integration tests

4fd9e63

add back blank lines

0157e81

b-flo mentioned this pull request Apr 27, 2022

Fix Transducer LM fusion and add Logging for Transducer inference #4327

Merged

mergify bot added the conflicts label May 18, 2022

kan-bayashi modified the milestones: v.202205, v.202206 May 26, 2022

csukuangfj suggested changes Jun 6, 2022

View reviewed changes

b-flo closed this Jun 28, 2022

b-flo mentioned this pull request Jun 29, 2022

Offline/Online (standalone) ESPnet2 Transducer #4479

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Espnet2 transducer v2 #4032

Espnet2 transducer v2 #4032

b-flo commented Feb 4, 2022 •

edited

mergify bot commented Feb 4, 2022

jeon30c commented Feb 4, 2022

codecov bot commented Feb 4, 2022 •

edited

b-flo commented Feb 4, 2022

b-flo commented Feb 10, 2022

jeon30c commented Feb 11, 2022

b-flo commented Feb 11, 2022 •

edited

mergify bot commented Feb 14, 2022

jeon30c commented Feb 14, 2022 •

edited

b-flo commented Feb 14, 2022

jeon30c commented Feb 16, 2022

b-flo commented Apr 26, 2022 •

edited

sw005320 commented Apr 28, 2022

b-flo commented Apr 28, 2022

sw005320 commented Apr 28, 2022

chintu619 commented Apr 29, 2022

b-flo commented Apr 29, 2022 •

edited

mergify bot commented May 18, 2022

b-flo commented Jun 6, 2022

csukuangfj commented Jun 6, 2022

b-flo commented Jun 6, 2022

b-flo commented Jun 28, 2022

Espnet2 transducer v2 #4032

Espnet2 transducer v2 #4032

Conversation

b-flo commented Feb 4, 2022 • edited

mergify bot commented Feb 4, 2022

jeon30c commented Feb 4, 2022

codecov bot commented Feb 4, 2022 • edited

Codecov Report

b-flo commented Feb 4, 2022

b-flo commented Feb 10, 2022

jeon30c commented Feb 11, 2022

b-flo commented Feb 11, 2022 • edited

without initialization

CER

WER

with ESPnet1 initialization

CER

WER

mergify bot commented Feb 14, 2022

jeon30c commented Feb 14, 2022 • edited

b-flo commented Feb 14, 2022

jeon30c commented Feb 16, 2022

b-flo commented Apr 26, 2022 • edited

sw005320 commented Apr 28, 2022

b-flo commented Apr 28, 2022

sw005320 commented Apr 28, 2022

chintu619 commented Apr 29, 2022

b-flo commented Apr 29, 2022 • edited

mergify bot commented May 18, 2022

b-flo commented Jun 6, 2022

csukuangfj commented Jun 6, 2022

b-flo commented Jun 6, 2022

b-flo commented Jun 28, 2022

b-flo commented Feb 4, 2022 •

edited

codecov bot commented Feb 4, 2022 •

edited

b-flo commented Feb 11, 2022 •

edited

jeon30c commented Feb 14, 2022 •

edited

b-flo commented Apr 26, 2022 •

edited

b-flo commented Apr 29, 2022 •

edited