Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add E-Branchformer configs and models in ASR recipes #4837

Merged
merged 18 commits into from Jan 4, 2023
Merged

Conversation

pyf98
Copy link
Collaborator

@pyf98 pyf98 commented Dec 27, 2022

Following #4833, this PR continues to add E-Branchformer results in ESPnet2 ASR recipes. The updated recipes and results (including those in the previous PR) are summarized below. Joint CTC/attention training and decoding are performed unless otherwise specified. In ESPnet, the most widely used Conformer config is: 12 layers, 2048 linear units (8 times expansion).

aidatatang_200zh: CER, with Transformer LM

encoder #blocks mlp linear ffn linear #params dev test
Conformer 12 NA 2048 45.98M 3.6 4.3
E-Branchformer 12 1024 1024 37.66M 3.6 4.2

aishell: CER, without LM

encoder #blocks mlp linear ffn linear #params dev test
Conformer 12 NA 2048 46.25M 4.3 4.6
Branchformer 24 2048 NA 45.43M 4.1 4.4
E-Branchformer 12 1024 1024 37.88M 4.2 4.5

chime4: WER on beamformit_5mics, with Transformer LM

  • NOTE: The 12-layer E-Branchformer and 15-layer Conformer (around 35M params) require a smaller learning rate to converge, which leads to worse performance than the two smaller models (less than 31M params).
encoder #blocks mlp linear ffn linear #params dt05_real dt05_simu et05_real et05_simu
Conformer 12 NA 1024 30.43M 7.8 9.5 12.5 14.8
Conformer 12 NA 2048 43.04M 7.3 9.1 12.0 13.6
E-Branchformer 10 1024 1024 30.79M 6.8 8.4 10.8 13.0

gigaspeech: WER, without LM

#4882

librispeech_100: WER, without LM

encoder #blocks mlp linear ffn linear #params dev clean dev other test clean test other
Conformer 12 NA 1024 34.23M 6.4 17.5 6.5 17.5
Conformer 15 NA 1024 39.00M 6.3 17.0 6.6 17.2
Conformer 12 NA 2048 46.84M 6.3 16.9 6.6 17.1
E-Branchformer 12 1024 1024 38.47M 6.1 16.7 6.3 17.0

librispeech_100: WER, CTC only, beam size 1, without LM

encoder #blocks mlp linear ffn linear #params dev clean dev other test clean test other
Conformer 15 NA 1024 26.96M 9.4 22.5 9.9 23.1
Conformer 12 NA 2048 34.80M 9.6 23.0 9.9 23.8
E-Branchformer 12 1024 1024 26.43M 9.2 22.4 9.6 23.1

swbd: WER, without LM

encoder #blocks mlp linear ffn linear #params callhm swbd overall
Conformer 12 NA 1024 31.92M 14.0 7.8 10.9
Conformer 15 NA 1024 36.69M 13.5 7.4 10.4
Conformer 12 NA 2048 44.53M 13.8 7.5 10.7
E-Branchformer 12 1024 1024 36.16M 13.4 7.3 10.4

tedlium2: WER, without LM

encoder #blocks mlp linear ffn linear #params dev test
Conformer 12 NA 1024 30.76M 7.8 7.6
Conformer 15 NA 1024 35.53M 7.5 7.6
Conformer 12 NA 2048 43.37M 7.5 7.5
E-Branchformer 12 1024 1024 35.01M 7.3 7.1

tedlium2: WER, CTC only, beam size 1, without LM

encoder #blocks mlp linear ffn linear #params dev test
Conformer 15 NA 1024 25.80M 9.1 9.0
Conformer 12 NA 2048 33.64M 8.9 8.5
E-Branchformer 12 1024 1024 25.28M 8.7 8.3

voxforge: CER, without LM

encoder #blocks mlp linear ffn linear #params dt_it et_it
Conformer 15 NA 1024 35.18M 9.0 8.1
Conformer 12 NA 2048 43.02M 8.9 8.0
E-Branchformer 12 1024 1024 34.65M 8.8 8.0

wsj: WER, with Transformer LM

encoder #blocks mlp linear ffn linear #params dev93 eval92
Conformer 12 NA 1024 30.43M 6.7 5.0
Conformer 15 NA 1024 35.20M 6.5 4.1
Conformer 12 NA 2048 43.04M 6.8 4.0
E-Branchformer 12 1024 1024 34.67M 6.5 4.3

@codecov
Copy link

codecov bot commented Dec 27, 2022

Codecov Report

Merging #4837 (dde4525) into master (7669cce) will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #4837   +/-   ##
=======================================
  Coverage   79.18%   79.18%           
=======================================
  Files         557      557           
  Lines       49279    49279           
=======================================
  Hits        39020    39020           
  Misses      10259    10259           
Flag Coverage Δ
test_integration_espnet1 66.39% <ø> (ø)
test_integration_espnet2 49.33% <ø> (ø)
test_python 67.99% <ø> (ø)
test_utils 23.34% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320 sw005320 added Recipe ASR Automatic speech recogntion labels Dec 27, 2022
@sw005320 sw005320 added this to the v.202301 milestone Dec 27, 2022
@pyf98
Copy link
Collaborator Author

pyf98 commented Jan 4, 2023

Hi @sw005320 Can we merge this PR now (before SLT) so that some people can start to use E-Branchformer? I will continue to add more in following PRs.

@sw005320
Copy link
Contributor

sw005320 commented Jan 4, 2023

OK, I'll start to review.
Please also remove [WIP], then.

@pyf98 pyf98 changed the title [WIP] Add E-Branchformer configs and models in ASR recipes Add E-Branchformer configs and models in ASR recipes Jan 4, 2023
@pyf98
Copy link
Collaborator Author

pyf98 commented Jan 4, 2023

Thanks @sw005320

Copy link
Contributor

@sw005320 sw005320 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
I have minor comments.

egs2/swbd/asr1/conf/train_asr.yaml Show resolved Hide resolved
@sw005320 sw005320 added the auto-merge Enable auto-merge label Jan 4, 2023
@sw005320
Copy link
Contributor

sw005320 commented Jan 4, 2023

Thanks a lot!
After the CI check, I'll merge this PR.

@mergify mergify bot merged commit d708807 into espnet:master Jan 4, 2023
@pyf98 pyf98 deleted the ebf branch January 4, 2023 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion auto-merge Enable auto-merge ESPnet2 README Recipe
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants