Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Branchformer Encoder in ESPnet2 #4400

Merged
merged 33 commits into from
Jul 28, 2022
Merged

[WIP] Branchformer Encoder in ESPnet2 #4400

merged 33 commits into from
Jul 28, 2022

Conversation

pyf98
Copy link
Collaborator

@pyf98 pyf98 commented May 27, 2022

Hi, this PR adds the Branchformer encoder (Peng et al., ICML 2022) into ESPnet2 and releases some of the trained models. Also, we've achieved better Conformer baselines on some recipes, which will be updated as well.

The PR is in progress. I've cleaned and merged my previous code. Now I'm training models using the latest code (and config if exists).

The paper can be found here: https://proceedings.mlr.press/v162/peng22a.html or on arXiv: https://arxiv.org/abs/2207.02971

Please cite the following paper if you find our work helpful.


@InProceedings{pmlr-v162-peng22a,
  title = 	 {Branchformer: Parallel {MLP}-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding},
  author =       {Peng, Yifan and Dalmia, Siddharth and Lane, Ian and Watanabe, Shinji},
  booktitle = 	 {Proceedings of the 39th International Conference on Machine Learning},
  pages = 	 {17627--17643},
  year = 	 {2022},
  editor = 	 {Chaudhuri, Kamalika and Jegelka, Stefanie and Song, Le and Szepesvari, Csaba and Niu, Gang and Sabato, Sivan},
  volume = 	 {162},
  series = 	 {Proceedings of Machine Learning Research},
  month = 	 {17--23 Jul},
  publisher =    {PMLR},
  pdf = 	 {https://proceedings.mlr.press/v162/peng22a/peng22a.pdf},
  url = 	 {https://proceedings.mlr.press/v162/peng22a.html},
  abstract = 	 {Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible, interpretable and customizable encoder alternative, Branchformer, with parallel branches for modeling various ranged dependencies in end-to-end speech processing. In each encoder layer, one branch employs self-attention or its variant to capture long-range dependencies, while the other branch utilizes an MLP module with convolutional gating (cgMLP) to extract local relationships. We conduct experiments on several speech recognition and spoken language understanding benchmarks. Results show that our model outperforms both Transformer and cgMLP. It also matches with or outperforms state-of-the-art results achieved by Conformer. Furthermore, we show various strategies to reduce computation thanks to the two-branch architecture, including the ability to have variable inference complexity in a single trained model. The weights learned for merging branches indicate how local and global dependencies are utilized in different layers, which benefits model designing.}
}

Updated recipes:

  • (ASR) Aishell
    • New Conformer baseline
    • Branchformer
    • Branchformer with fast_selfattn
  • (ASR) Switchboard 300h
    • Our Conformer baseline
    • Branchformer
  • (ASR) LibriSpeech 960h
    • Branchformer using the latest config
  • (SLU) Google Speech Commands
    • Branchformer
  • (SLU) SLURP
    • New Conformer baseline
    • Branchformer
  • (SLU) SLURP entity
    • New Conformer baseline
    • Branchformer
  • (MT) IWSLT'14 De-En
    • Branchformer

Copy link
Contributor

@siddalmia siddalmia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sw005320 sw005320 added this to the v.202206 milestone May 28, 2022
@sw005320 sw005320 added New Features ASR Automatic speech recogntion labels May 28, 2022
@codecov
Copy link

codecov bot commented May 29, 2022

Codecov Report

Merging #4400 (03a6354) into master (f2778f7) will increase coverage by 0.07%.
The diff coverage is 91.26%.

@@            Coverage Diff             @@
##           master    #4400      +/-   ##
==========================================
+ Coverage   82.40%   82.47%   +0.07%     
==========================================
  Files         481      484       +3     
  Lines       41238    41570     +332     
==========================================
+ Hits        33982    34285     +303     
- Misses       7256     7285      +29     
Flag Coverage Δ
test_integration_espnet1 66.38% <ø> (ø)
test_integration_espnet2 48.70% <12.38%> (-0.48%) ⬇️
test_python 69.57% <91.26%> (+0.17%) ⬆️
test_utils 23.30% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
espnet2/tasks/mt.py 0.00% <0.00%> (ø)
espnet2/asr/layers/fastformer.py 85.96% <85.96%> (ø)
espnet2/asr/layers/cgmlp.py 86.27% <86.27%> (ø)
espnet2/asr/encoder/branchformer_encoder.py 94.14% <94.14%> (ø)
espnet2/tasks/asr.py 91.76% <100.00%> (+0.04%) ⬆️

📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more

@sw005320
Copy link
Contributor

sw005320 commented Jun 8, 2022

@pengchengguo, can you review this PR, especially for espnet2/asr/encoder/branchformer_encoder.py?

One discussion point is whether we will put various blocks in espnet2/asr/encoder/branchformer_encoder.py or under https://github.com/espnet/espnet/tree/master/espnet/nets/pytorch_backend
I'm fine with either option but I want your input about it.

@pengchengguo
Copy link
Collaborator

No problem! I will do it this week.

@pengchengguo
Copy link
Collaborator

Hi @sw005320 and @pyf98,

I prefer the current structure. Since most users are moving to the espnet2, we could make it more closed-loop. However, the branchformer_encoder.py file is not as concise as others. Maybe we could move the definition of various modules to a new directory like espnet2/asr/modules and only keep the definition of BranchformerEncoderLayer and BranchformerEncoder.

@pyf98
Copy link
Collaborator Author

pyf98 commented Jul 8, 2022

Thank you for your comments. @sw005320 Should I move the definition of these modules into a separate directory?

@sw005320
Copy link
Contributor

sw005320 commented Jul 10, 2022

I think it is a good idea to move them to a separate directory.
We can have two options.

Please consider two options and select more appropriate option.

@pyf98
Copy link
Collaborator Author

pyf98 commented Jul 16, 2022

Hi @sw005320, I moved cgmlp and fastformer definitions to a separate directory layers in espnet2/asr.

Now there is an error with doc. What might be the cause of this error? Thanks.

@pyf98
Copy link
Collaborator Author

pyf98 commented Jul 16, 2022

Hi @sw005320, it seems the doc error comes from a third-party library Python-Markdown, which is used by sphinx_markdown_tables.

Python-Markdown released a new version 3.4/3.4.1 on July 15th, which modified the __init__ method in markdown/extensions/tables.py. Now, it requires a positional argument config. However, sphinx_markdown_tables does not pass this argument:
https://github.com/ryanfox/sphinx-markdown-tables/blob/master/sphinx_markdown_tables/__init__.py#L24

@sw005320
Copy link
Contributor

I see. Thanks for the information.
Then, can you pass this by setting version restriction for them? Sorry to ask extra work.

@pyf98
Copy link
Collaborator Author

pyf98 commented Jul 23, 2022

Hi @sw005320 I made a new PR to fix the doc issue. Once that is fixed, can we merge this PR?

@pyf98
Copy link
Collaborator Author

pyf98 commented Jul 25, 2022

Hi, can we re-run the CI tests for this PR?

@pyf98
Copy link
Collaborator Author

pyf98 commented Jul 26, 2022

I think this PR is ready for review.

@sw005320
Copy link
Contributor

LGTM!
Thanks for your great PR!

@sw005320 sw005320 merged commit 92ea573 into espnet:master Jul 28, 2022
@pyf98 pyf98 deleted the icml22 branch July 28, 2022 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion ESPnet2 New Features README
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants