Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libri100 recipe for standalone Transducer #4698

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

b-flo
Copy link
Member

@b-flo b-flo commented Oct 7, 2022

Add scripts + configs for streaming and offline Transducer. The second model is training, I'll add results to the README and update both models to hf.
Also, I added some minor fixes to the PR, it won't impact previous versions in any way.

I made a separate directory for asr_transducer1 here because we need some separation between the two versions of Transducer in ESPnet2 but I don't really have an opinion on the design. It makes sense because we defined this version as a new "task" but we could also keep asr1 and use some prefixes for the standalone version.

@b-flo b-flo added this to the v.202211 milestone Oct 7, 2022
@b-flo b-flo added Recipe RNNT (RNN) transducer related issue and removed README labels Oct 7, 2022
@mergify mergify bot added the README label Oct 7, 2022
@codecov
Copy link

codecov bot commented Oct 7, 2022

Codecov Report

Merging #4698 (ce88312) into master (3297e10) will increase coverage by 0.63%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4698      +/-   ##
==========================================
+ Coverage   74.36%   74.99%   +0.63%     
==========================================
  Files         654      655       +1     
  Lines       58347    58546     +199     
==========================================
+ Hits        43391    43909     +518     
+ Misses      14956    14637     -319     
Flag Coverage Δ
test_integration_espnet1 66.24% <ø> (-0.05%) ⬇️
test_integration_espnet2 47.65% <ø> (+0.70%) ⬆️
test_python 65.28% <ø> (+0.09%) ⬆️
test_utils 23.27% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

see 15 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@b-flo
Copy link
Member Author

b-flo commented Oct 16, 2022

@ftshijt @sw005320 If this design is okay, the PR can be merged. Otherwise I'll make the requested changes.

I'll update HF links in another PR, I'm currently running some additional experiments.

--ngpu 1 \
--nj 32 \
--inference_nj 32 \
--nbpe 500 \
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BPE size is 500 vs 5000 for the CTC-Att baseline model. I guess we can further improve results with bigger BPE size but I don't have enough resource for that.

@sw005320
Copy link
Contributor

@ftshijt @sw005320 If this design is okay, the PR can be merged. Otherwise I'll make the requested changes.

I'll update HF links in another PR, I'm currently running some additional experiments.

Sorry, I'll have some reviews after the ICASSP deadline. Can you wait for a week or so?

@mergify
Copy link
Contributor

mergify bot commented Oct 21, 2022

This pull request is now in conflict :(

@mergify mergify bot added the conflicts label Oct 21, 2022
@mergify mergify bot removed the conflicts label Oct 21, 2022
@kan-bayashi kan-bayashi modified the milestones: v.202211, v.202301 Dec 11, 2022
@kan-bayashi kan-bayashi removed this from the v.202301 milestone Feb 1, 2023
@kan-bayashi kan-bayashi added this to the v.202303 milestone Feb 1, 2023
@b-flo
Copy link
Member Author

b-flo commented Feb 10, 2023

It's more stable on my side so I'll start training different model architectures and settings before opening the corresponding PR. I'll use this branch (and message) to keep track of the best models.

Model Mode Num. params. BPE dev_clean (%WER) dev_other (%WER) test_clean (%WER) test_other (%WER)
E-Branchformer/Transformer CTC-Att offline 38.47M 5000 6.1 16.7 6.3 17.0
Conformer/RNN (old) offline 30.56M 500 5.9 17.6 6.4 17.9
Conformer/RNN (new) offline 30.53M 500 5.8 16.9 6.0 17.0
E-Branchformer/RNN (new) offline 29.12M 500 5.7 16.8 6.0 17.1
E-Branchformer/MEGA (tmp) offline 40.66M 500 5.7 16.5 6.0 16.7

@kan-bayashi kan-bayashi modified the milestones: v.202303, v.202307 May 1, 2023
@b-flo
Copy link
Member Author

b-flo commented Jun 22, 2023

@sw005320 If we are okay with the design (i.e.: adding a asr_transducer1 for standalone Transducer recipes), I think the PR can be merged.

Pre-trained model links will be added after I finish experiments with RWKV and I need to re-train the models anyways. At least, users have some examples for the standalone version and Librispeech-100 for now (which was asked before).

@kan-bayashi kan-bayashi modified the milestones: v.202307, v.202312 Aug 3, 2023
@kan-bayashi kan-bayashi modified the milestones: v.202310, v.202312 Oct 25, 2023
@kan-bayashi kan-bayashi modified the milestones: v.202312, v.202405 Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ESPnet2 README Recipe RNNT (RNN) transducer related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants