Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add transducer conformer configuration to commonvoice recipe #5503

Merged
merged 3 commits into from
Oct 25, 2023

Conversation

zuazo
Copy link
Contributor

@zuazo zuazo commented Oct 25, 2023

What?

This PR introduces a new transducer conformer configuration for the CommonVoice recipe:

  • Utilizes the Conformer architecture for the encoder (based on the train_asr_conformer5.yaml provided).
  • Incorporates the LSTM-based transducer decoder.

We tested the model in an NVIDIA A100 GPU, but smaller GPUs can probably be used.

Scores

We have tested the model with the Basque language (eu) only.

Common Voice 14 test set:

Model Common Voice 14 CER Common Voice 14 WER
train_asr_conformer5.yaml 1.6 8.0
train_asr_transducer_conformer5.yaml 1.7 8.0

AhoMyTTS (a private difficult dataset):

Model AhoMyTTS CER AhoMyTTS WER
train_asr_conformer5.yaml 6.0 22.21
train_asr_transducer_conformer5.yaml 6.3 21.46

Why?

The scores are similar to the provided Conformer5 configuration. But I thought you may be interested in including it, offering users an alternative method to train ASR models on the CommonVoice dataset.

The model: https://huggingface.co/espnet/zuazo_commonvoice_asr_train_asr_transducer_conformer5_raw_eu_bpe150_sp

Feedback and suggestions are welcome!

- Add a transducer conformer training configuration `train_asr_transducer_conformer5.yaml`.
- Added a decoding configuration for transducer in `decode_transducer.yaml`.
- The model has been trained and tested on NVIDIA A100 GPU.
@mergify mergify bot added the ESPnet2 label Oct 25, 2023
@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

Merging #5503 (1885236) into master (76b318e) will increase coverage by 0.00%.
Report is 3 commits behind head on master.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5503   +/-   ##
=======================================
  Coverage   75.40%   75.40%           
=======================================
  Files         709      709           
  Lines       65361    65361           
=======================================
+ Hits        49287    49288    +1     
+ Misses      16074    16073    -1     
Flag Coverage Δ
test_configuration_espnet2 ∅ <ø> (∅)
test_integration_espnet1 65.67% <ø> (ø)
test_integration_espnet2 48.68% <ø> (+<0.01%) ⬆️
test_python_espnet1 19.14% <ø> (ø)
test_python_espnet2 51.47% <ø> (ø)
test_utils 23.10% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sw005320 sw005320 added Recipe ASR Automatic speech recogntion RNNT (RNN) transducer related issue labels Oct 25, 2023
@sw005320 sw005320 added this to the v.202312 milestone Oct 25, 2023
@sw005320
Copy link
Contributor

Thank!
Yes, this is very valuable!
Could you add the results and model link to README.md in the recipe directory?

@mergify mergify bot added the README label Oct 25, 2023
@zuazo
Copy link
Contributor Author

zuazo commented Oct 25, 2023

Sure! I added the results of both the Conformer5 and the Transducer-Conformer5 to the README.

@sw005320 sw005320 merged commit c002f05 into espnet:master Oct 25, 2023
25 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ASR Automatic speech recogntion ESPnet2 README Recipe RNNT (RNN) transducer related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants