Add transducer conformer configuration to commonvoice recipe #5503

zuazo · 2023-10-25T13:00:15Z

What?

This PR introduces a new transducer conformer configuration for the CommonVoice recipe:

Utilizes the Conformer architecture for the encoder (based on the train_asr_conformer5.yaml provided).
Incorporates the LSTM-based transducer decoder.

We tested the model in an NVIDIA A100 GPU, but smaller GPUs can probably be used.

Scores

We have tested the model with the Basque language (eu) only.

Common Voice 14 test set:

Model	Common Voice 14 CER	Common Voice 14 WER
`train_asr_conformer5.yaml`	1.6	8.0
`train_asr_transducer_conformer5.yaml`	1.7	8.0

AhoMyTTS (a private difficult dataset):

Model	AhoMyTTS CER	AhoMyTTS WER
`train_asr_conformer5.yaml`	6.0	22.21
`train_asr_transducer_conformer5.yaml`	6.3	21.46

Why?

The scores are similar to the provided Conformer5 configuration. But I thought you may be interested in including it, offering users an alternative method to train ASR models on the CommonVoice dataset.

The model: https://huggingface.co/espnet/zuazo_commonvoice_asr_train_asr_transducer_conformer5_raw_eu_bpe150_sp

Feedback and suggestions are welcome!

- Add a transducer conformer training configuration `train_asr_transducer_conformer5.yaml`. - Added a decoding configuration for transducer in `decode_transducer.yaml`. - The model has been trained and tested on NVIDIA A100 GPU.

codecov · 2023-10-25T13:23:50Z

Codecov Report

Merging #5503 (1885236) into master (76b318e) will increase coverage by 0.00%.
Report is 3 commits behind head on master.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #5503   +/-   ##
=======================================
  Coverage   75.40%   75.40%           
=======================================
  Files         709      709           
  Lines       65361    65361           
=======================================
+ Hits        49287    49288    +1     
+ Misses      16074    16073    -1

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`65.67% <ø> (ø)`
test_integration_espnet2	`48.68% <ø> (+<0.01%)`	⬆️
test_python_espnet1	`19.14% <ø> (ø)`
test_python_espnet2	`51.47% <ø> (ø)`
test_utils	`23.10% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

see 1 file with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

sw005320 · 2023-10-25T13:40:11Z

Thank!
Yes, this is very valuable!
Could you add the results and model link to README.md in the recipe directory?

zuazo · 2023-10-25T15:35:54Z

Sure! I added the results of both the Conformer5 and the Transducer-Conformer5 to the README.

mergify bot added the ESPnet2 label Oct 25, 2023

sw005320 added Recipe ASR Automatic speech recogntion RNNT (RNN) transducer related issue labels Oct 25, 2023

sw005320 added this to the v.202312 milestone Oct 25, 2023

zuazo added 2 commits October 25, 2023 17:30

Add transducer conformer5 results to the README

7a649bd

Add (previous) conformer5 results to the README

1885236

mergify bot added the README label Oct 25, 2023

sw005320 merged commit c002f05 into espnet:master Oct 25, 2023
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add transducer conformer configuration to commonvoice recipe #5503

Add transducer conformer configuration to commonvoice recipe #5503

zuazo commented Oct 25, 2023 •

edited

codecov bot commented Oct 25, 2023 •

edited

sw005320 commented Oct 25, 2023

zuazo commented Oct 25, 2023

Add transducer conformer configuration to commonvoice recipe #5503

Add transducer conformer configuration to commonvoice recipe #5503

Conversation

zuazo commented Oct 25, 2023 • edited

What?

Scores

Why?

codecov bot commented Oct 25, 2023 • edited

Codecov Report

sw005320 commented Oct 25, 2023

zuazo commented Oct 25, 2023

zuazo commented Oct 25, 2023 •

edited

codecov bot commented Oct 25, 2023 •

edited