Add transducer conformer configuration to commonvoice recipe #5503
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What?
This PR introduces a new transducer conformer configuration for the CommonVoice recipe:
train_asr_conformer5.yaml
provided).We tested the model in an NVIDIA A100 GPU, but smaller GPUs can probably be used.
Scores
We have tested the model with the Basque language (
eu
) only.Common Voice 14 test set:
train_asr_conformer5.yaml
train_asr_transducer_conformer5.yaml
AhoMyTTS (a private difficult dataset):
train_asr_conformer5.yaml
train_asr_transducer_conformer5.yaml
Why?
The scores are similar to the provided Conformer5 configuration. But I thought you may be interested in including it, offering users an alternative method to train ASR models on the CommonVoice dataset.
The model: https://huggingface.co/espnet/zuazo_commonvoice_asr_train_asr_transducer_conformer5_raw_eu_bpe150_sp
Feedback and suggestions are welcome!