-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating Pruned Fast RNNT with Transducer + new recipe for mTEDx dataset #1465
Open
Anwarvic
wants to merge
36
commits into
speechbrain:develop
Choose a base branch
from
Anwarvic:pruned_fast_rnnt
base: develop
Could not load branches
Branch not found: {{ refName }}
Could not load tags
Nothing to show
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
608e1de
added script to prepare mTEDx dataset
26dddf5
added an asr transducer training file for mTEDx recipe
1a6fbbe
added jointer network to be used with the pruned loss of Fast RNNT
e9e6241
added pruned-loss to the losses script
af2c3b3
created simple beam searcher for the pruned loss; just the same as Tr…
9e6e1ee
added a recipe for creating a tokenizer on mTEDx dataset
a2073fa
added a recipe for creating an RNN language model on mTEDx-French dat…
22ec024
added a recipe for creating an RNN language model on mTEDx-French dat…
b74a424
added yaml file for training ASR transducer on mTEDx
61ccae8
added yaml file for training ASR transducer on mTEDx
0ca6e7a
Merge remote-tracking branch 'upstream/develop' into 'pruned_fast_rnnt'
78c9008
added README file for mTEDx recipe
Anwarvic f9b9e03
Merge branch 'speechbrain:develop' into pruned_fast_rnnt
Anwarvic 900c261
Merge branch 'speechbrain:develop' into pruned_fast_rnnt
Anwarvic 4e38371
Merge branch 'speechbrain:develop' into pruned_fast_rnnt
Anwarvic 2d60e5e
updated Transducer recipes + added README
Anwarvic bde66d5
updated Transducer recipes + added README
Anwarvic 112b688
added CTC recipes
Anwarvic c5cbe1f
updated files with latest updates
Anwarvic 2974d3a
Merge branch 'pruned_fast_rnnt' of https://github.com/Anwarvic/speech…
Anwarvic eb37ab2
updated scripts with latest updates
Anwarvic cddef0a
fixed pre-commit erorrs
Anwarvic 6b2e8f5
fixed pre-commit erorrs
Anwarvic 0ea78d7
added recipes yaml files to tests/recipes.csv
Anwarvic 3960a4e
fixed the un-used dnn_neurons variable in train_wav2vec.yaml file
Anwarvic f76f0a1
pre-commit passed successfully
Anwarvic 9f36769
updated transducer configs in the other dataset recipes to match the …
Anwarvic 405bcee
updated transducer configs in the other dataset recipes to match the …
Anwarvic 82d450b
added needed README files for mTEDx recipes
Anwarvic 1bf6988
changed use_torchaudio flag in Transducer recipes README all across d…
Anwarvic b71c2d7
fixed wrong pths in tests/recipes.csv
Anwarvic 9fd2564
added CTC models to CTC README of mTEDx recipe
Anwarvic dd41227
fixed merged issues in tests/recipes.csv
Anwarvic 209a60a
minor changes in README file
Anwarvic f22f144
removed unused variables in conf files in mTEDx recipes
Anwarvic b2aedf5
fixed the naming issue for transducer recipe
Anwarvic File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# mTEDx ASR with CTC models. | ||
This folder contains the scripts to train a wav2vec based system using mTEDx. | ||
You can train either a single-language wav2vec model or multilingual | ||
wav2vec model. Before running this recipe, make sure to read this | ||
[README](../../README.md) file first. | ||
|
||
**Note:**\ | ||
Wav2vec model used in this recipe is pre-trained on the French language. | ||
In order to use another language, don't forget to change the `wav2vec2_hub` | ||
in the `train_wav2vec.yaml` YAML file. | ||
|
||
|
||
# How to run | ||
|
||
To train a single-language wav2vec model, run: | ||
```bash | ||
$ python train.py hparams/train_wav2vec.yaml | ||
``` | ||
|
||
To train a multilingual wav2vec model, run: | ||
```bash | ||
$ python train.py hparams/train_xlsr.yaml | ||
``` | ||
|
||
# Results | ||
|
||
<table> | ||
<thead> | ||
<tr> | ||
<th>Release</th> | ||
<th>hyperparams file</th> | ||
<th>Val. CER</th> | ||
<th>Val. WER</th> | ||
<th colspan=4 style="text-align:center">Test WER</th> | ||
<th>Model link</th> | ||
<th>GPUs</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td>2022-08-10</td> | ||
<td>train_wav2vec.yaml</td> | ||
<td>GS: 4.49</td> | ||
<td>GS: 10.66</td> | ||
<td>GS: es-> -</td> | ||
<td>GS: fr-> 12.59</td> | ||
<td>GS: pt-> -</td> | ||
<td>GS: it-> -</td> | ||
<td>Not Available</td> | ||
<td>4xV100 32GB</td> | ||
</tr> | ||
<tr> | ||
<td>2022-08-10</td> | ||
<td>train_xlsr.yaml</td> | ||
<td>GS(avg.): 5.87</td> | ||
<td>GS(avg.): 15.24</td> | ||
<td>GS: es-> 14.72</td> | ||
<td>GS: fr-> 17.72</td> | ||
<td>GS: pt-> 17.11</td> | ||
<td>GS: it-> 17.87</td> | ||
<td>Not Available</td> | ||
<td>4xV100 32GB</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
|
||
|
||
|
||
|
||
# **About SpeechBrain** | ||
- Website: https://speechbrain.github.io/ | ||
- Code: https://github.com/speechbrain/speechbrain/ | ||
- HuggingFace: https://huggingface.co/speechbrain/ | ||
|
||
|
||
# **Citing SpeechBrain** | ||
Please, cite SpeechBrain if you use it for your research or business. | ||
|
||
```bibtex | ||
@misc{speechbrain, | ||
title={{SpeechBrain}: A General-Purpose Speech Toolkit}, | ||
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio}, | ||
year={2021}, | ||
eprint={2106.04624}, | ||
archivePrefix={arXiv}, | ||
primaryClass={eess.AS}, | ||
note={arXiv:2106.04624} | ||
} | ||
``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be
fast-rnnt
? I get a 404 on PyPI when checking forFastRNNT
.