Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating Pruned Fast RNNT with Transducer + new recipe for mTEDx dataset #1465

Open
wants to merge 36 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
608e1de
added script to prepare mTEDx dataset
Jun 22, 2022
26dddf5
added an asr transducer training file for mTEDx recipe
Jun 22, 2022
1a6fbbe
added jointer network to be used with the pruned loss of Fast RNNT
Jun 22, 2022
e9e6241
added pruned-loss to the losses script
Jun 22, 2022
af2c3b3
created simple beam searcher for the pruned loss; just the same as Tr…
Jun 22, 2022
9e6e1ee
added a recipe for creating a tokenizer on mTEDx dataset
Jun 22, 2022
a2073fa
added a recipe for creating an RNN language model on mTEDx-French dat…
Jun 22, 2022
22ec024
added a recipe for creating an RNN language model on mTEDx-French dat…
Jun 22, 2022
b74a424
added yaml file for training ASR transducer on mTEDx
Jun 22, 2022
61ccae8
added yaml file for training ASR transducer on mTEDx
Jun 22, 2022
0ca6e7a
Merge remote-tracking branch 'upstream/develop' into 'pruned_fast_rnnt'
Jul 29, 2022
78c9008
added README file for mTEDx recipe
Anwarvic Aug 2, 2022
f9b9e03
Merge branch 'speechbrain:develop' into pruned_fast_rnnt
Anwarvic Aug 12, 2022
900c261
Merge branch 'speechbrain:develop' into pruned_fast_rnnt
Anwarvic Sep 16, 2022
4e38371
Merge branch 'speechbrain:develop' into pruned_fast_rnnt
Anwarvic Sep 18, 2022
2d60e5e
updated Transducer recipes + added README
Anwarvic Sep 19, 2022
bde66d5
updated Transducer recipes + added README
Anwarvic Sep 19, 2022
112b688
added CTC recipes
Anwarvic Sep 19, 2022
c5cbe1f
updated files with latest updates
Anwarvic Sep 19, 2022
2974d3a
Merge branch 'pruned_fast_rnnt' of https://github.com/Anwarvic/speech…
Anwarvic Sep 19, 2022
eb37ab2
updated scripts with latest updates
Anwarvic Sep 19, 2022
cddef0a
fixed pre-commit erorrs
Anwarvic Sep 19, 2022
6b2e8f5
fixed pre-commit erorrs
Anwarvic Sep 19, 2022
0ea78d7
added recipes yaml files to tests/recipes.csv
Anwarvic Sep 19, 2022
3960a4e
fixed the un-used dnn_neurons variable in train_wav2vec.yaml file
Anwarvic Sep 19, 2022
f76f0a1
pre-commit passed successfully
Anwarvic Sep 19, 2022
9f36769
updated transducer configs in the other dataset recipes to match the …
Anwarvic Sep 19, 2022
405bcee
updated transducer configs in the other dataset recipes to match the …
Anwarvic Sep 19, 2022
82d450b
added needed README files for mTEDx recipes
Anwarvic Sep 25, 2022
1bf6988
changed use_torchaudio flag in Transducer recipes README all across d…
Anwarvic Sep 25, 2022
b71c2d7
fixed wrong pths in tests/recipes.csv
Anwarvic Sep 25, 2022
9fd2564
added CTC models to CTC README of mTEDx recipe
Anwarvic Sep 26, 2022
dd41227
fixed merged issues in tests/recipes.csv
Anwarvic Sep 26, 2022
209a60a
minor changes in README file
Anwarvic Sep 26, 2022
f22f144
removed unused variables in conf files in mTEDx recipes
Anwarvic Sep 26, 2022
b2aedf5
fixed the naming issue for transducer recipe
Anwarvic Sep 26, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
23 changes: 19 additions & 4 deletions recipes/CommonVoice/ASR/transducer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,29 @@
This folder contains scripts necessary to run an ASR experiment with the CommonVoice dataset: [CommonVoice Homepage](https://commonvoice.mozilla.org/)

# Extra-Dependencies
This recipe support two implementation of Transducer loss, see `use_torchaudio` arg in Yaml file:
1- Transducer loss from torchaudio (if torchaudio version >= 0.10.0) (Default)
2- Speechbrain Implementation using Numba lib. (this allow you to have a direct access in python to the Transducer loss implementation)
Note: Before running this recipe, make sure numba is installed. Otherwise, run:
This recipe supports three implementations of the transducer loss, see
`framework` arg in the yaml file:
1. Transducer loss from torchaudio (this requires torchaudio version >= 0.10.0)
(Default).
2. Speechbrain implementation using Numba. To use it, please set
`framework=speechbrain` in the yaml file. This version is implemented within
SpeechBrain and allows you to directly access the python code of the
transducer loss (and directly modify it if needed).
3. FastRNNT (pruned / unpruned) loss function.
- To use the un-pruned loss function, please set `framework=fastrnnt`.
- To use the pruned loss function, please change the whole `transducer_cost`
yaml variable.

If you are planning to use speechbrain RNNT loss function, install `numba`:
```
pip install numba
```

If you are planning to use FastRNNT loss function, install `FastRNNT`:
```
pip install FastRNNT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be fast-rnnt? I get a 404 on PyPI when checking for FastRNNT.

```

# How to run
python train.py hparams/{hparam_file}.py

Expand Down
2 changes: 1 addition & 1 deletion recipes/CommonVoice/ASR/transducer/hparams/train_fr.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ log_softmax: !new:speechbrain.nnet.activations.Softmax
apply_log: True

transducer_cost: !name:speechbrain.nnet.losses.transducer_loss
use_torchaudio: True
framework: torchaudio
Anwarvic marked this conversation as resolved.
Show resolved Hide resolved
blank_index: !ref <blank_index>

# for MTL
Expand Down
24 changes: 19 additions & 5 deletions recipes/LibriSpeech/ASR/transducer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,15 +4,29 @@ Before running this recipe, make sure numba is installed (pip install numba)
You can download LibriSpeech at http://www.openslr.org/12

# Extra-Dependencies
This recipe supports two implementations of the transducer loss, see `use_torchaudio` arg in the yaml file:
1. Transducer loss from torchaudio (this requires torchaudio version >= 0.10.0) (Default).
2. Speechbrain implementation using Numba. To use it, please set `use_torchaudio=False` in the yaml file. This version is implemented within SpeechBrain and allows you to directly access the python code of the transducer loss (and directly modify it if needed).

Note: Before running this recipe, make sure numba is installed. Otherwise, run:
This recipe supports three implementations of the transducer loss, see
`framework` arg in the yaml file:
1. Transducer loss from torchaudio (this requires torchaudio version >= 0.10.0)
(Default).
2. Speechbrain implementation using Numba. To use it, please set
`framework=speechbrain` in the yaml file. This version is implemented within
SpeechBrain and allows you to directly access the python code of the
transducer loss (and directly modify it if needed).
3. FastRNNT (pruned / unpruned) loss function.
- To use the un-pruned loss function, please set `framework=fastrnnt`.
- To use the pruned loss function, please change the whole `transducer_cost`
yaml variable.

If you are planning to use speechbrain RNNT loss function, install `numba`:
```
pip install numba
```

If you are planning to use FastRNNT loss function, install `FastRNNT`:
```
pip install FastRNNT
```

# How to run it
python train.py train/train.yaml

Expand Down
2 changes: 1 addition & 1 deletion recipes/LibriSpeech/ASR/transducer/hparams/train.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -190,7 +190,7 @@ log_softmax: !new:speechbrain.nnet.activations.Softmax

transducer_cost: !name:speechbrain.nnet.losses.transducer_loss
blank_index: !ref <blank_index>
use_torchaudio: True
framework: torchaudio

# This is the RNNLM that is used according to the Huggingface repository
# NB: It has to match the pre-trained RNNLM!!
Expand Down
23 changes: 19 additions & 4 deletions recipes/TIMIT/ASR/transducer/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,29 @@ TIMIT is a speech dataset available from LDC: https://catalog.ldc.upenn.edu/LDC9


# Extra-Dependencies
This recipe support two implementation of Transducer loss, see `use_torchaudio` arg in Yaml file:
1- Transducer loss from torchaudio (if torchaudio version >= 0.10.0) (Default)
2- Speechbrain Implementation using Numba lib. (this allow you to have a direct access in python to the Transducer loss implementation)
Note: Before running this recipe, make sure numba is installed. Otherwise, run:
This recipe supports three implementations of the transducer loss, see
`framework` arg in the yaml file:
1. Transducer loss from torchaudio (this requires torchaudio version >= 0.10.0)
(Default).
2. Speechbrain implementation using Numba. To use it, please set
`framework=speechbrain` in the yaml file. This version is implemented within
SpeechBrain and allows you to directly access the python code of the
transducer loss (and directly modify it if needed).
3. FastRNNT (pruned / unpruned) loss function.
- To use the un-pruned loss function, please set `framework=fastrnnt`.
- To use the pruned loss function, please change the whole `transducer_cost`
yaml variable.

If you are planning to use speechbrain RNNT loss function, install `numba`:
```
pip install numba
```

If you are planning to use FastRNNT loss function, install `FastRNNT`:
```
pip install FastRNNT
```

# How to run
Update the path to the dataset in the yaml config file and run the following.
```
Expand Down
4 changes: 2 additions & 2 deletions recipes/TIMIT/ASR/transducer/hparams/train.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ output: !new:speechbrain.nnet.linear.Linear
# apply_log: True

compute_cost: !name:speechbrain.nnet.losses.transducer_loss
use_torchaudio: True
framework: torchaudio
blank_index: !ref <blank_index>

model: !new:torch.nn.ModuleList [[
Expand Down Expand Up @@ -216,7 +216,7 @@ train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger

transducer_stats: !name:speechbrain.utils.metric_stats.MetricStats
metric: !name:speechbrain.nnet.losses.transducer_loss
use_torchaudio: True
framework: torchaudio
blank_index: !ref <blank_index>
reduction: none

Expand Down
4 changes: 2 additions & 2 deletions recipes/TIMIT/ASR/transducer/hparams/train_wav2vec.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ output: !new:speechbrain.nnet.linear.Linear
# apply_log: True

compute_cost: !name:speechbrain.nnet.losses.transducer_loss
use_torchaudio: True
framework: torchaudio
blank_index: !ref <blank_index>

model: !new:torch.nn.ModuleList [[
Expand Down Expand Up @@ -205,7 +205,7 @@ train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger

transducer_stats: !name:speechbrain.utils.metric_stats.MetricStats
metric: !name:speechbrain.nnet.losses.transducer_loss
use_torchaudio: True
framework: torchaudio
blank_index: !ref <blank_index>
reduction: none

Expand Down
89 changes: 89 additions & 0 deletions recipes/mTEDx/ASR/CTC/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# mTEDx ASR with CTC models.
This folder contains the scripts to train a wav2vec based system using mTEDx.
You can train either a single-language wav2vec model or multilingual
wav2vec model. Before running this recipe, make sure to read this
[README](../../README.md) file first.

**Note:**\
Wav2vec model used in this recipe is pre-trained on the French language.
In order to use another language, don't forget to change the `wav2vec2_hub`
in the `train_wav2vec.yaml` YAML file.


# How to run

To train a single-language wav2vec model, run:
```bash
$ python train.py hparams/train_wav2vec.yaml
```

To train a multilingual wav2vec model, run:
```bash
$ python train.py hparams/train_xlsr.yaml
```

# Results

<table>
<thead>
<tr>
<th>Release</th>
<th>hyperparams file</th>
<th>Val. CER</th>
<th>Val. WER</th>
<th colspan=4 style="text-align:center">Test WER</th>
<th>Model link</th>
<th>GPUs</th>
</tr>
</thead>
<tbody>
<tr>
<td>2022-08-10</td>
<td>train_wav2vec.yaml</td>
<td>GS: 4.49</td>
<td>GS: 10.66</td>
<td>GS: es-> -</td>
<td>GS: fr-> 12.59</td>
<td>GS: pt-> -</td>
<td>GS: it-> -</td>
<td>Not Available</td>
<td>4xV100 32GB</td>
</tr>
<tr>
<td>2022-08-10</td>
<td>train_xlsr.yaml</td>
<td>GS(avg.): 5.87</td>
<td>GS(avg.): 15.24</td>
<td>GS: es-> 14.72</td>
<td>GS: fr-> 17.72</td>
<td>GS: pt-> 17.11</td>
<td>GS: it-> 17.87</td>
<td>Not Available</td>
<td>4xV100 32GB</td>
</tr>
</tbody>
</table>




# **About SpeechBrain**
- Website: https://speechbrain.github.io/
- Code: https://github.com/speechbrain/speechbrain/
- HuggingFace: https://huggingface.co/speechbrain/


# **Citing SpeechBrain**
Please, cite SpeechBrain if you use it for your research or business.

```bibtex
@misc{speechbrain,
title={{SpeechBrain}: A General-Purpose Speech Toolkit},
author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
year={2021},
eprint={2106.04624},
archivePrefix={arXiv},
primaryClass={eess.AS},
note={arXiv:2106.04624}
}
```