speechbrain · Anwarvic · Jun 22, 2022 · Jun 22, 2022 · Jun 22, 2022 · Jun 22, 2022
diff --git a/recipes/CommonVoice/ASR/transducer/README.md b/recipes/CommonVoice/ASR/transducer/README.md
@@ -2,14 +2,29 @@
 This folder contains scripts necessary to run an ASR experiment with the CommonVoice dataset: [CommonVoice Homepage](https://commonvoice.mozilla.org/)
 
 # Extra-Dependencies
-This recipe support two implementation of Transducer loss, see `use_torchaudio` arg in Yaml file:
-1- Transducer loss from torchaudio (if torchaudio version >= 0.10.0) (Default)
-2- Speechbrain Implementation using Numba lib. (this allow you to have a direct access in python to the Transducer loss implementation)
-Note: Before running this recipe, make sure numba is installed. Otherwise, run:
+This recipe supports three implementations of the transducer loss, see
+`framework` arg in the yaml file:
+1. Transducer loss from torchaudio (this requires torchaudio version >= 0.10.0)
+(Default).
+2. Speechbrain implementation using Numba. To use it, please set
+`framework=speechbrain` in the yaml file. This version is implemented within
+SpeechBrain and  allows you to directly access the python code of the
+transducer loss (and directly modify it if needed).
+3. FastRNNT (pruned / unpruned) loss function.
+  - To use the un-pruned loss function, please set `framework=fastrnnt`.
+  - To use the pruned loss function, please change the whole `transducer_cost`
+  yaml variable.
+
+If you are planning to use speechbrain RNNT loss function, install `numba`:
 ```
 pip install numba
 ```
 
+If you are planning to use FastRNNT loss function, install `FastRNNT`:
+```
+pip install FastRNNT
+```
+
 # How to run
 python train.py hparams/{hparam_file}.py
 

diff --git a/recipes/CommonVoice/ASR/transducer/hparams/train_fr.yaml b/recipes/CommonVoice/ASR/transducer/hparams/train_fr.yaml
@@ -183,7 +183,7 @@ log_softmax: !new:speechbrain.nnet.activations.Softmax
    apply_log: True
 
 transducer_cost: !name:speechbrain.nnet.losses.transducer_loss
-   use_torchaudio: True
+   framework: torchaudio
    blank_index: !ref <blank_index>
 
 # for MTL

diff --git a/recipes/LibriSpeech/ASR/transducer/README.md b/recipes/LibriSpeech/ASR/transducer/README.md
@@ -4,15 +4,29 @@ Before running this recipe, make sure numba is installed (pip install numba)
 You can download LibriSpeech at http://www.openslr.org/12
 
 # Extra-Dependencies
-This recipe supports two implementations of the transducer loss, see `use_torchaudio` arg in the yaml file:
-1. Transducer loss from torchaudio (this requires torchaudio version >= 0.10.0) (Default).
-2. Speechbrain implementation using Numba. To use it, please set `use_torchaudio=False` in the yaml file. This version is implemented within SpeechBrain and  allows you to directly access the python code of the transducer loss (and directly modify it if needed).
-
-Note: Before running this recipe, make sure numba is installed. Otherwise, run:
+This recipe supports three implementations of the transducer loss, see
+`framework` arg in the yaml file:
+1. Transducer loss from torchaudio (this requires torchaudio version >= 0.10.0)
+(Default).
+2. Speechbrain implementation using Numba. To use it, please set
+`framework=speechbrain` in the yaml file. This version is implemented within
+SpeechBrain and  allows you to directly access the python code of the
+transducer loss (and directly modify it if needed).
+3. FastRNNT (pruned / unpruned) loss function.
+  - To use the un-pruned loss function, please set `framework=fastrnnt`.
+  - To use the pruned loss function, please change the whole `transducer_cost`
+  yaml variable.
+
+If you are planning to use speechbrain RNNT loss function, install `numba`:
 ```
 pip install numba
 ```
 
+If you are planning to use FastRNNT loss function, install `FastRNNT`:
+```
+pip install FastRNNT
+```
+
 # How to run it
 python train.py train/train.yaml
 

diff --git a/recipes/LibriSpeech/ASR/transducer/hparams/train.yaml b/recipes/LibriSpeech/ASR/transducer/hparams/train.yaml
@@ -190,7 +190,7 @@ log_softmax: !new:speechbrain.nnet.activations.Softmax
 
 transducer_cost: !name:speechbrain.nnet.losses.transducer_loss
    blank_index: !ref <blank_index>
-   use_torchaudio: True
+   framework: torchaudio
 
 # This is the RNNLM that is used according to the Huggingface repository
 # NB: It has to match the pre-trained RNNLM!!

diff --git a/recipes/TIMIT/ASR/transducer/README.md b/recipes/TIMIT/ASR/transducer/README.md
@@ -4,14 +4,29 @@ TIMIT is a speech dataset available from LDC: https://catalog.ldc.upenn.edu/LDC9
 
 
 # Extra-Dependencies
-This recipe support two implementation of Transducer loss, see `use_torchaudio` arg in Yaml file:
-1- Transducer loss from torchaudio (if torchaudio version >= 0.10.0) (Default)
-2- Speechbrain Implementation using Numba lib. (this allow you to have a direct access in python to the Transducer loss implementation)
-Note: Before running this recipe, make sure numba is installed. Otherwise, run:
+This recipe supports three implementations of the transducer loss, see
+`framework` arg in the yaml file:
+1. Transducer loss from torchaudio (this requires torchaudio version >= 0.10.0)
+(Default).
+2. Speechbrain implementation using Numba. To use it, please set
+`framework=speechbrain` in the yaml file. This version is implemented within
+SpeechBrain and  allows you to directly access the python code of the
+transducer loss (and directly modify it if needed).
+3. FastRNNT (pruned / unpruned) loss function.
+  - To use the un-pruned loss function, please set `framework=fastrnnt`.
+  - To use the pruned loss function, please change the whole `transducer_cost`
+  yaml variable.
+
+If you are planning to use speechbrain RNNT loss function, install `numba`:
 ```
 pip install numba
 ```
 
+If you are planning to use FastRNNT loss function, install `FastRNNT`:
+```
+pip install FastRNNT
+```
+
 # How to run
 Update the path to the dataset in the yaml config file and run the following.
 ```

diff --git a/recipes/TIMIT/ASR/transducer/hparams/train.yaml b/recipes/TIMIT/ASR/transducer/hparams/train.yaml
@@ -150,7 +150,7 @@ output: !new:speechbrain.nnet.linear.Linear
 #    apply_log: True
 
 compute_cost: !name:speechbrain.nnet.losses.transducer_loss
-    use_torchaudio: True
+    framework: torchaudio
     blank_index: !ref <blank_index>
 
 model: !new:torch.nn.ModuleList [[
@@ -216,7 +216,7 @@ train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
 
 transducer_stats: !name:speechbrain.utils.metric_stats.MetricStats
     metric: !name:speechbrain.nnet.losses.transducer_loss
-        use_torchaudio: True
+        framework: torchaudio
         blank_index: !ref <blank_index>
         reduction: none
 

diff --git a/recipes/TIMIT/ASR/transducer/hparams/train_wav2vec.yaml b/recipes/TIMIT/ASR/transducer/hparams/train_wav2vec.yaml
@@ -133,7 +133,7 @@ output: !new:speechbrain.nnet.linear.Linear
 #    apply_log: True
 
 compute_cost: !name:speechbrain.nnet.losses.transducer_loss
-    use_torchaudio: True
+    framework: torchaudio
     blank_index: !ref <blank_index>
 
 model: !new:torch.nn.ModuleList [[
@@ -205,7 +205,7 @@ train_logger: !new:speechbrain.utils.train_logger.FileTrainLogger
 
 transducer_stats: !name:speechbrain.utils.metric_stats.MetricStats
     metric: !name:speechbrain.nnet.losses.transducer_loss
-        use_torchaudio: True
+        framework: torchaudio
         blank_index: !ref <blank_index>
         reduction: none
 

diff --git a/recipes/mTEDx/ASR/CTC/README.md b/recipes/mTEDx/ASR/CTC/README.md
@@ -0,0 +1,89 @@
+# mTEDx ASR with CTC models.
+This folder contains the scripts to train a wav2vec based system using mTEDx.
+You can train either a single-language wav2vec model or multilingual
+wav2vec model. Before running this recipe, make sure to read this
+[README](../../README.md) file first.
+
+**Note:**\
+Wav2vec model used in this recipe is pre-trained on the French language.
+In order to use another language, don't forget to change the `wav2vec2_hub`
+in the `train_wav2vec.yaml` YAML file.
+
+
+# How to run
+
+To train a single-language wav2vec model, run:
+```bash
+$ python train.py hparams/train_wav2vec.yaml
+```
+
+To train a multilingual wav2vec model, run:
+```bash
+$ python train.py hparams/train_xlsr.yaml
+```
+
+# Results
+
+<table>
+    <thead>
+        <tr>
+            <th>Release</th>
+            <th>hyperparams file</th>
+            <th>Val. CER</th>
+            <th>Val. WER</th>
+            <th colspan=4 style="text-align:center">Test WER</th>
+            <th>Model link</th>
+            <th>GPUs</th>
+        </tr>
+    </thead>
+    <tbody>
+      <tr>
+            <td>2022-08-10</td>
+            <td>train_wav2vec.yaml</td>
+            <td>GS: 4.49</td>
+            <td>GS: 10.66</td>
+            <td>GS: es-> -</td>
+            <td>GS: fr-> 12.59</td>
+            <td>GS: pt-> -</td>
+            <td>GS: it-> -</td>
+            <td>Not Available</td>
+            <td>4xV100 32GB</td>
+        </tr>
+        <tr>
+            <td>2022-08-10</td>
+            <td>train_xlsr.yaml</td>
+            <td>GS(avg.): 5.87</td>
+            <td>GS(avg.): 15.24</td>
+            <td>GS: es-> 14.72</td>
+            <td>GS: fr-> 17.72</td>
+            <td>GS: pt-> 17.11</td>
+            <td>GS: it-> 17.87</td>
+            <td>Not Available</td>
+            <td>4xV100 32GB</td>
+        </tr>
+    </tbody>
+</table>
+
+
+
+
+# **About SpeechBrain**
+- Website: https://speechbrain.github.io/
+- Code: https://github.com/speechbrain/speechbrain/
+- HuggingFace: https://huggingface.co/speechbrain/
+
+
+# **Citing SpeechBrain**
+Please, cite SpeechBrain if you use it for your research or business.
+
+```bibtex
+@misc{speechbrain,
+  title={{SpeechBrain}: A General-Purpose Speech Toolkit},
+  author={Mirco Ravanelli and Titouan Parcollet and Peter Plantinga and Aku Rouhe and Samuele Cornell and Loren Lugosch and Cem Subakan and Nauman Dawalatabad and Abdelwahab Heba and Jianyuan Zhong and Ju-Chieh Chou and Sung-Lin Yeh and Szu-Wei Fu and Chien-Feng Liao and Elena Rastorgueva and François Grondin and William Aris and Hwidong Na and Yan Gao and Renato De Mori and Yoshua Bengio},
+  year={2021},
+  eprint={2106.04624},
+  archivePrefix={arXiv},
+  primaryClass={eess.AS},
+  note={arXiv:2106.04624}
+}
+```