Integrating Pruned Fast RNNT with Transducer + new recipe for mTEDx dataset #1465

Anwarvic · 2022-06-22T16:14:13Z

Hi all,

In this pull request, I'm proposing basically two things:

A new (ASR, Tokenizer, LM) recipe for mTEDx dataset used mainly for speech recognition & speech translation.
An integration of the pruned-loss of Fast RNNT to the transducer recipe.

The following steps are a way to use the new changes and make merging this PR as easy as possible:

Download the mTEDx data. Check this README file:
Install Fast-RNNT:
```
pip install fast_rnnt
```
Train a tokenizer & language model using the data. For reproducibility, download the tokenizer (from here) and RNNLM (from here).
Now, open the ./speechbrain/recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml file and set these variables:
```
pretrained_tokenizer_path:
pretrained_lm_path:
data_folder:
```

Now, to train the Transducer on mTEDx-French using pruned-loss function, run the following command

python ./speechbrain/recipes/mTEDx/ASR/Transducer/train.py \
           ./speechbrain/recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml

The resulting model should be found in ./results/mTEDx_fr/CASCADE/CRDNN_pruned

EDIT (18/09/2022)

Starting from here, I will mention the most recent updates to this PR:

Added README files for mTEDx recipes.
Changed the signature for transducer_loss function to accept framework argument instead of use_torchaudio.
Changed the use_torchaudio variable to framework in the transducer YAML files.
Added warmup mechanism for Fast RNNT as suggested here.
Added mTEDx CTC recipes.
More documentations & comments.

…ansducerBeamSearcher with one overridden function

…aset

mravanelli · 2022-06-22T16:41:47Z

Thank you @Anwarvic, could you please report there the current results? In particular, it is interesting to see the performance/speed comparison between torchaudio loss and the FastRNNT (at least with your current implementation).

Anwarvic · 2022-06-23T08:18:18Z

Hi @mravanelli, the following is a comparison between the torchaudio.rnnt_loss and the fast_rnnt.pruned_loss using different prune ranges: 5, 40, and 115 using my current implementation on the mTEDx-Fr dataset with respect to:

Training Loss:

Validation Loss:

Validation CER (Character-Error-Rate):

All these four models were trained using the same architecture with the same hyper-parameters. The only difference was the batch sizes as shown in the following table:

func	batch_size
torchaudio	8
pruned_5	20
pruned_40	16
pruned_115	8

Also, I don't know if it's fair to compare the pruned_loss with the torchaudio.rnnt_loss since the former is a summation of two losses as seen in:
https://github.com/Anwarvic/speechbrain/blob/61ccae88f444379f38d74ed5855250d6978adf70/speechbrain/nnet/losses.py#L195

danpovey · 2022-06-23T08:27:26Z

We generally give the pruned loss a scale of zero in the loss function during a warmup period, to give time for the simple loss to learn the alignment. The length of the warmup period depends on the learning rate schedule, normally it would be about the same amount of time as it takes for the simple loss, when normalized by the number of frames, to go below 0.5 or so.

mravanelli · 2022-06-24T15:43:23Z

Thank you @danpovey! @Anwarvic, did you try the warmup thing?

Anwarvic · 2022-06-24T19:17:22Z

Yes, @mravanelli! I'm just waiting for a few more epochs before reporting the results.

Anwarvic · 2022-06-27T07:27:47Z

Hi @danpovey @mravanelli, the model's CER & WER didn't get any better even after using warmup. I trained two models on the same dataset using the same hyper-parameters (prune_range=5); one uses warmup (same implementation as here) while the other doesn't (same implementation as the one in this PR). Here is a comparison between their training loss:

Both models' WER didn't get any better, always stuck at 99.99.

danpovey · 2022-06-27T07:36:49Z

What warmup schedule did you use, i.e. how many batches does the warmup last? The setup of ours where warmup takes 3000 batches used a reworked conformer model that takes a "warmup" option, and a different optimizer, and it converges initially about 10 times faster than the conventional conformer. You will likely need to warm up for way longer if you are using the regular conformer and the normal learning-rate schedule for transformers. You should notice a kind of knee in the loss function where it starts to learn the alignment and starts to decrease fairly rapidly. (I.e. in your normal model training). That should tell you how long the warmup needs to be, approximately.

Anwarvic · 2022-06-27T08:16:39Z

@danpovey, really appreciate your quick responses. And sorry about that. I should've provided more details.

What warmup schedule did you use, i.e. how many batches does the warmup last?

The same one as implemented here, which can be summarized in the following equation:

$$ loss = 0.5 * \text{simple loss} + \text{pruned scale} * \text{pruned loss} $$
Where

$$ \text{pruned scale} = \begin{cases} \text{0,} &\quad\text{if curr epoch < warmup epochs}\\ \text{0.1,} &\quad\text{if warmup epochs < curr epoch < 2 * warmup epochs }\\ \text{1.0,} &\quad\text{if curr epoch > 2 * warmup epochs}\\ \end{cases}$$

In this implementation, I used $ \text{warmup epochs}=2$ which is around $3000$ steps.

You will likely need to warm up for way longer if you are using the regular conformer and the normal learning-rate schedule for transformers.

In my implementation, I'm not using conformers. I'm using CRDNN encoder which is basically a combination of CNNs, RNNs, and DNNs. I'm using torch.Adadelta optimizer with a constant learning rate of $1$. Do you think this till applies?

You should notice a kind of knee in the loss function where it starts to learn the alignment and starts to decrease fairly rapidly.

Based on my understanding, the plan now is like the following:

Train the un-warmed model till convergence (till there is a knee in the training loss curve)
Use this as the number of steps for warmup.

Am I correct?

danpovey · 2022-06-27T22:14:13Z

Instead of "train the un-warmed model..", what I meant is, presumably you had some baseline model (without the pruned RNN-T loss) that was converging. I meant that model. But I would say set the warmup period to, say, 8 epochs, based on the CER curve for your baseline model. Your baseline model seems to be getting to an OK point in 4 or 5 epochs; but you are using larger batch sizes with the pruned model, and to some extent the initial convergence depends on the number of batches and not just the amount of data seen, so I'm suggesting to increase the number of warmup epochs because of that. In a setup with more data you wouldn't need to warm up for so long.

BTW, the lm_scale part of the un-pruned loss inflates the loss values so you can't exactly compare the losses with the baseline. If you could split apart the pruned loss, though, and display that separately, that should be comparable to the baseline RNN-T loss.
When we display losses we normalize by the total number of frames, which makes them a little easier to interpret.

danpovey · 2022-06-27T22:18:11Z

By the way, I had a look at the adadelta paper at some point and could not make much sense of it, it seems to me an extremely strange update rule. I am a little concerned that it may interact badly with a setup where some parts of the loss function are only introduced after a delay. That is, it's not clear to me that it would behave in a reasonable way if your gradients are set to zero for some initial periiod for some of the parameters. You could perhaps have a look at the update and try to convince yourself one way or another. But adagrad may be safer here, you'd have to choose a learning rate schedule. (Presumably the transformer one, with warmup period, is not necessary here since this is not a transformer model.)

anautsch

Hi @Anwarvic hope you are doing well. I've minor comments only. @TParcollet worked more closely with you on this one. Are more contributions pending?

recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml

recipes/mTEDx/LM/RNNLM/hparams/train.yaml

Anwarvic · 2022-09-14T13:24:21Z

Hi, @anautsch! Yes, there are some that I was intending to add but didn't have the time. I think I can add them to this PR by this weekend inshallah.

anautsch

Hi @Anwarvic thank you for the updates !
( appreciateing the edit note in your original post :)

It's more comments & curiosity from my end.

There were some warnings/errors in testing.

Please run pre-commit run --all-files to let it fix linters automatically.

From the testing log:

=========================== short test summary info ============================
FAILED tests/consistency/test_docstrings.py::test_recipe_list - AssertionErro...
FAILED tests/consistency/test_recipe.py::test_recipe_list - AssertionError: a...

Checking ./speechbrain/nnet/transducer/transducer_joint.py...
	ERROR: The class Fast_RNNT_Joiner(nn.Module) in ./speechbrain/nnet/transducer/transducer_joint.py has no docstring. Please write it. For more info, see tests/consistency/DOCSTRINGS.md

----------------------------- Captured stdout call -----------------------------
	ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/ASR/CTC/hparams/train_wav2vec.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train_unpruned.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/LM/RNNLM/hparams/train.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/Tokenizer/hparams/1K_unigram_subword_bpe.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train_wav2vec_pruned.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/ASR/CTC/hparams/train_xlsr.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md

(it skipped doc, unit & integration tests, they'll still have to be run after attending the consistency checks.)

Are you satisfied with the (pre)trained models ?

recipes/mTEDx/ASR/CTC/hparams/train_wav2vec.yaml

recipes/mTEDx/LM/RNNLM/hparams/train.yaml

Anwarvic · 2022-09-19T18:27:46Z

Hi @anautsch , I think all above problems are resolved now.

…new signature for transducer_loss() function

anautsch

Thanks for the updates @Anwarvic !

There's more roadwork ahead :)

The Tokenizer recipe has no Readme.

It might be an upper case issue here 'Transducer' (?)

	ERROR: The file recipes/mTEDx/ASR/transducer/train.py listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/train.py listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/train.py listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/train_wav2vec.py listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train.yaml listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train_pruned.yaml listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train_unpruned.yaml listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train_wav2vec_pruned.yaml listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!

...

	ERROR: The recipe recipe0147 does not contain a Script_file. Please add it!
	ERROR: The recipe recipe0147 does not contain a Hparam_file. Please add it!
	ERROR: The recipe recipe0147 does not contain a Readme_file. Please add it!
	ERROR: The recipe recipe0148 does not contain a Script_file. Please add it!
	ERROR: The recipe recipe0148 does not contain a Hparam_file. Please add it!
	ERROR: The recipe recipe0148 does not contain a Readme_file. Please add it!
	ERROR: The recipe recipe0149 does not contain a Script_file. Please add it!
	ERROR: The recipe recipe0149 does not contain a Hparam_file. Please add it!
	ERROR: The recipe recipe0149 does not contain a Readme_file. Please add it!
	ERROR: The recipe recipe0150 does not contain a Script_file. Please add it!
	ERROR: The recipe recipe0150 does not contain a Hparam_file. Please add it!
	ERROR: The recipe recipe0150 does not contain a Readme_file. Please add it!

...

E               FileNotFoundError: [Errno 2] No such file or directory: 'recipes/mTEDx/ASR/transducer/README.md'
E       FileNotFoundError: [Errno 2] No such file or directory: 'recipes/mTEDx/ASR/transducer/hparams/train.yaml'

Please remove unused variables (or use them)

Checking recipes/mTEDx/Tokenizer/hparams/1K_unigram_subword_bpe.yaml...
	ERROR: variable "train_log" not used in recipes/mTEDx/Tokenizer/train.py!
Checking recipes/mTEDx/LM/RNNLM/hparams/train.yaml...
	ERROR: variable "blank_index" not used in recipes/mTEDx/LM/RNNLM/train.py!

recipes/CommonVoice/ASR/transducer/hparams/train_fr.yaml

speechbrain/nnet/losses.py

recipes/mTEDx/ASR/CTC/README.md

recipes/mTEDx/ASR/Transducer/README.md

…ifferent datasets

Anwarvic · 2022-09-26T00:27:40Z

@anautsch, sorry for the late reply. I have resolved all issued mentioned above. Please, feel free to get back to me once other issues occur

anautsch · 2022-09-26T11:09:34Z

Hi @Anwarvic there were errors with the consistency checking.

tests/consistency/test_docstrings.py .                                   [ 16%]
tests/consistency/test_recipe.py FFFF                                    [ 83%]
tests/consistency/test_yaml.py F                                         [100%]

Please run: pytest tests/consistency for details.
Check out Mirco's readme for cues.

Some recipe pointers are not set correctly (case sensitive path perhaps), and there are unused variables.
(Nothing big but still relevant.)

Anwarvic · 2022-09-26T14:28:51Z

@anautsch, passed all tests!

anautsch

@Anwarvic there's an integration test error.

tests/integration/ASR_Transducer/example_asr_transducer_experiment.py F  [ 18%]

It's about the interface change you put in

E       TypeError: transducer_loss() got an unexpected keyword argument 'use_torchaudio'

This implies that everyone who is relying on use_torchaudio will have to change all their code when this PR goes through as well. Idk - this might be better for a major version change. @TParcollet @mravanelli any thoughts?

Situation:
@Anwarvic came across that there are more than framework possible to use here, thus proposed a change to:

speechbrain/speechbrain/nnet/losses.py

Line 56 in b2aedf5

'speechbrain' | 'torchaudio' | 'fast_rnnt' (default: torchaudio).

We talked about the impact of these changes here, since @Anwarvic already proposed depending updates to all recipes:
https://github.com/speechbrain/speechbrain/pull/1465/files#r979478055

Yet, this remains somewhat an interface change - how about this: should both (non-mandatory) options be allowed until a major version drops the use_torchaudio flag eventually ?

anautsch · 2022-09-27T08:37:08Z

recipes/mTEDx/Tokenizer/hparams/1K_unigram_subword_bpe.yaml

+# Data files
+data_folder: !PLACEHOLDER
+langs:
+   - !PLACEHOLDER


Adel-Moumen · 2024-01-16T13:03:26Z

Hello @Anwarvic,

Sorry for the very late reply.

We are still looking forward to merging your PR as it is a very nice contribution. Unfortunately, due to the release of many PRs in develop, I'll need to ask you to update your upstream branch to comply with develop and to update your PR accordingly. You can refer to https://colab.research.google.com/drive/1IEPfKRuvJRSjoxu22GZhb3czfVHsAy0s?usp=sharing to guide you through this process.

Could you please confirm with me if you are still okay with finishing this PR? Sorry again.

Best regards,
Adel

Anwarvic · 2024-01-16T13:13:04Z

Hi @Adel-Moumen,

Thank you for the update and no worries about the delay. I appreciate your feedback and not giving up on my PR.

I'm still committed to finishing this PR, and I'll ensure it doesn't conflict with develop branch as you kindly pointed out.

If you don't mind, I will do this by the end of next week (26/01/2024) as I have things on my plate at the moment. Would that be ok?

Thanks!

Adel-Moumen · 2024-01-16T13:14:35Z

Hi @Adel-Moumen,

Thank you for the update and no worries about the delay. I appreciate your feedback and not giving up on my PR.

I'm still committed to finishing this PR, and I'll ensure it doesn't conflict with develop branch as you kindly pointed out.

If you don't mind, I will do this by the end of next week (26/01/2024) as I have things on my plate at the moment. Would that be ok?

Thanks!

Would be wonderful. Please ping me if you need some help in the process. :-)

asumagic · 2024-02-05T13:21:18Z

recipes/CommonVoice/ASR/transducer/README.md

 ```
 pip install numba
 ```

+If you are planning to use FastRNNT loss function, install `FastRNNT`:
+```
+pip install FastRNNT


Shouldn't this be fast-rnnt? I get a 404 on PyPI when checking for FastRNNT.

mohamed-anwar-intern added 10 commits June 22, 2022 15:00

added script to prepare mTEDx dataset

608e1de

added an asr transducer training file for mTEDx recipe

26dddf5

added jointer network to be used with the pruned loss of Fast RNNT

1a6fbbe

added pruned-loss to the losses script

e9e6241

created simple beam searcher for the pruned loss; just the same as Tr…

af2c3b3

…ansducerBeamSearcher with one overridden function

added a recipe for creating a tokenizer on mTEDx dataset

9e6e1ee

added a recipe for creating an RNN language model on mTEDx-French dat…

a2073fa

…aset

added a recipe for creating an RNN language model on mTEDx-French dat…

22ec024

…aset

added yaml file for training ASR transducer on mTEDx

b74a424

added yaml file for training ASR transducer on mTEDx

61ccae8

mravanelli requested a review from TParcollet June 22, 2022 16:39

mravanelli added the enhancement New feature or request label Jun 22, 2022

Merge remote-tracking branch 'upstream/develop' into 'pruned_fast_rnnt'

0ca6e7a

Anwarvic mentioned this pull request Aug 2, 2022

AssertionError: assert py.is_contiguous() k2-fsa/fast_rnnt#14

Closed

Anwarvic added 2 commits August 2, 2022 18:47

added README file for mTEDx recipe

78c9008

Merge branch 'speechbrain:develop' into pruned_fast_rnnt

f9b9e03

anautsch reviewed Sep 13, 2022

View reviewed changes

recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml Outdated Show resolved Hide resolved

recipes/mTEDx/LM/RNNLM/hparams/train.yaml Outdated Show resolved Hide resolved

Anwarvic added 2 commits September 16, 2022 18:33

Merge branch 'speechbrain:develop' into pruned_fast_rnnt

900c261

Merge branch 'speechbrain:develop' into pruned_fast_rnnt

4e38371

updated scripts with latest updates

eb37ab2

anautsch reviewed Sep 19, 2022

View reviewed changes

recipes/mTEDx/ASR/CTC/hparams/train_wav2vec.yaml Outdated Show resolved Hide resolved

recipes/mTEDx/LM/RNNLM/hparams/train.yaml Outdated Show resolved Hide resolved

Anwarvic added 5 commits September 19, 2022 13:03

fixed pre-commit erorrs

cddef0a

fixed pre-commit erorrs

6b2e8f5

added recipes yaml files to tests/recipes.csv

0ea78d7

fixed the un-used dnn_neurons variable in train_wav2vec.yaml file

3960a4e

pre-commit passed successfully

f76f0a1

Anwarvic added 2 commits September 19, 2022 16:28

updated transducer configs in the other dataset recipes to match the …

9f36769

…new signature for transducer_loss() function

updated transducer configs in the other dataset recipes to match the …

405bcee

…new signature for transducer_loss() function

anautsch reviewed Sep 20, 2022

View reviewed changes

recipes/CommonVoice/ASR/transducer/hparams/train_fr.yaml Show resolved Hide resolved

speechbrain/nnet/losses.py Show resolved Hide resolved

recipes/mTEDx/ASR/CTC/README.md Outdated Show resolved Hide resolved

recipes/mTEDx/ASR/Transducer/README.md Outdated Show resolved Hide resolved

Anwarvic added 6 commits September 25, 2022 19:37

added needed README files for mTEDx recipes

82d450b

changed use_torchaudio flag in Transducer recipes README all across d…

1bf6988

…ifferent datasets

fixed wrong pths in tests/recipes.csv

b71c2d7

added CTC models to CTC README of mTEDx recipe

9fd2564

fixed merged issues in tests/recipes.csv

dd41227

minor changes in README file

209a60a

removed unused variables in conf files in mTEDx recipes

f22f144

fixed the naming issue for transducer recipe

b2aedf5

anautsch reviewed Sep 27, 2022

View reviewed changes

Adel-Moumen self-requested a review January 16, 2024 13:03

Adel-Moumen assigned Anwarvic Jan 16, 2024

asumagic reviewed Feb 5, 2024

View reviewed changes

asumagic mentioned this pull request May 21, 2024

AMD ROCm: Conformer-transducer diverges #2551

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating Pruned Fast RNNT with Transducer + new recipe for mTEDx dataset #1465

Integrating Pruned Fast RNNT with Transducer + new recipe for mTEDx dataset #1465

Anwarvic commented Jun 22, 2022 •

edited

mravanelli commented Jun 22, 2022

Anwarvic commented Jun 23, 2022

danpovey commented Jun 23, 2022

mravanelli commented Jun 24, 2022

Anwarvic commented Jun 24, 2022

Anwarvic commented Jun 27, 2022 •

edited

danpovey commented Jun 27, 2022

Anwarvic commented Jun 27, 2022

danpovey commented Jun 27, 2022

danpovey commented Jun 27, 2022

anautsch left a comment

Anwarvic commented Sep 14, 2022

anautsch left a comment •

edited

Anwarvic commented Sep 19, 2022

anautsch left a comment

Anwarvic commented Sep 26, 2022

anautsch commented Sep 26, 2022

Anwarvic commented Sep 26, 2022

anautsch left a comment •

edited

anautsch Sep 27, 2022

Adel-Moumen commented Jan 16, 2024

Anwarvic commented Jan 16, 2024

Adel-Moumen commented Jan 16, 2024

asumagic Feb 5, 2024

Integrating Pruned Fast RNNT with Transducer + new recipe for mTEDx dataset #1465

Are you sure you want to change the base?

Integrating Pruned Fast RNNT with Transducer + new recipe for mTEDx dataset #1465

Conversation

Anwarvic commented Jun 22, 2022 • edited

EDIT (18/09/2022)

mravanelli commented Jun 22, 2022

Anwarvic commented Jun 23, 2022

danpovey commented Jun 23, 2022

mravanelli commented Jun 24, 2022

Anwarvic commented Jun 24, 2022

Anwarvic commented Jun 27, 2022 • edited

danpovey commented Jun 27, 2022

Anwarvic commented Jun 27, 2022

danpovey commented Jun 27, 2022

danpovey commented Jun 27, 2022

anautsch left a comment

Choose a reason for hiding this comment

Anwarvic commented Sep 14, 2022

anautsch left a comment • edited

Choose a reason for hiding this comment

Anwarvic commented Sep 19, 2022

anautsch left a comment

Choose a reason for hiding this comment

Anwarvic commented Sep 26, 2022

anautsch commented Sep 26, 2022

Anwarvic commented Sep 26, 2022

anautsch left a comment • edited

Choose a reason for hiding this comment

anautsch Sep 27, 2022

Choose a reason for hiding this comment

Adel-Moumen commented Jan 16, 2024

Anwarvic commented Jan 16, 2024

Adel-Moumen commented Jan 16, 2024

asumagic Feb 5, 2024

Choose a reason for hiding this comment

Anwarvic commented Jun 22, 2022 •

edited

Anwarvic commented Jun 27, 2022 •

edited

anautsch left a comment •

edited

anautsch left a comment •

edited