Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating Pruned Fast RNNT with Transducer + new recipe for mTEDx dataset #1465

Open
wants to merge 36 commits into
base: develop
Choose a base branch
from

Conversation

Anwarvic
Copy link

@Anwarvic Anwarvic commented Jun 22, 2022

Hi all,

In this pull request, I'm proposing basically two things:

  • A new (ASR, Tokenizer, LM) recipe for mTEDx dataset used mainly for speech recognition & speech translation.
  • An integration of the pruned-loss of Fast RNNT to the transducer recipe.

The following steps are a way to use the new changes and make merging this PR as easy as possible:

  1. Download the mTEDx data. Check this README file:

  2. Install Fast-RNNT:

    pip install fast_rnnt
  3. Train a tokenizer & language model using the data. For reproducibility, download the tokenizer (from here) and RNNLM (from here).

  4. Now, open the ./speechbrain/recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml file and set these variables:

    pretrained_tokenizer_path:
    pretrained_lm_path:
    data_folder:

Now, to train the Transducer on mTEDx-French using pruned-loss function, run the following command

python ./speechbrain/recipes/mTEDx/ASR/Transducer/train.py \
           ./speechbrain/recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml

The resulting model should be found in ./results/mTEDx_fr/CASCADE/CRDNN_pruned

EDIT (18/09/2022)

Starting from here, I will mention the most recent updates to this PR:

  • Added README files for mTEDx recipes.
  • Changed the signature for transducer_loss function to accept framework argument instead of use_torchaudio.
  • Changed the use_torchaudio variable to framework in the transducer YAML files.
  • Added warmup mechanism for Fast RNNT as suggested here.
  • Added mTEDx CTC recipes.
  • More documentations & comments.

@mravanelli mravanelli added the enhancement New feature or request label Jun 22, 2022
@mravanelli
Copy link
Collaborator

Thank you @Anwarvic, could you please report there the current results? In particular, it is interesting to see the performance/speed comparison between torchaudio loss and the FastRNNT (at least with your current implementation).

@Anwarvic
Copy link
Author

Hi @mravanelli, the following is a comparison between the torchaudio.rnnt_loss and the fast_rnnt.pruned_loss using different prune ranges: 5, 40, and 115 using my current implementation on the mTEDx-Fr dataset with respect to:

  • Training Loss:
  • Validation Loss:
  • Validation CER (Character-Error-Rate):

All these four models were trained using the same architecture with the same hyper-parameters. The only difference was the batch sizes as shown in the following table:

func batch_size
torchaudio 8
pruned_5 20
pruned_40 16
pruned_115 8

Also, I don't know if it's fair to compare the pruned_loss with the torchaudio.rnnt_loss since the former is a summation of two losses as seen in:
https://github.com/Anwarvic/speechbrain/blob/61ccae88f444379f38d74ed5855250d6978adf70/speechbrain/nnet/losses.py#L195

@danpovey
Copy link

We generally give the pruned loss a scale of zero in the loss function during a warmup period, to give time for the simple loss to learn the alignment. The length of the warmup period depends on the learning rate schedule, normally it would be about the same amount of time as it takes for the simple loss, when normalized by the number of frames, to go below 0.5 or so.

@mravanelli
Copy link
Collaborator

Thank you @danpovey! @Anwarvic, did you try the warmup thing?

@Anwarvic
Copy link
Author

Yes, @mravanelli! I'm just waiting for a few more epochs before reporting the results.

@Anwarvic
Copy link
Author

Anwarvic commented Jun 27, 2022

Hi @danpovey @mravanelli, the model's CER & WER didn't get any better even after using warmup. I trained two models on the same dataset using the same hyper-parameters (prune_range=5); one uses warmup (same implementation as here) while the other doesn't (same implementation as the one in this PR). Here is a comparison between their training loss:

Both models' WER didn't get any better, always stuck at 99.99.

@danpovey
Copy link

What warmup schedule did you use, i.e. how many batches does the warmup last? The setup of ours where warmup takes 3000 batches used a reworked conformer model that takes a "warmup" option, and a different optimizer, and it converges initially about 10 times faster than the conventional conformer. You will likely need to warm up for way longer if you are using the regular conformer and the normal learning-rate schedule for transformers. You should notice a kind of knee in the loss function where it starts to learn the alignment and starts to decrease fairly rapidly. (I.e. in your normal model training). That should tell you how long the warmup needs to be, approximately.

@Anwarvic
Copy link
Author

@danpovey, really appreciate your quick responses. And sorry about that. I should've provided more details.

What warmup schedule did you use, i.e. how many batches does the warmup last?

The same one as implemented here, which can be summarized in the following equation:

$$ loss = 0.5 * \text{simple loss} + \text{pruned scale} * \text{pruned loss} $$
Where

$$ \text{pruned scale} = \begin{cases} \text{0,} &\quad\text{if curr epoch < warmup epochs}\\ \text{0.1,} &\quad\text{if warmup epochs < curr epoch < 2 * warmup epochs }\\ \text{1.0,} &\quad\text{if curr epoch > 2 * warmup epochs}\\ \end{cases}$$

In this implementation, I used $ \text{warmup epochs}=2$ which is around $3000$ steps.

You will likely need to warm up for way longer if you are using the regular conformer and the normal learning-rate schedule for transformers.

In my implementation, I'm not using conformers. I'm using CRDNN encoder which is basically a combination of CNNs, RNNs, and DNNs. I'm using torch.Adadelta optimizer with a constant learning rate of $1$. Do you think this till applies?

You should notice a kind of knee in the loss function where it starts to learn the alignment and starts to decrease fairly rapidly.

Based on my understanding, the plan now is like the following:

  1. Train the un-warmed model till convergence (till there is a knee in the training loss curve)
  2. Use this as the number of steps for warmup.

Am I correct?

@danpovey
Copy link

Instead of "train the un-warmed model..", what I meant is, presumably you had some baseline model (without the pruned RNN-T loss) that was converging. I meant that model. But I would say set the warmup period to, say, 8 epochs, based on the CER curve for your baseline model. Your baseline model seems to be getting to an OK point in 4 or 5 epochs; but you are using larger batch sizes with the pruned model, and to some extent the initial convergence depends on the number of batches and not just the amount of data seen, so I'm suggesting to increase the number of warmup epochs because of that. In a setup with more data you wouldn't need to warm up for so long.

BTW, the lm_scale part of the un-pruned loss inflates the loss values so you can't exactly compare the losses with the baseline. If you could split apart the pruned loss, though, and display that separately, that should be comparable to the baseline RNN-T loss.
When we display losses we normalize by the total number of frames, which makes them a little easier to interpret.

@danpovey
Copy link

By the way, I had a look at the adadelta paper at some point and could not make much sense of it, it seems to me an extremely strange update rule. I am a little concerned that it may interact badly with a setup where some parts of the loss function are only introduced after a delay. That is, it's not clear to me that it would behave in a reasonable way if your gradients are set to zero for some initial periiod for some of the parameters. You could perhaps have a look at the update and try to convince yourself one way or another. But adagrad may be safer here, you'd have to choose a learning rate schedule. (Presumably the transformer one, with warmup period, is not necessary here since this is not a transformer model.)

Copy link
Collaborator

@anautsch anautsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Anwarvic hope you are doing well. I've minor comments only. @TParcollet worked more closely with you on this one. Are more contributions pending?

recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml Outdated Show resolved Hide resolved
recipes/mTEDx/LM/RNNLM/hparams/train.yaml Outdated Show resolved Hide resolved
@Anwarvic
Copy link
Author

Hi, @anautsch! Yes, there are some that I was intending to add but didn't have the time. I think I can add them to this PR by this weekend inshallah.

Copy link
Collaborator

@anautsch anautsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Anwarvic thank you for the updates !
( appreciateing the edit note in your original post :)

It's more comments & curiosity from my end.

There were some warnings/errors in testing.

Please run pre-commit run --all-files to let it fix linters automatically.

From the testing log:

=========================== short test summary info ============================
FAILED tests/consistency/test_docstrings.py::test_recipe_list - AssertionErro...
FAILED tests/consistency/test_recipe.py::test_recipe_list - AssertionError: a...
Checking ./speechbrain/nnet/transducer/transducer_joint.py...
	ERROR: The class Fast_RNNT_Joiner(nn.Module) in ./speechbrain/nnet/transducer/transducer_joint.py has no docstring. Please write it. For more info, see tests/consistency/DOCSTRINGS.md
----------------------------- Captured stdout call -----------------------------
	ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/ASR/CTC/hparams/train_wav2vec.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train_unpruned.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/LM/RNNLM/hparams/train.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/Tokenizer/hparams/1K_unigram_subword_bpe.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train_wav2vec_pruned.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	ERROR: The file recipes/mTEDx/ASR/CTC/hparams/train_xlsr.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md

(it skipped doc, unit & integration tests, they'll still have to be run after attending the consistency checks.)

Are you satisfied with the (pre)trained models ?

recipes/mTEDx/ASR/CTC/hparams/train_wav2vec.yaml Outdated Show resolved Hide resolved
recipes/mTEDx/LM/RNNLM/hparams/train.yaml Outdated Show resolved Hide resolved
@Anwarvic
Copy link
Author

Hi @anautsch , I think all above problems are resolved now.

Copy link
Collaborator

@anautsch anautsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @Anwarvic !

There's more roadwork ahead :)

The Tokenizer recipe has no Readme.

It might be an upper case issue here 'Transducer' (?)

	ERROR: The file recipes/mTEDx/ASR/transducer/train.py listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/train.py listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/train.py listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/train_wav2vec.py listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train.yaml listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train_pruned.yaml listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train_unpruned.yaml listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train_wav2vec_pruned.yaml listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!
	ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!

...

	ERROR: The recipe recipe0147 does not contain a Script_file. Please add it!
	ERROR: The recipe recipe0147 does not contain a Hparam_file. Please add it!
	ERROR: The recipe recipe0147 does not contain a Readme_file. Please add it!
	ERROR: The recipe recipe0148 does not contain a Script_file. Please add it!
	ERROR: The recipe recipe0148 does not contain a Hparam_file. Please add it!
	ERROR: The recipe recipe0148 does not contain a Readme_file. Please add it!
	ERROR: The recipe recipe0149 does not contain a Script_file. Please add it!
	ERROR: The recipe recipe0149 does not contain a Hparam_file. Please add it!
	ERROR: The recipe recipe0149 does not contain a Readme_file. Please add it!
	ERROR: The recipe recipe0150 does not contain a Script_file. Please add it!
	ERROR: The recipe recipe0150 does not contain a Hparam_file. Please add it!
	ERROR: The recipe recipe0150 does not contain a Readme_file. Please add it!

...

E               FileNotFoundError: [Errno 2] No such file or directory: 'recipes/mTEDx/ASR/transducer/README.md'
E       FileNotFoundError: [Errno 2] No such file or directory: 'recipes/mTEDx/ASR/transducer/hparams/train.yaml'

Please remove unused variables (or use them)

Checking recipes/mTEDx/Tokenizer/hparams/1K_unigram_subword_bpe.yaml...
	ERROR: variable "train_log" not used in recipes/mTEDx/Tokenizer/train.py!
Checking recipes/mTEDx/LM/RNNLM/hparams/train.yaml...
	ERROR: variable "blank_index" not used in recipes/mTEDx/LM/RNNLM/train.py!

speechbrain/nnet/losses.py Show resolved Hide resolved
recipes/mTEDx/ASR/CTC/README.md Outdated Show resolved Hide resolved
recipes/mTEDx/ASR/Transducer/README.md Outdated Show resolved Hide resolved
@Anwarvic
Copy link
Author

@anautsch, sorry for the late reply. I have resolved all issued mentioned above. Please, feel free to get back to me once other issues occur

@anautsch
Copy link
Collaborator

Hi @Anwarvic there were errors with the consistency checking.

tests/consistency/test_docstrings.py .                                   [ 16%]
tests/consistency/test_recipe.py FFFF                                    [ 83%]
tests/consistency/test_yaml.py F                                         [100%]

Please run: pytest tests/consistency for details.
Check out Mirco's readme for cues.

Some recipe pointers are not set correctly (case sensitive path perhaps), and there are unused variables.
(Nothing big but still relevant.)

@Anwarvic
Copy link
Author

@anautsch, passed all tests!

Copy link
Collaborator

@anautsch anautsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Anwarvic there's an integration test error.

tests/integration/ASR_Transducer/example_asr_transducer_experiment.py F  [ 18%]

It's about the interface change you put in

E       TypeError: transducer_loss() got an unexpected keyword argument 'use_torchaudio'

This implies that everyone who is relying on use_torchaudio will have to change all their code when this PR goes through as well. Idk - this might be better for a major version change. @TParcollet @mravanelli any thoughts?

Situation:
@Anwarvic came across that there are more than framework possible to use here, thus proposed a change to:

'speechbrain' | 'torchaudio' | 'fast_rnnt' (default: torchaudio).

We talked about the impact of these changes here, since @Anwarvic already proposed depending updates to all recipes:
https://github.com/speechbrain/speechbrain/pull/1465/files#r979478055

Yet, this remains somewhat an interface change - how about this: should both (non-mandatory) options be allowed until a major version drops the use_torchaudio flag eventually ?

# Data files
data_folder: !PLACEHOLDER
langs:
- !PLACEHOLDER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fr ?

@Adel-Moumen
Copy link
Collaborator

Hello @Anwarvic,

Sorry for the very late reply.

We are still looking forward to merging your PR as it is a very nice contribution. Unfortunately, due to the release of many PRs in develop, I'll need to ask you to update your upstream branch to comply with develop and to update your PR accordingly. You can refer to https://colab.research.google.com/drive/1IEPfKRuvJRSjoxu22GZhb3czfVHsAy0s?usp=sharing to guide you through this process.

Could you please confirm with me if you are still okay with finishing this PR? Sorry again.

Best regards,
Adel

@Anwarvic
Copy link
Author

Hi @Adel-Moumen,

Thank you for the update and no worries about the delay. I appreciate your feedback and not giving up on my PR.

I'm still committed to finishing this PR, and I'll ensure it doesn't conflict with develop branch as you kindly pointed out.

If you don't mind, I will do this by the end of next week (26/01/2024) as I have things on my plate at the moment. Would that be ok?

Thanks!

@Adel-Moumen
Copy link
Collaborator

Hi @Adel-Moumen,

Thank you for the update and no worries about the delay. I appreciate your feedback and not giving up on my PR.

I'm still committed to finishing this PR, and I'll ensure it doesn't conflict with develop branch as you kindly pointed out.

If you don't mind, I will do this by the end of next week (26/01/2024) as I have things on my plate at the moment. Would that be ok?

Thanks!

Would be wonderful. Please ping me if you need some help in the process. :-)

```
pip install numba
```

If you are planning to use FastRNNT loss function, install `FastRNNT`:
```
pip install FastRNNT
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be fast-rnnt? I get a 404 on PyPI when checking for FastRNNT.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants