-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrating Pruned Fast RNNT with Transducer + new recipe for mTEDx dataset #1465
base: develop
Are you sure you want to change the base?
Conversation
…ansducerBeamSearcher with one overridden function
Thank you @Anwarvic, could you please report there the current results? In particular, it is interesting to see the performance/speed comparison between torchaudio loss and the FastRNNT (at least with your current implementation). |
Hi @mravanelli, the following is a comparison between the
All these four models were trained using the same architecture with the same hyper-parameters. The only difference was the batch sizes as shown in the following table:
Also, I don't know if it's fair to compare the |
We generally give the pruned loss a scale of zero in the loss function during a warmup period, to give time for the simple loss to learn the alignment. The length of the warmup period depends on the learning rate schedule, normally it would be about the same amount of time as it takes for the simple loss, when normalized by the number of frames, to go below 0.5 or so. |
Yes, @mravanelli! I'm just waiting for a few more epochs before reporting the results. |
Hi @danpovey @mravanelli, the model's CER & WER didn't get any better even after using warmup. I trained two models on the same dataset using the same hyper-parameters ( Both models' WER didn't get any better, always stuck at 99.99. |
What warmup schedule did you use, i.e. how many batches does the warmup last? The setup of ours where warmup takes 3000 batches used a reworked conformer model that takes a "warmup" option, and a different optimizer, and it converges initially about 10 times faster than the conventional conformer. You will likely need to warm up for way longer if you are using the regular conformer and the normal learning-rate schedule for transformers. You should notice a kind of knee in the loss function where it starts to learn the alignment and starts to decrease fairly rapidly. (I.e. in your normal model training). That should tell you how long the warmup needs to be, approximately. |
@danpovey, really appreciate your quick responses. And sorry about that. I should've provided more details.
The same one as implemented here, which can be summarized in the following equation: $$ loss = 0.5 * \text{simple loss} + \text{pruned scale} * \text{pruned loss} $$ In this implementation, I used $ \text{warmup epochs}=2$ which is around
In my implementation, I'm not using conformers. I'm using CRDNN encoder which is basically a combination of CNNs, RNNs, and DNNs. I'm using
Based on my understanding, the plan now is like the following:
Am I correct? |
Instead of "train the un-warmed model..", what I meant is, presumably you had some baseline model (without the pruned RNN-T loss) that was converging. I meant that model. But I would say set the warmup period to, say, 8 epochs, based on the CER curve for your baseline model. Your baseline model seems to be getting to an OK point in 4 or 5 epochs; but you are using larger batch sizes with the pruned model, and to some extent the initial convergence depends on the number of batches and not just the amount of data seen, so I'm suggesting to increase the number of warmup epochs because of that. In a setup with more data you wouldn't need to warm up for so long. BTW, the |
By the way, I had a look at the adadelta paper at some point and could not make much sense of it, it seems to me an extremely strange update rule. I am a little concerned that it may interact badly with a setup where some parts of the loss function are only introduced after a delay. That is, it's not clear to me that it would behave in a reasonable way if your gradients are set to zero for some initial periiod for some of the parameters. You could perhaps have a look at the update and try to convince yourself one way or another. But adagrad may be safer here, you'd have to choose a learning rate schedule. (Presumably the transformer one, with warmup period, is not necessary here since this is not a transformer model.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Anwarvic hope you are doing well. I've minor comments only. @TParcollet worked more closely with you on this one. Are more contributions pending?
Hi, @anautsch! Yes, there are some that I was intending to add but didn't have the time. I think I can add them to this PR by this weekend inshallah. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @Anwarvic thank you for the updates !
( appreciateing the edit note in your original post :)
It's more comments & curiosity from my end.
There were some warnings/errors in testing.
Please run pre-commit run --all-files
to let it fix linters automatically.
From the testing log:
=========================== short test summary info ============================
FAILED tests/consistency/test_docstrings.py::test_recipe_list - AssertionErro...
FAILED tests/consistency/test_recipe.py::test_recipe_list - AssertionError: a...
Checking ./speechbrain/nnet/transducer/transducer_joint.py...
ERROR: The class Fast_RNNT_Joiner(nn.Module) in ./speechbrain/nnet/transducer/transducer_joint.py has no docstring. Please write it. For more info, see tests/consistency/DOCSTRINGS.md
----------------------------- Captured stdout call -----------------------------
ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml is not listed in tests/recipes.csv. Please add it. For more info see tests/consistency/README.md
ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train.yaml is not listed in tests/recipes.csv. Please add it. For more info see tests/consistency/README.md
ERROR: The file recipes/mTEDx/ASR/CTC/hparams/train_wav2vec.yaml is not listed in tests/recipes.csv. Please add it. For more info see tests/consistency/README.md
ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train_unpruned.yaml is not listed in tests/recipes.csv. Please add it. For more info see tests/consistency/README.md
ERROR: The file recipes/mTEDx/LM/RNNLM/hparams/train.yaml is not listed in tests/recipes.csv. Please add it. For more info see tests/consistency/README.md
ERROR: The file recipes/mTEDx/Tokenizer/hparams/1K_unigram_subword_bpe.yaml is not listed in tests/recipes.csv. Please add it. For more info see tests/consistency/README.md
ERROR: The file recipes/mTEDx/ASR/Transducer/hparams/train_wav2vec_pruned.yaml is not listed in tests/recipes.csv. Please add it. For more info see tests/consistency/README.md
ERROR: The file recipes/mTEDx/ASR/CTC/hparams/train_xlsr.yaml is not listed in tests/recipes.csv. Please add it. For more info see tests/consistency/README.md
(it skipped doc, unit & integration tests, they'll still have to be run after attending the consistency checks.)
Are you satisfied with the (pre)trained models ?
Hi @anautsch , I think all above problems are resolved now. |
…new signature for transducer_loss() function
…new signature for transducer_loss() function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @Anwarvic !
There's more roadwork ahead :)
The Tokenizer recipe has no Readme.
It might be an upper case issue here 'Transducer' (?)
ERROR: The file recipes/mTEDx/ASR/transducer/train.py listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/train.py listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/train.py listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/train_wav2vec.py listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train.yaml listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train_pruned.yaml listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train_unpruned.yaml listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/hparams/train_wav2vec_pruned.yaml listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!
ERROR: The file recipes/mTEDx/ASR/transducer/README.md listed in tests/recipes.csv does not exist!
...
ERROR: The recipe recipe0147 does not contain a Script_file. Please add it!
ERROR: The recipe recipe0147 does not contain a Hparam_file. Please add it!
ERROR: The recipe recipe0147 does not contain a Readme_file. Please add it!
ERROR: The recipe recipe0148 does not contain a Script_file. Please add it!
ERROR: The recipe recipe0148 does not contain a Hparam_file. Please add it!
ERROR: The recipe recipe0148 does not contain a Readme_file. Please add it!
ERROR: The recipe recipe0149 does not contain a Script_file. Please add it!
ERROR: The recipe recipe0149 does not contain a Hparam_file. Please add it!
ERROR: The recipe recipe0149 does not contain a Readme_file. Please add it!
ERROR: The recipe recipe0150 does not contain a Script_file. Please add it!
ERROR: The recipe recipe0150 does not contain a Hparam_file. Please add it!
ERROR: The recipe recipe0150 does not contain a Readme_file. Please add it!
...
E FileNotFoundError: [Errno 2] No such file or directory: 'recipes/mTEDx/ASR/transducer/README.md'
E FileNotFoundError: [Errno 2] No such file or directory: 'recipes/mTEDx/ASR/transducer/hparams/train.yaml'
Please remove unused variables (or use them)
Checking recipes/mTEDx/Tokenizer/hparams/1K_unigram_subword_bpe.yaml...
ERROR: variable "train_log" not used in recipes/mTEDx/Tokenizer/train.py!
Checking recipes/mTEDx/LM/RNNLM/hparams/train.yaml...
ERROR: variable "blank_index" not used in recipes/mTEDx/LM/RNNLM/train.py!
…ifferent datasets
@anautsch, sorry for the late reply. I have resolved all issued mentioned above. Please, feel free to get back to me once other issues occur |
Hi @Anwarvic there were errors with the consistency checking.
Please run: Some recipe pointers are not set correctly (case sensitive path perhaps), and there are unused variables. |
@anautsch, passed all tests! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Anwarvic there's an integration test error.
tests/integration/ASR_Transducer/example_asr_transducer_experiment.py F [ 18%]
It's about the interface change you put in
E TypeError: transducer_loss() got an unexpected keyword argument 'use_torchaudio'
This implies that everyone who is relying on use_torchaudio
will have to change all their code when this PR goes through as well. Idk - this might be better for a major version change. @TParcollet @mravanelli any thoughts?
Situation:
@Anwarvic came across that there are more than framework possible to use here, thus proposed a change to:
speechbrain/speechbrain/nnet/losses.py
Line 56 in b2aedf5
'speechbrain' | 'torchaudio' | 'fast_rnnt' (default: torchaudio). |
We talked about the impact of these changes here, since @Anwarvic already proposed depending updates to all recipes:
https://github.com/speechbrain/speechbrain/pull/1465/files#r979478055
Yet, this remains somewhat an interface change - how about this: should both (non-mandatory) options be allowed until a major version drops the use_torchaudio
flag eventually ?
# Data files | ||
data_folder: !PLACEHOLDER | ||
langs: | ||
- !PLACEHOLDER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fr
?
Hello @Anwarvic, Sorry for the very late reply. We are still looking forward to merging your PR as it is a very nice contribution. Unfortunately, due to the release of many PRs in Could you please confirm with me if you are still okay with finishing this PR? Sorry again. Best regards, |
Hi @Adel-Moumen, Thank you for the update and no worries about the delay. I appreciate your feedback and not giving up on my PR. I'm still committed to finishing this PR, and I'll ensure it doesn't conflict with If you don't mind, I will do this by the end of next week (26/01/2024) as I have things on my plate at the moment. Would that be ok? Thanks! |
Would be wonderful. Please ping me if you need some help in the process. :-) |
``` | ||
pip install numba | ||
``` | ||
|
||
If you are planning to use FastRNNT loss function, install `FastRNNT`: | ||
``` | ||
pip install FastRNNT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be fast-rnnt
? I get a 404 on PyPI when checking for FastRNNT
.
Hi all,
In this pull request, I'm proposing basically two things:
The following steps are a way to use the new changes and make merging this PR as easy as possible:
Download the mTEDx data. Check this README file:
Install Fast-RNNT:
Train a tokenizer & language model using the data. For reproducibility, download the tokenizer (from here) and RNNLM (from here).
Now, open the
./speechbrain/recipes/mTEDx/ASR/Transducer/hparams/train_pruned.yaml
file and set these variables:Now, to train the Transducer on mTEDx-French using pruned-loss function, run the following command
The resulting model should be found in
./results/mTEDx_fr/CASCADE/CRDNN_pruned
EDIT (18/09/2022)
Starting from here, I will mention the most recent updates to this PR:
transducer_loss
function to acceptframework
argument instead ofuse_torchaudio
.use_torchaudio
variable toframework
in the transducer YAML files.