Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switchboard Recipe #1460

Merged
merged 45 commits into from Nov 21, 2022
Merged

Switchboard Recipe #1460

merged 45 commits into from Nov 21, 2022

Conversation

dwgnr
Copy link
Contributor

@dwgnr dwgnr commented Jun 21, 2022

Hey everybody,

I made a recipe for the Switchboard corpus.
The data preparation steps mostly follow Kaldi's s5c recipe.

The recipe includes the following models:

ASR

  • CTC: Wav2Vec2 Encoder + CTC Decoder (adapted from the Commonvoice recipes)
  • seq2seq: CRDNN encoder + GRU Decoder + Attention (adapted from the LibriSpeech recipe)
    • Note: Unlike the Librispeech recipe, this system does not include any LM. In fact, every LM I tried (pretrained, finetuned or trained from scratch) seemed to make the performance much worse
  • transformer: Transformer model + LM (adapted from the LibriSpeech recipe)

LM

  • There are two hparams files for finetuning existing LibriSpeech LMs on Switchboard and Fisher data, one for an RNNLM and the other for a Transformer LM

Tokenizer

  • Basic Sentencepiece Tokenizer training on Switchboard and Fisher data

Performance
The model performance is as follows:

Model Swbd WER Callhome WER Eval2000 WER
CTC 21.35 28.32 24.91
seq2seq 25.37 36.87 29.33
Transformer (LibriSpeech LM) 22.00 30.12 26.14
Transformer (Finetuned LM) 21.11 29.43 25.36

As you can see, the performance is currently comparable to Kaldi's chain systems without i-vectors.
However, they need some refinement to be on par with the best Kaldi systems available (WER should be around 18 on the full eval2000 testset).

If you have any suggestions for improvements, I'd be happy to implement them.

I can also provide the trained models in case you are interested (I might need some help with this whole Huggingface thing though).

Best,
Dominik

ps Thanks for all the great work you've done here! :)

@TParcollet
Copy link
Collaborator

Hi! Thanks for this huge contribution. Please start by solving all the tests, and then we will proceed with a proper review :-)

You can install the pre-commit and pre-push hooks quite easily following the doc!

Also, do you know how your numbers compare to ESPnet for instance? We can easily discuss ways of improving the performance.

@mravanelli
Copy link
Collaborator

Thank you for this contribution. One tip: you can also run the tests locally with 'pytest', ''tests/.run-linters.sh" (for style issues), ".run-unittests.sh" (for unittests). You can fix most of the style issues of python files with "black" (see https://speechbrain.readthedocs.io/en/latest/contributing.html)

@mravanelli mravanelli added the enhancement New feature or request label Jun 21, 2022
@mravanelli mravanelli requested a review from anautsch June 21, 2022 20:40
@mravanelli
Copy link
Collaborator

I would also suggest to merge here the latest version of the development branch. We just merged some additional checks and tests that are needed in this case as well.

@dwgnr
Copy link
Contributor Author

dwgnr commented Jun 22, 2022

Hi,

I merged with the latest development branch, ran the linters and made the appropriate changes, so the formatting should be ok now.

Unfortunately, my local unittests kept crashing in tests/unittests/test_augment.py and tests/unittests/test_ctc_segmentation.py with this error:

ImportError while importing test module '/nfs/scratch/staff/wagnerdo/speechbrain-fork-2/tests/unittests/test_ctc_segmentation.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
tests/unittests/test_ctc_segmentation.py:1: in <module>
    from speechbrain.pretrained import EncoderDecoderASR
speechbrain/__init__.py:4: in <module>
    from .core import Stage, Brain, create_experiment_directory, parse_arguments
speechbrain/core.py:35: in <module>
    from speechbrain.utils.distributed import run_on_main
speechbrain/utils/__init__.py:11: in <module>
    from . import *  # noqa
speechbrain/utils/profiling.py:11: in <module>
    from torch.autograd.profiler_util import (  # pytorch v1.10.1
E   ModuleNotFoundError: No module named 'torch.autograd.profiler_util'

I'm not sure whether it's a local issue with my environment or whether it's a broader problem.
In any case, it doesn't look like it has something to do with the Switchboard recipe in particular.

@TParcollet: ESPNet reports WERs of 15.6 (Callhome), 8.4 (Swbd), and 12.0 (eval2000) for their latest transformer model (see https://github.com/espnet/espnet/blob/master/egs2/swbd/asr1/RESULTS.md), so there is definitely room for improvement.

@mravanelli
Copy link
Collaborator

Thank you @dwgnr! It looks like the performance is not to good at the moment. As far as I remember, switchboard recording are sampled at 8 kHz (and not 16 kHz as the other recipes). Do you consider it? Also, I think a significant improvement can be achieved by adding a language model, right?

@dwgnr
Copy link
Contributor Author

dwgnr commented Jun 22, 2022

Hi @mravanelli,
thanks for kicking off the checks once again.

Yes, Switchboard is sampled at 8kHz and I did indeed consider this in various ways.
For the CTC recipe, everything is simply upsampled to 16 kHz.
This worked much better than passing the 8 kHz data to the pretrained wav2vec model.
The WERs with 8 kHz data are 35.70 (Swbd), 52.30 (Callhome), and 43.05 (eval2000).

The seq2seq model is trained on 8 kHz audio. I also tried upsampling to 16 kHz but it didn't make much difference.

For the transformer model, the audio data is currently upsampled to 16 kHz, but I haven't tried training it on 8 kHz yet.

I fully agree, adding a decent language model should definitely lead to improvements.
Unfortunately, the opposite is the case at the moment :/

I tried a lot of different things with RNNLMs on the seq2seq model (e.g. use pretrained LibriSpeech LM, train simple LSTM LM from scratch, finetune pretrained LibriSpeech LM) and I was careful to use the correct Tokenizer for each LM.
However, the performance always became worse compared to the variant without LM.

Are there any known issues with the S2SRNNBeamSearchLM(S2SRNNBeamSearcher) decoder class?
Or maybe it's just some parameters that work for Librispeech but not so well for Switchboard?

When looking through the detailed results, I noticed that with the LM in place, the decoder seems to be very eager to output more than just a single word, which leads to a lot of insertions.

For example with LM:

================================================================================
en_4938_A_1206, %WER 100.00 [ 1 / 1, 1 ins, 0 del, 0 sub ]
UHHUH ; <eps>
  =   ;   I
UHHUH ;   U
================================================================================
en_6189_B_1892, %WER 300.00 [ 3 / 1, 3 ins, 0 del, 0 sub ]
IT ; <eps> ; <eps> ; <eps>
=  ;   I   ;   I   ;   I
IT ;   IT  ;   IT  ;   IT
==============================================================================
en_6489_A_2236, %WER 500.00 [ 5 / 1, 5 ins, 0 del, 0 sub ]
OH ; <eps> ; <eps> ; <eps> ; <eps> ; <eps>
=  ;   I   ;   I   ;   I   ;   I   ;   I
OH ;   OH  ;   OH  ;   OH  ;   OH  ;   OH

This phenomenon is less pronounced for the decoder without LM:

================================================================================
en_4938_A_1206, %WER 0.00 [ 0 / 1, 0 ins, 0 del, 0 sub ]
uhhuh
  =
uhhuh
================================================================================
en_6189_B_1892, %WER 100.00 [ 1 / 1, 1 ins, 0 del, 0 sub ]
it ; <eps>
=  ;   I
it ;   it
================================================================================
en_6489_A_2236, %WER 0.00 [ 0 / 1, 0 ins, 0 del, 0 sub ]
oh
=
oh
================================================================================

There are actually quite a lot of very short utterances in the corpus, so these errors quickly accumulate.

The problem also seems to invert for long utterances.

For example with LM:

================================================================================
en_6183_B_1770, %WER 20.51 [ 8 / 39, 0 ins, 3 del, 5 sub ]
BUT ; I ; REMEMBER ; WHEN ; I ; FIRST ; MET ; YOU ; THOUGH ; AND ; YOU ; WERE ; REAL ; POSITIVE ; ABOUT ; THE ; WHOLE ; THING ;  AND  ; I ; THOUGHT ; SHE ; WANTS  ; TO ; BE ; A ; DIETICIAN ;  SHE  ;   IS  ; CRAZY ;  THERE  ;   IS  ; SO ; MANY ; OTHER ; THINGS ; SHE ; COULD ; DO
 =  ; = ;    =     ;  =   ; = ;   =   ;  =  ;  =  ;   =    ;  =  ;  =  ;  =   ;  =   ;    =     ;   =   ;  =  ;   =   ;   =   ;   D   ; = ;    =    ;  =  ;   S    ; =  ; =  ; = ;     S     ;   S   ;   D   ;   =   ;    S    ;   D   ; =  ;  =   ;   =   ;   =    ;  S  ;   =   ; =
BUT ; I ; REMEMBER ; WHEN ; I ; FIRST ; MET ; YOU ; THOUGH ; AND ; YOU ; WERE ; REAL ; POSITIVE ; ABOUT ; THE ; WHOLE ; THING ; <eps> ; I ; THOUGHT ; SHE ; WANTED ; TO ; BE ; A ; DIETITIAN ; SHE'S ; <eps> ; CRAZY ; THERE'S ; <eps> ; SO ; MANY ; OTHER ; THINGS ; YOU ; COULD ; DO
================================================================================

And without LM:

en_6183_B_1770, %WER 148.72 [ 58 / 39, 47 ins, 1 del, 10 sub ]
but ; i ; remember ; when ; i ; first ;  met  ; you ; though ; and ; you ; were ; real ; positive ; about ; the ; whole ; thing ;  and  ; i ; thought ; <eps> ; she ; wants ; to ; be ; a ; dietician  ; she ;   is  ; crazy ; there ;    is   ; so ; many ; other ; things ; she ; could ; do ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps> ; <eps>
 =  ; = ;    =     ;  =   ; = ;   =   ;   S   ;  S  ;   S    ;  =  ;  =  ;  =   ;  =   ;    =     ;   =   ;  =  ;   =   ;   =   ;   D   ; = ;    =    ;   I   ;  =  ;   =   ; =  ; =  ; = ;     S      ;  S  ;   S   ;   =   ;   S   ;    S    ; =  ;  =   ;   =   ;   =    ;  S  ;   S   ; =  ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I   ;   I
but ; i ; remember ; when ; i ; first ; moved ;  to ;  utah  ; and ; you ; were ; real ; positive ; about ; the ; whole ; thing ; <eps> ; i ; thought ;  well ; she ; wants ; to ; be ; a ; technician ; and ; she's ; crazy ;  and  ; there's ; so ; many ; other ; things ; you ;  can  ; do ;   do  ;  you  ;   do  ;   do  ;  you  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do  ;   do

Do you have any ideas what could cause these weird insertions?

@mravanelli
Copy link
Collaborator

As for the RNNLM or TransformerLM recipes, these are the same that we use for librispeech (where we noticed a quite big improvement when using the LM). As for the decoding part, one thing that might be important is to tune a bit the hyperparameters of the beamsearcher. According to my experience, this is really important. Fortunately, to tune these hyperparameters you don't have to run the training part again, but only the final decoding part (ideally on the validation set and not on the test one).

@anautsch
Copy link
Collaborator

anautsch commented Jul 1, 2022

@dwgnr thank you for this contribution!
It looks all very clean - there are pre-commit check issues with yaml files. Please check also for errors in the consistency checks.

ERROR: The file recipes/Switchboard/LM/hparams/transformer_small.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
ERROR: The file recipes/Switchboard/Tokenizer/hparams/2K_unigram_subword_bpe.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
ERROR: The file recipes/Switchboard/ASR/transformer/hparams/transformer_small_lm_after_cleanup.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
ERROR: The file recipes/Switchboard/ASR/transformer/hparams/transformer_finetuned_LM.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
ERROR: The file recipes/Switchboard/LM/hparams/transformer.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
ERROR: The file recipes/Switchboard/ASR/transformer/hparams/transformer_small_lm.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
ERROR: The file recipes/Switchboard/ASR/seq2seq/hparams/train_BPE_1000_own_tokenizer.yaml is not listed in tests/recipes.csv. Please add it.                 For more info see tests/consistency/README.md
	
ERROR: variable "ctc_weight_decode" not used in recipes/Switchboard/ASR/seq2seq/train.py!

As mentioned abive, before the final commit, please use !PLACEHOLDER for data_folders.

Btw - do you still get E ModuleNotFoundError: No module named 'torch.autograd.profiler_util' ?
(It might be a pytorch version issue.)

@anautsch
Copy link
Collaborator

anautsch commented Oct 7, 2022

@Adel-Moumen @TParcollet what's your take - should we merge without having been able to debug/test recipes, since Switchboard is with restricted access?

(From my perspective, we should merge & if someone faces issues, we would welcome their PR. There are little other pragmatic means to wrap this PR up - there are ample of contributions in here already.)

@dwgnr
Copy link
Contributor Author

dwgnr commented Oct 7, 2022

In case it helps your decision:
I'm open to help fixing bugs related to the recipe, if someone finds one after the merge 🙂

@anautsch
Copy link
Collaborator

Hi @dwgnr getting back to this. As there are no responses thus far, I'd proceed with this my way. I consider your line of work as completed, yet, we face some technicality issues here, so—let's tackle them.

Please let me resolve some side issues (i.e. please do not push meanwhile):

  • reverting (also correct but different) formatting of PR-unconcerned files
  • integration testing of proposed model(s) w/o database-depending dataio & prepare scripts
  • recipe.csv file just causes merge conflicts across all PRs (effectively, putting contributors in competition)

For the latter, I still have to make up my mind of what would be a viable step forward. Perhaps I should work this out in another, new PR and then come back here. (Working on your PR integration in the then-new recipe integration testing PR meanwhile.)
We lack some general recipe testing (e.g. functionality check when refactorings are coming up).

For the former, I looked into different options and will push one solution in a bit:

  • rebase -> we will not break the git tree
  • copy/paste from a more recent develop status -> git tree consistency is core
  • seek highest top-level folders with reformattings-only below -> take earlier commit ID -> checkout (recursively) that folder at that commit ID

So that I do not need to go over that again, please do not re/format &/or push to this branch ;-)

@dwgnr
Copy link
Contributor Author

dwgnr commented Oct 11, 2022

Hi @anautsch, I won't do anything to the branch, I promise ;-)

Let me know if it gets too messy with all the reformatting issues.
I could simply open a new PR. Maybe that's cleaner and less complicated?

@anautsch
Copy link
Collaborator

@dwgnr 🤣

look at the stats of your PR
+4,802 −1
=> this is your work shining ;-)

... from my side the PR is good as it stands—it's environment needs to be brought up to speed.
—and then, some aligning things need to occur, that's all the patience I'm asking for.

@Adel-Moumen
Copy link
Collaborator

Hello,

The three models have been merged in the HF SB hub.

We are now waiting for this PR to be merged so that we can then switch the visibility of the models to 'public' on the HF hub :-)

Many thanks, @dwgnr for your massive work! 😎

@mravanelli
Copy link
Collaborator

Hi @dwgnr,
thank you for your contribution. Could you please fix the conflict?

@dwgnr
Copy link
Contributor Author

dwgnr commented Nov 2, 2022

Hi @mravanelli, fixed it.

@mravanelli
Copy link
Collaborator

Thank you! @anautsch do you think everything is ready to be merged?

@anautsch
Copy link
Collaborator

anautsch commented Nov 3, 2022

Hi @mravanelli - the PR lgtm as said before.
Since I do not have access to the same data, I oculdn't test all - I went for another strategy, the recicpe testing PR (after that, I wanted to merge this one using the tools from there, knowing these recipes are sanity checked and join a healthy base). I still need to figure out the test debug flags as in here (for example):
https://github.com/speechbrain/speechbrain/blob/15819a16f903523d384e3231731fc83e30bce8cc/tests/recipes/AISHELL-1.csv

@dwgnr I'll attend first to this error - it's coming up across all PRs right now (I suspect some dependency updated and broke things). Then, let's get this done already; I didn't think about simply working out the test debug flags before - sorry -> that'd have already satisfied what I need to have.

@mravanelli
Copy link
Collaborator

@dwgnr, could you please merge here the latest version of the develop branch?

@dwgnr
Copy link
Contributor Author

dwgnr commented Nov 3, 2022

Hi @mravanelli, I merged the latest version and it auto-closed the PR. Was it supposed to do that?

@dwgnr dwgnr reopened this Nov 3, 2022
Copy link
Collaborator

@anautsch anautsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @dwgnr the same happened with me in another PR; don't worry, re-opening is just fine.
(happens.)

@anautsch
Copy link
Collaborator

Hi @dwgnr thank you for updating the latest merge conflict.

Please take a look at the test_debug_flags in
https://github.com/anautsch/speechbrain/blob/refactor-recipe-testing/tests/recipes/CommonVoice.csv
or
https://github.com/anautsch/speechbrain/blob/refactor-recipe-testing/tests/recipes/LibriSpeech.csv

is there a 'minimal' set of test flags for your scripts, so we can test them w/o having the actual data available?

It's using minimal data available at: https://github.com/speechbrain/speechbrain/tree/develop/tests/samples

If you could provide the test_debug_flags in addition to each added recipe, that would help me.

The PR #1600 demands more time from me than I expected. In there, the testing is made database specific and available for all recipes in the sense that we also know the workflows are still functional (think: dependencies may update and their interfaces could break over time).

@dwgnr
Copy link
Contributor Author

dwgnr commented Nov 17, 2022

Hi @anautsch, no problem. I'll see if I can find a minimal set of test_debug_flags for the recipes.

@anautsch
Copy link
Collaborator

Hope this command helps you to just run the checks for your recipes

python -c 'from speechbrain.utils.recipe_tests import run_recipe_tests; print("TEST FAILED!") if not(run_recipe_tests(filters_fields=["Dataset"], filters=["Switchboard"], do_checks=False)) else print("TEST PASSED")'

@dwgnr
Copy link
Contributor Author

dwgnr commented Nov 17, 2022

Hi @anautsch, I managed to get the tests running under the new recipe testing environment🎉
It would require some small adjustments in #1600 as well as some changes in this PR.

In #1600, tests/samples/annotation/ASR_train.csv would need two extra columns called channel and words and to make the LM training work, the dev and train corpus files would need to be CSVs instead of plaintext.

I attached the files below:
Switchboard.csv
LM_dev.csv
LM_train.csv
ASR_train.csv

I pushed the changes related to the recipes here. Hope that's fine.

@anautsch
Copy link
Collaborator

Hi @dwgnr - that's a great help, thank you!

Your recipe should not change too much; adequate dummy data needs to be there. I think it is that what you refer to.
The changes you brought forth will impact the other PR - that's good!

Right now, there is no merge conflict to develop. As for the other PR, I'll need to update it to include recipes that were released meanwhile just as well as this one. The changes you made look like what I needed to do with other recipes, too. I'll need to get back to this when wrapping up the other PR — meanwhile:

lgtm

@anautsch anautsch merged commit 5c338c4 into speechbrain:develop Nov 21, 2022
@dwgnr
Copy link
Contributor Author

dwgnr commented Nov 21, 2022

Hi @anautsch, thanks for merging! This was a lot of fun :-)
Let me know if users encounter any problems with the recipe.

@Adel-Moumen
Copy link
Collaborator

I turned public the models on the HuggingFace hub. Thanks a lot again for your work!! :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants