Questions towards hyper-parameters and the token post-processing #10

joyolee · 2023-05-02T15:17:52Z

Dear authors,
thanks for the great work. I have two questions about the paper.
In Section 4.1 about the experimental setup, it's written:
For both AVSR and AVST, we use an English AV-HuBERT large pre-trained model [3], which is trained on the combination of LRS3-TED [8] and the English portion of VoxCeleb2 [27]. We follow [3] for fine-tuning hyper-parameters, except that we fine-tune our bilingual models to 30K updates and our multilingual AVSR model to 90K updates.

I would ask, how many warmup_steps, hold_steps, and decay_steps did you use? And how many freeze_finetune_updates did you set? Because the original configuration file for the large model has 60k updates. We may need to change the above-mentioned hyperparameters if the max_updates is changed to 30k.

The second question is about punctuation removal and lowercasing before calculating WER. Because I also observed some special tokens, e.g. the music token ♪ in the dictionary. Which tokens have you removed and how?

I'm looking forward to your reply and thank you in advance :)

Best regards,
Zhengyang

The text was updated successfully, but these errors were encountered:

Anwarvic · 2023-08-12T23:22:55Z

Hi @joyolee.

Sorry for the late reply!

My team and I have added the training and decoding scripts recently, so feel free to check them. All hyper-parameters used for fine-tuning can be found in this YAML configuration file. However, we had to change a few parameters as shown in the training script

For your reference, the following are the answer to your questions:

how many warmup_steps?

10,000 steps

how many hold_steps?

always 0

how many decay_steps?

20,000 steps

And how many freeze_finetune_updates did you set?

non-English models used 4,000 steps out of 30,000 total steps. And the English model used 24,000 steps out of 90,000 total steps.

The second question is about punctuation removal and lowercasing before calculating WER. Because I also observed some special tokens, e.g. the music token ♪ in the dictionary. Which tokens have you removed and how?

Yes, we've used Fairseq's WerScorer which removes punctuations and lower-case the text.

I hope I answered all of your question. I'm gonna close this for now, but feel free to re-open it when needed.

Anwarvic added the question Further information is requested label May 3, 2023

Anwarvic closed this as completed Aug 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions towards hyper-parameters and the token post-processing #10

Questions towards hyper-parameters and the token post-processing #10

joyolee commented May 2, 2023

Anwarvic commented Aug 12, 2023

Questions towards hyper-parameters and the token post-processing #10

Questions towards hyper-parameters and the token post-processing #10

Comments

joyolee commented May 2, 2023

Anwarvic commented Aug 12, 2023