Skip to content

Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.

Notifications You must be signed in to change notification settings

krylm/whisper-event-tuning

Repository files navigation

Whisper fine tuning event 2022 - script modification

Last setup what was used for training best fine tuned model for Whisper in HuggingFace Fine tuning event 2022.

DeepSpeed

First modification was to get access to bigger batch_size without gradient_accumulation_steps using DeepSpeed.

To make it run inside Docker, I've used guide from Zihao's blogpost.

Concatenation of input dataset

It was idea from Bayar. Whisper model uses 30 second batches, but Common Voice dataset is around 3-5 seconds of audio in each sample. We can concatenate audio and text together to fewer samples. To learn from more dense data. It should run faster and learn a lot more from each sample.

Other ideas

According to some details of training Large v2 model in Whisper paper I have some ideas to try in next steps.

Thanks for Whisper Fine tuning event 2022

About

Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.

Topics

Resources

Stars

Watchers

Forks