This repository contains code for fine-tuning the Whisper speech-to-text model. It utilizes Weights & Biases (wandb) for logging metrics and storing models. Key features include:
- Stochastic depth implementation for improved model generalization
- Correct implementation of SpecAugment for robust audio data augmentation
- Checkpointing functionality to save and resume training progress, crucial for handling long-running experiments and potential interruptions
- Integration with Weights & Biases (wandb) for comprehensive experiment tracking and model versioning
-
Clone the repository:
git clone https://github.com/i4ds/whisper-finetune.git cd whisper-finetune
-
Create and activate a virtual environment (strongly recommended). Use Venv or Anaconda or your favorite virtual enviroment creator.
-
Install the package in editable mode:
pip install -e .
Please have a look at https://github.com/i4Ds/whisper-prep. The data is passed as a 🤗 Datasets to the model.
-
Create a configuration file (see examples in
configs/*.yaml
) -
Run the fine-tuning script:
python src/whisper_finetune/scripts/finetune.py --config configs/large-cv-srg-sg-corpus.yaml
Modify the YAML files in the configs/
directory to customize your fine-tuning process. Refer to the existing configuration files for examples of available options.
The starting point of this repository was the excellent repository by Jumon at https://github.com/jumon/whisper-finetuning
We welcome contributions! Please feel free to submit a Pull Request.
If you encounter any problems, please file an issue along with a detailed description.
- Vincenzo Timmel (vincenzo.timmel@fhnw.ch)
- Vincenzo Timmel (vincenzo.timmel@fhnw.ch)
- Claudio Paonessa (claudio.paonessa@fhnw.ch)
This project is licensed under the MIT License - see the LICENSE file for details.