This repository is dedicated to building a preliminary fine-tuning pipeline for BEATs, a powerful "Audio Pre-Training with Acoustic Tokenizers" model developed by Microsoft, and you can find the official repository here. This pipeline is a work in progress focused on fine-tuning the model using the ESC-50 dataset before extending its capabilities to handle custom datasets.
The initial implementation of this pipeline was developed by the Norwegian Institute for Nature Research (NINA) using Lightning AI and their work is available here.
UPDATES:
- Docker image built with:
- Python 3.10 (slim version image based on Debian bullseye)
- PyTorch 2.0
- Torchaudio 2.0
- Lightning 2.0
- Optimized Docker image with minimum requirements (less dependencies)
- Warning errors while training (still in progress)
PENDINGS:
- Prototypical network training orquestration with config file similarly to the fine-tuning case
- Bash scripting for data preparation
- Fine-tuning evaluation metrics on ESC-50 dataset
- Fine-tuning on custom dataset
NOTE: fine-tuning/retraining the Tokenizer is NOT on the agenda at the moment. This pipeline is designed only for training the feature extractor and the prototypical network.
To get started, follow these steps:
-
Clone this repository:
git clone https://github.com/fede6590/BEATs-train.git
-
Download the following files:
After downloading, extract the contents of the ESC-50 dataset ZIP file inside the data
folder. The folder structure within the data directory should be as the following:
- data/
- BEATs/
- BEATs_iter3_plus_AS2M.pt
- ESC-50-master/
- audio/
- meta/
- ...
- BEATs/
To build the Docker image, use the following command:
docker build -t beats -f Dockerfile .
IMPORTANT: fine_tune/config.yaml
contains all the customizable parameters for training.
For fine-tuning BEATs on your dataset, use the following commands:
- with available GPU(s)
docker run -v "$PWD":/app \ -v "data":/data \ --gpus all \ beats \ python fine_tune/trainer.py fit --config fine_tune/config.yaml
- without GPU
docker run -v "$PWD":/app \ -v "data":/data \ beats \ python fine_tune/trainer.py fit --config fine_tune/config.yaml
To train the prototypical network, first, create a miniESC50 dataset:
-
with available GPU(s)
docker run -v "$PWD":/app \ -v "data":/data \ --gpus all \ beats \ python data_utils/miniESC50.py
-
without GPU
docker run -v "$PWD":/app \ -v "data":/data \ beats \ python data_utils/miniESC50.py
Then, start the training:
-
with available GPU(s)
docker run -v "$PWD":/app \ -v "data":/data \ --gpus all \ beats \ python prototypicalbeats/trainer.py fit --data miniESC50DataModule
-
without GPU
docker run -v "$PWD":/app \ -v "data":/data \ beats \ python prototypicalbeats/trainer.py fit --data miniESC50DataModule