[Paper] Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Modify the variables of each dataset in the configs/dataset_config.py
.
- Move your audio files to the
wavdir
directory. - Create a meta_csv_file with columns
name
(file names) andlabel
(emotional labels) for each dataset. The pretraining datasets do not need thelabel
column.
Extracting WavLM features in advance can accelerate the pretraining speed greatly. Please use the extract_feature/WavLM/extract_wavlm.py
file to extract the features of pretraining data in advance.
Specify training hyperparameters on the command line or modify them in the configs/train_config.py
.
Please also specify path_to_wavlm
on the command line or in the configs/model_config.py
.
Please refer to the get_args
function in the configs/__init__.py
if you want to use the command line method.
python pretrain.py -M Vesper-4
python pretrain.py -M Vesper-12
python pretrain.py -M Vesper-12 -b 32 -g 0,1 -l 0.0005 --model_path_to_wavlm PATH_to_WavLM/WavLM-Large.pt
Specify fine-tuning hyperparameters on the command line or modify them in the configs/train_config.py
.
Please also specify path_to_vesper
on the command line or in the configs/model_config.py
.
python finetune.py -M Vesper-12 -d iemocap
python finetune.py -M Vesper-12 -d iemocap -g 0 -b 32 -l 0.0007 --model_path_to_vesper PATH_to_EXP_DIRECTORY/checkpoint/model_best.pt