This repo provides an easy way to evaluate generative V2A models. It is based on PyTorch.
This section walks you through the process of evaluating a generative v2a model. The following steps are required:
First, you need to install the required environment. You can do this by running the following command:
conda env create -f conda_env_cu12.1.yaml
Also, AudioTools by Descript Inc. is needed for audio processing and PASST model for metrics. You can install them by running the following commands:
pip install git+https://github.com/descriptinc/audiotools
pip install git+https://github.com/kkoutini/passt_hear21@0.0.19#egg=hear21passt
This evaluation pipeline uses Synchformer model to analyze the audio-visual synchronization. Run the following command to download the Synchformer checkpoints:
bash ./checkpoints/download_synchformer_checkpoints.sh
Use run_evaluations.ipynb to run the pipeline. All the required steps are described in the notebook.