# Evaluate generative multimodal audio models

Use this notebook to evaluate the generative multimodal audio models. This notebook uses following metrics:

- Frechet Audio Distance (FAD)
- Kulback-Leibler Divergence (KLD)
- ImageBind score (IB)
- Synchronisation Error (SE)

## Setup
User must have

1. Generated audio samples
1. GT audios for given videos that were used when generating the audio

## Configuration

In [15]:
from pathlib import Path

from utils.utils import dataclass_from_dict
from configs.evaluation_cfg import EvaluationCfg
from metrics.evaluation_metrics import EvaluationMetrics

%reload_ext autoreload
%autoreload 2

In [17]:
sample_dir = Path("/home/hdd/data/greatesthits/evaluation/23-12-20T00-45-15/generated_samples_23-12-20T09-17-40")
# init evaluation config with default KLD and FAD metrics
evaluation_cfg = dataclass_from_dict(EvaluationCfg, {"sample_directory": sample_dir, "pipeline": {"fad": {}, "kld": {}}})
assert type(evaluation_cfg) == EvaluationCfg


Evaluation pipeline:
sample_directory: /home/hdd/data/greatesthits/evaluation/23-12-20T00-45-15/generated_samples_23-12-20T09-17-40
gt_directory: /home/hdd/data/greatesthits/evaluation/GT
result_directory: /home/hdd/data/greatesthits/evaluation/23-12-20T00-45-15/generated_samples_23-12-20T09-17-40
PipelineCfg(fad=FADCfg(model_name='vggish', sample_rate=16000, use_pca=False, use_activation=False, dtype='float32', embeddings_fn='vggish_embeddings.npy'), kld=KLDCfg(pretrained_length=10, batch_size=10, num_workers=10, duration=2.0))
verbose: False


# Metrics
Metrics class are initialised with the *EvaluationCfg* -class which defines the evaluation pipeline.

In [23]:
metrics = EvaluationMetrics(evaluation_cfg)
print(f"FAD: {metrics.run_fad()}")
print(f"KLD: {metrics.run_kld()}")

Embeddings found in sample directory (/home/hdd/data/greatesthits/evaluation/23-12-20T00-45-15/generated_samples_23-12-20T09-17-40/vggish_embeddings.npy)
Embeddings found in gt directory (/home/hdd/data/greatesthits/evaluation/GT/vggish_embeddings.npy)


Using cache found in /home/ilpo/.cache/torch/hub/harritaylor_torchvggish_master


FAD: 2.864387990917186
