# Evaluate generative multimodal audio models

Use this notebook to evaluate the generative multimodal audio models. This notebook uses following metrics:

- Frechet Audio Distance (FAD)
- Kulback-Leibler Divergence (KLD)
- ImageBind score (IB)
- Synchronisation Error (SE)

**Note** Sometimes GPU resources are not freed until the Jupyter Kernel is restarted. If you encounter CUDA out of memory error, please restart the kernel and run the notebook again.

## Setup
User must have

1. Generated audio samples
1. GT audios for given videos that were used when generating the audio

## Configuration

In [1]:
from pathlib import Path
from pprint import pprint

from eval_utils.utils import dataclass_from_dict
from configs.evaluation_cfg import EvaluationCfg, PipelineCfg
from metrics.evaluation_metrics import EvaluationMetrics

%reload_ext autoreload
%autoreload 2

  warn(


In [2]:
sample_dir = Path("/home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa")
# init evaluation config with default KLD and FAD metrics
evaluation_cfg = dataclass_from_dict(EvaluationCfg, {"sample_directory": sample_dir, "verbose": True, "pipeline": {"insync": {}}})
assert type(evaluation_cfg) == EvaluationCfg
assert type(evaluation_cfg.pipeline) == PipelineCfg


Evaluation pipeline:
sample_directory: /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa
gt_directory: /home/hdd/data/greatesthits/evaluation/GT
result_directory: /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa
PipelineCfg(fad=None, kld=None, insync=InSyncCfg(exp_name='24-01-04T16-39-21', device='cuda:0', vfps=25, afps=16000, input_size=256, ckpt_parent_path='./checkpoints/sync_models'))
verbose: True


# Metrics
Metrics class are initialised with the *EvaluationCfg* -class which defines the evaluation pipeline.

In [4]:
metrics = EvaluationMetrics(evaluation_cfg)
# metrics.run_all()
# print(f"FAD: {metrics.run_fad()}")
# print(f"KLD: {metrics.run_kld()}")
pprint(f"InSync: {metrics.run_insync()}")

Reencoding directory generated_samples_24-02-28T14-47-57_jepa to 25 fps, 16000 afps, 256 input size: 100%|██████████| 40/40 [00:05<00:00,  7.28it/s]


Reencoded samples


TypeError: cross_entropy_loss(): argument 'target' (position 2) must be Tensor, not dict

In [None]:
metrics.export_results()