# Evaluate generative multimodal audio models

Use this notebook to evaluate the generative multimodal audio models. This notebook uses following metrics:

- Frechet Audio Distance (FAD)
- Kulback-Leibler Divergence (KLD)
- ImageBind score (IB)
- Synchronisation Error (SE)

❗️**Note**❗️ Sometimes GPU resources are not freed until the Jupyter Kernel is restarted. If you encounter CUDA out of memory error, please restart the kernel and run the notebook again.

## Setup
User must have

1. Videos with generated audio
2. Ground truth videos
3. Initialised the environment according to [README.md](README.md)

## Helpers and Imports

In [12]:
from pathlib import Path
from pprint import pprint

from eval_utils.utils import dataclass_from_dict
from configs.evaluation_cfg import EvaluationCfg, PipelineCfg
from metrics.evaluation_metrics import EvaluationMetrics
from metrics.evaluation_metrics_combiner import EvaluationMetricsCombiner

%reload_ext autoreload
%autoreload 2

In [3]:
def get_evaluation_config(eval_cfg_dict: dict):
    evaluation_cfg = dataclass_from_dict(EvaluationCfg, eval_cfg_dict)
    assert type(evaluation_cfg) == EvaluationCfg
    assert type(evaluation_cfg.pipeline) == PipelineCfg
    return evaluation_cfg

def get_calculated_evaluation_metrics(evaluation_cfg: EvaluationCfg, force_recalculate: bool = False):
    print("Evaluating", evaluation_cfg.sample_directory.as_posix())
    evaluation_metrics = EvaluationMetrics(evaluation_cfg)
    assert type(evaluation_metrics) == EvaluationMetrics
    evaluation_metrics.run_all(force_recalculate)
    evaluation_metrics.export_results()
    print("Evaluation done\n")
    return evaluation_metrics

## Define configurations

<p align="center">❗️**NOTE**❗️</p>
<p align="center">Only modify the following cell with your arguments and paths. <span style="color:red">Do not touch any other cell if not stated otherwise.</span></p>
<p align="center">❗️**NOTE**❗️</p>

1. Define the paths to the videos with model-generated audio (*sample_dirs*)
2. Define the path to ground truth videos (*gt_dir*)
3. Define the evaluation pipeline (*pipeline_cfg_dict*)
    - Define the metrics to be used (fad, kld, insync)
    - Define the parameters for individual metrics (see ./configs/evaluation_cfg.py for more details)
    - Example:
        - Only insync metric with default params: {"insync": {}}
        - All the metrics with default params: {"fad": {}, "kld": {}, "insync": {}}
        - Only FAD calculated using PCA: {"fad": {"use_pca": True}}
4. Define verbosity (*is_verbose*)

In [4]:
sample_dirs = [
    Path(
        "/home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa"
    ),
    Path(
        "/home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_1"
    ),
    Path(
        "/home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_2"
    ),
]
gt_dir = Path("")
pipeline_cfg_dict = {"insync": {}}
is_verbose = True

## Initialise configurations

In [5]:
assert pipeline_cfg_dict is not None, "Pipeline is not defined or it is empty."
evaluation_cfgs = [
    get_evaluation_config(
        {
            "sample_directory": sample_dir,
            "gt_directory": gt_dir,
            "pipeline": pipeline_cfg_dict,
            "verbose": is_verbose,
        }
    )
    for sample_dir in sample_dirs
]

Evaluation pipeline:
sample_directory: /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa
gt_directory: .
result_directory: /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa
PipelineCfg(fad=None, kld=None, insync=InSyncCfg(exp_name='24-01-04T16-39-21', device='cuda:0', vfps=25, afps=16000, input_size=256, ckpt_parent_path='./checkpoints/sync_models'))
verbose: True

Evaluation pipeline:
sample_directory: /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_1
gt_directory: .
result_directory: /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_1
PipelineCfg(fad=None, kld=None, insync=InSyncCfg(exp_name='24-01-04T16-39-21', device='cuda:0', vfps=25, afps=16000, input_size=256, ckpt_parent_path='./checkpoints/sync_models'))
verbose: True

Evaluation pipeline:
sample_directory: /

# Metrics
Metrics class are initialised with the *EvaluationCfg* -class which defines the evaluation pipeline. The class is used to calculate the metrics for a single sample directory (EvaluationCfg entry). *EvaluationMetricsCombiner* -class is used to combine the metrics for all the sample directories for plotting.

In [6]:
metrics = [
    get_calculated_evaluation_metrics(evaluation_cfg)
    for evaluation_cfg in evaluation_cfgs
]


Evaluating /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa


Reencoding directory generated_samples_24-02-28T14-47-57_jepa to 25 fps, 16000 afps, 256 input size: 100%|██████████| 40/40 [00:11<00:00,  3.39it/s]


Reencoded samples
Results exported to /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa/results_24-03-12T12-01-11.yaml
Evaluation done

Evaluating /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_1


Reencoding directory generated_samples_24-02-28T14-47-57_jepa_1 to 25 fps, 16000 afps, 256 input size: 100%|██████████| 40/40 [00:11<00:00,  3.40it/s]


Reencoded samples
Results exported to /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_1/results_24-03-12T12-01-34.yaml
Evaluation done

Evaluating /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_2


Reencoding directory generated_samples_24-02-28T14-47-57_jepa_2 to 25 fps, 16000 afps, 256 input size: 100%|██████████| 40/40 [00:11<00:00,  3.37it/s]


Reencoded samples
Results exported to /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_2/results_24-03-12T12-01-56.yaml
Evaluation done



In [9]:
combined_results = EvaluationMetricsCombiner(metrics)
pprint(combined_results.combine())

{'insync': (['generated_samples_24-02-28T14-47-57_jepa',
   'generated_samples_24-02-28T14-47-57_jepa_1',
   'generated_samples_24-02-28T14-47-57_jepa_2'],
  [0.05, 0.05, 0.05]),
 'insync_per_video': (['generated_samples_24-02-28T14-47-57_jepa/2015-09-23-16-13-51-173_denoised_298.mp4.mp4',
   'generated_samples_24-02-28T14-47-57_jepa/2015-09-24-15-43-24-742_denoised_548.mp4.mp4',
   'generated_samples_24-02-28T14-47-57_jepa/2015-02-21-17-34-22_denoised_228.mp4.mp4',
   'generated_samples_24-02-28T14-47-57_jepa/2015-03-20-01-33-24_denoised_341.mp4.mp4',
   'generated_samples_24-02-28T14-47-57_jepa/2015-03-30-01-48-12_denoised_169.mp4.mp4',
   'generated_samples_24-02-28T14-47-57_jepa/2015-09-30-20-27-11-36_denoised_449.mp4.mp4',
   'generated_samples_24-02-28T14-47-57_jepa/2015-09-24-14-41-06-753_denoised_146.mp4.mp4',
   'generated_samples_24-02-28T14-47-57_jepa/2015-09-18-04-24-32-123_denoised_109.mp4.mp4',
   'generated_samples_24-02-28T14-47-57_jepa/2015-03-31-02-11-31_denoised_187.

## Plotting
Plotting the combined results. Here you can define the plotting directory.

In [15]:
# Define the output directory for the plots (if desired)
# if not defined, the plots will not be saved but returned as matplotlib figures and
# displayed in the notebook. (You can save them manually from the notebook)
plot_dir = None
combined_results.plot(plot_dir)

{'insync': PosixPath('insync.png'),
 'insync_per_video': PosixPath('insync_per_video.png')}