# Evaluate generative multimodal audio models

Use this notebook to evaluate the generative multimodal audio models. This notebook uses following metrics:

- Frechet Audio Distance (FAD)
- Kulback-Leibler Divergence (KLD)
- ImageBind score (IB)
- Synchronisation Error (SE)

**Note** Sometimes GPU resources are not freed until the Jupyter Kernel is restarted. If you encounter CUDA out of memory error, please restart the kernel and run the notebook again.

## Setup
User must have

1. Generated audio samples
1. GT audios for given videos that were used when generating the audio

## Helpers and Imports

In [18]:
from pathlib import Path
from pprint import pprint

from eval_utils.utils import dataclass_from_dict
from configs.evaluation_cfg import EvaluationCfg, PipelineCfg
from metrics.evaluation_metrics import EvaluationMetrics
from metrics.evaluation_metrics_combiner import EvaluationMetricsCombiner

%reload_ext autoreload
%autoreload 2

In [14]:
def get_evaluation_config(eval_cfg_dict: dict):
    evaluation_cfg = dataclass_from_dict(EvaluationCfg, eval_cfg_dict)
    assert type(evaluation_cfg) == EvaluationCfg
    assert type(evaluation_cfg.pipeline) == PipelineCfg
    return evaluation_cfg

def get_calculated_evaluation_metrics(evaluation_cfg: EvaluationCfg, force_recalculate: bool = False):
    print("Evaluating", evaluation_cfg.sample_directory.as_posix())
    evaluation_metrics = EvaluationMetrics(evaluation_cfg)
    assert type(evaluation_metrics) == EvaluationMetrics
    evaluation_metrics.run_all(force_recalculate)
    evaluation_metrics.export_results()
    print("Evaluation done\n")
    return evaluation_metrics

## Define configurations

<p align="center">❗️**NOTE**❗️</p>
<p align="center">Only modify the following cell with your arguments and paths. <span style="color:red">Do not touch any other cell if not stated otherwise.</span></p>
<p align="center">❗️**NOTE**❗️</p>

1. Define the paths to the videos with model-generated audio (*sample_dirs*)
2. Define the path to ground truth videos (*gt_dir*)
3. Define the evaluation pipeline (*pipeline_cfg_dict*)
    - Define the metrics to be used (fad, kld, insync)
    - Define the parameters for individual metrics (see ./configs/evaluation_cfg.py for more details)
    - Example:
        - Only insync metric with default params: {"insync": {}}
        - All the metrics with default params: {"fad": {}, "kld": {}, "insync": {}}
        - Only FAD calculated using PCA: {"fad": {"use_pca": True}}
4. Define verbosity (*is_verbose*)

In [3]:
sample_dirs = [
    Path(
        "/home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa"
    ),
    Path(
        "/home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_1"
    ),
    Path(
        "/home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_2"
    ),
]
gt_dir = Path("")
pipeline_cfg_dict = {"insync": {}}
is_verbose = True

## Initialise configurations

In [15]:
assert pipeline_cfg_dict is not None, "Pipeline is not defined or it is empty."
evaluation_cfgs = [
    get_evaluation_config(
        {
            "sample_directory": sample_dir,
            "gt_directory": gt_dir,
            "pipeline": pipeline_cfg_dict,
            "verbose": is_verbose,
        }
    )
    for sample_dir in sample_dirs
]

Evaluation pipeline:
sample_directory: /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa
gt_directory: .
result_directory: /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa
PipelineCfg(fad=None, kld=None, insync=InSyncCfg(exp_name='24-01-04T16-39-21', device='cuda:0', vfps=25, afps=16000, input_size=256, ckpt_parent_path='./checkpoints/sync_models'))
verbose: True

Evaluation pipeline:
sample_directory: /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_1
gt_directory: .
result_directory: /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_1
PipelineCfg(fad=None, kld=None, insync=InSyncCfg(exp_name='24-01-04T16-39-21', device='cuda:0', vfps=25, afps=16000, input_size=256, ckpt_parent_path='./checkpoints/sync_models'))
verbose: True

Evaluation pipeline:
sample_directory: /

# Metrics
Metrics class are initialised with the *EvaluationCfg* -class which defines the evaluation pipeline. The class is used to calculate the metrics for a single sample directory (EvaluationCfg entry). *EvaluationMetricsCombiner* -class is used to combine the metrics for all the sample directories for plotting.

In [19]:
metrics = [
    get_calculated_evaluation_metrics(evaluation_cfg)
    for evaluation_cfg in evaluation_cfgs
]
combined_results = EvaluationMetricsCombiner(metrics)
combined_results.combine()

Evaluating /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa


Reencoding directory generated_samples_24-02-28T14-47-57_jepa to 25 fps, 16000 afps, 256 input size: 100%|██████████| 40/40 [00:11<00:00,  3.35it/s]


Reencoded samples
Results exported to /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa/results_24-03-12T11-19-55.yaml
Evaluation done

Evaluating /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_1


Reencoding directory generated_samples_24-02-28T14-47-57_jepa_1 to 25 fps, 16000 afps, 256 input size: 100%|██████████| 40/40 [00:12<00:00,  3.33it/s]


Reencoded samples
Results exported to /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_1/results_24-03-12T11-20-17.yaml
Evaluation done

Evaluating /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_2


Reencoding directory generated_samples_24-02-28T14-47-57_jepa_2 to 25 fps, 16000 afps, 256 input size: 100%|██████████| 40/40 [00:11<00:00,  3.35it/s]


Reencoded samples
Results exported to /home/hdd/ilpo/evaluation_data/synchronisonix/24-02-27T16-46-55/generated_samples_24-02-28T14-47-57_jepa_2/results_24-03-12T11-20-39.yaml
Evaluation done



## Plotting
Plotting the combined results. Here you can define the plotting directory.

In [20]:
# Define the output directory for the plots (if desired)
# if not defined, the plots will not be saved but returned as matplotlib figures
plot_dir = None
combined_results.plot(plot_dir)

AttributeError: 'dict' object has no attribute 'plot'