Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations.
This repository contains scripts to evaluate musical source separation (MSS) models using embedding-based metrics (specifically MERT-v1-95M) and correlate these metrics with subjective listener ratings. It supports evaluation on two datasets: a generative SVS dataset ('gensvs') and a MUSDB18-based 'bake-off' dataset.
This script calculates various objective metrics for separated audio files.
Metrics:
- MERT-v1-95M-MSE: Mean Squared Error in the MERT embedding space.
- FADMERT-v1-95M: An intrusive variant of Frechet Audio Distance using MERT embeddings.
- SDR, SI-SDR, SI-SIR, SI-SAR: Signal-to-Distortion Ratio and Scale-Invariant Signal-to-Distortion/Interference/Artifact Ratios.
- WAV-MSE: Mean Squared Error in the waveform domain.
- SPEC-MSE: Mean Squared Error in the magnitude spectrogram domain.
Configuration: Open the script and adjust the global variables at the top:
DATASET: Set to'gensvs'or'bake_off'.SEP_PATH: Path to separated audio files.TGT_PATH: Path to reference/target audio files.EMBEDDING: Embedding model to use (default:'MERT-v1-95M').WORKERS: Number of workers for audio loading.
Usage:
python calc_embmse.pyResults will be saved in emb_mse_results_gensvs/ or emb_mse_results_bake_off/ depending on the dataset configuration.
This script calculates the correlation (Pearson, Spearman, Kendall's Tau) between the objective metrics calculated by calc_embmse.py and subjective listener ratings provided in a CSV file.
Configuration:
DATASET: Set to'gensvs'or'bake_off'to match the evaluation step.
Usage:
python corr_metrics_and_ratings.pyThe script loads listener responses (default: third_party/bake_off/raw_listener_responses_w_violations.csv) and correlates them with the metric results found in the output directory.
The embedding metrics in this repository rely on the gensvs package.
Install it via pip:
pip install gensvs(For more details, visit the gensvs PyPI page.)
Ensure your Python environment also has the following packages installed:
numpypandasscipytorch,torchaudiotorchmetricssoundfiletqdmnussltransformers
You will need the separated audio files and ground truth references to run the evaluation.
- Separated Audio (Bake-Off Models): The audio generated by the models evaluated in the 2025 Bake-Off paper can be downloaded from Zenodo:
https://doi.org/10.5281/zenodo.15843081 - Separated Audio (IRM1 / SiSEC18): The IRM1 (Oracle) and REP1 audio is available as part of the SiSEC18 estimates:
https://zenodo.org/records/1256003 - Reference Audio: Requires MUSDB18-HQ or standard MUSDB18 test set (reference files).
Placement:
- Separated files should go to
SEP_PATH(default:./audio/MSS_bake_off_eval_audio/10s_audio). - Reference files should go to
TGT_PATH(default:./audio/musdb18_test_10s).
- Full Dataset: The complete Generative Singing Voice Separation dataset is available on Zenodo:
https://zenodo.org/records/15911723
Placement:
- Unzip contents into
audio/gensvs_eval_audio(structure should typically be flat model folders + atargetfolder).
If running corr_metrics_and_ratings.py, you need the listener response CSV. A simplified version is included in this repository at third_party/bake_off/raw_listener_responses_w_violations.csv.
- MERT-v1-95M: The script uses the
MERT-v1-95Mmodel for embeddings. This will typically be downloaded automatically via thetransformerslibrary on the first run.
If you use the datasets or code evaluated in this work, please consider citing the following papers:
Reference to DAGA paper will be added soon!
@INPROCEEDINGS{11230934,
author={Bereuter, Paul A. and Stahl, Benjamin and Plumbley, Mark D. and Sontacchi, Alois},
booktitle={2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
title={Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models},
year={2025},
volume={},
number={},
pages={1-5},
keywords={Measurement;Degradation;Training;Time-frequency analysis;Correlation;Limiting;Computational modeling;Conferences;Reliability;Software development management},
doi={10.1109/WASPAA66052.2025.11230934}}@INPROCEEDINGS{11230942,
author={Jaffe, Noah and Burgoyne, John Ashley},
booktitle={2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
title={Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception},
year={2025},
volume={},
number={},
pages={1-5},
keywords={Measurement;Source separation;Statistical analysis;Instruments;Energy measurement;Interference;Reproducibility of results;Recording;Reliability;Signal to noise ratio},
doi={10.1109/WASPAA66052.2025.11230942}}See also third_party/bake_off/README.md for details on the listener study data (DOI: 10.5281/zenodo.15843081).