mert-emb-eval

Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations.

This repository contains scripts to evaluate musical source separation (MSS) models using embedding-based metrics (specifically MERT-v1-95M) and correlate these metrics with subjective listener ratings. It supports evaluation on two datasets: a generative SVS dataset ('gensvs') and a MUSDB18-based 'bake-off' dataset.

Key Scripts

1. `calc_embmse.py`

This script calculates various objective metrics for separated audio files.

Metrics:

MERT-v1-95M-MSE: Mean Squared Error in the MERT embedding space.
FADMERT-v1-95M: An intrusive variant of Frechet Audio Distance using MERT embeddings.
SDR, SI-SDR, SI-SIR, SI-SAR: Signal-to-Distortion Ratio and Scale-Invariant Signal-to-Distortion/Interference/Artifact Ratios.
WAV-MSE: Mean Squared Error in the waveform domain.
SPEC-MSE: Mean Squared Error in the magnitude spectrogram domain.

Configuration: Open the script and adjust the global variables at the top:

DATASET: Set to 'gensvs' or 'bake_off'.
SEP_PATH: Path to separated audio files.
TGT_PATH: Path to reference/target audio files.
EMBEDDING: Embedding model to use (default: 'MERT-v1-95M').
WORKERS: Number of workers for audio loading.

Usage:

python calc_embmse.py

Results will be saved in emb_mse_results_gensvs/ or emb_mse_results_bake_off/ depending on the dataset configuration.

2. `corr_metrics_and_ratings.py`

This script calculates the correlation (Pearson, Spearman, Kendall's Tau) between the objective metrics calculated by calc_embmse.py and subjective listener ratings provided in a CSV file.

Configuration:

DATASET: Set to 'gensvs' or 'bake_off' to match the evaluation step.

Usage:

python corr_metrics_and_ratings.py

The script loads listener responses (default: third_party/bake_off/raw_listener_responses_w_violations.csv) and correlates them with the metric results found in the output directory.

Installation

The embedding metrics in this repository rely on the gensvs package.

Install it via pip:

pip install gensvs

(For more details, visit the gensvs PyPI page.)

Other Dependencies

Ensure your Python environment also has the following packages installed:

numpy
pandas
scipy
torch, torchaudio
torchmetrics
soundfile
tqdm
nussl
transformers

Audio Data Downloads

You will need the separated audio files and ground truth references to run the evaluation.

1. Bake-Off Dataset (`DATASET = 'bake_off'`)

Separated Audio (Bake-Off Models): The audio generated by the models evaluated in the 2025 Bake-Off paper can be downloaded from Zenodo:
https://doi.org/10.5281/zenodo.15843081
Separated Audio (IRM1 / SiSEC18): The IRM1 (Oracle) and REP1 audio is available as part of the SiSEC18 estimates:
https://zenodo.org/records/1256003
Reference Audio: Requires MUSDB18-HQ or standard MUSDB18 test set (reference files).

Placement:

Separated files should go to SEP_PATH (default: ./audio/MSS_bake_off_eval_audio/10s_audio).
Reference files should go to TGT_PATH (default: ./audio/musdb18_test_10s).

2. GenSVS Dataset (`DATASET = 'gensvs'`)

Full Dataset: The complete Generative Singing Voice Separation dataset is available on Zenodo:
https://zenodo.org/records/15911723

Placement:

Unzip contents into audio/gensvs_eval_audio (structure should typically be flat model folders + a target folder).

Listener Ratings

If running corr_metrics_and_ratings.py, you need the listener response CSV. A simplified version is included in this repository at third_party/bake_off/raw_listener_responses_w_violations.csv.

Models

MERT-v1-95M: The script uses the MERT-v1-95M model for embeddings. This will typically be downloaded automatically via the transformers library on the first run.

References

If you use the datasets or code evaluated in this work, please consider citing the following papers:

DAGA 2026 paper

Reference to DAGA paper will be added soon!

GenSVS Dataset

@INPROCEEDINGS{11230934,
  author={Bereuter, Paul A. and Stahl, Benjamin and Plumbley, Mark D. and Sontacchi, Alois},
  booktitle={2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)}, 
  title={Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models}, 
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Measurement;Degradation;Training;Time-frequency analysis;Correlation;Limiting;Computational modeling;Conferences;Reliability;Software development management},
  doi={10.1109/WASPAA66052.2025.11230934}}

2025 Bake-Off Dataset

@INPROCEEDINGS{11230942,
  author={Jaffe, Noah and Burgoyne, John Ashley},
  booktitle={2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)}, 
  title={Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception}, 
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Measurement;Source separation;Statistical analysis;Instruments;Energy measurement;Interference;Reproducibility of results;Recording;Reliability;Signal to noise ratio},
  doi={10.1109/WASPAA66052.2025.11230942}}

See also third_party/bake_off/README.md for details on the listener study data (DOI: 10.5281/zenodo.15843081).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
third_party		third_party
utils		utils
.gitignore		.gitignore
README.md		README.md
calc_embmse.py		calc_embmse.py
corr_metrics_and_ratings.py		corr_metrics_and_ratings.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mert-emb-eval

Key Scripts

1. `calc_embmse.py`

2. `corr_metrics_and_ratings.py`

Installation

Other Dependencies

Audio Data Downloads

1. Bake-Off Dataset (`DATASET = 'bake_off'`)

2. GenSVS Dataset (`DATASET = 'gensvs'`)

Listener Ratings

Models

References

DAGA 2026 paper

GenSVS Dataset

2025 Bake-Off Dataset

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

mert-emb-eval

Key Scripts

1. calc_embmse.py

2. corr_metrics_and_ratings.py

Installation

Other Dependencies

Audio Data Downloads

1. Bake-Off Dataset (DATASET = 'bake_off')

2. GenSVS Dataset (DATASET = 'gensvs')

Listener Ratings

Models

References

DAGA 2026 paper

GenSVS Dataset

2025 Bake-Off Dataset

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

1. `calc_embmse.py`

2. `corr_metrics_and_ratings.py`

1. Bake-Off Dataset (`DATASET = 'bake_off'`)

2. GenSVS Dataset (`DATASET = 'gensvs'`)

Packages