Skip to content

pablebe/mert-emb-eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

mert-emb-eval

Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations.

This repository contains scripts to evaluate musical source separation (MSS) models using embedding-based metrics (specifically MERT-v1-95M) and correlate these metrics with subjective listener ratings. It supports evaluation on two datasets: a generative SVS dataset ('gensvs') and a MUSDB18-based 'bake-off' dataset.

Key Scripts

1. calc_embmse.py

This script calculates various objective metrics for separated audio files.

Metrics:

  • MERT-v1-95M-MSE: Mean Squared Error in the MERT embedding space.
  • FADMERT-v1-95M: An intrusive variant of Frechet Audio Distance using MERT embeddings.
  • SDR, SI-SDR, SI-SIR, SI-SAR: Signal-to-Distortion Ratio and Scale-Invariant Signal-to-Distortion/Interference/Artifact Ratios.
  • WAV-MSE: Mean Squared Error in the waveform domain.
  • SPEC-MSE: Mean Squared Error in the magnitude spectrogram domain.

Configuration: Open the script and adjust the global variables at the top:

  • DATASET: Set to 'gensvs' or 'bake_off'.
  • SEP_PATH: Path to separated audio files.
  • TGT_PATH: Path to reference/target audio files.
  • EMBEDDING: Embedding model to use (default: 'MERT-v1-95M').
  • WORKERS: Number of workers for audio loading.

Usage:

python calc_embmse.py

Results will be saved in emb_mse_results_gensvs/ or emb_mse_results_bake_off/ depending on the dataset configuration.

2. corr_metrics_and_ratings.py

This script calculates the correlation (Pearson, Spearman, Kendall's Tau) between the objective metrics calculated by calc_embmse.py and subjective listener ratings provided in a CSV file.

Configuration:

  • DATASET: Set to 'gensvs' or 'bake_off' to match the evaluation step.

Usage:

python corr_metrics_and_ratings.py

The script loads listener responses (default: third_party/bake_off/raw_listener_responses_w_violations.csv) and correlates them with the metric results found in the output directory.

Installation

The embedding metrics in this repository rely on the gensvs package.

Install it via pip:

pip install gensvs

(For more details, visit the gensvs PyPI page.)

Other Dependencies

Ensure your Python environment also has the following packages installed:

  • numpy
  • pandas
  • scipy
  • torch, torchaudio
  • torchmetrics
  • soundfile
  • tqdm
  • nussl
  • transformers

Audio Data Downloads

You will need the separated audio files and ground truth references to run the evaluation.

1. Bake-Off Dataset (DATASET = 'bake_off')

  • Separated Audio (Bake-Off Models): The audio generated by the models evaluated in the 2025 Bake-Off paper can be downloaded from Zenodo:
    https://doi.org/10.5281/zenodo.15843081
  • Separated Audio (IRM1 / SiSEC18): The IRM1 (Oracle) and REP1 audio is available as part of the SiSEC18 estimates:
    https://zenodo.org/records/1256003
  • Reference Audio: Requires MUSDB18-HQ or standard MUSDB18 test set (reference files).

Placement:

  • Separated files should go to SEP_PATH (default: ./audio/MSS_bake_off_eval_audio/10s_audio).
  • Reference files should go to TGT_PATH (default: ./audio/musdb18_test_10s).

2. GenSVS Dataset (DATASET = 'gensvs')

Placement:

  • Unzip contents into audio/gensvs_eval_audio (structure should typically be flat model folders + a target folder).

Listener Ratings

If running corr_metrics_and_ratings.py, you need the listener response CSV. A simplified version is included in this repository at third_party/bake_off/raw_listener_responses_w_violations.csv.

Models

  • MERT-v1-95M: The script uses the MERT-v1-95M model for embeddings. This will typically be downloaded automatically via the transformers library on the first run.

References

If you use the datasets or code evaluated in this work, please consider citing the following papers:

DAGA 2026 paper

Reference to DAGA paper will be added soon!

GenSVS Dataset

@INPROCEEDINGS{11230934,
  author={Bereuter, Paul A. and Stahl, Benjamin and Plumbley, Mark D. and Sontacchi, Alois},
  booktitle={2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)}, 
  title={Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models}, 
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Measurement;Degradation;Training;Time-frequency analysis;Correlation;Limiting;Computational modeling;Conferences;Reliability;Software development management},
  doi={10.1109/WASPAA66052.2025.11230934}}

2025 Bake-Off Dataset

@INPROCEEDINGS{11230942,
  author={Jaffe, Noah and Burgoyne, John Ashley},
  booktitle={2025 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)}, 
  title={Musical Source Separation Bake-Off: Comparing Objective Metrics with Human Perception}, 
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Measurement;Source separation;Statistical analysis;Instruments;Energy measurement;Interference;Reproducibility of results;Recording;Reliability;Signal to noise ratio},
  doi={10.1109/WASPAA66052.2025.11230942}}

See also third_party/bake_off/README.md for details on the listener study data (DOI: 10.5281/zenodo.15843081).

About

Embedding-Based Intrusive Evaluation Metrics for Musical Source Separation Using MERT Representations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages