Benno Weck*1, Ilaria Manco*2,3, Emmanouil Benetos2, Elio Quinton3, George Fazekas2, Dmitry Bogdanov1
1 UPF, 2 QMUL, 3 UMG
This repository contains code and data for the paper MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models (ISMIR 2024).
[TODO]
The dataset is available to download from Zenodo:
wget -P data https://zenodo.org/record/12709974/files/muchomusic.csv
or via HuggingFace Datasets. You can access it using the 🤗 Datasets library:
from datasets import load_dataset
MuchoMusic = load_dataset("mulab-mir/muchomusic")
To use this code, we recommend creating a new python3 virtual environment:
python -m venv venv
source venv/bin/activate
Then, clone the repository and install the dependencies:
git clone https://github.com/mulab-mir/muchomusic.git
cd muchomusic
pip install -r requirements.txt
This codebase has been tested with Python 3.11.5.
muchomusic
├── data
│ └── muchomusic.csv
├── dataset_creation # code to generate and validate the dataset
├── muchomusic_eval # evaluation code
│ ├── configs # folder to store the config files for evaluation experiments
| └── ...
├── evaluate.py # run file to run the evaluation
└── prepare_prompts.py
Inputs to the benchmark should be given as a JSON object with the following format:
{
"id": 415600,
"prompt": "Question: What rhythm pattern do the digital drums follow? Options: (A) Four on the floor. (B) Off-beat syncopation. (C) Scat singing. (D) E-guitar playing a simple melody. The correct answer is: ",
"answers": [
"Pop music",
"Reggae",
"Latin rock",
"Ska"
],
"answer_orders": [
3,
0,
2,
1
],
"dataset": "sdd",
"genre": "Reggae",
"reasoning": [
"genre and style"
],
"knowledge": [],
"audio_path": "data/sdd/audio/00/415600.2min.mp3",
"model_output": "A"
}
To generate this, first run:
python prepare_prompts.py --output_path <path_to_json_file>
Then obtain the model predictions from each (audio, text) pair formed by prompt
and the corresponding audio at audio_path
, and populate model_output
accordingly.
python evaluate.py --output_dir <path_to_results_dir>
After running the code, the results will be stored in <path_to_results_dir>
.
If you use the code in this repo, please consider citing our work:
@inproceedings{weck2024muchomusic,
title={MuChoMusic: Evaluating Music Understanding in Multimodal Audio-Language Models},
author={Weck, Benno and Manco, Ilaria and Benetos, Emmanouil and Quinton, Elio and Fazekas, György and Bogdanov, Dmitry},
booktitle = {Proceedings of the 25th International Society for Music Information Retrieval Conference (ISMIR)},
year={2024}
}
This repository is released under the MIT License. Please see the LICENSE file for more details. The dataset is released under the CC BY-SA 4.0 license.
If you have any questions, please get in touch: benno.weck01@estudiant.upf.edu, i.manco@qmul.ac.uk.
If you find a problem when using the code, you can also open an issue.