This is the official repository for the paper "Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning".
If there is any problem related to the code running, please open an issue and I will help you as mush as I can.
To promote transparency and reproducibility in research, I have retrained a similar model using publicly available datasets after the internship. This model has been trained on public data and adheres to the same methodology described in the paper.
**Note that this is NOT the official ckpt and has NO relation with Sony. The performance is similar to the official checkpoint. **
https://huggingface.co/ldzhangyx/instruct-MusicGen/blob/main/finetuned.ckpt
https://bit.ly/instruct-musicgen
# clone project
git clone https://github.com/ldzhangyx/instruct-MusicGen/
cd instruct-MusicGen
# [OPTIONAL] create conda environment
conda create -n myenv python=3.11.7
conda activate myenv
# install pytorch according to instructions
# https://pytorch.org/get-started/
# install requirements
pip install -r requirements.txt# clone project
git clone https://github.com/ldzhangyx/instruct-MusicGen/
cd instruct-MusicGen
# create conda environment and install dependencies
conda env create -f environment.yaml -n myenv
# activate conda environment
conda activate myenvTrain model with default configuration
# train on CPU
python src/train.py trainer=cpu
# train on GPU
python src/train.py trainer=gpuYou may need to change essential parameters in config/config.yaml to fit your own dataset.
You can override any parameter from command line like this
python src/train.py trainer.max_epochs=50 data.batch_size=4python src/data/slakh_datamodule.pyFor add, remove, extract operation, please change the parameters in both test_step() in src/models/instructmusicgenadapter_module.py and __getitem__() in src/data/slakh_datamodule.py.
Currently it should be completed manually. But we will provide a script to automate this process soon.
python src/eval.pyPlease make sure the generated music files are in the corresponding locations.
python evaluation/utils.py # to generate a csv file for CLAP calculation
python evaluation/main.pyAfter preparing the checkpoint and the input audio file, you can generate audio via
python src/inference.py@article{zhang2024instruct,
title={Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning},
author={Zhang, Yixiao and Ikemiya, Yukara and Choi, Woosung and Murata, Naoki and Mart{\'\i}nez-Ram{\'\i}rez, Marco A and Lin, Liwei and Xia, Gus and Liao, Wei-Hsiang and Mitsufuji, Yuki and Dixon, Simon},
journal={arXiv preprint arXiv:2405.18386},
year={2024}
}