MARS-Sep: Multimodal-Aligned Reinforced Sound Separation

Zihan Zhang^1,*, Xize Cheng^1,*, Zhennan Jiang^2,*, Dongjie Fu¹, Jingyuan Chen¹
Zhou Zhao¹, Tao Jin^1,†

¹Zhejiang University ²CASIA
^†Corresponding author. ^*Equal contribution

ICLR 2026

📣 Overview

📦 Data Preparation

🎵 MUSIC Dataset

Please refer to the script under dataset/music.

🔊 VGGSound Dataset

Please refer to the script under dataset/vggsound.

🚀 Installation

Clone the repository and set up the environment:

git clone https://github.com/mars-sep/ImageBind.git
cd Imagebind
pip install .

git clone https://github.com/mars-sep/MARS-Sep.git
cd MARS-Sep/

conda create -n marssep python=3.10
conda activate marssep

pip install -r requirements.txt

🏃 Training

python train.py \
    -o exp/vggsound/marssep \
    -c conf/mars.yaml
    -t data/vggsound/train.csv \
    -v data/vggsound/val.csv \
    --batch_size 128 \
    --workers 20 \
    --emb_dim 1024 \
    --train_mode image text audio \
    --is_feature \
    --feature_mode imagebind

🔬 Evaluate

Evaluate on MUSIC and VGGSound.

OMP_NUM_THREADS=1 python evaluate.py -o exp/vggsound/marssep/ -c conf/mars.yaml -l exp/vggsound/marssep/eval_MUSIC_VGGS.txt -t data/MUSIC/solo/test.csv -t2 data/vggsound/test-good-no-music.csv --no-pit --prompt_ens --audio_source ./MUSIC-aq.npy

Evaluate on VGGSound-Clean+ and VGGSound.

OMP_NUM_THREADS=1 python evaluate.py -o exp/vggsound/marssep/ -c conf/mars.yaml -l exp/vggsound/marssep/eval_VGGS_VGGSN.txt -t data/vggsound/test-good.csv -t2 data/vggsound/test-no-music.csv --no-pit --prompt_ens --audio_source ./VGGSOUND-aq.npy

🔍 Inference

OMP_NUM_THREADS=1 python infer3.py -o exp/vggsound/marssep/  -i "demo/audio/hvCj8Dk0Su4.wav" --text_query "playing bagpipes" -f "exp/vggsound/marssep/hvCj8Dk0Su4/playing bagpipes.wav"

📜 Citation

If you find our work useful for your research, please feel free to cite our paper:

@misc{zhang2025marssepmultimodalalignedreinforcedsound,
      title={MARS-Sep: Multimodal-Aligned Reinforced Sound Separation}, 
      author={Zihan Zhang and Xize Cheng and Zhennan Jiang and Dongjie Fu and Jingyuan Chen and Zhou Zhao and Tao Jin},
      year={2025},
      eprint={2510.10509},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2510.10509}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
dataset		dataset
marssep		marssep
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MARS-Sep: Multimodal-Aligned Reinforced Sound Separation

📣 Overview

📦 Data Preparation

🎵 MUSIC Dataset

🔊 VGGSound Dataset

🚀 Installation

🏃 Training

🔬 Evaluate

🔍 Inference

📜 Citation

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

mars-sep/MARS-Sep

Folders and files

Latest commit

History

Repository files navigation

MARS-Sep: Multimodal-Aligned Reinforced Sound Separation

📣 Overview

📦 Data Preparation

🎵 MUSIC Dataset

🔊 VGGSound Dataset

🚀 Installation

🏃 Training

🔬 Evaluate

🔍 Inference

📜 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages