Skip to content

media-comp/2022-iSeparate

Repository files navigation

iSeparate

This repository consists of an attempt to reimplement, reproduce and unify the various deep learning based methods for Music Source Separation.

This project was started as part of the requirement for the course Media Computing in Practice at the University of Tokyo, under the guidance of Yusuke Matsui sensei.

This is a work in progress, current results are decent but not as good as reported in the papers, please use with a pinch of salt. Will continue to try and improve the quality of separation.

Currently implemented methods:

Model Paper Official code
D3Net Densely connected multidilated convolutional networks for dense prediction tasks
(CVPR 2021, Takahashi et al., Sony)
link
Demucs v2 Music Source Separation in the Waveform Domain
(Arxiv 2021, Defossez et al., Facebook, INRIA)
link

Getting Started

For Linux users:

Install the libsndfile and soundstretch libraries using your packagemanager, for example:

  sudo apt-get install libsndfile1 soundstretch

For Windows and Linux users

If you use anaconda or miniconda, you can quickly create an environment using the provided environment yaml files.

For GPU machines:

conda env create --name <envname> --file=environment-cuda.yml

For CPU only machines:

conda env create --name <envname> --file=environment-cpu.yml

After creating the environment you can activate it as below:

conda activate <envname>

For Mac users:

To do

Separate using pre-trained model

Create your own Karaoke tracks!

Currently the D3Net vocals model has been uploaded to Huggingface and you can run vocals-accompaniment separation using that model with the separate.py script. Invoke the separation as follows:

python separate.py \
                -c configs/d3net/eval.yaml \
                -i path/to/song.wav

Currently only .wav files are supported on windows. You can use the following command to convert .mp3 file to .wav file within the conda environment created above:

ffmpeg -i song.mp3 song.wav

You can use .mp3 files directly on linux, without conversion.

Dataset Preparation and Training

If you would like to train the models yourself, please follow the following procedure

Dataset Preparation

iSeparate currently supports the MUSDB18 dataset. This dataset is in the Native Instruments STEMS format. However, it is easier to deal with decoded .wav files. To do that you can run the prepare_dataset.py file.

If you would like to download a small 7s version of the dataset for testing the code, run

python prepare_dataset.py \
                        --root data/MUSDB18-sample \
                        --wav-root data/MUSDB18-sample-wav \
                        --filelists-dir filelists/musdb-sample \
                        --download-sample \
                        --keep-wav-only \
                        --make-symlink

If you would like to download the full dataset for training, run

python prepare_dataset.py \
                        --root data/MUSDB18 \
                        --wav-root data/MUSDB18-wav \
                        --filelists-dir filelists/musdb \
                        --keep-wav-only \
                        --make-symlink

The prepare_dataset.py downloads the data in STEMS format to the directory specified by --root and then extracts the wav files into the directory specified by --wav-root. If you want to delete the STEMS and keep only the wav files, you can use the --keep-wav-only option. The --make-symlink option will create a symbolic link from the wav directory to the data/MUSDB18-wav directory. If you wanted you could also edit the config files in configs directory to point to the dataset directory.

Training

Nvidia GPU's are required for training. These models require quite a lot of VRAM, you can change the batch_size parameter in the configs to suit your needs.

Add the --debug flag at the end if you just want to do a debug run (train on one batch and validation and then cleans up after itself)

To train on a single GPU:

python train.py --config-file configs/<method>/<config-name.yaml>

To train on multiple GPU with DistributedDataParallel

python -m torch.distributed.run \
               --nproc_per_node=4 train.py \
               --config-file configs/<method>/<config-name.yaml>

Extending and Contributing

If you would like to add a new method and train on the MUSDB18 dataset, do the following steps

  • create a model package: models/awesome-method
    • implement your model
    • add the separate.py file and implement the load_models and separate functions
    • add the model to model_switcher.py
  • create and/or add your custom loss functions to the losses/loss_switcher.py
  • create config files following the examples in configs directory