SING: Symbol-to-Instrument Neural Generator
SING is a deep learning based music notes synthetizer that can be trained on the NSynth dataset. Despite being 32 times faster to train and 2,500 faster for inference, SING produces audio with significantly improved perceptual quality compared to the NSynth wavenet-like autoencoder  as measured by Mean Opinion Scores based on human evaluations.
The architecture and results obtained are detailed in our paper SING: Symbol-to-Instrument Neural Generator. SING is based on a LSTM based sequence generator and a convolutional decoder:
SING works with python3.6 and newest. To use SING, you must have decently recent version of the following package installed:
- pytorch (needs to be >= 4.1.0 as we use torch.stft)
If you have anaconda installed, you can run from the root of this repository:
conda env update conda activate sing
This will create a
sing environmnent with all the dependencies installed.
Alternatively, you can use pip to install those:
pip3 install -r requirements.txt
SING can optionally be installed using the usual
although this is not required.
Obtaining the NSynth dataset
If you want to train SING from scratch, you will need a copy of the NSynth dataset . To download it, you use the following instructions (WARNING, NSynth is 30GB so this will take a bit of time):
mkdir data && cd data &&\ wget http://download.magenta.tensorflow.org/datasets/nsynth/nsynth-train.jsonwav.tar.gz &&\ tar xf nsynth-train.jsonwav.tar.gz
Once installed or from the root of this repository, you can use a family of commands detailed hereafter of the form
python3 -m sing.*
For either training or generation, use the
--cuda flag for GPU acceleration
--parallel flag to use all available GPUs. Depending on the memory
and number of GPUs available, consider tweaking the batch size using the
--batch-size flag. The default is 64 but 256 was used in the paper.
If you already have the NSynth dataset downloaded somewhere, run
python3 -m sing.train [--cuda [--parallel]] --data PATH_TO_NSYNTH \ --output PATH_TO_SING_MODEL [--checkpoint PATH_TO_CHECKPOINTS]
PATH_TO_NSYNTH is by default set to
The final model will be saved at
PATH_TO_SING_MODEL (default is
models/sing.th). If you want
to save checkpoints after each epoch, or to resume a previously interrupted
training, use the
For generation, you do not need the NSynth dataset but you should have a trained SING model.
python3 -m sing.generate [--cuda [--parallel]] \ --model PATH_TO_SING_MODEL PATH_TO_ITEM_LIST
PATH_TO_ITEM_LIST should be a file with one dataset item name per list,
Alternatively, you can download a pretrained model using
python3 -m sing.generate [--cuda [--parallel]] --dl PATH_TO_ITEM_LIST
By default, the model will be downloaded under
models/sing.th but a
different path can be provided using the
The pretrained model can be directly download here.
To reproduce the results of Table 1 in our paper, simply run
# For the L1 spectral losss python3 -m sing.train [--cuda [--parallel]] --l1 # For the L1 spectral loss without time embeddings python3 -m sing.train [--cuda [--parallel]] --l1 --time-dim=0 # For the Wav loss python3 -m sing.train [--cuda [--parallel]] --wav
To reproduce the audio samples used for the human evaluations, simply run from the root of the git repository
python3 -m sing.generate [--cuda [--parallel]] --dl nsynth_100_test.txt
nsynth_100_test.txt has been generated using the following code:
from sing import nsynth from sing.fondation.datasets import RandomSubset dset = nsynth.get_nsynth_metadata() train, valid, test = nsynth.make_datasets(dset) evaluation = RandomSubset(test, 100) open("nsynth_100_test.txt", "w").write("\n".join( evaluation[i].metadata['name'] for i in range(len(evaluation))))
We thank the Magenta team for their inspiring work on NSynth.
For conveniance we have included a copy of the metadata of the NSynth dataset in this repository. The dataset has been released by Google Inc under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
SING is released under Creative Commons Attribution 4.0 International (CC BY 4.0) license, as found in the LICENSE file.
: Jesse Engel, Cinjon Resnick, Adam Roberts, Sander Dieleman, Douglas Eck, Karen Simonyan, and Mohammad Norouzi. Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders. 2017.