## Dataset Generation

In this notebook we generate and save datasets for training and validating deep learning models. Additionally, we create a small audio dataset for evaluation.

In [1]:
import spiegelib as spgl

Load Dexed and set the note length and render length to be one second. For this experiment we aren’t worried about the release of the sound, but you can set the render length longer than the note length to capture the release portion of a signal. Also reload the configuration previously saved.

In [2]:
synth = spgl.synth.SynthVST("/Library/Audio/Plug-Ins/VST/Dexed.vst", note_length_secs=1.0, render_length_secs=1.0)
synth.load_state('./synth_params/dexed_simple_fm.json')

#### MFCC Dataset

Generate training and testing dataset using Mel-frequency Cepstral Coefficients feature extraction. The DatasetGenerator class works by generating random patches from the synthesizer, then running audio feature extraction on the resulting sound, and then saving the audio features and parameter values. Audio features and parameter values are saved in seperate .npy files.

We set the time_major argument to True so that the orientation of the output is (time_slices, features), as opposed to (features, time_slices) which is default. This is how TensorFlow models expect the data to be oriented.

Normalization settings used for the training dataset are saved as a .pkl file. These settings are used to ensure future data is normalized in the same way.

The total size of this dataset is about 140MB.

In [3]:
# Mel-frequency Cepstral Coefficients audio feature extractor.
features = spgl.features.MFCC(num_mfccs=13, time_major=True, hop_size=1024)

# Setup generator for MFCC output and generate 50000 training examples and 10000 testing examples
generator = spgl.DatasetGenerator(synth, features,
                                  output_folder="./data_simple_FM_mfcc",
                                  normalize=True)
generator.generate(50000, file_prefix="train_")
generator.generate(10000, file_prefix="test_")
generator.save_normalizers('normalizers.pkl')

Generating Dataset: 100%|██████████| 50000/50000 [17:05<00:00, 48.74it/s] 

Fitting normalizers and normalizing data



Fitting Normalizers: 100%|██████████| 13/13 [00:01<00:00,  8.76it/s]
Generating Dataset: 100%|██████████| 10000/10000 [03:03<00:00, 54.57it/s]


#### STFT Dataset

Generate training and testing dataset using the magnitude of the STFT. This dataset will be used to train the convolutional neural network. 

The total size of the resulting dataset is about 10.8GB.

In [4]:
# Magnitude STFT ouptut feature extraction
features = spgl.features.STFT(fft_size=512, hop_size=256, output='magnitude', time_major=True)

# Setup generator and create dataset
generator = spgl.DatasetGenerator(synth, features, output_folder="./data_simple_FM_stft", normalize=True)
generator.generate(50000, file_prefix="train_")
generator.generate(10000, file_prefix="test_")
generator.save_normalizers('normalizers.pkl')

Generating Dataset: 100%|██████████| 50000/50000 [11:01<00:00, 75.62it/s]

Fitting normalizers and normalizing data



Generating Dataset: 100%|██████████| 10000/10000 [02:13<00:00, 75.09it/s]


#### Evaluation Dataset

Create an audio set for evaluation. We set the save_audio argument to True in the DatasetGenerator constructor so that audio WAV files are saved.

In [5]:
eval_generator = spgl.DatasetGenerator(synth, features,
                                       output_folder='./evaluation',
                                       save_audio=True)
eval_generator.generate(25)

Generating Dataset: 100%|██████████| 25/25 [00:00<00:00, 64.07it/s]
