# Multi-Instrument Timbre Transfer using Omnizart and DDSP

This Notebook does illustrate how we combined those two pieces of software to
create a multi-instrument timbre transfer system.

This is a local version of the code that only runs on the CPU. Hence, it takes
some time to run. Using a GPU lead to issues with Tensorflow on our local
machine. You may try to use a GPU by commenting out the lines 7 to 10 in the
cell just below. In parallel we also ran this code on the Scitas Izar cluster where we
didn't have those issues but unfortunately the synthesizer software to resynthesize
the midis generated bby Omnizart is not installable on the cluster. Thus, the
workflow was rather unpractical involving downloading the midis, synthesizing
them and then uploading those audio files back to the cluster.

In [1]:
from omni_transcribe import transcribe, synth
from ddsp_timbre_transfer import timbre_transfer
from utils import combine_wavs, convert_wav
import scipy.io.wavfile as wave

# Disable GPU
import tensorflow as tf
gpus = tf.config.list_physical_devices(device_type = 'GPU')
tf.config.experimental.set_memory_growth(gpus[0], True)
tf.config.set_visible_devices([], 'GPU')

First, we use Omnizart to perform source separation and transcribe the different
instruments into separate midi files.

In [2]:
# Transcribe to midi
filename = "test.wav"
midis = transcribe(filename)

2022-06-19 15:17:52 Loading model...
2022-06-19 15:18:00 Extracting feature...
2022-06-19 15:18:10 Predicting...




2022-06-19 15:18:26 Inferring notes....


                                                                                

2022-06-19 15:18:28 MIDI file has been written to ./test_0.mid.
2022-06-19 15:18:28 MIDI file has been written to ./test_1.mid.
2022-06-19 15:18:28 MIDI file has been written to ./test_2.mid.
2022-06-19 15:18:28 MIDI file has been written to ./test_3.mid.
2022-06-19 15:18:28 Transcription finished


At this point, the midis must be resynthesized using a software synthesizer
because DDSP takes wav files as input and not midis directly.

In [3]:
# Synthesize midi to wav files
for midi in midis:
    synth(midi)

Output file as: ./test_0_synth.wav
Synthesizing MIDI...
Synthesize finished
Output file as: ./test_1_synth.wav
Synthesizing MIDI...
Synthesize finished
Output file as: ./test_2_synth.wav
Synthesizing MIDI...
Synthesize finished
Output file as: ./test_3_synth.wav
Synthesizing MIDI...
Synthesize finished


Next, we need to change the encoding of the wav files in order for them to be
compatible with the DDSP code.

In [4]:
wavs = [midi[2:].replace(".mid","")+"_synth.wav" for midi in midis]
valid_wavs = [wav for wav in wavs if wave.read(wav)[1].size > 0]
wavs_16bit = [wav.replace(".wav","_16.wav") for wav in valid_wavs]

pair = zip(valid_wavs, wavs_16bit)

for p in pair:
    convert_wav(*p)

At this stage, the timbre transfer enabled through DDSP is applied to each of
the separated instruments.

In [5]:
# {"Violin", "Flute", "Flute2", "Trumpet", "Tenor_Saxophone"}
all_instruments = ["Violin", "Flute", "Trumpet", "Tenor_Saxophone"]
length_instr = len(all_instruments)
instruments = [all_instruments[i % length_instr] for i in range(len(wavs_16bit))]

pairs = zip(wavs_16bit, instruments)

# Perform timbre transfer
results = [timbre_transfer(wav, instrument) for wav, instrument in pairs]


Extracting audio features...
Audio features took 165.1 seconds
Loading dataset statistics from /home/nicolas/workspace/ma/ma4/ddspzart/workspace/ddsp_pretrained/solo_violin/dataset_statistics.pkl
===Trained model===
Time Steps 1000
Samples 64000
Hop Size 64

===Resynthesis===
Time Steps 8250
Samples 528000

Restoring model took 21.4 seconds
Prediction took 27.1 seconds

Extracting audio features...
Audio features took 133.8 seconds
Loading dataset statistics from /home/nicolas/workspace/ma/ma4/ddspzart/workspace/ddsp_pretrained/solo_flute/dataset_statistics.pkl
===Trained model===
Time Steps 1000
Samples 64000
Hop Size 64

===Resynthesis===
Time Steps 6969
Samples 446016

Restoring model took 16.8 seconds
Prediction took 16.6 seconds

Extracting audio features...


And finally those different audio files, each with a transferred timbre, are
combined back into a single audio file, resulting in a timbre-transferred multi-
instrument track.

In [None]:
# Combine wav files into one
out_path = "result.wav"
combine_wavs(results, out_path)