# Example source separation of a trumpet with Open-Unmix

We provide an example of the source separation of a trumpet. The result shows that the model learns a meaningful task. However, the model complexity does not seem to be sufficiently complex to separate the trumpet from other wind instruments. This tasks is harder than for example separating voice from drum.
We observe even less qualitative output for French horn or guitar because there sound is similar to other instruments in the considered ensembles. Moreover, they have longer pauses leading to a more imbalanced dataset.


In [3]:
#import scipy.signal
import pickle
from dataloader_slakh import SlakhDataset
import openunmix
import torch.optim as optim
import torch.nn as nn
import time
import os
import torch
import norbert
import sklearn.preprocessing
import random
import warnings
import matplotlib.pyplot as plt
import tqdm
import numpy as np
from IPython.display import Audio, display
print('Start')

warnings.simplefilter(action='ignore', category=FutureWarning)


%load_ext autoreload
%autoreload 2


Start


ImportError: cannot import name 'benedict' from 'benedict' (/home/nicolas/workspace/ma/ma4/ddspzart/.venv/lib/python3.8/site-packages/benedict/__init__.py)

Initialize the dataset


In [None]:
%cd ../open-unmix-pytorch/
%pwd


## Set up the model


In [None]:
SAVE_PATH = f"../source_separation/data/checkpoints/"
CKPT_FILE = "exp4_trumpet.pt"
DATASET_STAT_FILE = "exp4_dataset_statistics.pickle"
%pwd


In [None]:
with open(SAVE_PATH+DATASET_STAT_FILE, 'rb') as targets_file:
    stat = pickle.load(targets_file)
    mean, scale = stat


In [None]:
use_cuda = torch.cuda.is_available()
torch.manual_seed(42)

device = torch.device("cuda" if use_cuda else "cpu")
kwargs = {'num_workers': 1, 'pin_memory': True} if use_cuda else {}
device = "cpu"

unmix = openunmix.model.OpenUnmix(
    input_mean=mean,
    input_scale=scale,
    nb_channels=1,
    hidden_size=512,
    max_bin=512,
    nb_bins=2048+1
).to(device)

optimizer = optim.RMSprop(unmix.parameters(), lr=0.005)
criterion = torch.nn.MSELoss()


# Demonstrate the model


In [None]:
# Testing -> Here: Just use one sample.
# We do not provide the whole data set here. For training, the whole SLAKH dataset was used.
test_dataset = SlakhDataset(split='test', seq_duration=5.0)
test_sampler = torch.utils.data.DataLoader(test_dataset, batch_size=8)

track = test_dataset[0]


In [None]:
# Filter the instrument (here Trumpet)
test_dataset.target = 'Trumpet'
test_dataset.filter_target()


In [None]:
# Load the trained model (here the trumpet source separation)
checkpoint = torch.load(SAVE_PATH+CKPT_FILE)
unmix.load_state_dict(checkpoint['model_state_dict'])
optimizer.load_state_dict(checkpoint['optimizer_state_dict'])
epoch = checkpoint['epoch']
loss = checkpoint['loss']


In [None]:
# separate with custom trained models

audio_torch = track[0][None, ...].float().to(device)

# Here, we  only apply one source separation model to demonstrate that an application of openunmix is in general possible too for Wind instruments.
# However, a more thorough neural network architecture optimization had to be performed for achieving better results.
target_models = {"Trumpet": unmix}
own_separator = openunmix.model.Separator(
    target_models, nb_channels=1).to(device)
y = track[1][None, ...].float().to(device).squeeze()

y_hat = own_separator.forward(audio_torch).clone().detach().squeeze()

display(Audio(track[0], rate=44100))
display(Audio(y.cpu().numpy(), rate=44100))
display(Audio(y_hat.cpu().numpy(), rate=44100))


In [None]:
from audio2numpy import open_audio
horn_extracted, sr = open_audio(
    "../source_separation/data/audio/01_horn_standard_not_augmented_extracted_05_loss.wav")
horn_isolated_gt, sr = open_audio(
    "../source_separation/data/audio/01_horn_standard_not_augmented_isolated.wav")
horn_with_others, sr = open_audio(
    "../source_separation/data/audio/01_horn_standard_not_augmented_original.wav")

print("Horn separated")
display(Audio(horn_with_others, rate=sr))
display(Audio(horn_isolated_gt, rate=sr))
display(Audio(horn_extracted, rate=sr))


As another example, we provide a French horn extraction. As mentioned, the model is not able to extract it. However, it can be heard that the melody voice (here trumpet) is removed.


We provide testing functionality in the notebook "evaluate_openunmix_separator.ipynb". Here, we do not to provide a thorough discussion of the training process even though this needed time and effort also including the usage of the EPFL cluster.
However, we think that this does not deliver key findings for the CM class.
