Sascha Spors,
Professorship Signal Theory and Digital Signal Processing,
Institute of Communications Engineering (INT),
Faculty of Computer Science and Electrical Engineering (IEF),
University of Rostock,
Germany

# Data Driven Audio Signal Processing - A Tutorial with Computational Examples

Winter Semester 2021/22 (Master Course #24512)

- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture
- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise

Feel free to contact lecturer frank.schultz@uni-rostock.de

# Exercise 5: SVD Matrix on Multitrack Audio 

## Objectives


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import pyloudnorm as pyln
import shutil
from scipy.io import wavfile
from numpy.linalg import cond, matrix_rank
from scipy.linalg import svd, norm, diagsvd, inv, pinv, null_space

# some 'global' variables that should be appear here at top for convenience
target_lufs = -32
flag_channel = 'Mono'  # 'Left', 'Right' or 'Mono'

### Data Preparation

The notebook gains most fun, if we have multitrack audio to work with, ideally on our own computer, even more ideally we have some own recorded music ;-)

We can get some nice multitrack audio material from [https://www.cambridge-mt.com/ms/mtk/](https://www.cambridge-mt.com/ms/mtk/) .

This data base supports Mike Senior's 2nd edition of the book "[The Mixing Secrets For The Small Studio](https://cambridge-mt.com/ms/main/)".

The data base is provided for educational purpose and we will use it precisely this way...maybe in slightly different mixing approach than initially intended, but we will doing a mixing job here...using the SVD.

So, if we need some multitrack material, we can download one of the zip files. Below we use 4 examples that worked out very illustratively.

To create a matrix where the columns represent the multitrack files (either the left or right channel or summed to mono) we need equal length wav files to work with the code below (I was too lazy to program this nicely).

A reasonable workflow is to use a digital audio workstaion (DAW) software, such as [Reaper](https://www.reaper.fm/purchase.php) (please support these guys, non-commercial license is very cheap).

In Reaper we would do:
- grabbing all multitrack files from a certain zip into an empty project
- making them separate tracks
- choose an appropriate time selection
- do some simple level mixing accroding to actual taste
- selecting all tracks
- open the `render to file` dialog
    - choose Source: `Selected tracks (stems)`
    - choose Bounds: `Time selection`
    - Directory: create a `multi-track` folder as subfolder where the raw wav files are stored
    - File name: `multi-track`
    - make sure that original sampling rate is used
    - Format: `Wav`
    - WAV bit depth: `32 Bit FP` (uses more storage, but then we don't need to pay attention to numbers larger/smaller than 1 if some mixing was performed)
    - check `Dry Run (no output`)
    - if this simulated rendering comes with the expected channels into the expected number of files with naming convenntion `multi_track-xxx.wav`
    - we are ready to render using `Render xxx files``
- maybe it's a good idea to save this project for further reference, such as checking what different mixes do to the SVD data...
 

In [None]:
# choose one of the nice multitrack projects:

path = 'audio_ex05/cfx_Mathematician_Full/'
# c:fx, 'Mathematician', https://soundcloud.com/c-fx
# multitrack for educational use only:
# https://mtkdata.cambridgemusictechnology.co.uk/MTK005/cfx_Mathematician.zip

path = 'audio_ex05/MaurizioPagnuttiSextet_AllTheGinIsGone/'
# Maurizio Pagnutti Sextet, 'All The Gin Is Gone', https://www.artesuono.it/album.aspx?id=76&p=0
# for educational use only:
# https://multitracks.cambridge-mt.com/MaurizioPagnuttiSextet_AllTheGinIsGone.zip

path = 'audio_ex05/cryonicPAX_Excessive/'
# cryonicPAX, 'Excessive'
# multitrack for educational use only:
# https://mtkdata.cambridgemusictechnology.co.uk/MTK015/cryonicPAX_Excessive.zip

path = 'audio_ex05/Fin_Echoes/'
# FIN, 'Echoes'
# multitrack for educational use only:
# https://multitracks.cambridge-mt.com/Fin_Echoes.zip

In [None]:
# set up path names
pathr = path + 'multi_track/'
pathu = path + 'left_singular_vectors/'
patha = path + 'reduced_rank_mixdown/'

# clear/del path and re-create new to start fresh
try:
    shutil.rmtree(pathu)
except OSError as e:
    print("Error: %s : %s" % (pathu, e.strerror))
os.mkdir(pathu)
try:
    shutil.rmtree(patha)
except OSError as e:
    print("Error: %s : %s" % (pathu, e.strerror))
os.mkdir(patha)

In [None]:
# read in multitrack wavs and store in a matrix A

files = sorted(os.listdir(pathr))
print(files)
flag_conc = True

for i in files:
    if i[-4:] == '.wav':  # consider only multi_track-xxx.wav
        if i[0:12] == 'multi_track-':
            fs, tmp = wavfile.read(pathr+i)
            if flag_channel == 'Mono':  # we assume that the files are stereo
                # make mono and (xxx, 1) dimension
                x = np.expand_dims((tmp[:, 0] + tmp[:, 1]) / 2, 1)
            elif flag_channel == 'Left':
                # take left channel and (xxx, 1) dimension
                x = np.expand_dims(tmp[:, 0], 1)
            elif flag_channel == 'Right':
                # take right channel and (xxx, 1) dimension
                x = np.expand_dims(tmp[:, 1], 1)
            else:
                print('!!! check flag_channel !!!')
                break
            print(i, x.shape, x.dtype)
            # non-elegant way to stack all stems (single channel tracks) into matrix
            if flag_conc:
                A, flag_conc = x, False
            else:
                A = np.concatenate((A, x), axis=1)
# since we know sampling frequency (assuming that all multitracks have same),
# we can instantiate a lufs meter later to be used
lufs_meter = pyln.Meter(fs)

print('\nmulti track matrix A', A.shape, A.dtype)

In [None]:
# SVD of A and some checks

[U, s, Vh] = svd(A, full_matrices=False)
print('U shape: ', U.shape)
print('Vh shape: ', Vh.shape)
print('no of sing vals:', s.size)
print('sing vals: ', s)
print('condition number: ', cond(A), s[0] / s[-1])

print(norm(U @ np.diag(s) @ Vh - A, 'fro'))
print(norm(U @ np.diag(s) @ Vh - A, 2))
print(norm(U @ np.diag(s) @ Vh - A, 'nuc'))

In [None]:
# apply sing vals to the column space -> we plot the time signals
# according to their sing vals strength
Us = U  @ np.diag(s)
N = Us.shape[0]
t = np.arange(N) / fs

In [None]:
# plot column space signals, i.e. left singular vecs weighted with sing vals
# create wav files of these signals to listen to (loudness normalization using BS1770 for convenience)

# the above handling indicates:
# - relative weighting of the signals in terms of sing vals can be seen in the plots
# - listening to the quality of the sing vector signals comes with same loudness

nr = 3  # if we have more than 18 tracks we should
nc = 6  # choose other nr, nc to fit into suplot
plt.figure
fig, axs = plt.subplots(nr, nc, figsize=(16, 10))
cnt = 0
for r in range(nr):
    for c in range(nc):
        axs[r, c].plot(t, Us[:, cnt])
        axs[r, c].set_ylim(-1.5, 1.5)
        # we start plotiing/wav writing with index 1
        # in order to match rank 1...R wavfiles
        axs[r, c].set_title(
            r'$\sigma$={0:3.2f} $\cdot$ U[{1:d}]'.format(s[cnt], cnt+1))

        lufs = lufs_meter.integrated_loudness(Us[:, cnt])  # calc 1770 loudness
        tmp = Us[:, cnt] * 10**((target_lufs - lufs) /
                                20)  # adapt to target_lufs
        if np.max(np.abs(tmp)) > 1.:  # clipping might occur
            print('!!! wav file clipping !!!, decrease target_lufs to:')
            print(target_lufs - 20*np.log10(np.max(np.abs(tmp))))
        lufs = lufs_meter.integrated_loudness(
            tmp)  # check if we got target_lufs
        print(lufs)
        # write wav of col space signals, for convenient listening files have equal lufs
        wavfile.write(pathu+'left_singular_vector_'+str(cnt+1)+'.wav', fs, tmp)
        cnt += 1
        if cnt == Us.shape[1]:  # all col space sing vecs are processed
            break
plt.savefig(path+'plot_left_singular_vectors.png')

In [None]:
# check the polarity and level of the mixing weights
# unity gain as weights in terms of the V space vectors (inner products)
mixing_weight = Vh @ np.ones((s.size, 1))
ui = np.arange(mixing_weight.size) + 1
level = 10*np.log10(np.abs(mixing_weight)**2)  # dB
level_pos = np.copy(level)
level_pos[mixing_weight <= 0] = 0
level_neg = np.copy(level)
level_neg[mixing_weight > 0] = 0
plt.figure(figsize=(6, 4))
plt.stem(ui, level_pos, basefmt='white', linefmt='C0',
         markerfmt='C0o', label='positiv polarity')
plt.stem(ui, level_neg, basefmt='white', linefmt='C1',
         markerfmt='C1o', label='negativ polarity')
plt.xticks(ui)
plt.legend()
plt.xlabel('U column index')
plt.ylabel('mixing weight in dB')
plt.grid(True)
plt.savefig(path+'mixing_weights.png')
# print(mixing_weight)
# print(20*np.log10(np.abs(mixing_weight)))

In [None]:
# reconstruct A from SVD with rank 1...R
# and mixdown to a mono signal
# loudness normalization of mixdown
for r in range(s.size):
    # print(r)
    s_reduced = np.copy(s)
    s_reduced[r+1:] = 0
    # print(s_reduced)
    tmp = U @ np.diag(s_reduced) @ mixing_weight
    lufs = lufs_meter.integrated_loudness(tmp)  # calc 1770 loudness
    tmp *= 10**((target_lufs - lufs)/20)  # adapt to target_lufs
    if np.max(np.abs(tmp)) > 1.:  # clipping might occur
        print('!!! wav file clipping !!!, decrease target_lufs to:')
        print(target_lufs - 20*np.log10(np.max(np.abs(tmp))))
    lufs = lufs_meter.integrated_loudness(tmp)  # check if we got target_lufs
    print(lufs)
    # write wav of rank reduced mixdown, for convenient listening files have equal lufs
    wavfile.write(patha+'rank_'+str(r+1)+'.wav', fs, tmp)
# by that we can listen how the reduced SVD-factorization of matrix A
# acting on equal mixing input vector (i.e. all A matrix columns get unity gain)
# sounds like

## Copyright

- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)
- feel free to use the notebooks for your own purposes
- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
- the code of the IPython examples is licensed under under the [MIT license](https://opensource.org/licenses/MIT)
- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.