Sascha Spors,
Professorship Signal Theory and Digital Signal Processing,
Institute of Communications Engineering (INT),
Faculty of Computer Science and Electrical Engineering (IEF),
University of Rostock,
Germany

# Data Driven Audio Signal Processing - A Tutorial with Computational Examples

Winter Semester 2022/23 (Master Course #24512)

- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture
- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise

Feel free to contact lecturer frank.schultz@uni-rostock.de

# Exercise 5: SVD Matrix on Multitrack Audio 

## Objectives

TBD

In [None]:
# note: we will delete (own) and save files on the computer,
# so we should check what
# wavfile.write, wavfile.read, rmtree, mkdir and plt.savefig ist doing
# in the cells below. I tested the notebook on two machines and it did what it
# is intended to do (i.e. deleting or creating left_singular_vectors folder,
# deleting or creating reduced_rank_mixdown folder, write *.wav files into there,
# write *.png graphic files of output plots into the chosen folder 'path').
# So, harmless stuff actually. Anyway, I am not responsible for data loss on
# other machines, if something went wrong. So, please, check what you are doing
# on your machine

import matplotlib.pyplot as plt
import numpy as np
import os
import pyloudnorm as pyln
import shutil
from scipy.io import wavfile
from numpy.linalg import cond, matrix_rank
from scipy.linalg import svd, norm, diagsvd, inv, pinv, null_space

In [None]:
# some 'global' variables that should appear here at top for convenience
target_lufs = -32  # target loudness
flag_channel = 'Mono'  # 'Left', 'Right' or 'Mono'

## Data Preparation

This notebook gains most fun, if we have multi-track audio to work with, ideally on our own computer. Maybe we even have some own recorded multi-tracked music ;-)

We can get some nice multi-track audio material from [https://www.cambridge-mt.com/ms/mtk/](https://www.cambridge-mt.com/ms/mtk/) .

This data base supports Mike Senior's 2nd edition of the book "[The Mixing Secrets For The Small Studio](https://cambridge-mt.com/ms/main/)".

The data base is provided for educational purpose and we will use it precisely this way...maybe in slightly different mixing approach than initially intended, but we will doing a **mixing job** here...**using the SVD**.

So, if we need some multi-track material, we can download one of the zip files. Below we use 4 examples that worked out very illustratively.

To create a matrix where the columns represent the multi-tracked files (either the left channel, or the right channel or summed to a mono channel) we need equal length *.wav files to work with the code below (I was too lazy to program this nicely).

A reasonable work flow is to use a digital audio workstation (DAW) software, such as [Reaper](https://www.reaper.fm/purchase.php) (please support these guys, non-commercial license is very cheap).

In Reaper we would do:
- grabbing all multi-track files from a certain zip into an empty project
- making them as separate tracks
- choose an appropriate time selection
- do some simple level mixing according to actual taste
- selecting all tracks
- open the `render to file` dialog
    - choose Source: `Selected tracks (stems)`
    - choose Bounds: `Time selection`
    - Directory: create a `multi-track` folder as subfolder where the raw wav files are stored
    - File name: `multi-track`
    - make sure that original sampling rate is used
    - Format: `Wav`
    - WAV bit depth: `32 Bit FP` (uses more storage, but then we don't need to pay attention to numbers larger/smaller than 1 if some mixing was performed)
    - click `Dry Run (no output`)
    - check if this simulated rendering comes with the expected channels into the expected number of files with naming convention `multi_track-xxx.wav`
    - we are ready to render by clicking `Render xxx files``
- maybe it's a good idea to save this Reaper project for further reference, such as checking what different mixes do to the SVD data...

## Choose Multitrack Example

In [None]:
# choose one of these illustrative multitrack projects:

path = 'svd_multitrack_audio/cfx_Mathematician_Full/'
# c:fx, 'Mathematician', https://soundcloud.com/c-fx
# multitrack for educational use only:
# https://mtkdata.cambridgemusictechnology.co.uk/MTK005/cfx_Mathematician.zip

path = 'svd_multitrack_audio/MaurizioPagnuttiSextet_AllTheGinIsGone/'
# Maurizio Pagnutti Sextet, 'All The Gin Is Gone', https://www.artesuono.it/album.aspx?id=76&p=0
# for educational use only:
# https://multitracks.cambridge-mt.com/MaurizioPagnuttiSextet_AllTheGinIsGone.zip

path = 'svd_multitrack_audio/cryonicPAX_Excessive/'
# cryonicPAX, 'Excessive'
# multitrack for educational use only:
# https://mtkdata.cambridgemusictechnology.co.uk/MTK015/cryonicPAX_Excessive.zip

path = 'svd_multitrack_audio/Fin_Echoes/'
# FIN, 'Echoes'
# multitrack for educational use only:
# https://multitracks.cambridge-mt.com/Fin_Echoes.zip

In [None]:
# set up path names
pathr = path + 'multi_track/'
pathu = path + 'left_singular_vectors/'  # i.e. the column space of matrix
patha = path + 'reduced_rank_mixdown/'

# clear/del path and re-create new to start fresh
try:  # for left_singular_vectors/
    shutil.rmtree(pathu)
except OSError as e:
    print("Error: %s : %s" % (pathu, e.strerror))
os.mkdir(pathu)

try:  # for reduced_rank_mixdown/
    shutil.rmtree(patha)
except OSError as e:
    print("Error: %s : %s" % (pathu, e.strerror))
os.mkdir(patha)

## Read Audio Files into Matrix X

In [None]:
# read in multitrack wavs and store in a matrix X

files = sorted(os.listdir(pathr))
print(files)
flag_conc = True

for i in files:
    # consider only .wav (this excludes .wav.reapeaks files from Reaper)
    if i[-4:] == '.wav':
        # consider only multi_track-
        if i[0:12] == 'multi_track-':
            # so we read in all stuff with name 'multi_track-xxxxxxxx.wav'
            # if we set up the folder multi_track properly, we only find
            # multi_track-001.wav, multi_track-002.wav, multi_track-003.wav, ...
            fs, tmp = wavfile.read(pathr+i)  # get raw data from file
            # we assume that the files are stereo
            if flag_channel == 'Mono':
                # make mono and make it (xxx, 1) dimension
                x = np.expand_dims((tmp[:, 0] + tmp[:, 1]) / 2, 1)
            elif flag_channel == 'Left':
                # take left channel and make it (xxx, 1) dimension
                x = np.expand_dims(tmp[:, 0], 1)
            elif flag_channel == 'Right':
                # take right channel and make it (xxx, 1) dimension
                x = np.expand_dims(tmp[:, 1], 1)
            else:
                print('!!! unknown flag_channel !!!')
                break
            # if we got here, we can print some log data
            print(i, fs, 'Hz', x.shape, x.dtype)
            # non-elegant way to stack all stems (single channel tracks) into matrix
            if flag_conc:  # very first x into non-exisiting X
                X, flag_conc = x, False
            else:  # X is already existing and axis=0 is assumed to be const
                # for all concatenated x,
                # so multi_track-001.wav, multi_track-002.wav, multi_track-003.wav, ...
                # must have exactly same length
                X = np.concatenate((X, x), axis=1)
# we could ignore the warning  Chunk (non-data) not understood, skipping it.

# since we know sampling frequency (assuming that all multitracks have same),
# we can instantiate a lufs meter (to measure loudness), that is later to be used
lufs_meter = pyln.Meter(fs)

print('\nmulti-track matrix X', X.shape, X.dtype)

## SVD of X

Let us assume tall, thin, full column rank matrix $\mathbf{X}$ with rank $r=N$ here, which is a fair assumption for our multi-track data. It is probably hard to set up a rank deficient tall, thin matrix with different multi-track audio channels (except two guitars play exactly the same stuff ;-) ).

Full SVD 

$$\mathbf{X}_{M \times N} = \mathbf{U}_{M \times M} \mathbf{S}_{M \times N} (\mathbf{V}_{N \times N})^\mathrm{H}$$

Economy SVD

$$\mathbf{X}_{M \times N} = \mathbf{U}_{M \times r} \mathbf{S}_{r \times r} (\mathbf{V}_{N \times r})^\mathrm{H}$$

with diagonal matrix

$$\mathbf{S}_{r \times r}=
\begin{bmatrix}
\sigma_1  & 0  & 0 & 0\\
0 & \sigma_2 & 0 & 0\\
0 & 0 & : & 0\\
0 & 0 & 0 & \sigma_r
\end{bmatrix}
$$

containing singular values $\sigma_1 > \sigma_2 > ... > \sigma_r$.

In [None]:
# SVD of X and some checks
[U, s, Vh] = svd(X, full_matrices=False)
print('U shape: ', U.shape)
print('Vh shape: ', Vh.shape)
print('sing vals: ', s)
print(s.size, 'well behaved singular values ==', X.shape[1], 'tracks ?', )
print('condition number: ', cond(X), s[0] / s[-1])

# check SVD synthesis of X vs. stored X
print(norm(U @ np.diag(s) @ Vh - X, 'fro'))
print(norm(U @ np.diag(s) @ Vh - X, 2))
print(norm(U @ np.diag(s) @ Vh - X, 'nuc'))

## Weighted Left Singular Vectors of the Column Space

We set up

$$\mathbf{H}_{M \times r} =  \mathbf{U}_{M \times r} \mathbf{S}_{r \times r}$$

as a matrix, that contains the left singular vectors weighted by their corresponding non-zero singular values. We consider the **economy SVD version** from above and are still under the assumption of full column rank, tall/thin matrix $\mathbf{X}$, so $r=N$.

In literature which discusses principal component analysis (PCA), this matrix $\mathbf{H}$ is often termed as 
- principal component signals
- principal component **scores**

The right singular value matrix $\mathbf{V}_{N \times r}$ is then referred to as
- principal component coefficients
- principal component **loadings**

So, there is a strong link of SVD and PCA, in fact one way to derive a prove of the PCA is via the SVD. Note, however, that PCA typically assumes matrices $\mathbf{X}$, where the **columns** are assumed or enforced (by `scipy.stats.zscore()` ) to be **mean free** and very often also having **unit variance**.

Besides that fundamental difference, the main motivation of the PCA and our little toy example is about the same: what can the weighted left singular vectors $\mathbf{H}$ that span the column space tell us about our data.

In [None]:
# weight the left singular (unit) vectors by their corresponding
# singular values
H = U  @ np.diag(s)
H.shape

## Plot Column Space Vectors and Render Audio of it

In [None]:
# here we
# 1. we plot the weighted left singular vectors
# 2. we create wav files of these signals such that we can listen to them
# (loudness normalization using BS1770 for convenience)
#
# do not harm your ears!!!
#
# this handling indicates:
# 1. the relative weighting of the signals in terms of their singular values
# can be seen in the plots
# 2. the weighted left singular vector signals in wave files are encoded such
# that we perceive about same loudness when listening to all of them

N = H.shape[0]
t = np.arange(N) / fs

nr = 3  # if we have more than 18 tracks we should
nc = 6  # choose other nr, nc to fit into a subplot matrix
#plt.figure
fig, axs = plt.subplots(nr, nc, figsize=(16, 10))
cnt = 0
for r in range(nr):
    for c in range(nc):
        # we start plotting/wav writing with index 1
        # in order to match rank 1...R wavfiles
        axs[r, c].plot(t, H[:, cnt])
        axs[r, c].set_ylim(-1.5, 1.5)
        axs[r, c].set_ylabel(r'$\sigma_{{{0:d}}} \cdot U_{{{0:d}}}$'.format(cnt+1))
        axs[r, c].set_xlabel('t / s')
        axs[r, c].set_title(r'$\sigma_{{{0:d}}}$={1:3.2f}'.format(cnt+1, s[cnt]))
        axs[r, c].grid(True)
        # calc 1770 loudness
        lufs = lufs_meter.integrated_loudness(H[:, cnt])
        # adapt to target_lufs
        tmp = H[:, cnt] * 10**((target_lufs - lufs) / 20)
        # get info if clipping occurs
        if np.max(np.abs(tmp)) > 1.:
            print('!!! wav file clipping !!!, decrease target_lufs to:')
            print(target_lufs - 20*np.log10(np.max(np.abs(tmp))))
        # check if we got target_lufs
        lufs = lufs_meter.integrated_loudness(tmp)
        print('track', cnt+1,  lufs, 'dB LUFS')
        # write wav of column space signals
        # for convenient listening all files exhibit about equal loudness
        wavfile.write(pathu+'left_singular_vector_'+str(cnt+1)+'.wav', fs, tmp)
        cnt += 1  # prep for next column space signal
        # in case we have more subplots than col space signals we stop:
        if cnt == H.shape[1]:
            break
plt.tight_layout()
plt.savefig(path+'plot_left_singular_vectors.png')

## Apply Equal Mixing Gain 

We could assume that the multi-track data is somehow pre-processed, such that we obtain a reasonable mixdown to a mono signal $\mathbf{y}$ when applying equal mixing gains stored in the vector $\mathbf{g} = [g, g, g, ..., g]^\mathrm{T}$.

This linear model is written as

$$\mathbf{y} = \mathbf{X} \mathbf{g}$$

and we should assume unity gain $g=1$, so we have $\mathbf{g} = [1, 1, 1, ..., 1]^\mathrm{T}$

With our nice SVD factorization above, we can write

$$\mathbf{y} = \mathbf{X}\mathbf{g} = \mathbf{U} \mathbf{S} \mathbf{V}^\mathrm{H} \mathbf{g}$$

and with the weighted column space matrix $\mathbf{H} = \mathbf{U} \mathbf{S}$ we get

$$\mathbf{y} = \mathbf{X}\mathbf{g} = \mathbf{H} \cdot \mathbf{V}^\mathrm{H} \mathbf{g}$$

Therefore, the vector $\mathbf{V}^\mathrm{H} \mathbf{g}$ defines the mixing weights for the linear combination of the column space signals.

We should check the polarity and the level of these mixing weights.

In [None]:
# check the polarity and level of the mixing weights
# unity gain as weights in terms of the V space vectors (inner products)
mixing_weight = Vh @ np.ones((s.size, 1))
ui = np.arange(mixing_weight.size) + 1
level = 10*np.log10(np.abs(mixing_weight)**2)  # dB
level_pos = np.copy(level)
level_pos[mixing_weight <= 0] = 0
level_neg = np.copy(level)
level_neg[mixing_weight > 0] = 0

plt.figure(figsize=(6, 3))
plt.plot(ui[np.squeeze(mixing_weight <= 0)],
         level_neg[np.squeeze(mixing_weight <= 0)], 'C0o-', label='polarity -')
plt.plot(ui[np.squeeze(mixing_weight > 0)],
         level_pos[np.squeeze(mixing_weight > 0)], 'C3o-', label='polarity +')

plt.xticks(ui)
plt.legend()
plt.xlabel('U column index')
plt.ylabel('mixing weight in dB')
plt.grid(True)
plt.tight_layout()
plt.savefig(path+'mixing_weights.png')
# print(mixing_weight)
# print(20*np.log10(np.abs(mixing_weight)))

## Low-Rank Approximation of X and Mixdown

The best rank $q<=r$ approximation for $\mathbf{X}$ is given as
$$\mathbf{X}_q = \sum_{i=1}^{q} \sigma_i \mathbf{u}_i \mathbf{v}_i^\mathrm{H},$$
well known as Eckart–Young theorem.

It is interesting to listen to a mixdown using the equal mixing gains just as above, here now with different best rank approximations.

Instead of creating the best rank approximation $\mathbf{X}_q$ straightforwardly with a for loop for the sum above, we could also treat the singular value matrix as
$$\mathbf{S}^*_{r \times r}=
\begin{bmatrix}
\sigma_1  & 0  & 0 & 0 & 0\\
0 & \sigma_2 & 0 & 0 & 0\\
0 & 0 & \sigma_q & 0 & 0\\
0 & 0 & 0 & \sigma_{r-1}=0 & 0\\
0 & 0 & 0 & 0 & \sigma_r=0
\end{bmatrix},
$$
thus setting all singular values $\sigma_{>q}=0$, and then re-calculating
$$\mathbf{H}_{M \times r} =  \mathbf{U}_{M \times r} \mathbf{S}^*_{r \times r}$$
and
$$\mathbf{y} = \mathbf{H} \cdot \mathbf{V}^\mathrm{H} \mathbf{g}$$
using the economy SVD as above.

In [None]:
# reconstruct X from SVD with rank 1...r
# and mixdown to a mono signal
# loudness normalization of mixdown
for r in range(s.size):
    s_reduced = np.copy(s)  # get original singular values
    s_reduced[r+1:] = 0  # throw away lower right diagonal parts
    # print(s_reduced)
    mixing_weight = Vh @ np.ones((s.size, 1))
    H = U @ np.diag(s_reduced)
    tmp = H @ mixing_weight  # this is the mixdown
    lufs = lufs_meter.integrated_loudness(tmp)  # calc 1770 loudness
    tmp *= 10**((target_lufs - lufs)/20)  # adapt to target_lufs
    if np.max(np.abs(tmp)) > 1.:  # clipping might occur
        print('!!! wav file clipping !!!, decrease target_lufs to:')
        print(target_lufs - 20*np.log10(np.max(np.abs(tmp))))
    lufs = lufs_meter.integrated_loudness(tmp)  # check if we got target_lufs
    print('rank', r+1, 'approx', lufs, 'dB LUFS')
    # write wav of rank reduced mixdown, for convenient listening files have equal lufs
    wavfile.write(patha+'rank_'+str(r+1)+'.wav', fs, tmp)
# by that we can listen how the reduced SVD-factorization of matrix X
# acting on equal mixing input vector (i.e. all X matrix columns get unity gain)
# sounds like

## Copyright

- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)
- feel free to use the notebooks for your own purposes
- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
- the code of the IPython examples is licensed under under the [MIT license](https://opensource.org/licenses/MIT)
- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.