Sascha Spors,
Professorship Signal Theory and Digital Signal Processing,
Institute of Communications Engineering (INT),
Faculty of Computer Science and Electrical Engineering (IEF),
University of Rostock,
Germany

# Data Driven Audio Signal Processing - A Tutorial with Computational Examples

Winter Semester 2022/23 (Master Course #24512)

- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture
- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise

Feel free to contact lecturer frank.schultz@uni-rostock.de

# Exercise 8: Ridge Regression 

## Objectives

TBD


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import shutil
from scipy.io import wavfile
from numpy.linalg import cond, matrix_rank
from scipy.linalg import inv, lstsq, pinv, svd, diagsvd, norm
from sklearn.linear_model import LinearRegression, Ridge

In [None]:
# some 'global' variables that should be appear here at top for convenience
flag_channel = 'Mono'  # 'Left', 'Right' or 'Mono'

## Data Preparation

This notebook gains most fun, if we have multi-track audio to work with, ideally on our own computer. Maybe we even have some own recorded multi-tracked music ;-)

We can get some nice multi-track audio material from [https://www.cambridge-mt.com/ms/mtk/](https://www.cambridge-mt.com/ms/mtk/) .

This data base supports Mike Senior's 2nd edition of the book "[The Mixing Secrets For The Small Studio](https://cambridge-mt.com/ms/main/)".

The data base is provided for educational purpose and we will use it precisely this way...learning linear regression and regularization.

So, if we need some multi-track material, we can download one of the zip files. Below we use 4 examples that worked out very illustratively.

To create a matrix where the columns represent the multi-tracked files (either the left channel, or the right channel or summed to a mono channel) we need equal length *.wav files to work with the code below (I was too lazy to program this nicely).

A reasonable work flow is to use a digital audio workstation (DAW) software, such as [Reaper](https://www.reaper.fm/purchase.php) (please support these guys, non-commercial license is very cheap).

In Reaper we would do:
- grabbing all multi-track files from a certain zip into an empty project
- making them as separate tracks
- choose an appropriate time selection
- do some simple level mixing according to actual taste
- selecting all tracks
- open the `render to file` dialog
    - choose Source: `Selected tracks (stems)`
    - choose Bounds: `Time selection`
    - Directory: create a `multi-track` folder as subfolder where the raw wav files are stored
    - File name: `multi-track`
    - make sure that original sampling rate is used
    - Format: `Wav`
    - WAV bit depth: `32 Bit FP` (uses more storage, but then we don't need to pay attention to numbers larger/smaller than 1 if some mixing was performed)
    - click `Dry Run (no output`)
    - check if this simulated rendering comes with the expected channels into the expected number of files with naming convention `multi_track-xxx.wav`
    - we are ready to render by clicking `Render xxx files``
- maybe it's a good idea to save this Reaper project for further reference 

In [None]:
# choose one of the nice multitrack projects:

#path = 'svd_multitrack_audio/cfx_Mathematician_Full/'
# c:fx, 'Mathematician', https://soundcloud.com/c-fx
# multitrack for educational use only:
# https://mtkdata.cambridgemusictechnology.co.uk/MTK005/cfx_Mathematician.zip

#path = 'svd_multitrack_audio/MaurizioPagnuttiSextet_AllTheGinIsGone/'
# Maurizio Pagnutti Sextet, 'All The Gin Is Gone', https://www.artesuono.it/album.aspx?id=76&p=0
# for educational use only:
# https://multitracks.cambridge-mt.com/MaurizioPagnuttiSextet_AllTheGinIsGone.zip

#path = 'svd_multitrack_audio/cryonicPAX_Excessive/'
# cryonicPAX, 'Excessive'
# multitrack for educational use only:
# https://mtkdata.cambridgemusictechnology.co.uk/MTK015/cryonicPAX_Excessive.zip

path = 'svd_multitrack_audio/Fin_Echoes/'
# FIN, 'Echoes'
# multitrack for educational use only:
# https://multitracks.cambridge-mt.com/Fin_Echoes.zip

In [None]:
# read in multitrack wavs and store in a matrix X
# set up path name
pathr = path + 'multi_track/'
files = sorted(os.listdir(pathr))
print(files)
flag_conc = True

for i in files:
    # consider only .wav (this excludes .wav.reapeaks files from Reaper)
    if i[-4:] == '.wav':
        # consider only multi_track-
        if i[0:12] == 'multi_track-':
            # so we read in all stuff with name 'multi_track-xxxxxxxx.wav'
            # if we set up the folder multi_track properly, we only find
            # multi_track-001.wav, multi_track-002.wav, multi_track-003.wav, ...
            fs, tmp = wavfile.read(pathr+i)  # get raw data from file
            # we assume that the files are stereo
            if flag_channel == 'Mono':
                # make mono and make it (xxx, 1) dimension
                x = np.expand_dims((tmp[:, 0] + tmp[:, 1]) / 2, 1)
            elif flag_channel == 'Left':
                # take left channel and make it (xxx, 1) dimension
                x = np.expand_dims(tmp[:, 0], 1)
            elif flag_channel == 'Right':
                # take right channel and make it (xxx, 1) dimension
                x = np.expand_dims(tmp[:, 1], 1)
            else:
                print('!!! unknown flag_channel !!!')
                break
            # if we got here, we can print some log data
            print(i, fs, 'Hz', x.shape, x.dtype)
            # non-elegant way to stack all stems (single channel tracks) into matrix
            if flag_conc:  # very first x into non-exisiting X
                X, flag_conc = x, False
            else:  # X is already existing and axis=0 is assumed to be const
                # for all concatenated x,
                # so multi_track-001.wav, multi_track-002.wav, multi_track-003.wav, ...
                # must have exactly same length
                X = np.concatenate((X, x), axis=1)
# we could ignore the warning  Chunk (non-data) not understood, skipping it.

print('\nmulti-track matrix X', X.shape, X.dtype)

In [None]:
path_tmp = 'ols_ridge_multitrack_audio/'
try:
    shutil.rmtree(path_tmp)
except OSError as e:
    print("Error: %s : %s" % (path_tmp, e.strerror))
os.mkdir(path_tmp)

In [None]:
no_samples = X.shape[0]
no_channels = X.shape[1]
print('no_samples:', no_samples, '\nno_channels:', no_channels)
#set up mixing weights to unity gain for all channels:
mix_weights = np.ones([no_channels, 1])  # this is x for the standard problem X beta = y 

In [None]:
# mix lives in in column space of X:
mix = X @ mix_weights  # this is b for the standard problem X beta = y 
# so we expect that least squares solution yields exactly mix_weights
mix_weights_solved, res, rnk, s = lstsq(X, mix)
np.allclose(mix_weights_solved, mix_weights)

In [None]:
# get some very basic audio features
def crest(x):  # only for single dim array
    sq_mean = 1/np.size(x) * np.sum(x**2)
    sq_abs_peak =  np.max(x**2)
    cf = 10 * np.log10(sq_abs_peak / sq_mean)  # crest factor
    std = np.std(x)
    mean = np.mean(x)
    peak = np.max(np.abs(x))
    return sq_mean, sq_abs_peak, cf, std, mean, peak

In [None]:
sq_mean_mix, sq_abs_peak_mix, cf_mix, std_mix, mean_mix, peak_mix = crest(mix)
print('features for mix')
print(f"{'sq_mean_mix:'} {sq_mean_mix:+4.3e}")
print(f"{'sq_abs_peak_mix:'} {sq_abs_peak_mix:+4.3e}")
print(f"{'cf_mix:'} {cf_mix:+4.2f} {'dB'}")
print(f"{'std_mix:'} {std_mix:+4.3e}")
print(f"{'mean_mix:'} {mean_mix:+4.3e}")
print(f"{'peak_mix:'} {peak_mix:+4.3e}")

# create noise with stdev that matches an specific SNR between mix and noise:

# this yields +280 dB SNR
#mean, stdev = 0, std_mix  / 1e14

# mean, stdev = 0, std_mix  / 1e7  # this yields +140 dB SNR
# mean, stdev = 0, std_mix  / 100  # this yields +40 dB SNR
# mean, stdev = 0, std_mix  / 10  # this yields +20 dB SNR
# mean, stdev = 0, std_mix  / 2  # this yields +6 dB SNR
# mean, stdev = 0, std_mix  / 1  # this yields +0 dB SNR

# this yields -6 dB SNR (noise louder than music)
mean, stdev = 0, std_mix  * 2

# this yields -12 dB SNR (noise louder than music)
#mean, stdev = 0, std_mix * 4

rng = np.random.default_rng(1)
noise = rng.normal(mean, stdev, [no_samples, 1])
sq_mean_noise, sq_abs_peak_noise, cf_noise, std_noise, mean_noise, peak_noise = crest(
    noise)
print('\nfeatures for noise')
print(f"{'sq_mean_noise:'} {sq_mean_noise:+4.3e}")
print(f"{'sq_abs_peak_noise:'} {sq_abs_peak_noise:+4.3e}")
print(f"{'cf_noise:'} {cf_noise:+4.2f} {'dB'}")
print(f"{'std_noise:'} {std_noise:+4.3e}")
print(f"{'mean_noise:'} {mean_noise:+4.3e}")
print(f"{'peak_noise:'} {peak_noise:+4.3e}")

SNR = 10*np.log10(sq_mean_mix / sq_mean_noise)
print(f"\n{'SNR'} {SNR:4.2f} {'dB'}")

# standard problem X beta = y
# here: beta-> mix_weights, y-> mixdown

# add noise to the mix that was created above
# mix_with_noise potentially also lives in left nullspace
mix_with_noise = np.squeeze(mix) + np.squeeze(noise)

max_peak = np.max(np.abs(np.array([mix_with_noise[:,None], mix])))

wavfile.write(path_tmp+'mix_with_noise_SNR_' +
              str(int(SNR))+'dB.wav', fs, mix_with_noise / max_peak * 10**(-1/20))
wavfile.write(path_tmp+'mix_without_noise_SNR_' +
              str(int(SNR))+'dB.wav', fs, np.squeeze(mix) / max_peak * 10**(-1/20))

# LS solution using lstsq():
mix_weights_solved, res, rnk, s = lstsq(X, mix_with_noise)
print('mix_weights_solved == mix_weights:',
      np.allclose(mix_weights_solved, mix_weights))

channel_idx = np.arange(no_channels) + 1

plt.figure(figsize=(10, 3))
plt.plot(channel_idx, 20*np.log10(mix_weights),
         'C0o-', label='exact mix weights')
plt.plot(channel_idx, 20*np.log10(mix_weights_solved),
         'C1o:', label='LS mix weights for mix+noise')
plt.xticks(channel_idx)
plt.xlabel('channel')
plt.ylabel('dB')
plt.title(f"{'SNR ='} {SNR:4.2f} {'dB'}")
plt.legend()
plt.grid(True)

In [None]:
def my_ridge_regression_via_svd(X, lmb = 0):
    # textbook implementation for a tall/thin, full column rank matrix X
    # we should not use this for real practical applications
    # as this might numerical non-robust
    [U, s, Vh] = svd(X, full_matrices=False)  # economy SVD
    S = diagsvd(s, X.shape[1], X.shape[1])
    V = Vh.conj().T
    Sli_ridge = inv(S.conj().T @ S + lmb*np.eye(X.shape[1])) @ S.conj().T
    Xli_ridge = V @ Sli_ridge @ U.conj().T  # for lmb=0 this returns the left inverse
    return Xli_ridge

In [None]:
print(f"\n{'SNR'} {SNR:4.2f} {'dB'}")
ridge_coeff = 0
if ridge_coeff == 0:
    # check sklearn.linear_model functions vs. our own function
    mw1 = my_ridge_regression_via_svd(X, lmb=ridge_coeff) @ mix_with_noise
    print(np.allclose(mw1, mix_weights))  # returns True only for very large SNR!, i.e. y is in the columns space of X

    model_ridge_regression = Ridge(alpha=ridge_coeff, solver='svd', tol=1e-3)
    model_ridge_regression.fit(X, mix_with_noise)
    mw2 = model_ridge_regression.coef_
    print(np.allclose(mw2, mix_weights))

    model_regression = LinearRegression()
    model_regression.fit(X, mix_with_noise)
    mw3 = model_regression.coef_
    print(np.allclose(mw3, mix_weights))  # returns True only for very large SNR!


In [None]:
ridge_coeff = 10  # 0.5

mw1 = my_ridge_regression_via_svd(X, lmb=ridge_coeff) @ mix_with_noise

# model_ridge_regression = Ridge(alpha=ridge_coeff, solver='svd', tol=1e-3)
# already created above, we just change the ridge coeff
model_ridge_regression.alpha = ridge_coeff
model_ridge_regression.fit(X, mix_with_noise)
mw2 = model_ridge_regression.coef_

print(np.allclose(mw1, mw2))

channel_idx = np.arange(no_channels) + 1

plt.figure(figsize=(10, 10))
plt.subplot(2, 1, 1)
plt.plot(channel_idx, 20*np.log10(mix_weights),
         'C0o-', label='exact mix weights')
plt.plot(channel_idx, 20*np.log10(mix_weights_solved), 'C3o-',
         label='LS mix weights for mix+noise')
plt.plot(channel_idx, 20*np.log10(mix_weights_solved), 'C3o')
plt.plot(channel_idx, 20*np.log10(mw2), 'C1d-', lw=2,
         label='my ridge LS mix weights for mix+noise, coeff='+str(ridge_coeff))
plt.plot(channel_idx, 20*np.log10(mw2), 'C2+-',
         label='ridge LS mix weights for mix+noise, coeff='+str(ridge_coeff))
plt.xticks(channel_idx)
plt.xlabel('channel')
plt.ylabel('dB')
plt.title('SNR = '+str(SNR)+' dB')
plt.legend()
plt.grid(True)

plt.subplot(2, 1, 2)
plt.plot(channel_idx, mix_weights, 'C0o-', label='exact mix weights')
plt.plot(channel_idx, mix_weights_solved, 'C3o-',
         label='LS mix weights for mix+noise')
plt.plot(channel_idx, mw1, 'C1d-', lw=2,
         label='my ridge LS mix weights for mix+noise, coeff='+str(ridge_coeff))
plt.plot(channel_idx, mw2, 'C2+-',
         label='ridge LS mix weights for mix+noise, coeff='+str(ridge_coeff))
plt.xticks(channel_idx)
plt.xlabel('channel')
plt.ylabel('linear gain = linear mixing weight')
plt.legend()
plt.grid(True)

plt.tight_layout()

print('||mix_weights||_2^2 =', norm(mix_weights, 2)**2)
print('||LS mix_weights||_2^2 =', norm(mix_weights_solved, 2)**2)
print('||ridge LS mix_weights||_2^2 =', norm(mw2, 2)**2)

## Copyright

- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)
- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)
- feel free to use the notebooks for your own purposes
- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.