Sascha Spors,
Professorship Signal Theory and Digital Signal Processing,
Institute of Communications Engineering (INT),
Faculty of Computer Science and Electrical Engineering (IEF),
University of Rostock,
Germany

# Data Driven Audio Signal Processing - A Tutorial with Computational Examples

Winter Semester 2021/22 (Master Course #24512)

- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture
- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise

Feel free to contact lecturer frank.schultz@uni-rostock.de

# Exercise 8: Ridge Regression 

## Objectives


In [None]:
import matplotlib.pyplot as plt
import numpy as np
import os
import shutil
from scipy.io import wavfile
from numpy.linalg import cond, matrix_rank
from scipy.linalg import inv, lstsq, pinv, svd, diagsvd, norm
from sklearn.linear_model import LinearRegression, Ridge

# some 'global' variables that should be appear here at top for convenience
flag_channel = 'Mono'  # 'Left', 'Right' or 'Mono'

### Data Preparation

The notebook gains most fun, if we have multitrack audio to work with, ideally on our own computer, even more ideally we have some own recorded music ;-)

We can get some nice multitrack audio material from [https://www.cambridge-mt.com/ms/mtk/](https://www.cambridge-mt.com/ms/mtk/) .

This data base supports Mike Senior's 2nd edition of the book "[The Mixing Secrets For The Small Studio](https://cambridge-mt.com/ms/main/)".

The data base is provided for educational purpose and we will use it precisely this way...maybe in slightly different mixing approach than initially intended, but we will doing a mixing job here...using the SVD.

So, if we need some multitrack material, we can download one of the zip files. Below we use 4 examples that worked out very illustratively.

To create a matrix where the columns represent the multitrack files (either the left or right channel or summed to mono) we need equal length wav files to work with the code below (I was too lazy to program this nicely).

A reasonable workflow is to use a digital audio workstaion (DAW) software, such as [Reaper](https://www.reaper.fm/purchase.php) (please support these guys, non-commercial license is very cheap).

In Reaper we would do:
- grabbing all multitrack files from a certain zip into an empty project
- making them separate tracks
- choose an appropriate time selection
- do some simple level mixing accroding to actual taste
- selecting all tracks
- open the `render to file` dialog
    - choose Source: `Selected tracks (stems)`
    - choose Bounds: `Time selection`
    - Directory: create a `multi-track` folder as subfolder where the raw wav files are stored
    - File name: `multi-track`
    - make sure that original sampling rate is used
    - Format: `Wav`
    - WAV bit depth: `32 Bit FP` (uses more storage, but then we don't need to pay attention to numbers larger/smaller than 1 if some mixing was performed)
    - check `Dry Run (no output`)
    - if this simulated rendering comes with the expected channels into the expected number of files with naming convenntion `multi_track-xxx.wav`
    - we are ready to render using `Render xxx files``
- maybe it's a good idea to save this project for further reference, such as checking what different mixes do to the SVD data...
 

In [None]:
# choose one of the nice multitrack projects:

#path = 'audio_ex05/cfx_Mathematician_Full/'
# c:fx, 'Mathematician', https://soundcloud.com/c-fx
# multitrack for educational use only:
# https://mtkdata.cambridgemusictechnology.co.uk/MTK005/cfx_Mathematician.zip

#path = 'audio_ex05/MaurizioPagnuttiSextet_AllTheGinIsGone/'
# Maurizio Pagnutti Sextet, 'All The Gin Is Gone', https://www.artesuono.it/album.aspx?id=76&p=0
# for educational use only:
# https://multitracks.cambridge-mt.com/MaurizioPagnuttiSextet_AllTheGinIsGone.zip

path = 'audio_ex05/cryonicPAX_Excessive/'
# cryonicPAX, 'Excessive'
# multitrack for educational use only:
# https://mtkdata.cambridgemusictechnology.co.uk/MTK015/cryonicPAX_Excessive.zip

#path = 'audio_ex05/Fin_Echoes/'
# FIN, 'Echoes'
# multitrack for educational use only:
# https://multitracks.cambridge-mt.com/Fin_Echoes.zip

In [None]:
# set up path name
pathr = path + 'multi_track/'
#  read in multitrack wavs and store in a matrix A
files = sorted(os.listdir(pathr))
#print(files)
flag_conc = True

for i in files:
    if i[-4:] == '.wav':  # consider only multi_track-xxx.wav
        if i[0:12] == 'multi_track-':
            fs, tmp = wavfile.read(pathr+i)
            if flag_channel == 'Mono':  # we assume that the files are stereo
                # make mono and (xxx, 1) dimension
                x = np.expand_dims((tmp[:, 0] + tmp[:, 1]) / 2, 1)
            elif flag_channel == 'Left':
                # take left channel and (xxx, 1) dimension
                x = np.expand_dims(tmp[:, 0], 1)
            elif flag_channel == 'Right':
                # take right channel and (xxx, 1) dimension
                x = np.expand_dims(tmp[:, 1], 1)
            else:
                print('!!! check flag_channel !!!')
                break
            print(i, x.shape, x.dtype)
            # non-elegant way to stack all stems (single channel tracks) into matrix
            if flag_conc:
                A, flag_conc = x, False
            else:
                A = np.concatenate((A, x), axis=1)

print('\nmulti track matrix A', A.shape, A.dtype)

In [None]:
path_tmp = 'audio_ex08/'
try:
    shutil.rmtree(path_tmp)
except OSError as e:
    print("Error: %s : %s" % (path_tmp, e.strerror))
os.mkdir(path_tmp)

In [None]:
no_samples = A.shape[0]
no_channels = A.shape[1]
print('no_samples:', no_samples, '\nno_channels:', no_channels)
#set up mixing weights to unity gain for all channels:
mix_weights = np.ones([no_channels, 1])  # this is x for the standard problem A x = b 

In [None]:
# mix is in column space:
mix = A @ mix_weights  # this is b for the standard problem A x = b 
# so we expect that LS solution yields exactly mix_weights
mix_weights_solved, res, rnk, s = lstsq(A, mix)
np.allclose(mix_weights_solved, mix_weights)

In [None]:
# get some very basic audio features
def crest(x):  # only for single dim array
    sq_mean = 1/np.size(x) * np.sum(x**2)
    sq_abs_peak =  np.max(x**2)
    cf = 10 * np.log10(sq_abs_peak / sq_mean)  # crest factor
    std = np.std(x)
    mean = np.mean(x)
    peak = np.max(np.abs(x))
    return sq_mean, sq_abs_peak, cf, std, mean, peak

In [None]:
sq_mean_mix, sq_abs_peak_mix, cf_mix, std_mix, mean_mix, peak_mix = crest(mix)
print('features for mix')
print(f"{'sq_mean_mix:'} {sq_mean_mix:+4.3e}")
print(f"{'sq_abs_peak_mix:'} {sq_abs_peak_mix:+4.3e}")
print(f"{'cf_mix:'} {cf_mix:+4.2f} {'dB'}")
print(f"{'std_mix:'} {std_mix:+4.3e}")
print(f"{'mean_mix:'} {mean_mix:+4.3e}")
print(f"{'peak_mix:'} {peak_mix:+4.3e}")

# create noise with stdev that matches an specific SNR between mix and noise:
#mean, stdev = 0, std_mix  / 1e14  # this yields +280 dB SNR
#mean, stdev = 0, std_mix  / 1e7  # this yields +140 dB SNR
#mean, stdev = 0, std_mix  / 100  # this yields +40 dB SNR
#mean, stdev = 0, std_mix  / 10  # this yields +20 dB SNR
#mean, stdev = 0, std_mix  / 2  # this yields +6 dB SNR
mean, stdev = 0, std_mix  / 1  # this yields +0 dB SNR
#mean, stdev = 0, std_mix  * 2  # this yields -6 dB SNR (noise louder than music)
#mean, stdev = 0, std_mix  * 4  # this yields -12 dB SNR (noise louder than music)

rng = np.random.default_rng(1)
noise = rng.normal(mean, stdev, [no_samples, 1])
sq_mean_noise, sq_abs_peak_noise, cf_noise, std_noise, mean_noise, peak_noise = crest(noise)
print('\nfeatures for noise')
print(f"{'sq_mean_noise:'} {sq_mean_noise:+4.3e}")
print(f"{'sq_abs_peak_noise:'} {sq_abs_peak_noise:+4.3e}")
print(f"{'cf_noise:'} {cf_noise:+4.2f} {'dB'}")
print(f"{'std_noise:'} {std_noise:+4.3e}")
print(f"{'mean_noise:'} {mean_noise:+4.3e}")
print(f"{'peak_noise:'} {peak_noise:+4.3e}")

SNR = 10*np.log10(sq_mean_mix / sq_mean_noise)
print(f"\n{'SNR'} {SNR:4.2f} {'dB'}")

# standard problem Ax = b
# here: x-> mix_weights, b-> mixdown

# add noise to the mix that was created above
# mix_with_noise is thus not necessarily in pure column space of A
mix_with_noise = np.squeeze(mix) + np.squeeze(noise)
wavfile.write(path_tmp+'mix_with_noise_SNR_'+str(int(SNR))+'dB.wav', fs, mix_with_noise)

# LS solution using lstsq():
mix_weights_solved, res, rnk, s = lstsq(A, mix_with_noise)
print('mix_weights_solved == mix_weights:',
      np.allclose(mix_weights_solved, mix_weights))

plt.figure(figsize=(10,6))
plt.stem(20*np.log10(mix_weights), basefmt='C0:', linefmt='C0', markerfmt='C0o',
        label='exact mix weights')
plt.stem(20*np.log10(mix_weights_solved), basefmt='C3:', linefmt='C3', markerfmt='C3o',
         label='LS mix weights for mix+noise')
plt.xticks(np.arange(no_channels))
plt.xlabel('channel')
plt.ylabel('dB')
plt.title(f"{'SNR ='} {SNR:4.2f} {'dB'}")
plt.legend()
plt.grid(True)

In [None]:
def my_ridge_regression_via_svd(A, lmb = 0):
    # textbook implementation for tall/thin, full col rank A
    # we should not use this for practical applications!
    [U, s, Vh] = svd(A, full_matrices=False)
    S = diagsvd(s, A.shape[1], A.shape[1])
    V = Vh.conj().T
    Sli_ridge = inv(S.conj().T @ S + lmb*np.eye(A.shape[1])) @ S.conj().T
    Ali_ridge = V @ Sli_ridge @ U.conj().T  # for lmb=0 this returns the left inverse
    return Ali_ridge

In [None]:
print(f"\n{'SNR'} {SNR:4.2f} {'dB'}")
ridge_coeff = 0
if ridge_coeff == 0:
    # check sklearn.linear_model functions vs. our own function
    mw1 = my_ridge_regression_via_svd(A, lmb=ridge_coeff) @ mix_with_noise
    print(np.allclose(mw1, mix_weights))  # returns True only for very large SNR!, i.e. b is in the columns space of A

    model_ridge_regression = Ridge(alpha=ridge_coeff, solver='svd', tol=1e-3)
    model_ridge_regression.fit(A, mix_with_noise)
    mw2 = model_ridge_regression.coef_
    print(np.allclose(mw2, mix_weights))

    model_regression = LinearRegression()
    model_regression.fit(A, mix_with_noise)
    mw3 = model_regression.coef_
    print(np.allclose(mw3, mix_weights))  # returns True only for very large SNR!


In [None]:
ridge_coeff = 10 #0.5

mw1 = my_ridge_regression_via_svd(A, lmb=ridge_coeff) @ mix_with_noise

# model_ridge_regression = Ridge(alpha=ridge_coeff, solver='svd', tol=1e-3)
# already created above, we just change the ridge coeff
model_ridge_regression.alpha = ridge_coeff
model_ridge_regression.fit(A, mix_with_noise)
mw2 = model_ridge_regression.coef_

print(np.allclose(mw1, mw2))

plt.figure(figsize=(10,10))
plt.subplot(2,1,1)
plt.stem(20*np.log10(mix_weights), basefmt='C0:', linefmt='C0', markerfmt='C0o',
         label='exact mix weights')
plt.stem(20*np.log10(mix_weights_solved), basefmt='C3:', linefmt='C3', markerfmt='C3o',
         label='LS mix weights for mix+noise')
plt.plot(20*np.log10(mix_weights_solved), 'C3o', ms=10)
plt.stem(20*np.log10(mw2), basefmt='C1:', linefmt='C1', markerfmt='C1d',
         label='my ridge LS mix weights for mix+noise, coeff='+str(ridge_coeff))
plt.stem(20*np.log10(mw2), basefmt='C2:', linefmt='C2:', markerfmt='C2+',
         label='ridge LS mix weights for mix+noise, coeff='+str(ridge_coeff))
plt.xticks(np.arange(no_channels))
plt.xlabel('channel')
plt.ylabel('dB')
plt.title('SNR = '+str(SNR)+' dB')
plt.legend()
plt.grid(True)

plt.subplot(2,1,2)
plt.plot(mix_weights, 'C0',
         label='exact mix weights')
plt.plot(mix_weights_solved, 'C3',
         label='LS mix weights for mix+noise')
plt.plot(mw1, 'C1', lw=3,
         label='my ridge LS mix weights for mix+noise, coeff='+str(ridge_coeff))
plt.plot(mw2, 'C2',
         label='ridge LS mix weights for mix+noise, coeff='+str(ridge_coeff))
plt.xticks(np.arange(no_channels))
plt.xlabel('channel')
plt.ylabel('linear gain')
plt.legend()
plt.grid(True)

print('||mix_weights||_2^2 =', norm(mix_weights,2)**2)
print('||LS mix_weights||_2^2 =', norm(mix_weights_solved,2)**2)
print('||ridge LS mix_weights||_2^2 =', norm(mw2,2)**2)

## Copyright

- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)
- feel free to use the notebooks for your own purposes
- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
- the code of the IPython examples is licensed under under the [MIT license](https://opensource.org/licenses/MIT)
- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.