Sascha Spors,
Professorship Signal Theory and Digital Signal Processing,
Institute of Communications Engineering (INT),
Faculty of Computer Science and Electrical Engineering (IEF),
University of Rostock,
Germany

# Data Driven Audio Signal Processing - A Tutorial with Computational Examples

Winter Semester 2022/23 (Master Course #24512)

- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture
- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise

Feel free to contact lecturer frank.schultz@uni-rostock.de

# Trade-Off Between Bias^2 / Variance and Model Complexity for Linear Regression

- we use plain ordinary **least squares** (OLS) based **linear regression**
- we check **over**-/**underfitting** via bias$^2$/variance on models that were trained and predicted on noisy data (note here: **train data=test data**)
- for this toy example we know the real world (unnoisy) data, because we know the linear model equation that creates these data, so we are pretty sure about our interpretations on the models' performances
- in reality we deal with an unknown model, so we should be pretty cautious on the model complexity and over-/underfitting cases
- a robust model has a good **trade-off of bias^2/variance** (see this notebook) and predicts reasonable outcomes to unknown input data (this is part of another notebook)

Useful chapters in textbooks:
- [Bishop 2006] Christopher M. Bishop, Pattern Recognition and Machine Learning, Springer, 2006, Chapter 3.2
- Kevin P. Murphy, Machine Learning-A Probabilistic Perspective, MIT Press, 2012, 1st ed., Chapter 6.4.4
- Kevin P. Murphy, Probabilistic Machine Learning-An Introduction, MIT Press, 2022, Chapter 4.7.6.3
- Gareth James, Daniela Witten, Trevor Hastie, Rob Tibshirani, An Introduction to Statistical Learning with Applications in R, Springer, 2nd ed., 2021, Chapter 2.2.2
- Trevor Hastie, Robert Tibshirani, Jerome Friedman, The Elements of  Statistical Learning: Data Mining, Inference, and Prediction, Springer, 2nd ed., 2009, Chapter 2.9
- Richard O. Duda, Peter E. Hart, David G. Stork, Pattern Classification, Wiley, 2000, 2nd ed., Chapter 9.3

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from statsmodels.api import OLS

## Real World Model and Real World Data

In [None]:
# number of observations / samples:
N = 2**8
# real world model with x as input variable to create 4 features:
x = np.linspace(0, 2*np.pi, N)
X = np.column_stack((np.cos(x),
                     np.sin(2*x),
                     np.cos(5*x),
                     np.cos(6*x)))
# add a bias/intercept column to the design/feature matrix:
X = np.hstack((np.ones((X.shape[0], 1)), X))
hasconst = True
# some nice numbers for the true model parameters beta:
beta = np.array([3, 2, 1, 1/2, 1/4])

In [None]:
# generate 'real world' data with the design matrix of 'real world' model
y = np.dot(X, beta)
plt.figure(figsize=(5, 3))
plt.plot(y, 'k-')
plt.xlabel("independent features' input variable x")
plt.ylabel(('dependent variable yn'))
plt.title('real world data as linear model (x -> 4 features + intercept)')
plt.xlim(0, N)
plt.ylim(-2, 8)
plt.grid(True)
print(X.shape, y.shape)

## Function for Train / Predict and Calc Bias^2 / Variance 

In [None]:
def bias_variance_of_model(X, noise_scale=0.5):
    # add bias column to the design matrix
    X = np.hstack((np.ones((X.shape[0], 1)), X))
    hasconst = True
    print('\nshape of model/feature matrix X:',
          X.shape,
          '\nrank of matrix X / # of model parameters:',
          np.linalg.matrix_rank(X))
    # init random number generator to reproduce results
    rng = np.random.default_rng(1234)
    # generate L data sets with added noise
    L = 2**7
    noise = rng.normal(loc=0, scale=noise_scale, size=(N, L))
    Yn = y[:, None] + noise
    # alloc memory for all predictions
    Yhat = np.zeros((N, L))
    # train and predict L models on these L data sets
    for i in range(L):
        model = OLS(Yn[:, i], X, hasconst=hasconst)  # OLS model
        results = model.fit()  # train the model
        Yhat[:, i] = results.predict(X)  # predict outcome
    
    # get average prediction, i.e. mean over the L models
    # which is a numerical eval of the expectation:
    ym = np.mean(Yhat, axis=1)  # (3.45) in [Bishop 2006]
    
    # get integrated squared bias (numerical eval of the expectation):
    # note: y is the real world data
    bias_squared = np.mean((ym - y)**2)  # (3.42), (3.46) in [Bishop 2006]
    
    # get integrated variance (numerical eval of the expectation):
    variance = np.mean(
        np.mean((Yhat - ym[:, None])**2, axis=1),
        axis=0)  # (3.43), (3.47) in [Bishop 2006]

    for i in range(L):
        axs[0].plot(Yn[:, i])
        axs[1].plot(Yhat[:, i])

    axs[1].plot(y, 'k-', label='true model')
    
    axs[1].plot(np.mean(Yhat, axis=1), ':', color='gold', label='$\mu(\hat{Y})$')
    
    
    axs[1].plot(np.mean(Yhat, axis=1) + np.std(Yhat, axis=1), '--', lw=0.75,
                color='gold', label='$\mu(\hat{Y}) + \sigma(\hat{Y})$')
    axs[1].plot(np.mean(Yhat, axis=1) - np.std(Yhat, axis=1), '-.', lw=0.75,
                color='gold', label='$\mu(\hat{Y}) - \sigma(\hat{Y})$')
    
    
    axs[1].set_title(r'bias$^2$='+'{:4.3f}'.format(
        bias_squared)+', var='+'{:4.3f}'.format(
        variance)+r', bias$^2$+var='+'{:4.3f}'.format(
        bias_squared+variance))
    for i in range(2):
        axs[i].set_xlim(0, N)
        axs[i].set_ylim(-2, 8)
        axs[i].grid(True)
        axs[i].set_xlabel("independent features' input variable x")
    axs[0].set_ylabel('dependent variable yn = outcome')
    axs[1].set_ylabel('predicted variable yhat')
    axs[1].legend()
    print('bias^2 + variance  = ', bias_squared+variance)

## Check Models

In [None]:
X = np.copy(x)[:, None]
fig, axs = plt.subplots(1, 2, figsize=(10, 3))
bias_variance_of_model(X)
axs[0].set_title('underfit, too low model complexity, high bias, low var');

In [None]:
X = np.column_stack((np.cos(x), np.sin(x)))
# <=N//2 makes sure we do not use more model parameters than signal samples
# in order to solve this as a least-squares problem, i.e. using left-inverse
for m in range(2, N//2):
    X = np.column_stack((X, np.sin(m*x), np.cos(m*x)))
fig, axs = plt.subplots(1, 2, figsize=(10, 3))
bias_variance_of_model(X)
axs[0].set_title('overfit, too high model complexity, low bias, high var');

In [None]:
X = np.column_stack((np.cos(x),
                     np.sin(2*x),
                     np.cos(5*x),
                     np.cos(6*x)))
fig, axs = plt.subplots(1, 2, figsize=(10, 3))
bias_variance_of_model(X)  # lowest possible bias^2+variance, because we
# know the true model (which in practice never happens!)
# the remaining variance is from the added noise
axs[0].set_title(
    'true model, lowest bias, lowest var');

In [None]:
X = np.column_stack((np.cos(x),
                     np.sin(2*x)))
fig, axs = plt.subplots(1, 2, figsize=(10, 3))
bias_variance_of_model(X)
axs[0].set_title('good bias/var trade-off for unknown true model');

## Copyright

- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)
- feel free to use the notebooks for your own purposes
- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
- the code of the IPython examples is licensed under under the [MIT license](https://opensource.org/licenses/MIT)
- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.