Sascha Spors,
Professorship Signal Theory and Digital Signal Processing,
Institute of Communications Engineering (INT),
Faculty of Computer Science and Electrical Engineering (IEF),
University of Rostock,
Germany

# Data Driven Audio Signal Processing - A Tutorial with Computational Examples

Winter Semester 2023/24 (Master Course #24512)

- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture
- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise

Feel free to contact lecturer frank.schultz@uni-rostock.de

# Binary logistic regression model with hidden layers and a sigmoid output layer

- we use TensorFlow & Keras API and scikit learn function to create and handle data sets

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import tensorflow as tf
import tensorflow.keras as keras
import tensorflow.keras.backend as K

print(
    "TF version",
    tf.__version__,
)

# tf.keras.backend.set_floatx('float64')  # we could use double precision

In [None]:
def predict_class(y):
    y[y < 0.5], y[y >= 0.5] = 0, 1

In [None]:
verbose = 1  # plot training status

In [None]:
# DATA
M = int(5 / 4 * 80000)  # number of samples per feature
N = 2  # number of features

train_size = 4 / 5  # 80% of data are used for training

# these seeds produce 'nice' two classes each with
# two clusters for chosen M, N and train_size
random_state_idx = 0
random_state = np.array([7, 21, 24, 25, 29, 33, 38])
X, Y = make_classification(
    n_samples=M,
    n_features=N,
    n_informative=N,
    n_redundant=0,
    n_classes=2,
    n_clusters_per_class=2,
    class_sep=1.5,
    flip_y=1e-2,
    random_state=random_state[random_state_idx],
)
X_train, X_test, Y_train, Y_test = train_test_split(
    X, Y, train_size=train_size, random_state=None
)
M_train = X_train.shape[0]
M_test = X_test.shape[0]
print("\nM_train", M_train)
print("X train dim", X_train.shape, "Y train dim", Y_train.shape)
print("\nM_test", M_test)
print("X test dim", X_test.shape, "Y test dim", Y_test.shape, "\n")

### Setup of Tensor Flow Model

- hyper parameters
- in practice we do hyper parameter tuning, see upcoming exercises

In [None]:
epochs = 2**2
batch_size = 2**3

- define model architecture based on fully connected layers
- in practice number and dimension of hidden layers should be hyper parameters to be learned

In [None]:
# some made up models, trainable params for N=2
# feel free to invent a new, optimum model for this classification example

# models too complex?! :
# no_perceptron_in_hl = np.array([64, 64])  # trainable params 4417
# no_perceptron_in_hl = np.array([64, 32, 16, 8, 4, 2])  # trainable params 2985
# no_perceptron_in_hl = np.array([64, 16, 4, 16, 64])  # trainable params 2533
# no_perceptron_in_hl = np.array([64, 4, 2, 4, 64])  # trainable params 859
# no_perceptron_in_hl = np.array([32, 16, 8, 4, 2])  # trainable params 809
# no_perceptron_in_hl = np.array([16, 16, 4, 2])  # trainable params 401
# no_perceptron_in_hl = np.array([16, 8, 4, 2])  # trainable params 233
# no_perceptron_in_hl = np.array([8, 8, 4, 2])  # trainable params 145
# no_perceptron_in_hl = np.array([5, 5, 5])  # trainable params 81
# model complexity reasonable?! :
# no_perceptron_in_hl = np.array([5, 3, 2])  # trainable params 44
# no_perceptron_in_hl = np.array([5, 4])  # trainable params 44
# no_perceptron_in_hl = np.array([5, 3])  # trainable params 37
# no_perceptron_in_hl = np.array([8])  # trainable params 33
# no_perceptron_in_hl = np.array([5, 2])  # trainable params 30
# no_perceptron_in_hl = np.array([5])  # trainable params 21
# no_perceptron_in_hl = np.array([3, 2])  # trainable params 20
no_perceptron_in_hl = np.array([2, 2])  # trainable params 15
# model too simple?! :
# no_perceptron_in_hl = np.array([2])  # trainable params 9->train this longer. i.e. with more epochs

- define and compile model

In [None]:
optimizer = keras.optimizers.SGD()

loss = keras.losses.BinaryCrossentropy(from_logits=False, label_smoothing=0)

metrics = [
    keras.metrics.BinaryCrossentropy(),
    keras.metrics.BinaryAccuracy(),
    keras.metrics.Precision(),  # PPV (FP related)
    keras.metrics.Recall(),  # TPR (FN related)
]

model = keras.Sequential()

model.add(keras.Input(shape=(N,)))  # input layer

for n in no_perceptron_in_hl:  # hidden layers (fully connected=dense)
    model.add(
        keras.layers.Dense(n, activation="tanh")
    )  # relu vs tanh makes a big difference

model.add(
    keras.layers.Dense(
        1,  # output layer, sigmoid for binary classification
        activation="sigmoid",
    )
)

model.compile(optimizer=optimizer, loss=loss, metrics=metrics)

tw = np.sum([K.count_params(w) for w in model.trainable_weights])
print("\ntrainable_weights:", tw, "\n")

### Train / Fit the Model

In [None]:
model.fit(
    X_train, Y_train, epochs=epochs, batch_size=batch_size, verbose=verbose
)

In [None]:
print(model.summary())
print("model weights\n", model.get_weights())

### Performance Measures: Fitted Model on Training Data Set

In [None]:
results = model.evaluate(X_train, Y_train, batch_size=M_train, verbose=False)
Y_train_pred = model.predict(X_train)
predict_class(Y_train_pred)

In [None]:
cost = results[0]
accuracy = results[2]
precision = results[3]
recall = results[4]
F1_score = 2 / (1 / precision + 1 / recall)  # harmonic mean
cm = tf.math.confusion_matrix(
    labels=Y_train, predictions=Y_train_pred, num_classes=2
)

print("binary_crossentropy cost ", cost)
print("precision / PPV (FP related)", precision)
print("recall / TPR (FN related)", recall)
print("accuracy", accuracy)
print(
    "our accuracy:",
    (cm.numpy()[0, 0] + cm.numpy()[1, 1]) / M_train * 100,
    "% are correct predictions",
)
print("F1", F1_score)
print(
    "\nconfusion matrix\nreal0,pred0  real0,pred1\nreal1,pred0  real1,pred1\nin % on train data:"
)
print(cm / M_train * 100)

### Test the Model

- we check model performance on **unseen** test data

In [None]:
results = model.evaluate(X_test, Y_test, batch_size=M_test, verbose=False)
Y_test_pred = model.predict(X_test)
predict_class(Y_test_pred)

### Performance Measures: Fitted Model on Test Data Set

In [None]:
cost = results[0]
accuracy = results[2]
precision = results[3]
recall = results[4]
F1_score = 2 / (1 / precision + 1 / recall)  # harmonic mean
cm = tf.math.confusion_matrix(
    labels=Y_test, predictions=Y_test_pred, num_classes=2
)

print("binary_crossentropy cost ", cost)
print("precision / PPV (FP related)", precision)
print("recall / TPR (FN related)", recall)
print("accuracy", accuracy)
print(
    "our accuracy:",
    (cm.numpy()[0, 0] + cm.numpy()[1, 1]) / M_test * 100,
    "% are correct predictions",
)
print("F1", F1_score)
print(
    "\nconfusion matrix\nreal0,pred0  real0,pred1\nreal1,pred0  real1,pred1\nin % on test data:"
)
print(cm / M_test * 100)

In [None]:
if N == 2:  # 2D plot of data and classification (curved) line
    f1, f2 = np.arange(-6, 6, 0.05), np.arange(-6, 6, 0.05)
    xv, yv = np.meshgrid(f1, f2)
    # create data such that TF can handle it in model.predict():
    Xgrid = np.concatenate(
        (np.reshape(xv, (1, -1)), np.reshape(yv, (1, -1))), axis=0
    ).T

    ygrid = model.predict(Xgrid)
    predict_class(ygrid)
    ygrid = np.reshape(ygrid, (xv.shape[0], xv.shape[1]))

    plt.figure(figsize=(12, 5))

    plt.subplot(1, 2, 1)  # left plot for training data set
    plt.plot(X_train[Y_train == 0, 0], X_train[Y_train == 0, 1], "C0o", ms=1)
    plt.plot(X_train[Y_train == 1, 0], X_train[Y_train == 1, 1], "C1o", ms=1)
    plt.contourf(f1, f2, ygrid, cmap="RdBu_r")
    plt.colorbar()
    plt.axis("equal")
    plt.xlim(-6, 6)
    plt.ylim(-6, 6)
    plt.title("training: " + str(X_train.shape))
    plt.xlabel("feature 1")
    plt.ylabel("feature 2")

    plt.subplot(1, 2, 2)  # right plot for test data set
    plt.plot(X_test[Y_test == 0, 0], X_test[Y_test == 0, 1], "C0o", ms=1)
    plt.plot(X_test[Y_test == 1, 0], X_test[Y_test == 1, 1], "C1o", ms=1)
    plt.contourf(f1, f2, ygrid, cmap="RdBu_r")
    plt.colorbar()
    plt.axis("equal")
    plt.xlim(-6, 6)
    plt.ylim(-6, 6)
    plt.title("test: " + str(X_test.shape))
    plt.xlabel("feature 1")
    plt.ylabel("feature 2")

Nice to do at home:
- instead of using `activation='tanh'` in the dense layers, we could experience that `activation='relu'` yields a more piece-wise linear classification boundary line.
- have a look at the classification boundary line and guess how many coefficients a polynomial or spline curve would need to create such a curve. A good model should exhibit about same parameter number to create this classification curve...thousands of model parameters for this example is too much. That's why mathematical thinking rather than just playing around helps a lot for such tasks.
- for more than two features `N` we cannot conveniently plot the data sets and boundary line anymore. Hence, instead of having visual contact to the data and classification, we heavily rely on the performances measures...we make sure that we fully understand the given numbers.
- How to choose the best number of features? How to find the best model? When is a model fully trained?...That are important questions for real applications. We soon learn about hyper parameter tuning, regularization...  

## Copyright

- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)
- feel free to use the notebooks for your own purposes
- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)
- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.