Sascha Spors,
Professorship Signal Theory and Digital Signal Processing,
Institute of Communications Engineering (INT),
Faculty of Computer Science and Electrical Engineering (IEF),
University of Rostock,
Germany

# Data Driven Audio Signal Processing - A Tutorial with Computational Examples

Winter Semester 2023/24 (Master Course #24512)

- lecture: https://github.com/spatialaudio/data-driven-audio-signal-processing-lecture
- tutorial: https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise

Feel free to contact lecturer frank.schultz@uni-rostock.de

# Multiclass Classification
- One Hot encoding
- Data set splitting into train, test
- **Softmax** activation function at output layer
- categorical cross-entropy loss
- we use convenient stuff from scikit-learn

## Imports

In [None]:
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import OneHotEncoder, LabelBinarizer
import tensorflow as tf
from tensorflow import keras

print(
    "TF version",
    tf.__version__,
)

tf.keras.backend.set_floatx("float64")  # we could use double precision

verbose = 1  # plot training status

## Data Synthesis, One Hot Encoding, Train/TestSplitting

In [None]:
nlabels = 3  # number of classes
labels = np.arange(nlabels)  # we encode as integers

m = int(5 / 4 * 80000)  # data examples
nx = 2 * nlabels  # number of features, we set it to 6 here

train_size = 4 / 5  # 80% of data are used for training

X, Y = make_classification(
    n_samples=m,
    n_features=nx,
    n_informative=nx,
    n_redundant=0,
    n_classes=nlabels,
    n_clusters_per_class=1,
    class_sep=1,
    flip_y=1e-2,
    random_state=None,
)
encoder = OneHotEncoder(sparse_output=False)
Y = encoder.fit_transform(Y.reshape(-1, 1))

X_train, X_test, Y_train, Y_test = train_test_split(
    X, Y, train_size=train_size, random_state=None
)
m_train = X_train.shape[0]
m_test = X_test.shape[0]
print("m_train", m_train)
print("m_test", m_test)
print("X train dim", X_train.shape, "Y train dim", Y_train.shape)
print("X test dim", X_test.shape, "Y test dim", Y_test.shape, "\n")

## Setup of Model Using Fully Connected Layers

In [None]:
# hyper parameters should be learned as well, however for this toy example
# we set them for reasonable computing time and appropriate results
epochs = 10
no_perceptron_in_hl = np.array([2 * nx, 4 * nx, nlabels])
batch_size = 32

# model architecture
optimizer = keras.optimizers.Adam()
loss = keras.losses.CategoricalCrossentropy(
    from_logits=False, label_smoothing=0
)
metrics = [
    keras.metrics.CategoricalCrossentropy(),
    keras.metrics.CategoricalAccuracy(),
]

model = keras.Sequential()
# apply input layer
model.add(keras.Input(shape=(nx,)))
# apply hidden layers
for n in no_perceptron_in_hl:
    model.add(keras.layers.Dense(n, activation="relu"))
# apply output layer with softmax for multi-class classificaton
model.add(keras.layers.Dense(nlabels, activation="softmax"))
# let TF compile the model architecture, one key step in compiling
# is to set up the forward and backward propagation workflow through the model
model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
print(model.summary())
# tw = np.sum([K.count_params(w) for w in model.trainable_weights])
# print('\ntrainable_weights', tw, '\n')

## Training of Model

In [None]:
model.fit(
    X_train,
    Y_train,
    validation_data=(X_test, Y_test),
    epochs=epochs,
    batch_size=batch_size,
    verbose=verbose,
)
print(model.summary())

## Evaluation of Model

In [None]:
def print_results(X, Y):
    # https://stackoverflow.com/questions/48908641/how-to-get-a-single-value-from-softmax-instead-of-probability-get-confusion-ma:
    lb = LabelBinarizer()
    lb.fit(labels)

    m = X.shape[0]
    results = model.evaluate(X, Y, batch_size=m, verbose=verbose)
    Y_pred = model.predict(X)
    cm = tf.math.confusion_matrix(
        labels=lb.inverse_transform(Y),
        predictions=lb.inverse_transform(Y_pred),
        num_classes=nlabels,
    )
    print("data entries", m)
    print(
        "Cost",
        results[0],
        "\nCategoricalCrossentropy",
        results[1],
        "\nCategoricalAccuracy",
        results[2],
    )
    print(
        "nCategoricalAccuracy from Confusion Matrix = ",
        np.sum(np.diag(cm.numpy())) / m,
    )
    print("Confusion Matrix in %\n", cm / m * 100)


print("\n\nmetrics on train data:")
print_results(X_train, Y_train)

print("\n\nmetrics on !never seen! test data:")
print_results(X_test, Y_test)
# recall: the model should generalize well on never before seen data 

## Copyright

- the notebooks are provided as [Open Educational Resources](https://en.wikipedia.org/wiki/Open_educational_resources)
- the text is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/)
- the code of the IPython examples is licensed under the [MIT license](https://opensource.org/licenses/MIT)
- feel free to use the notebooks for your own purposes
- please attribute the work as follows: *Frank Schultz, Data Driven Audio Signal Processing - A Tutorial Featuring Computational Examples, University of Rostock* ideally with relevant file(s), github URL https://github.com/spatialaudio/data-driven-audio-signal-processing-exercise, commit number and/or version tag, year.