# Classification of Bulk Material by Structure Borne Sound

Different bulk material creates characteristic structure borne sound when rolling/slipping down a ramp. In this example, the classification of different bulk material from its structure borne sound is shown. As data, audio recordings from different types of screws and bolts rolling down an aluminium ramp are used. A deep neural network (DNN) is trained and evaluated as classifier.

In [None]:
import numpy as np
import soundfile as sf
import tensorflow as tf
import keras
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
import matplotlib.pyplot as plt
%matplotlib inline

## Data Pre-Processing

For the recordings, two microphones where placed below a aluminium ramp in order to pick up the structure borne sound at two different positions. The stereo files contain the audio signals from the two microphones for different materials and repeated measurements.

First the segmentation parameters and the filenames are defined

In [None]:
segment_size = 512  # length of segments
segment_step = 128  # overlap of segments = segment_size - segment_step

n_classes = 5  # number of classes

fprefix = 'data/07_12_2017_recordings/'  # directory of recordings

# list of filenames and class labels
flist = {
    'plastic_spheres_1.wav': 0,
    'plastic_spheres_2.wav': 0,
    'plastic_spheres_3.wav': 0,
    'plastic_spheres_4.wav': 0,
    'steel_nut_M3_1.wav': 1,
    'steel_nut_M3_2.wav': 1,
    'steel_nut_M3_3.wav': 1,
    'steel_nut_M3_4.wav': 1,
    'steel_nut_M4_1.wav': 2,
    'steel_nut_M4_2.wav': 2,
    'steel_nut_M4_3.wav': 2,
    'steel_nut_M4_4.wav': 2,
    'messing_nut_M4_1.wav': 3,
    'messing_nut_M4_1.wav': 3,
    'messing_nut_M4_1.wav': 3,
    'messing_nut_M4_1.wav': 3,
    'screws_1.wav': 4,
    'screws_2.wav': 4,
    'screws_3.wav': 4,
    'screws_4.wav': 4
}

The recordings are now imported, normalized, segmented and composed into the feature matrix $\mathbf{X}$ and class vector $\mathbf{y}$. The magnitude spectrum of the samples in a segment is used as feature. 


First as a function which handles the data pre-processing is defined, followed by the actual composition of the feature matrix and class vector

In [None]:
def import_data(fname, nclass, segment_size, segment_step):
    # read wav file
    data, _ = sf.read(fname)
    # normalize level
    data = data/np.max(np.abs(data[:]))
    # put both channels into one vector
    data = np.ndarray.flatten(data, order='F')
    # segment and sort into feature matrix
    nseg = np.ceil((len(data)-segment_size)/segment_step)
    X = np.array([data[i*segment_step:i*segment_step+segment_size]
                  for i in range(int(nseg))])
    # construct target vector with one hot encoding
    y = np.zeros((X.shape[0], n_classes), dtype=np.int)
    y[:, nclass] = 1

    return X, y


X = np.empty((0, segment_size))
y = np.empty((0, n_classes), dtype=np.int)
for fname, nclass in flist.items():
    Xt, yt = import_data(fprefix+fname, nclass,
                         segment_size=segment_size, segment_step=segment_step)
    X = np.append(X, Xt, axis=0)
    y = np.append(y, yt, axis=0)


# use magnitude spectrum as feature
X = np.abs(np.fft.rfft(X, axis=1))
X = np.float32(X)

The feature matrix and class vector is split into a training and a test dataset for training and evaluation of the model

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/4, random_state=42)
print('loaded {0:5.0f}/{1:<5.0f} training/test samples'.format(X_train.shape[0], X_test.shape[0]))

## Construction and Training of Classifier

The classifier is constructed as a 2-layer DNN model with two fully connected 64-node layers besides the input and output layer

In [None]:
model = keras.models.Sequential()
model.add(keras.layers.Dense(
    64, input_dim=X_train.shape[1], activation='relu', kernel_regularizer=keras.regularizers.l2(0.01)))
model.add(keras.layers.Dense(64, activation='relu',
                             kernel_regularizer=keras.regularizers.l2(0.01)))
model.add(keras.layers.Dense(n_classes, activation='sigmoid'))
model.compile(loss='binary_crossentropy',
              optimizer='adam', metrics=['accuracy'])

print(model.summary())

Now the model is trained using the training dataset

In [None]:
history = model.fit(X_train, y_train, epochs=200, batch_size=128)

The model accuracy is plotted for visualization of the training progress

In [None]:
plt.plot(history.history['loss'])
plt.xlabel('epoch')
plt.ylabel('loss')
plt.title('training loss')
plt.grid()

## Evaluation of Classifier

The trained classifier is evaluated using the test dataset not used for training of the model

In [None]:
# predict the classes for the test dataset
predictions = model.predict(X_test)
predictions = np.argmax(predictions, axis=1)

# derive the true classes from one-hot encoding
true = np.argmax(y_test, axis=1) 

# print the classification report
print(classification_report(true, predictions, target_names=('plastic spheres', 'M3 steel nuts', 'M4 steel nuts', 'M4 messing nuts', 'screws')))

## Exercises

* Analyze the code in detail and understand its function. For instance
    * What is the overlap in percent of the data segments extracted from the audio recordings?
    * Why can we use a real-valued FFT to represent the segments in the spectral domain?
    * Why is it common the split the dataset into a training/test dataset?
    * What is an epoch and a batch?
    * What insights does the loss give on the training?
    * Can you explain the different metrics used in the classification report?
    

* Change the structure of the deep neural network and check the influence of these changes on the model performance. E.g.
    * increase/decrease the depth of single layers
    * add/remove layers
    * change the activation function


* Change the training of the network. E.g.
    * change the number of epochs and the batch size
    * use a different optimizer


* Construct a new model from your insights. What is the best performance you can reach with your model?

## Advanced Exercises

* Compute the confusion matrix. What classes are likely to be confused? Why?
* Redesign the classifier using a convolutive neural network. What performance can you reach?

**Copyright**

This notebook is provided as [Open Educational Resource](https://en.wikipedia.org/wiki/Open_educational_resources). Feel free to use the notebook for your own purposes. The text/data is licensed under [Creative Commons Attribution 4.0](https://creativecommons.org/licenses/by/4.0/), the code of the IPython examples under the [MIT license](https://opensource.org/licenses/MIT).