# Wake word processing

This notebook is to train a wake word model using tensorflow.

## Data Preprocessing

for data preprocessing we use the `python_speech_features` package for simplicity as we just have extract the mfcc feature and that package is as simple as posible

In [1]:
!pip install python_speech_features
!pip install scipy
!pip install numpy
!pip install keras
!pip install tensorflow==1.15



## This proccess is to make dataset.
The dataset consist of wake word and non wake word. In each audio file only have one class, either wake word or non wake word.
The audio itself, is then extracted using mfcc feature extraction that loaded from python_speech_features module. In this block we create dataset with various length of mfcc features, thus after this process it is required to apply some padding so the input data would be in the same shape. As for the target or label we make one hot encodeing so that the array of [0,1] would represent a non wake word uttarance and [1,0] as a wake word uttarance.

In [2]:
import glob
import numpy as np
from python_speech_features import mfcc
import scipy.io.wavfile as wav

ww = "/data/fandy/data-smart-speaker/new-project-haiindi/hai-indi_old/wake-word-16/"
nww = "/data/fandy/data-smart-speaker/new-project-haiindi/hai-indi_old/not-wake-word-16/"

X = []
Y = []   # ww = [1,0] , nww = [0,1]

maxshapeX = 0

for x in glob.glob(ww+"*.wav"):
    sr, frame = wav.read(x)
    feat = mfcc(frame, sr) 
    if feat.shape[0] > 1000:
        continue
    if feat.shape[0] > maxshapeX:
        maxshapeX = feat.shape[0]
        print(maxshapeX)
    X.append( feat )
    Y.append( np.array( [1, 0] ) )
    
for x in glob.glob(nww+"*.wav"):
    sr, frame = wav.read(x)
    feat = mfcc(frame, sr) 
    if feat.shape[0] > 1000:
        continue
    if feat.shape[0] > maxshapeX:
        maxshapeX = feat.shape[0]
        print(maxshapeX)
    X.append( feat )
    Y.append( np.array( [0, 1] ) )
    
Y = np.array(Y)

print(maxshapeX)

68
88
105
150
237
643
815
818
846
846


## Padding

In [3]:
def pad_along_axis(array: np.ndarray, target_length, axis=0):

    pad_size = target_length - array.shape[axis]
    axis_nb = len(array.shape)

    if pad_size < 0:
        return array

    npad = [(0, 0) for x in range(axis_nb)]
    npad[axis] = (0, pad_size)

    b = np.pad(array, pad_width=npad, mode='constant', constant_values=0)

    return b


for i in range(len(X)):
    X[i] = pad_along_axis(X[i], maxshapeX, 0)

X = np.array(X)
print(X.shape)
print(Y.shape)

np.save('X', X)
np.save('Y', Y)

(6258, 846, 13)
(6258, 2)


In [4]:
from keras.models import Sequential  
from keras.layers import Dense, Activation, BatchNormalization, Flatten, Conv1D, MaxPooling1D
from keras.layers import Dropout  
from keras.utils import to_categorical
import numpy as np
from sklearn.model_selection import train_test_split


def create_model_cnn(n_timesteps, n_dim, n_classes):
    model = Sequential()
    model.add(Conv1D(filters=64, kernel_size=3, activation='relu',data_format="channels_last", input_shape=(n_timesteps,n_dim)))
    model.add(Conv1D(filters=64, kernel_size=3, activation='relu'))
    model.add(Dropout(0.2))
    model.add(MaxPooling1D(pool_size=2))
    model.add(Flatten())
    model.add(Dense(100, activation='relu'))
    model.add(Dense(n_classes, activation='softmax'))
    model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

    return model



epochs = 20

X = np.load('X.npy')
Y = np.load('Y.npy')
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.1, random_state=19)

n_dim = X_train.shape[2]  
n_classes = y_train.shape[1] 
n_timesteps = X_train.shape[1]
model_cnn = create_model_cnn(n_timesteps, n_dim, n_classes)
print("CNN")
hist = model_cnn.fit(X_train, y_train, epochs=epochs, batch_size=4, verbose=2)
model_cnn.save('model-cnn.h5')

Using TensorFlow backend.


Instructions for updating:
If using Keras pass *_constraint arguments to layers.

CNN

Epoch 1/20
 - 17s - loss: 0.4596 - accuracy: 0.8274
Epoch 2/20
 - 17s - loss: 0.3099 - accuracy: 0.8841
Epoch 3/20
 - 16s - loss: 0.2563 - accuracy: 0.9022
Epoch 4/20
 - 16s - loss: 0.2249 - accuracy: 0.9157
Epoch 5/20
 - 17s - loss: 0.1776 - accuracy: 0.9283
Epoch 6/20
 - 16s - loss: 0.1536 - accuracy: 0.9419
Epoch 7/20
 - 16s - loss: 0.1421 - accuracy: 0.9466
Epoch 8/20
 - 17s - loss: 0.1115 - accuracy: 0.9606
Epoch 9/20
 - 16s - loss: 0.1072 - accuracy: 0.9599
Epoch 10/20
 - 16s - loss: 0.0916 - accuracy: 0.9689
Epoch 11/20
 - 17s - loss: 0.1030 - accuracy: 0.9698
Epoch 12/20
 - 17s - loss: 0.0799 - accuracy: 0.9743
Epoch 13/20
 - 16s - loss: 0.0609 - accuracy: 0.9798
Epoch 14/20
 - 17s - loss: 0.0743 - accuracy: 0.9757
Epoch 15/20
 - 17s - loss: 0.1131 - accuracy: 0.9698
Epoch 16/20
 - 16s - loss: 0.0673 - accuracy: 0.9790
Epoch 17/20
 - 17s - loss: 0.0954 - accuracy: 0.9812
Epoch 18/20
 - 16s - 

In [5]:
print("test with a new data")
print(model_cnn.evaluate(x=X_test, y=y_test))
# model_cnn.save('model-cnn.h5')

test with a new data
[0.9157367080164412, 0.9089456796646118]


In [49]:
%time
prob = []

for x in X_test :
    x = np.reshape( x, (1,846,13) )
    pred = model_cnn.predict(x)
    idx = np.argmax(pred)
    prob.append(pred[0][idx])
    kelas = "Wakeword" if idx == 0 else "Not wake word"
    print(f"Kelas {kelas} dengan probability {pred[0][idx]}")

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 58.7 µs
Kelas Wakeword dengan probability 1.0
Kelas Wakeword dengan probability 0.7535844445228577
Kelas Wakeword dengan probability 1.0
Kelas Not wake word dengan probability 1.0
Kelas Not wake word dengan probability 0.9960967898368835
Kelas Not wake word dengan probability 1.0
Kelas Wakeword dengan probability 0.9999998807907104
Kelas Wakeword dengan probability 1.0
Kelas Wakeword dengan probability 1.0
Kelas Wakeword dengan probability 1.0
Kelas Wakeword dengan probability 1.0
Kelas Not wake word dengan probability 0.9999911785125732
Kelas Wakeword dengan probability 1.0
Kelas Wakeword dengan probability 0.999968409538269
Kelas Wakeword dengan probability 0.9984637498855591
Kelas Wakeword dengan probability 1.0
Kelas Not wake word dengan probability 1.0
Kelas Wakeword dengan probability 0.9999723434448242
Kelas Not wake word dengan probability 0.9999688863754272
Kelas Wakeword dengan probability 1.0
Kelas Not wake word dengan 

## On the fly test
This is the wraper to e2e process

In [51]:
%time
import numpy as np
from python_speech_features import mfcc
import scipy.io.wavfile as wav
from keras.models import load_model


def pad_along_axis(array: np.ndarray, target_length, axis=0):

    pad_size = target_length - array.shape[axis]
    axis_nb = len(array.shape)

    if pad_size < 0:
        return array

    npad = [(0, 0) for x in range(axis_nb)]
    npad[axis] = (0, pad_size)

    b = np.pad(array, pad_width=npad, mode='constant', constant_values=0)

    return b

def load_h5model(path):
    return load_model(path)
    

    
model = load_h5model("model-cnn.h5")
x = "/data/fandy/data-smart-speaker/new-project-haiindi/hai-indi_old/wake-word-16/001_HAIINDI_F_22_JAWA_MM_1M_001-01.wav"

sr, frame = wav.read(x)
x = mfcc(frame, sr)

if x.shape != (846, 13):
    # Padding
    if x.shape[0] < 846:
        x = pad_along_axis(x, 846,0)
    elif x.shape[0] > 846:
        x = x[:846,:]

x = np.reshape( x, (1,846,13) )
pred = model.predict(x)
idx = np.argmax(pred)
kelas = "Wakeword" if idx == 0 else "Not wake word"
hasil = {
    'prob' : pred[0][idx],
    'label' : kelas
}
hasil

CPU times: user 0 ns, sys: 0 ns, total: 0 ns
Wall time: 48.2 µs


{'prob': 1.0, 'label': 'Wakeword'}

## Convert model into tensorlite model.


In [1]:
from keras.models import load_model
model = load_model('model-cnn.h5')
print(model.outputs)
print(model.inputs)
print([node.op.name for node in model.inputs])
print([node.op.name for node in model.outputs])

Using TensorFlow backend.


Instructions for updating:
If using Keras pass *_constraint arguments to layers.


[<tf.Tensor 'dense_2/Softmax:0' shape=(?, 2) dtype=float32>]
[<tf.Tensor 'conv1d_1_input:0' shape=(?, 846, 13) dtype=float32>]
['conv1d_1_input']
['dense_2/Softmax']


In [2]:
from keras import backend as K
import tensorflow as tf

def freeze_session(session, keep_var_names=None, output_names=None, clear_devices=True):
    from tensorflow.python.framework.graph_util import convert_variables_to_constants
    graph = session.graph
    with graph.as_default():
        freeze_var_names = list(set(v.op.name for v in tf.global_variables()).difference(keep_var_names or []))
        output_names = output_names or []
        output_names += [v.op.name for v in tf.global_variables()]
        # Graph -> GraphDef ProtoBuf
        input_graph_def = graph.as_graph_def()
        if clear_devices:
            for node in input_graph_def.node:
                node.device = ""
        frozen_graph = convert_variables_to_constants(session, input_graph_def,
                                                      output_names, freeze_var_names)
        return frozen_graph


frozen_graph = freeze_session(K.get_session(),
                              output_names=[out.op.name for out in model.outputs])

# Save to ./model/tf_model.pb
tf.train.write_graph(frozen_graph, "model", "tf_model.pb", as_text=False)

Instructions for updating:
Use `tf.compat.v1.graph_util.convert_variables_to_constants`
Instructions for updating:
Use `tf.compat.v1.graph_util.extract_sub_graph`
INFO:tensorflow:Froze 39 variables.
INFO:tensorflow:Converted 39 variables to const ops.


'model/tf_model.pb'