In [1]:
from utils import generate_graphs, prepare_data, prepare_model, plot_losses, evaluate_model, LossHistory
from keras.callbacks import ModelCheckpoint, TensorBoard
import keras
from sklearn.metrics import classification_report

ModuleNotFoundError: No module named 'librosa'

Here we load in the necessary libraries and functions defined in the utils file. 

First, we can explore several sets of graphs. 

In [None]:
generate_graphs('0_jackson_0.wav')
generate_graphs('1_jackson_1.wav')

The first set of graphs shows the same speaker saying two different numbers. We can see that the speaker Jackson speaks differently while saying one than zero. 

Next, we can compare Jackson saying zero to Theo saying zero. Despite they are saying the same number, there is still diferences in the visuals. We want to make sure that our model generalizes to any speaker saying any digit

In [None]:
generate_graphs('0_jackson_0.wav')
generate_graphs('0_theo_0.wav')

To start the tuning process, I can separate the recordings into a training, validation, and testing set.

In [None]:
random.seed(9001)
X_train, X_val, X_test, y_train, y_val, y_test, input_output = prepare_data('./recordings/')

Next we can create two of the different architectures that I considered: Feedforward and Convolutional. Note that this convolutional net has no max pooling, dropout, or batch normalization.

In [None]:
FF_model = prepare_model(input_output, modeltype='FF')
CNN_model = prepare_model(input_output, modeltype='CNN', dropout=False, maxpooling=False, batch_n=False)

Due to the fact that the Feedforward network has almost 3 million more parameters to train, I am going to go forward with the CNN architecture.

We can see immediately that without maxpooling, the model has almost 2 million parameters. To reduce that further, let's see how many parameters we have with max pooling.

In [None]:
CNN_base_model = prepare_model(input_output, modeltype='CNN', dropout=False, maxpooling=True, batch_n=False)

With max pooling we're down to ~500,000 parameters which makes the model much faster and less likely to overfit. Now I want to see how performance changes with the dropout and batch normalization layers. First let's set the baseline for performance with the CNN without either. I will train all models with 50 epochs and a batch size of 32.

In [None]:
callbacks = [ModelCheckpoint(filepath='models/cnn_base_model.h5', monitor='val_loss', save_best_only=True), TensorBoard(log_dir='./Graph', histogram_freq=1,
                                                  write_graph=False, write_images=False)]
history = CNN_base_model.fit(X_train, y_train, batch_size=32, epochs=50, verbose= 2, validation_data = [X_val, y_val],
                   callbacks=callbacks)
plot_losses(history)

We end with quite a high validation accuracy. Now I want to add in dropout and batch normalization to see if they improve prediction.

In [2]:
CNN_model = prepare_model(input_output, modeltype='CNN', dropout=True, maxpooling=True, batch_n=True)
callbacks = [ModelCheckpoint(filepath='models/cnn_model.h5', monitor='val_loss', save_best_only=True), TensorBoard(log_dir='./Graph', histogram_freq=1,
                                                  write_graph=False, write_images=False)]
history = CNN_model.fit(X_train, y_train, batch_size=32, epochs=50, verbose= 2, validation_data = [X_val, y_val],
                   callbacks=callbacks)
plot_losses(history)

NameError: name 'prepare_model' is not defined

Our model performance does not improve with dropout and batch normalization, so I will train two more models: one with no batch normalization and one with no dropout.

In [None]:
CNN_nob_model = prepare_model(input_output, modeltype='CNN', dropout=True, maxpooling=True, batch_n=False)
callbacks = [ModelCheckpoint(filepath='models/cnn_nob_model.h5', monitor='val_loss', save_best_only=True), TensorBoard(log_dir='./Graph', histogram_freq=1,
                                                  write_graph=False, write_images=False)]
history = CNN_nob_model.fit(X_train, y_train, batch_size=32, epochs=50, verbose= 2, validation_data = [X_val, y_val],
                   callbacks=callbacks)
plot_losses(history)

In [None]:
CNN_nod_model = prepare_model(input_output, modeltype='CNN', dropout=False, maxpooling=True, batch_n=True)
callbacks = [ModelCheckpoint(filepath='models/cnn_nod_model.h5', monitor='val_loss', save_best_only=True), TensorBoard(log_dir='./Graph', histogram_freq=1,
                                                  write_graph=False, write_images=False)]
history = CNN_nod_model.fit(X_train, y_train, batch_size=32, epochs=50, verbose= 2, validation_data = [X_val, y_val],
                   callbacks=callbacks)
plot_losses(history)

Neither the dropout nor the batch normalization helped improve our base model and thus they were both excluded from the final model. 

Finally, let's take our base model and see how it performs on the test set.

In [None]:
evaluate_model('models/cnn_base_model.h5', X_test, y_test)