# Machine Learning Model. Audiobooks


## Machine learning algorithm



### STEP 1: Import the relevant libraries

In [1]:
# Import relevenant libraries
import tensorflow
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.optimizers import SGD

### STEP 2: Import the Data in npz format

In [2]:
# store the npz data in a temp variable
tr_npz = np.load('Audiobooks_data_train.npz')


# Extract the inputs using the keyword under which the data was saved
# to ensure that they are all floats
train_inputs = tr_npz['inputs'].astype(np.float)
# targets must be int because of sparse_categorical_crossentropy 
# (we want to be able to smoothly one-hot encode them)
train_targets = tr_npz['targets'].astype(np.int)


# Load the validation data in the temporary variable
val_npz = np.load('Audiobooks_data_validation.npz')
# Load the inputs and the targets in the same line
validation_inputs, validation_targets = val_npz['inputs'].astype(np.float), val_npz['targets'].astype(np.int)

# Load the test data in the temporary variable
te_npz = np.load('Audiobooks_data_test.npz')
# Create 2 variables that will contain the test inputs and the test targets
test_inputs, test_targets = te_npz['inputs'].astype(np.float), te_npz['targets'].astype(np.int)

### STEP 3: Develop Model
Outline, optimizers, loss, early stopping and training

In [9]:
# Set the input and output sizes
input_size = 10
output_size = 2
# Use same hidden layer size for both hidden layers. Not a necessity.
hidden_layer_size = 60
    
    
# define how the model will look like
model = Sequential([
    # Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # it takes several arguments, but the most important ones for in this case 
    # are the hidden_layer_size and the activation function
    Dense(hidden_layer_size, activation='relu'), # 1st hidden layer
    Dense(hidden_layer_size, activation='relu'), # 2nd hidden layer
    # the final layer is no different, we just make sure to activate it with softmax
    Dense(output_size, activation='softmax') # output layer
])


### Choose the optimizer and the loss function

# Define the optimizer, 
# the loss function, 
# and the metrics of interest at each iteration
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

### Training

# batch size
batch_size = 32

# maximum number of training epochs
max_epochs = 100

# set an early stopping mechanism
# let's set patience=2, to be a bit tolerant against random validation loss increases
early_stopping = EarlyStopping(patience=2)

# fit the model
# note that this time the train, validation and test data are not iterable
H = model.fit(train_inputs, # train inputs
          train_targets, # train targets
          batch_size=batch_size, # batch size
          epochs=max_epochs, # epochs that we will train for (assuming early stopping doesn't kick in)
          # callbacks are functions called by a task when a task is completed
          # task here is to check if val_loss is increasing
          callbacks=[early_stopping], # early stopping
          validation_data=(validation_inputs, validation_targets), # validation data
          verbose = 2 # making sure we get enough information about the training process
          )  

Train on 3579 samples, validate on 447 samples
Epoch 1/100
3579/3579 - 0s - loss: 0.5406 - accuracy: 0.7868 - val_loss: 0.4206 - val_accuracy: 0.8479
Epoch 2/100
3579/3579 - 0s - loss: 0.3729 - accuracy: 0.8720 - val_loss: 0.3309 - val_accuracy: 0.8680
Epoch 3/100
3579/3579 - 0s - loss: 0.3252 - accuracy: 0.8824 - val_loss: 0.3018 - val_accuracy: 0.8904
Epoch 4/100
3579/3579 - 0s - loss: 0.3036 - accuracy: 0.8863 - val_loss: 0.2882 - val_accuracy: 0.8881
Epoch 5/100
3579/3579 - 0s - loss: 0.2902 - accuracy: 0.8908 - val_loss: 0.2800 - val_accuracy: 0.8993
Epoch 6/100
3579/3579 - 0s - loss: 0.2813 - accuracy: 0.8961 - val_loss: 0.2663 - val_accuracy: 0.9016
Epoch 7/100
3579/3579 - 0s - loss: 0.2706 - accuracy: 0.9019 - val_loss: 0.2619 - val_accuracy: 0.8971
Epoch 8/100
3579/3579 - 0s - loss: 0.2652 - accuracy: 0.9008 - val_loss: 0.2587 - val_accuracy: 0.9060
Epoch 9/100
3579/3579 - 0s - loss: 0.2612 - accuracy: 0.9014 - val_loss: 0.2567 - val_accuracy: 0.9016
Epoch 10/100
3579/3579 - 0

<tensorflow.python.keras.callbacks.History at 0x1b9933850b8>

## STEP 4: Test the model


It is very important to realize that fiddling with the hyperparameters overfits the validation dataset. 

The test is the absolute final instance. You should not test before you are completely done with adjusting your model.

If you adjust your model after testing, you will start overfitting the test dataset, which will defeat its purpose.

In [10]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)



In [11]:
print('\nTest loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))


Test loss: 0.24. Test accuracy: 91.07%
