# Audiobook ML classifier
#### Using the data collected and preprocessed in 'PreProcessedAudiobookData', train our model

#### Goal: Create an ML algorithim that will work on a model that will predict the retention of a customer
#### Simplier: Classificaiton problem with two classes(outpus) (will they/wont they be retained)

### Import relevant libraries

In [20]:
import numpy as np
import tensorflow as tf

### Load data
#### All the data was preprocessed, therefore, all we gotta do is split into targets/inputs

In [16]:
npz = np.load('Audiobooks_data_train.npz')
train_inputs = npz['inputs'].astype(np.float)
train_targets = npz['targets'].astype(np.int)

npz = np.load('Audiobooks_data_validation.npz')
validation_inputs = npz['inputs'].astype(np.float)
validation_targets = npz['targets'].astype(np.int)

npz = np.load('Audiobooks_data_test.npz')
test_inputs = npz['inputs'].astype(np.float)
test_targets = npz['targets'].astype(np.int)

### Model
##### Outline the model(hidden/non-hidden layers, depth, width etc.)
##### State loss func
##### State optimization algorithim

In [17]:
inputs_size = 10
output_size = 2
#use same hidden layer size for both width/depth
hidden_layer_size = 100

#build model 
model = tf.keras.Sequential([
#no need for flattening bc this we already flattened during preprocessing(vectorizedddd)   
#tf.keras.layers.Dense() is essentially = output = activation_func(dot(input,weight) + bias)
        tf.keras.layers.Dense(hidden_layer_size, activation = 'relu'),
        tf.keras.layers.Dense(hidden_layer_size, activation = 'relu'),
#do the same dot product using .layers.Dense(), except set the number of outputs
#use 'softmax' activation func so that we get the properly size output tensor
        tf.keras.layers.Dense(output_size, activation = 'softmax')  
        ])

#choose the optimizer using .compile()
model.compile(optimizer = 'adam', loss = 'sparse_categorical_crossentropy', metrics =['accuracy'])

#no need to batch out data and iterate through it because while training .fit() method will
#batch out for us
BATCH_SIZE = 100
MAX_EPOCHS = 100

#early stopping; set an earlytstopping mechanism using .callbacks.EarlyStopping object
#patience = 2 bc we want some tolerance for overfitting, but not massive
early_stopping = tf.keras.callbacks.EarlyStopping(patience = 1)

#train the model
#we do not need to batch the data bc the train/test/validate data are not iterables
#.fit() will batch for us
model.fit(train_inputs, 
          train_targets, 
          batch_size = BATCH_SIZE, 
          epochs = MAX_EPOCHS,
          callbacks = [early_stopping],
          validation_data = (validation_inputs, validation_targets), 
          verbose = 2)

Epoch 1/100
36/36 - 1s - loss: 0.5246 - accuracy: 0.7231 - val_loss: 0.4340 - val_accuracy: 0.7852
Epoch 2/100
36/36 - 0s - loss: 0.4142 - accuracy: 0.7756 - val_loss: 0.3989 - val_accuracy: 0.7808
Epoch 3/100
36/36 - 0s - loss: 0.3825 - accuracy: 0.7944 - val_loss: 0.3791 - val_accuracy: 0.7942
Epoch 4/100
36/36 - 0s - loss: 0.3657 - accuracy: 0.8027 - val_loss: 0.3714 - val_accuracy: 0.7942
Epoch 5/100
36/36 - 0s - loss: 0.3579 - accuracy: 0.8036 - val_loss: 0.3656 - val_accuracy: 0.7919
Epoch 6/100
36/36 - 0s - loss: 0.3539 - accuracy: 0.8013 - val_loss: 0.3679 - val_accuracy: 0.7942
Epoch 7/100
36/36 - 0s - loss: 0.3479 - accuracy: 0.8094 - val_loss: 0.3585 - val_accuracy: 0.8054
Epoch 8/100
36/36 - 0s - loss: 0.3455 - accuracy: 0.8094 - val_loss: 0.3585 - val_accuracy: 0.8009
Epoch 9/100
36/36 - 0s - loss: 0.3355 - accuracy: 0.8201 - val_loss: 0.3705 - val_accuracy: 0.8009


<tensorflow.python.keras.callbacks.History at 0x18ad49c2910>

# Testing the model

In [19]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)

print('Test loss:{0:.2f}   Test Accuracy:{1:.2f}%'.format(test_loss, test_accuracy*100))

Test loss:0.34   Test Accuracy:82.37%
