# Business case

We get data from an Audiobook app. Logically, it relates only to the audio versions of books. Each customer in the database has made a purchase at least once, that's why he/she is in the database. We want to create a machine learning algorithm based on our available data that can predict if a customer will buy again.

The main idea is that if a customer has a low probability of coming back, there is no reason to spend any money on advertizing to him/her. If we can focus our efforts ONLY on customers that are likely to convert again, we can make great savings. Moreover, this model can identify the most important metrics for a customer to come back again.

There are several variables: Customer ID, Book length in mins_avg (average of all purchases), Book length in minutes_sum (sum of all purchases), Price Paid_avg (average of all purchases), Price paid_sum (sum of all purchases), Review (a Boolean variable), Review (out of 10), Total minutes listened, Completion (from 0 to 1), Support requests (number), and Last visited minus purchase date (in days).

The targets are a Boolean variable (so 0, or 1). We are taking a period of 2 years in our inputs, and the next 6 months as targets. So, in fact, we are predicting if: based on the last 2 years of activity and engagement, a customer will convert in the next 6 months. 6 months sounds like a reasonable time. If they don't convert after 6 months, chances are they've gone to a competitor or didn't like the Audiobook way of digesting information.

## Create the machine learning algorithm

### Import the relevant libraries

In [1]:
import numpy as np
import tensorflow as tf

### Data

In [2]:
# temporary variable data, where we will store each of the three Audiobooks datasets
data = np.load('Audiobooks_data_train.npz')

# we extract the inputs using the keyword under which we saved them
train_inputs = data['inputs'].astype(np.float)
# targets must be int because of sparse_categorical_crossentropy (to be able to smoothly one-hot encode them)
train_targets = data['targets'].astype(np.int)

#validation data
data = np.load('Audiobooks_data_validation.npz')
validation_inputs, validation_targets = data['inputs'].astype(np.float), data['targets'].astype(np.int)

#test data
data = np.load('Audiobooks_data_test.npz')
test_inputs, test_targets = data['inputs'].astype(np.float), data['targets'].astype(np.int)

### Model
Outline, optimizers, loss, early stopping and training

In [6]:
# set the output size : it's a binary outcome
output_size = 2
# use same hidden layer size for both hidden layers (not necessary)
hidden_layer_size = 50

# define the structure of the model
model = tf.keras.Sequential([
    # tf.keras.layers.Dense is basically implementing: output = activation(dot(input, weight) + bias)
    # most important arguments are hidden_layer_size and activation function
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
    # the final layer is activated with softmax
    tf.keras.layers.Dense(output_size, activation='softmax')
])

# we define the optimizer we'd like to use, the loss function,
# and the metrics we are interested in obtaining at each iteration
model.compile(optimizer='adam',
              loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# set the batch size
batch_size = 100
# set a maximum number of training epochs
max_epochs = 100

# let's set patience=4, to be a bit tolerant against random validation loss increases
early_stopping = tf.keras.callbacks.EarlyStopping(patience=4)

# fit the model
model.fit(train_inputs,
          train_targets,
          batch_size=batch_size,
          # epochs that we will train for
          epochs=max_epochs,
          # callbacks are functions called by a task when a task is completed
          # task here is to check if val_loss is increasing at least 4 times in a row
          callbacks=[early_stopping],
          validation_data=(validation_inputs, validation_targets),
          verbose=2)

Epoch 1/100
36/36 - 0s - loss: 0.5521 - accuracy: 0.7946 - val_loss: 0.3953 - val_accuracy: 0.8859
Epoch 2/100
36/36 - 0s - loss: 0.3764 - accuracy: 0.8715 - val_loss: 0.3008 - val_accuracy: 0.8993
Epoch 3/100
36/36 - 0s - loss: 0.3286 - accuracy: 0.8793 - val_loss: 0.2774 - val_accuracy: 0.9016
Epoch 4/100
36/36 - 0s - loss: 0.3039 - accuracy: 0.8863 - val_loss: 0.2647 - val_accuracy: 0.9016
Epoch 5/100
36/36 - 0s - loss: 0.2897 - accuracy: 0.8910 - val_loss: 0.2563 - val_accuracy: 0.9060
Epoch 6/100
36/36 - 0s - loss: 0.2785 - accuracy: 0.8966 - val_loss: 0.2527 - val_accuracy: 0.9105
Epoch 7/100
36/36 - 0s - loss: 0.2721 - accuracy: 0.8963 - val_loss: 0.2528 - val_accuracy: 0.9105
Epoch 8/100
36/36 - 0s - loss: 0.2638 - accuracy: 0.9000 - val_loss: 0.2458 - val_accuracy: 0.9060
Epoch 9/100
36/36 - 0s - loss: 0.2606 - accuracy: 0.9003 - val_loss: 0.2475 - val_accuracy: 0.9128
Epoch 10/100
36/36 - 0s - loss: 0.2550 - accuracy: 0.9028 - val_loss: 0.2437 - val_accuracy: 0.9128
Epoch 11/

<tensorflow.python.keras.callbacks.History at 0x7f8a142087f0>

## Test the model

After training on the training data and validating on the validation data, we test the final prediction power of our model by running it on the test dataset that the algorithm has NEVER seen before.

It is very important to realize that fiddling with the hyperparameters overfits the validation dataset. 
The test is the absolute final instance. We should not test before you are completely done with adjusting our model.

In [7]:
# getting the loss (which is there by default) 
# and whatever was specified in the 'metrics' argument when fitting the model
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)



In [12]:
print('\nTest loss: {0:.2f} \nTest accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))


Test loss: 0.24 
Test accuracy: 90.85%


## Obtain the probability for a customer to convert

In [13]:
# predicting the probability of each class using the 'predict' method
model.predict(test_inputs).round(2)

array([[0.22, 0.78],
       [0.07, 0.93],
       [0.66, 0.34],
       [0.15, 0.85],
       [0.15, 0.85],
       [0.  , 1.  ],
       [1.  , 0.  ],
       [0.9 , 0.1 ],
       [0.49, 0.51],
       [1.  , 0.  ],
       [0.14, 0.86],
       [0.  , 1.  ],
       [0.14, 0.86],
       [0.  , 1.  ],
       [1.  , 0.  ],
       [0.  , 1.  ],
       [1.  , 0.  ],
       [0.12, 0.88],
       [1.  , 0.  ],
       [1.  , 0.  ],
       [1.  , 0.  ],
       [0.  , 1.  ],
       [0.15, 0.85],
       [0.89, 0.11],
       [0.07, 0.93],
       [0.8 , 0.2 ],
       [0.96, 0.04],
       [0.07, 0.93],
       [0.98, 0.02],
       [0.  , 1.  ],
       [0.17, 0.83],
       [1.  , 0.  ],
       [1.  , 0.  ],
       [1.  , 0.  ],
       [1.  , 0.  ],
       [1.  , 0.  ],
       [0.  , 1.  ],
       [0.87, 0.13],
       [0.79, 0.21],
       [0.05, 0.95],
       [0.  , 1.  ],
       [0.6 , 0.4 ],
       [0.  , 1.  ],
       [0.99, 0.01],
       [0.89, 0.11],
       [1.  , 0.  ],
       [0.12, 0.88],
       [0.17,

In [14]:
# Alternatively, we can get only the second column
# The main idea is that we are often interested in ONLY ONE of the two outcomes
# In this case we would like to know if the customer will convert again
# Once more, we can round to 0 digits, to achieve only 0s or 1s
model.predict(test_inputs)[:,1].round(2)

array([0.78, 0.93, 0.34, 0.85, 0.85, 1.  , 0.  , 0.1 , 0.51, 0.  , 0.86,
       1.  , 0.86, 1.  , 0.  , 1.  , 0.  , 0.88, 0.  , 0.  , 0.  , 1.  ,
       0.85, 0.11, 0.93, 0.2 , 0.04, 0.93, 0.02, 1.  , 0.83, 0.  , 0.  ,
       0.  , 0.  , 0.  , 1.  , 0.13, 0.21, 0.95, 1.  , 0.4 , 1.  , 0.01,
       0.11, 0.  , 0.88, 0.83, 0.12, 0.95, 0.87, 0.41, 0.06, 0.2 , 1.  ,
       1.  , 0.93, 0.82, 0.85, 0.02, 0.24, 0.09, 0.79, 0.22, 0.89, 0.  ,
       0.08, 0.08, 0.25, 0.04, 1.  , 0.08, 1.  , 0.96, 0.85, 0.03, 0.92,
       0.78, 0.89, 0.  , 0.33, 0.82, 0.  , 1.  , 0.84, 1.  , 0.84, 0.86,
       1.  , 0.07, 0.92, 0.04, 0.  , 0.  , 1.  , 0.92, 0.96, 0.84, 0.  ,
       0.01, 0.99, 0.27, 0.84, 0.85, 0.78, 0.  , 0.83, 1.  , 0.18, 0.01,
       0.84, 0.12, 0.66, 0.21, 1.  , 1.  , 0.  , 0.77, 0.88, 0.99, 0.  ,
       0.78, 0.  , 0.86, 0.89, 1.  , 0.  , 0.17, 0.85, 0.84, 0.07, 1.  ,
       0.78, 1.  , 0.18, 0.  , 0.89, 0.13, 1.  , 0.8 , 0.92, 0.91, 0.9 ,
       0.24, 0.33, 0.85, 0.79, 0.9 , 0.  , 0.23, 0.

In [16]:
# a much better approach here would be to use argmax (arguments of the maxima)
# which indicates the position of the highest argument row-wise or column-wise
# here, we want ot know which column has the highest probability, therefore we set axis=1 
# it's not mandatory here but it's great for multiclass classification problems
np.argmax(model.predict(test_inputs),axis=1)

array([1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1,
       1, 0, 1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0,
       0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0,
       0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1,
       1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 0, 1, 1, 0, 0,
       1, 0, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 1, 1, 0, 0, 1, 1, 0, 1,
       1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0,
       0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0, 0, 1,
       1, 1, 1, 1, 0, 1, 0, 0, 1, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0,
       1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0,
       1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1,
       0, 0, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1,
       0, 1, 0, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0,
       0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0,

## Save the model

In [17]:
# we save the model using the built-in method TensorFlow method
# the HDF format is optimal for large numerical objects
# the proper extension is .h5 to indicate HDF, version 5
model.save('audiobooks_model.h5') 