<a href="https://colab.research.google.com/github/jerome-keli/Deep_Learning/blob/main/Audiobook_Customer_Prediction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Predicting Customer Re-Purchases in an Audiobook App**

This project explores real customer data from an audiobook platform — focusing solely on audio versions of books. Each entry represents a customer who has made at least one purchase. The aim is to understand patterns in their activity and use those insights to predict whether they are likely to purchase again within the next six months.

The motivation is straightforward: customers with a low probability of returning may not be worth targeting with advertising, while those more likely to buy again present opportunities for efficient marketing and higher returns. Beyond prediction, the model also highlights which factors most strongly influence customer retention, offering valuable guidance for growth strategies.

The dataset is provided as a .csv file and includes metrics such as:


*   Book length – both average and total minutes of all purchases
*   Price paid – average and total amounts spent
*   Review data – whether a review was given and the score out of 10
* Engagement – total minutes listened, completion rate (0–1)
* Support requests – number of help interactions
* Last visit vs. last purchase – time in days since last transaction

Customer ID is included but serves only as an identifier, not a predictive factor.

For modelling, the past two years of activity form the input data, while the target is a binary variable indicating whether the customer made a purchase in the six months following that period. The problem is framed as a binary classification task:
* 0 – customer did not buy again
* 1 – customer made another purchase

By predicting customer re-purchase behavior, the work demonstrates how machine learning can support targeted marketing, reduce wasted spend, and reveal the key drivers of loyalty in the audiobook market.

### Import the libraries

In [10]:
import numpy as np
from sklearn import preprocessing
import tensorflow as tf

raw_data = np.loadtxt('/content/Audiobooks_data.csv',delimiter=',')
unscaled_inputs_all = raw_data[:,1:-1]
targets_all = raw_data[:,-1]

Balancing Dataset

In [11]:
num_one_targets = int(np.sum(targets_all))
zero_targets_counter = 0
indices_to_remove = []
for i in range(targets_all.shape[0]):
    if targets_all[i] == 0:
        zero_targets_counter += 1
        if zero_targets_counter > num_one_targets:
          indices_to_remove.append(i)

unscaled_inputs_equal_priors = np.delete(unscaled_inputs_all, indices_to_remove, axis=0)
targets_equal_priors = np.delete(targets_all, indices_to_remove, axis=0)

Scaling Dataset

In [12]:
scaled_inputs = preprocessing.scale(unscaled_inputs_equal_priors)

Shuffling Dataset

In [13]:
shuffled_indices = np.arange(scaled_inputs.shape[0])
np.random.shuffle(shuffled_indices)

shuffled_inputs = scaled_inputs[shuffled_indices]
shuffled_targets = targets_equal_priors[shuffled_indices]

Splitting it into Train/Validation/Test

In [14]:
samples_count = shuffled_inputs.shape[0]

train_samples_count = int(0.8*samples_count)
validation_samples_count = int(0.1*samples_count)
test_samples_count = samples_count - train_samples_count - validation_samples_count

train_inputs = shuffled_inputs[:train_samples_count]
train_targets = shuffled_targets[:train_samples_count]

validation_inputs = shuffled_inputs[train_samples_count:train_samples_count+validation_samples_count]
validation_targets = shuffled_targets[train_samples_count:train_samples_count+validation_samples_count]

test_inputs = shuffled_inputs[train_samples_count+validation_samples_count:]
test_targets = shuffled_targets[train_samples_count+validation_samples_count:]

Data for Tensors

In [15]:
np.savez('Audiobooks_data_train',inputs=train_inputs,targets=train_targets)
np.savez('Audiobooks_data_validation',inputs=validation_inputs,targets=validation_targets)
np.savez('Audiobooks_data_test',inputs=test_inputs,targets=test_targets)

In [16]:
npz = np.load('/content/Audiobooks_data_train.npz')
train_inputs, train_targets = npz['inputs'].astype(float), npz['targets'].astype(int)

npz = np.load('/content/Audiobooks_data_validation.npz')
validation_inputs, validation_targets = npz['inputs'].astype(float), npz['targets'].astype(int)

npz = np.load('/content/Audiobooks_data_test.npz')
test_inputs, test_targets = npz['inputs'].astype(float), npz['targets'].astype(int)

### Model
Outline, optimizers, loss, early stopping and training

In [17]:
input_size = 10 #there are 10 predictors
output_size = 2 #there are 2 classes
hidden_layer_size = 50

model = tf.keras.Sequential([
                             tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                             tf.keras.layers.Dense(hidden_layer_size, activation='relu'),
                             tf.keras.layers.Dense(output_size, activation='softmax')
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

batch_size = 100

max_epochs = 100

early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)

model.fit(train_inputs,
          train_targets,
          batch_size = batch_size,
          epochs = max_epochs,
          callbacks = [early_stopping],
          validation_data = (validation_inputs, validation_targets),
          verbose = 2)

Epoch 1/100
36/36 - 1s - 42ms/step - accuracy: 0.7002 - loss: 0.5771 - val_accuracy: 0.7338 - val_loss: 0.5103
Epoch 2/100
36/36 - 0s - 4ms/step - accuracy: 0.7675 - loss: 0.4643 - val_accuracy: 0.7673 - val_loss: 0.4377
Epoch 3/100
36/36 - 0s - 8ms/step - accuracy: 0.7846 - loss: 0.4140 - val_accuracy: 0.7919 - val_loss: 0.3949
Epoch 4/100
36/36 - 0s - 8ms/step - accuracy: 0.7893 - loss: 0.3907 - val_accuracy: 0.8166 - val_loss: 0.3836
Epoch 5/100
36/36 - 0s - 5ms/step - accuracy: 0.8008 - loss: 0.3789 - val_accuracy: 0.8031 - val_loss: 0.3713
Epoch 6/100
36/36 - 0s - 8ms/step - accuracy: 0.8027 - loss: 0.3675 - val_accuracy: 0.7919 - val_loss: 0.3696
Epoch 7/100
36/36 - 0s - 4ms/step - accuracy: 0.8011 - loss: 0.3665 - val_accuracy: 0.8166 - val_loss: 0.3518
Epoch 8/100
36/36 - 0s - 8ms/step - accuracy: 0.8111 - loss: 0.3565 - val_accuracy: 0.8210 - val_loss: 0.3495
Epoch 9/100
36/36 - 0s - 9ms/step - accuracy: 0.8061 - loss: 0.3528 - val_accuracy: 0.8098 - val_loss: 0.3495
Epoch 10/

<keras.src.callbacks.history.History at 0x7c46c842a990>

Test the model

In [18]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)

[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - accuracy: 0.8459 - loss: 0.3004 
