# Audiobook Customer Retention Prediction Using Artificial Neural Network


## Project Title: Audiobook Customer Retention Prediction

## Problem Statement:

In this project, the goal is to develop a machine learning algorithm to predict whether a customer of an Audiobook app will make another purchase within the next 6 months. The dataset contains information related to audio book purchases, including customer engagement metrics over a 2-year period. The objective is to identify customers with a high probability of conversion, enabling the company to focus advertising efforts more efficiently and uncovering key factors influencing customer retention.

## Project Overview:

The Audiobook company aims to optimize its marketing strategy by predicting customer behavior. The primary challenge is to distinguish customers likely to make a future purchase from those who are not. This project involves creating a classification algorithm with two classes: "will buy" (1) and "won't buy" (0). By leveraging customer data such as average book length, total minutes listened, support requests, and other features, the algorithm will provide insights into the key metrics influencing customer retention.

## Business Impact:

1. **Cost Savings:** By focusing marketing efforts on customers with a high likelihood of conversion, the company can maximize return on investment and minimize advertising expenses.

2. **Growth Opportunities:** Efficient targeting of potential customers increases the chances of customer conversion, contributing to business growth and creating new opportunities.

## Data Description:

The dataset includes various customer-related features such as Customer ID, Book length overall (sum of the minute length of all purchases), Book length avg (average length in minutes of all purchases), Price paid_overall (sum of all purchases) ,Price Paid avg (average of all purchases), Review (a Boolean variable whether the customer left a review), Review out of 10 (if the customer left a review, his/her review out of 10), Total minutes listened, Completion (from 0 to 1), Support requests (number of support requests; everything from forgotten password to assistance for using the App), and Last visited minus purchase date (in days). The target variable is a Boolean indicating whether a customer will make another purchase within the next 6 months.

## Objective:

Build a robust machine learning algorithm capable of predicting customer behavior and aiding the Audiobook company in making informed decisions for customer retention.

## Approach:

Part 1. **Data Preprocessing:** Importing data, splitting the data into the training set and Test set, and feature scaling.

Part 2. **Building the Artificial Neural Network (ANN):** Initializing the ANN, adding the input layer and the first hidden layer, Adding the second hidden layer and adding the output layer.

Part 3. **Training the Artificial Neural Network (ANN):** Compiling the ANN, and training the ANN on the training set.

Part 4. **Making the Predictions and Evaluating the model:** Predicting the test set results, and making the confusion matrix.

## Expected Outcome:

The developed machine learning model will serve as a valuable tool for the Audiobook company, allowing them to focus resources on retaining customers likely to make future purchases. The project aims to contribute to cost savings, growth opportunities, and enhanced customer relationship management.

### Importing the libraries

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf

In [2]:
tf.__version__

'2.14.0'

## Part 1 - Data Preprocessing

### Importing the dataset

In [3]:
dataset = pd.read_csv('Audiobooks_data.csv')
X = dataset.iloc[:, 1:-1].values
y = dataset.iloc[:, -1].values

In [4]:
print(X)

[[1620.   1620.     19.73 ... 1603.8     5.     92.  ]
 [2160.   2160.      5.33 ...    0.      0.      0.  ]
 [2160.   2160.      5.33 ...    0.      0.    388.  ]
 ...
 [2160.   2160.      6.14 ...    0.      0.      0.  ]
 [1620.   1620.      5.33 ...  615.6     0.     90.  ]
 [1674.   3348.      5.33 ...    0.      0.      0.  ]]


In [5]:
print(y)

[0 0 0 ... 0 0 1]


### Splitting the dataset into the Training set, validation set and Test set

In [8]:
from sklearn.model_selection import train_test_split
X_train, temp_inputs, y_train, temp_targets = train_test_split(X, y, test_size = 0.2, random_state = 0)

validation_input, X_test, validation_target, y_test = train_test_split( temp_inputs, temp_targets, test_size=0.5, random_state=42)



### Feature Scaling

In [14]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
validation_input = sc.transform(validation_input)

## Part 2 - Building the ANN

### Initializing the ANN

In [16]:
ann = tf.keras.models.Sequential()

### Adding the input layer and the first hidden layer

In [17]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

### Adding the second hidden layer

In [18]:
ann.add(tf.keras.layers.Dense(units=6, activation='relu'))

### Adding the output layer

In [19]:
ann.add(tf.keras.layers.Dense(units=1, activation='sigmoid'))

## Part 3 - Training the ANN

### Compiling the ANN

In [20]:
ann.compile(optimizer = 'adam', loss = 'binary_crossentropy', metrics = ['accuracy'])

### Training the ANN on the Training set

In [24]:
# set an early stopping mechanism to avoid overfitting
# let's set patience=2, to be a bit tolerant against random validation loss increases
# callbacks are functions called by a task when a task is completed
# task here is to check if val_loss is increasing
early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)

ann.fit(X_train, y_train, batch_size = 32, epochs = 100, callbacks= early_stopping, validation_data = (validation_input, validation_target), verbose = 2)

Epoch 1/100
353/353 - 1s - loss: 0.2233 - accuracy: 0.9121 - val_loss: 0.2149 - val_accuracy: 0.9126 - 785ms/epoch - 2ms/step
Epoch 2/100
353/353 - 1s - loss: 0.2219 - accuracy: 0.9119 - val_loss: 0.2167 - val_accuracy: 0.9134 - 655ms/epoch - 2ms/step
Epoch 3/100
353/353 - 1s - loss: 0.2222 - accuracy: 0.9123 - val_loss: 0.2142 - val_accuracy: 0.9134 - 647ms/epoch - 2ms/step
Epoch 4/100
353/353 - 1s - loss: 0.2211 - accuracy: 0.9120 - val_loss: 0.2144 - val_accuracy: 0.9141 - 637ms/epoch - 2ms/step
Epoch 5/100
353/353 - 1s - loss: 0.2210 - accuracy: 0.9125 - val_loss: 0.2186 - val_accuracy: 0.9126 - 660ms/epoch - 2ms/step


<keras.src.callbacks.History at 0x7f7d2b78b040>

## Part 4 - Making the predictions and evaluating the model

### Predicting the Test set results

In [25]:
y_pred = ann.predict(X_test)
y_pred = (y_pred > 0.5)
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

[[0 0]
 [0 1]
 [0 0]
 ...
 [0 0]
 [0 0]
 [0 0]]


### Making the Confusion Matrix

In [32]:
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
test_accuracy = accuracy_score(y_test, y_pred)

[[1162    3]
 [ 145   99]]


In [33]:
print('Test accuracy: {:.2f}%'.format(test_accuracy*100.))

Test accuracy: 89.50%


The model has a validation accuracy of 91.26% and a test accuracy of 89.50%

In [34]:
# Calculate precision, recall, and F1 score
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print the results
print(f'Precision: {precision:.4f}')
print(f'Recall: {recall:.4f}')
print(f'F1 Score: {f1:.4f}')


Precision: 0.9706
Recall: 0.4057
F1 Score: 0.5723
