# Scenario
A dataset of an audio book app contains information (metrics) of customers/users that have made atleast one in-app purchase. The goal is to create a model that will predict if the customer will buy again based on certain metrics. The reason for the model is so that we can properly identify how the company should spend advertising resources in order to keep fans buying and not wasting resources on alienated customers since they are unlikely to buy anyway. In this way improved sales and profitability can be obtained.

The model is ultimatly a human behaviour predictor and as a side effect the most important metrics as predictors will be identified.

# Resources
A .csv file is supplied. Each row is a customer with a unique ID. 

The first two metrics included are Overall Booklength (sum of all purchased books length) and Average Booklength (Overall/number of books), naturally the number of books purchased can be derived from these two variables. 

Following on, Overall Price (sum) and Average Price follow the same logic in USD. Intuitivly, price is almost always a good predictor, we expect one of these to be an important metric. 

Next we have Review (boolean), it shows if the customer has engaged with the company atleast once. It is likely that an engaging customer should return. Next, Review 10/10 is a quantified, average measure of Review - attitude towards content/platform/medium.

Total minutes listened is the cumulative sum of playback time when listening to a book. This serves as a measure of how much time the customer spends on the app. A measure of engament. If you spend more time on the app you will be exposed to more content and therefore buy again.

Completion shows the percentage of content completed (total length of books/total minutes listened). A shortfall is that people do not relisten to books. You could test for this or include a metric to track replays.

Support requests is the total number of requests the person has opened. A measure of engagement of an inverse nature. More requests could mean they have a bad experience and leave the platform all together. Or it could be feeding back positive interactions and stimulates.

Last visited - Purchase date. A measure of engagement (recency). The larger the value the more recent.

## How was the data gathered?
Data was gathered from the app and is 2 years worth of engagement. An extra 6 months of data was used to check for customer conversion.

## Targets?
We will use supervised learning, the target will be a boolean. 1 if conversion occured and 0 if not. Conversion is defined by if another book was purchased after the initial 2 years of data in the follow up 6 months. 

# Method
## Input data
The inputs should be scaled or sandardized. The dataset should also be shuffled,balanced and batched.   

## Forward propagation in hidden layers
Each input will be weighted and biased linearly (dot product) to each neuron in the following layer(s) and finally transformed non-linearly using a specific activation function before being forward propagated into the next layer. This is then repeated for n hidden layers. 

## Output data
A 1-hot encoded output lends itself nicely to a classification system rather than a vector of probabilities. The final activation function will be a Softmax function since the output vector should be 1x2 in size and be 1-hot encoded. The final activation function will be a Softmax function.

## Back propagation
The Adaptive Moment Estimation (ADAM) optimiser will be used. The loss will be determined using sparse categorical cross entropy since the targets should be 1-hot encoded.

## Steps
### Import packages
### Helper functions
### Load the data
### Preprocessing
#### Missing Values
#### Standardise or scale the data
#### Shuffle the data
#### Balance the data
#### Shuffle the data again
#### Split into training, validation and test sets
### Outline the model(s)
### Train the model(s)
### Test the model

### Import packages

In [1]:
# import data and table packages
import pandas as pd
import numpy as np

# import graphing tools


# import ML packages
from sklearn import preprocessing
import tensorflow as tf

### Helper functions

In [2]:
# Create the model contructs
def construct(output_length, optimizer='adam', mode='sequential', flatten_input=True, list_of_widths=[50], list_of_depths=[2]):
    dict = {}
    #if mode is sequential then
    if mode.lower() == 'sequential': 
        # for the number of widths
        for w in list_of_widths:
            # for the number of depths 
            for d in list_of_depths:
                # create a label for the dictionary
                label = str(w) + 'x' + str(d)
                print(f"Constructing a {label} model")
                # instantiate the model object
                if flatten_input is True:
                    model = tf.keras.Sequential([
                        # input layer:
                        #           flatten the tensor into a vector using an inbuilt method of tensor flow
                        tf.keras.layers.Flatten(input_shape=(28,28,1))])
                else:
                    model = tf.keras.Sequential()
                # for d layers
                for x in range(d):
                    # add a hidden layer of width w
                    model.add(tf.keras.layers.Dense(w,activation = 'relu'))
                # add the output layer
                model.add(tf.keras.layers.Dense(output_length,activation = 'softmax'))
                # choose the optimizer and loss functions
                model.compile(optimizer=optimizer,loss='sparse_categorical_crossentropy',metrics=['accuracy'])
                # save in the dicitonary
                dict[label] = model
    return dict

def shuffle(data):
    # Save the column names
    cols = data.columns.to_list()
    # create evenly spaced values in the inputs dataframe
    shuffled_indices = np.arange(data.values.shape[0])
    # shuffle randomly
    np.random.shuffle(shuffled_indices)
    # set the new indices positions from the shuffle data on the non-shuffled data. Save as a Dataframe for intuitive veiwing using the cols saved
    data = pd.DataFrame(data.values[shuffled_indices],columns=cols)
    return data

# a helper function to cast a value into tf.int64
def cast_to_tf_integer(x):
    # if x is not a tf.int64 object then
    if type(x) is not tf.int64:
        # cast x as a tf.int64
        x = tf.cast(x, tf.int64) 
    return x

# a helper function to cast a value into tf.float32
def cast_to_tf_float(x):
    # if x is not a tf.float32 object then
    if type(x) is not tf.float32:
        #cast x as a tf.float32
        x = tf.cast(x, tf.float32)
    return x

### Load the data

In [3]:
# load csv file into a dataframe
raw_data = pd.read_csv('Audiobooks_data.csv')
raw_data

Unnamed: 0,ID,Book length (mins)_avg,Book length (mins)_overall,Price_avg,Price_overall,Review,Review 10/10,Completion,Minutes_listened,Support Requests,Last visited minus Purchase date,Targets
0,994,1620.0,1620,19.73,19.73,1,10.0,0.99,1603.8,5,92,0
1,1143,2160.0,2160,5.33,5.33,0,,0.00,0.0,0,0,0
2,2059,2160.0,2160,5.33,5.33,0,,0.00,0.0,0,388,0
3,2882,1620.0,1620,5.96,5.96,0,,0.42,680.4,1,129,0
4,3342,2160.0,2160,5.33,5.33,0,,0.22,475.2,0,361,0
...,...,...,...,...,...,...,...,...,...,...,...,...
14079,28220,1620.0,1620,5.33,5.33,1,9.0,0.61,988.2,0,4,0
14080,28671,1080.0,1080,6.55,6.55,1,6.0,0.29,313.2,0,29,0
14081,31134,2160.0,2160,6.14,6.14,0,,0.00,0.0,0,0,0
14082,32832,1620.0,1620,5.33,5.33,1,8.0,0.38,615.6,0,90,0


In [4]:
raw_data.describe()

Unnamed: 0,ID,Book length (mins)_avg,Book length (mins)_overall,Price_avg,Price_overall,Review,Review 10/10,Completion,Minutes_listened,Support Requests,Last visited minus Purchase date,Targets
count,14084.0,14084.0,14084.0,14084.0,14084.0,14084.0,2468.0,14084.0,14084.0,14084.0,14084.0,14084.0
mean,16772.491551,1591.281685,1678.608634,7.103791,7.543805,0.16075,8.908829,0.125659,189.888983,0.070222,61.935033,0.158833
std,9691.807248,504.340663,654.838599,4.931673,5.560129,0.367313,1.537262,0.241206,371.08401,0.472157,88.207634,0.365533
min,2.0,216.0,216.0,3.86,3.86,0.0,1.0,0.0,0.0,0.0,0.0,0.0
25%,8368.0,1188.0,1188.0,5.33,5.33,0.0,8.0,0.0,0.0,0.0,0.0,0.0
50%,16711.5,1620.0,1620.0,5.95,6.07,0.0,10.0,0.0,0.0,0.0,11.0,0.0
75%,25187.25,2160.0,2160.0,8.0,8.0,0.0,10.0,0.13,194.4,0.0,105.0,0.0
max,33683.0,2160.0,7020.0,130.94,130.94,1.0,10.0,1.0,2160.0,30.0,464.0,1.0


### Preprocessing
#### Missing Values
Missing values appear to be in the dataset, specifically in the "Review 10/10" feature. The values need to be filled with a "status quo" or the feature is to be removed all together. Let us opt for setting a "status quo" of the midway point "5".

In [5]:
# fill NaN values with 5
raw_data['Review 10/10'] = raw_data['Review 10/10'].fillna(5)
raw_data

Unnamed: 0,ID,Book length (mins)_avg,Book length (mins)_overall,Price_avg,Price_overall,Review,Review 10/10,Completion,Minutes_listened,Support Requests,Last visited minus Purchase date,Targets
0,994,1620.0,1620,19.73,19.73,1,10.0,0.99,1603.8,5,92,0
1,1143,2160.0,2160,5.33,5.33,0,5.0,0.00,0.0,0,0,0
2,2059,2160.0,2160,5.33,5.33,0,5.0,0.00,0.0,0,388,0
3,2882,1620.0,1620,5.96,5.96,0,5.0,0.42,680.4,1,129,0
4,3342,2160.0,2160,5.33,5.33,0,5.0,0.22,475.2,0,361,0
...,...,...,...,...,...,...,...,...,...,...,...,...
14079,28220,1620.0,1620,5.33,5.33,1,9.0,0.61,988.2,0,4,0
14080,28671,1080.0,1080,6.55,6.55,1,6.0,0.29,313.2,0,29,0
14081,31134,2160.0,2160,6.14,6.14,0,5.0,0.00,0.0,0,0,0
14082,32832,1620.0,1620,5.33,5.33,1,8.0,0.38,615.6,0,90,0


#### Standardise or scale the data
Let us use sklearn to standardise the features

In [6]:
# standardise the features using sklearn
standardized_data = pd.DataFrame(preprocessing.scale(raw_data.iloc[:,1:-1]),columns=raw_data.columns[1:-1])
standardized_data['Targets'] = raw_data['Targets']
standardized_data['ID'] = raw_data['ID']
standardized_data

Unnamed: 0,Book length (mins)_avg,Book length (mins)_overall,Price_avg,Price_overall,Review,Review 10/10,Completion,Minutes_listened,Support Requests,Last visited minus Purchase date,Targets,ID
0,0.056944,-0.089504,2.560319,2.191789,2.284918,2.664739,3.583548,3.810353,10.441350,0.340855,0,994
1,1.127687,0.735156,-0.359686,-0.398171,-0.437653,-0.422996,-0.520980,-0.511732,-0.148730,-0.702175,0,1143
2,1.127687,0.735156,-0.359686,-0.398171,-0.437653,-0.422996,-0.520980,-0.511732,-0.148730,3.696693,0,2059
3,0.056944,-0.089504,-0.231936,-0.284861,-0.437653,-0.422996,1.220335,1.321880,1.969286,0.760335,0,2882
4,1.127687,0.735156,-0.359686,-0.398171,-0.437653,-0.422996,0.391137,0.768886,-0.148730,3.390586,0,3342
...,...,...,...,...,...,...,...,...,...,...,...,...
14079,0.056944,-0.089504,-0.359686,-0.398171,2.284918,2.047192,2.008072,2.151371,-0.148730,-0.656826,0,28220
14080,-1.013799,-0.914164,-0.112297,-0.178744,2.284918,0.194551,0.681356,0.332311,-0.148730,-0.373394,0,28671
14081,1.127687,0.735156,-0.195436,-0.252486,-0.437653,-0.422996,-0.520980,-0.511732,-0.148730,-0.702175,0,31134
14082,0.056944,-0.089504,-0.359686,-0.398171,2.284918,1.429645,1.054495,1.147250,-0.148730,0.318181,0,32832


#### Shuffle the data

In [7]:
shuffled_data = shuffle(standardized_data)
shuffled_data

Unnamed: 0,Book length (mins)_avg,Book length (mins)_overall,Price_avg,Price_overall,Review,Review 10/10,Completion,Minutes_listened,Support Requests,Last visited minus Purchase date,Targets,ID
0,-0.799650,-0.749232,-0.359686,-0.398171,-0.437653,-0.422996,-0.230761,-0.287624,-0.14873,-0.430080,0.0,29399.0
1,1.127687,0.735156,0.181732,0.082050,-0.437653,-0.422996,-0.272221,-0.162473,-0.14873,-0.566128,0.0,25751.0
2,1.127687,0.735156,-0.359686,-0.398171,2.284918,2.664739,2.920190,4.319690,-0.14873,1.814703,0.0,7814.0
3,-0.799650,-0.749232,-0.035241,-0.110398,-0.437653,-0.422996,-0.396601,-0.415686,-0.14873,1.043767,0.0,8686.0
4,-0.799650,-0.749232,-0.359686,-0.398171,-0.437653,-0.422996,-0.520980,-0.511732,-0.14873,-0.702175,0.0,20858.0
...,...,...,...,...,...,...,...,...,...,...,...,...
14079,-1.013799,-0.914164,-0.359686,-0.398171,2.284918,2.664739,0.225298,0.012157,-0.14873,-0.271358,0.0,22955.0
14080,-0.799650,-0.749232,-0.079852,-0.149967,-0.437653,-0.422996,1.013035,0.672839,-0.14873,-0.702175,0.0,22612.0
14081,-2.726987,-2.233620,-0.244102,-0.295652,2.284918,2.664739,-0.520980,-0.511732,-0.14873,-0.702175,0.0,4041.0
14082,1.127687,0.735156,0.613649,0.465149,-0.437653,-0.422996,0.308217,0.652466,-0.14873,1.361211,0.0,7670.0


#### Balance the data
Correct the distribution in the dataset. Aim for equal distributions. A quick inspection of the data (counting the targets for each class) shows that more customers did not convert and "skews" the distrbution. This can lead to a bias for the model (since it optimises for loss) to always predict non-conversion instead of properly trying to make a prediction.

An easy solution is to take all the converted customers and selecting the same number of customers that were non-converted.

In [8]:
# Count the conversion frequencies
conversion_frequencies = shuffled_data.groupby(by='Targets').count().loc[:,'ID']
conversion_frequencies

Targets
0.0    11847
1.0     2237
Name: ID, dtype: int64

In [9]:
#select the same amount of non-converted as converted instances
remove_indices = []
counter = 0
# for every index in the target dataframe
for i in range(len(shuffled_data)):
    # if the index is non-converted
    if shuffled_data['Targets'].iloc[i] == 0:
        # increase the counter
        counter += 1
        # if the non-converted limit is reached start listing the indices to be excluded
        if counter > conversion_frequencies.iloc[1]:
            remove_indices.append(i)
#remove the indices
balanced_data = shuffled_data.drop(remove_indices)
balanced_data

Unnamed: 0,Book length (mins)_avg,Book length (mins)_overall,Price_avg,Price_overall,Review,Review 10/10,Completion,Minutes_listened,Support Requests,Last visited minus Purchase date,Targets,ID
0,-0.799650,-0.749232,-0.359686,-0.398171,-0.437653,-0.422996,-0.230761,-0.287624,-0.148730,-0.430080,0.0,29399.0
1,1.127687,0.735156,0.181732,0.082050,-0.437653,-0.422996,-0.272221,-0.162473,-0.148730,-0.566128,0.0,25751.0
2,1.127687,0.735156,-0.359686,-0.398171,2.284918,2.664739,2.920190,4.319690,-0.148730,1.814703,0.0,7814.0
3,-0.799650,-0.749232,-0.035241,-0.110398,-0.437653,-0.422996,-0.396601,-0.415686,-0.148730,1.043767,0.0,8686.0
4,-0.799650,-0.749232,-0.359686,-0.398171,-0.437653,-0.422996,-0.520980,-0.511732,-0.148730,-0.702175,0.0,20858.0
...,...,...,...,...,...,...,...,...,...,...,...,...
14061,0.164019,2.549408,-0.087963,1.040695,-0.437653,-0.422996,-0.520980,-0.511732,-0.148730,-0.702175,1.0,7223.0
14066,1.127687,0.735156,0.552816,0.411191,-0.437653,-0.422996,-0.520980,-0.511732,-0.148730,0.386204,1.0,15292.0
14068,-0.371353,1.724748,0.613649,2.288912,2.284918,2.664739,-0.520980,-0.511732,-0.148730,-0.702175,1.0,20688.0
14076,-0.157204,6.672708,0.072231,4.013753,-0.437653,-0.422996,-0.520980,-0.511732,-0.148730,1.338537,1.0,25149.0


#### Shuffle the data again

In [10]:
# Save the column names
cols = balanced_data.columns.to_list()

# create evenly spaced values in the inputs dataframe
shuffled_indices = np.arange(balanced_data.values.shape[0])
# shuffle randomly
np.random.shuffle(shuffled_indices)

# set the new indices positions from the shuffle data on the non-shuffled data. Save as a Dataframe for intuitive veiwing using the cols saved
re_shuffled_data = pd.DataFrame(balanced_data.values[shuffled_indices],columns=cols)
re_shuffled_data

Unnamed: 0,Book length (mins)_avg,Book length (mins)_overall,Price_avg,Price_overall,Review,Review 10/10,Completion,Minutes_listened,Support Requests,Last visited minus Purchase date,Targets,ID
0,0.056944,-0.089504,0.181732,0.082050,-0.437653,-0.422996,-0.520980,-0.511732,-0.14873,-0.588803,0.0,20051.0
1,-0.799650,-0.749232,-0.359686,-0.398171,-0.437653,-0.422996,0.474057,0.256638,-0.14873,2.653662,0.0,26739.0
2,-1.870393,-1.573892,0.181732,0.082050,2.284918,2.664739,-0.520980,-0.511732,-0.14873,2.120810,1.0,708.0
3,-2.512839,-2.068688,0.181732,0.082050,-0.437653,-0.422996,-0.520980,-0.511732,-0.14873,-0.520779,0.0,24408.0
4,-0.799650,-0.749232,0.613649,0.465149,-0.437653,-0.422996,-0.520980,-0.511732,-0.14873,-0.702175,0.0,22581.0
...,...,...,...,...,...,...,...,...,...,...,...,...
4469,-0.799650,-0.749232,0.181732,0.082050,-0.437653,-0.422996,-0.520980,-0.511732,-0.14873,1.293188,1.0,24955.0
4470,1.127687,0.735156,-0.244102,-0.295652,-0.437653,-0.422996,-0.520980,-0.511732,-0.14873,-0.702175,0.0,10211.0
4471,1.127687,0.735156,0.613649,0.465149,-0.437653,-0.422996,-0.396601,-0.337103,-0.14873,-0.543453,0.0,1306.0
4472,0.056944,-0.089504,0.181732,0.082050,-0.437653,-0.422996,0.183838,0.230444,-0.14873,1.576620,0.0,2132.0


In [11]:
# check for balance
print("Sum Count Average")
print(np.sum(re_shuffled_data['Targets']), len(re_shuffled_data['Targets']), np.sum(re_shuffled_data['Targets'])/ len(re_shuffled_data['Targets']))

Sum Count Average
2237.0 4474 0.5


In [12]:
# count the balanced dataset
samples_count = re_shuffled_data.shape[0]

# determine the split by ratios
train_set_count = int(0.8*samples_count)
validation_set_count = int(0.1*samples_count)
test_set_count = validation_set_count

### Divide the dataset into training, validation and test sets
We are opting for an 80, 10 , 10 split. Even though we have balanced the whole dataset, each set may be unbalanced and we should check for this. If the sets are unbalanced we should shuffle before taking each set.

In [13]:
# extract the train set
train_set = re_shuffled_data.iloc[:train_set_count]
# check for balance
print("Sum Count Average")
print(np.sum(train_set['Targets']), train_set_count, np.sum(train_set['Targets'])/train_set_count)

Sum Count Average
1791.0 3579 0.5004191114836547


In [14]:
# extract the validation set
validation_set = re_shuffled_data.iloc[train_set_count:train_set_count+validation_set_count]
# check for balance
print("Sum Count Average")
print(np.sum(validation_set['Targets']), validation_set_count, np.sum(validation_set['Targets'])/validation_set_count)

Sum Count Average
228.0 447 0.5100671140939598


In [15]:
# extract the test set
test_set = re_shuffled_data.iloc[train_set_count+validation_set_count:]
# check for balance
print("Sum Count Average")
print(np.sum(test_set['Targets']), test_set_count, np.sum(test_set['Targets'])/test_set_count)

Sum Count Average
218.0 447 0.48769574944071586


### Outline the model(s)
#### Model Type
Classifiction with two outcome classes. Converted or non-converted.

#### Model design
##### Hyperparameters
Hidden Layers = 2,4,8
Hidden Layer width = 12,24,48
Hidden Layer Activation Functions = Relu,
Output Layer Activation Function = Softmax,
Batch Size = 100,
Number of Epochs = 100
Early Stopping on val_loss to prevent over fitting, patience = 2, restore best values = True

##### Paramters
Input Layer = 10,
Output Layer = 2
Optimizer = 'Adam'
Learning Rate = 0.001
Loss = 'sparse_categorical_cross_entropy'

In [16]:
#Parameters
input_layer_width = 11
output_layer_width = 2
custom_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)

In [17]:
#Hyperparameters
hidden_layer_widths = [4,12]
hidden_layer_depths = [2,4]
BATCH_SIZE = 200
max_epochs = 1000
early_stopping = tf.keras.callbacks.EarlyStopping(
                            monitor='val_loss',
                            min_delta=0,
                            patience=2,
                            verbose=0,
                            mode='min',
                            restore_best_weights=False)

In [18]:
# construct the models
model_dict = construct(optimizer=custom_optimizer,output_length = output_layer_width, mode='sequential', flatten_input=False,list_of_widths=hidden_layer_widths, list_of_depths=hidden_layer_depths)

Constructing a 4x2 model
Constructing a 4x4 model
Constructing a 12x2 model
Constructing a 12x4 model


### Train the models

In [19]:
# for each model in the dictionary
for label in model_dict:
    # fit the model
    print(f"#####################\n{label} model\n#####################")
    model_dict[label].fit(
        train_set.iloc[:,:-2].values,
        train_set.iloc[:,-2:-1].values,
        epochs=max_epochs,
        callbacks=[early_stopping],
        validation_data=(validation_set.iloc[:,:-2].values,validation_set.iloc[:,-2:-1].values),
        batch_size=BATCH_SIZE,
        verbose=2)
    print("\n")

#####################
4x2 model
#####################
Epoch 1/1000
18/18 - 0s - loss: 0.6937 - accuracy: 0.5130 - val_loss: 0.6937 - val_accuracy: 0.5526
Epoch 2/1000
18/18 - 0s - loss: 0.6895 - accuracy: 0.5569 - val_loss: 0.6899 - val_accuracy: 0.5727
Epoch 3/1000
18/18 - 0s - loss: 0.6860 - accuracy: 0.5781 - val_loss: 0.6865 - val_accuracy: 0.5772
Epoch 4/1000
18/18 - 0s - loss: 0.6825 - accuracy: 0.5915 - val_loss: 0.6827 - val_accuracy: 0.5884
Epoch 5/1000
18/18 - 0s - loss: 0.6784 - accuracy: 0.5951 - val_loss: 0.6784 - val_accuracy: 0.5951
Epoch 6/1000
18/18 - 0s - loss: 0.6734 - accuracy: 0.6125 - val_loss: 0.6734 - val_accuracy: 0.5996
Epoch 7/1000
18/18 - 0s - loss: 0.6672 - accuracy: 0.6343 - val_loss: 0.6676 - val_accuracy: 0.6130
Epoch 8/1000
18/18 - 0s - loss: 0.6600 - accuracy: 0.6471 - val_loss: 0.6604 - val_accuracy: 0.6219
Epoch 9/1000
18/18 - 0s - loss: 0.6512 - accuracy: 0.6544 - val_loss: 0.6523 - val_accuracy: 0.6488
Epoch 10/1000
18/18 - 0s - loss: 0.6417 - accu

Epoch 9/1000
18/18 - 0s - loss: 0.5815 - accuracy: 0.7547 - val_loss: 0.5658 - val_accuracy: 0.7562
Epoch 10/1000
18/18 - 0s - loss: 0.5504 - accuracy: 0.7698 - val_loss: 0.5383 - val_accuracy: 0.7808
Epoch 11/1000
18/18 - 0s - loss: 0.5219 - accuracy: 0.7776 - val_loss: 0.5140 - val_accuracy: 0.7785
Epoch 12/1000
18/18 - 0s - loss: 0.4971 - accuracy: 0.7854 - val_loss: 0.4993 - val_accuracy: 0.7673
Epoch 13/1000
18/18 - 0s - loss: 0.4779 - accuracy: 0.7932 - val_loss: 0.4777 - val_accuracy: 0.8009
Epoch 14/1000
18/18 - 0s - loss: 0.4639 - accuracy: 0.7952 - val_loss: 0.4665 - val_accuracy: 0.7942
Epoch 15/1000
18/18 - 0s - loss: 0.4567 - accuracy: 0.7927 - val_loss: 0.4577 - val_accuracy: 0.8009
Epoch 16/1000
18/18 - 0s - loss: 0.4503 - accuracy: 0.7888 - val_loss: 0.4546 - val_accuracy: 0.7785
Epoch 17/1000
18/18 - 0s - loss: 0.4402 - accuracy: 0.7918 - val_loss: 0.4487 - val_accuracy: 0.7830
Epoch 18/1000
18/18 - 0s - loss: 0.4377 - accuracy: 0.7907 - val_loss: 0.4389 - val_accuracy

### Testing the models

In [20]:
# for each model in the dictionary
for label in model_dict:
    print(f'{label} model')
    test_loss, test_accuracy = model_dict[label].evaluate(test_set.iloc[:,:-2].values,test_set.iloc[:,-2:-1].values)
    import os
    working_directory = os.getcwd()
    print(working_directory)
    model_dict[label].save_weights(working_directory+f'\\{label}_model_trained')

4x2 model
C:\Users\pjjvn\Google Drive\Programming\audiobook_business_case
4x4 model
C:\Users\pjjvn\Google Drive\Programming\audiobook_business_case
12x2 model
C:\Users\pjjvn\Google Drive\Programming\audiobook_business_case
12x4 model
C:\Users\pjjvn\Google Drive\Programming\audiobook_business_case


# Insight
The model was trained on a shuffled dataset, that was then balanced by excluding "extra" non-converted targets and retaining all converted targets. This was performed to obtain an equal distribution of converted and non-converted targets. The data was shuffled again and then split into test, validation and test sets in order to obtain an equal distribution in each set.

With the current outlines, the models are able to predict approximatly 80-83% of the customer's behaviour correctly. That is 8 out of 10 customers.