# Project 1 : Creating a Model for AudioBook App using Deep Learning.

## Objective
Create a Model using Neural Networks that can predict if a customer will buy again.

## Summary

The data is related to the audio book app which consists of purchasing behaviour of the customer . Each customer in the data set has made purchase atleast once. Hence we will create a model  based on the data that will predict if the customer will buy again from the audio book company. The main idea is that company should not spend its advertising budget on customers who are unlikely to purchase again.

## Data info
##### The columns names are not included in the dataset as we want no text in our data while training the model.
##### Each row represents a person.
##### 1st column - Customer ID,
##### 2nd column - Overall Book Length (Its the sum of the length of all purchases)(mins),
##### 3rd column - Average Book length (Its is the sum divided by number of purchases)(mins),
##### 4th column - Overall price paid,
##### 5th column - Average price paid,
##### 6th column - Review ( 1 = left a review , 0 = din't left a review),
##### 7th column - Review 10/10 ( Its measures review of a customer from 1 to 10),
##### 8th column - Total minutes Listened,
##### 9th column - Completion (It is the total minutes listened / Overall Book Length),
##### 10th column - Support Requests ( total number of support requests made ),
##### 11th column - Last visited minus Purchase date,
##### 12th column - Targets (1 = If the customer bought again , 0 = If the customer din't buy ).

# IMPORT RELEVANT LIBRARIES

In [1]:
import numpy as np
from sklearn import preprocessing

raw_csv_data = np.loadtxt('D:\original.csv',delimiter=',')

unscaled_inputs_all = raw_csv_data[:,1:-1]
targets_all = raw_csv_data[:,-1]

# BALANCING THE DATA

In [2]:
num_one_targets = int(np.sum(targets_all))
zero_targets_counter = 0
indices_to_remove = []
for i in range(targets_all.shape[0]):
    if targets_all[i] == 0:
        zero_targets_counter +=1
        if zero_targets_counter > num_one_targets:
            indices_to_remove.append(i)
            
unscaled_inputs_equal_priors = np.delete(unscaled_inputs_all,indices_to_remove,axis=0)
targets_equals_priors = np.delete(targets_all,indices_to_remove,axis=0)

# STANDARDIZE THE INPUTS

In [3]:
scaled_inputs = preprocessing.scale(unscaled_inputs_equal_priors)

# SHUFFLE THE DATA

In [4]:
shuffled_indices = np.arange(scaled_inputs.shape[0])
np.random.shuffle(shuffled_indices)

shuffled_inputs = scaled_inputs[shuffled_indices]
shuffled_targets = targets_equals_priors[shuffled_indices]

# SPLIT DATA INTO TRAIN, VALIDATION & TEST

In [5]:
samples_count = shuffled_inputs.shape[0]

train_samples_count = int(0.8*samples_count)
validation_samples_count = int(0.1*samples_count)
test_samples_count = samples_count - train_samples_count - validation_samples_count

train_inputs = shuffled_inputs[:train_samples_count]
train_targets = shuffled_targets[:train_samples_count]

validation_inputs = shuffled_inputs[train_samples_count:train_samples_count+validation_samples_count]
validation_targets = shuffled_targets[train_samples_count:train_samples_count+validation_samples_count]

test_inputs = shuffled_inputs[train_samples_count+validation_samples_count:]
test_targets = shuffled_targets[train_samples_count+validation_samples_count:]


# Save the three datasets in .npz

In [6]:
np.savez('Audiobooks_data_train',inputs=train_inputs, targets= train_targets)
np.savez('Audiobooks_data_validation',inputs=validation_inputs, targets= validation_targets)
np.savez('Audiobooks_data_test',inputs=test_inputs, targets= test_targets)

# CREATE THE MODEL

In [7]:
import tensorflow as tf

## DATA

In [8]:
npz = np.load('Audiobooks_data_train.npz')

train_inputs = npz['inputs'].astype(np.float)
train_targets = npz['targets'].astype(np.int)

npz = np.load('Audiobooks_data_validation.npz')
validation_inputs, validation_targets = npz['inputs'].astype(np.float),npz['targets'].astype(np.int)

npz = np.load('Audiobooks_data_test.npz')
test_inputs, test_targets = npz['inputs'].astype(np.float),npz['targets'].astype(np.int)

# MODEL

## OUTLINE, OPTIMIZERS, LOSS, EARLY STOPPING & TRAINING

In [9]:
input_size = 10
output_size = 2
hidden_layer_size = 50

model = tf.keras.Sequential([
                              tf.keras.layers.Dense(hidden_layer_size, activation = 'relu'),
                              tf.keras.layers.Dense(hidden_layer_size, activation = 'relu'),
                              tf.keras.layers.Dense(output_size, activation = 'softmax')
])


model.compile(optimizer='adam', loss = 'sparse_categorical_crossentropy', metrics = ['accuracy'])

batch_size = 100

max_epochs = 100

early_stopping = tf.keras.callbacks.EarlyStopping()

model.fit(train_inputs,
          train_targets,
          batch_size = batch_size,
          epochs = max_epochs,
          callbacks =[early_stopping],
          validation_data =(validation_inputs,validation_targets),
          verbose=2)



Train on 3579 samples, validate on 447 samples
Epoch 1/100
3579/3579 - 1s - loss: 0.5892 - accuracy: 0.6672 - val_loss: 0.5071 - val_accuracy: 0.7315
Epoch 2/100
3579/3579 - 0s - loss: 0.4585 - accuracy: 0.7639 - val_loss: 0.4427 - val_accuracy: 0.7651
Epoch 3/100
3579/3579 - 0s - loss: 0.4097 - accuracy: 0.7877 - val_loss: 0.4090 - val_accuracy: 0.7942
Epoch 4/100
3579/3579 - 0s - loss: 0.3847 - accuracy: 0.7949 - val_loss: 0.3946 - val_accuracy: 0.7987
Epoch 5/100
3579/3579 - 0s - loss: 0.3690 - accuracy: 0.8058 - val_loss: 0.3797 - val_accuracy: 0.8143
Epoch 6/100
3579/3579 - 0s - loss: 0.3581 - accuracy: 0.8139 - val_loss: 0.3720 - val_accuracy: 0.8166
Epoch 7/100
3579/3579 - 0s - loss: 0.3516 - accuracy: 0.8072 - val_loss: 0.3726 - val_accuracy: 0.8233


<tensorflow.python.keras.callbacks.History at 0x17971980888>

# TEST THE MODEL

In [10]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)

