# Notebook Description

<div style="text-align: justify">
The notebook named 'Audiobooks.ipynb' demonstrates my hands-on critical thinking in creating a neural network model to predict whether a person will repurchase an audiobook. Seeing that I couldn't easily beat the baseline with my version of the neural network, I decided to copy and paste the instructor's code into a new notebook to run and test its results. Finally, I will compare the instructor's code results with my results to see how my approach works in comparison.
</div>

In [None]:
import sys 
sys.executable  # Display the path to the Python executable ensuring the correct env

<div style="text-align: justify">
This is the last cell I intervene in this notebook. The cells below present the instructor's point of view. I have included two different data files in this project to ensure the code and files match those the instructor used. We want to test its results completely unbiased. My only intervention is that I will delete all instructor comments, as they don't affect the results.
</div>

# Here Starts the Instructor's Point of View:

In [353]:
import numpy as np
from sklearn import preprocessing

raw_csv_data = np.loadtxt('Audiobooks_data.csv',delimiter=',')

unscaled_inputs_all = raw_csv_data[:,1:-1]

targets_all = raw_csv_data[:,-1]

In [354]:
num_one_targets = int(np.sum(targets_all))

zero_targets_counter = 0

indices_to_remove = []

for i in range(targets_all.shape[0]):
    if targets_all[i] == 0:
        zero_targets_counter += 1
        if zero_targets_counter > num_one_targets:
            indices_to_remove.append(i)
            
unscaled_inputs_equal_priors = np.delete(unscaled_inputs_all, indices_to_remove, axis=0)
targets_equal_priors = np.delete(targets_all, indices_to_remove, axis=0)

In [355]:
scaled_inputs = preprocessing.scale(unscaled_inputs_equal_priors)

In [356]:
shuffled_indices = np.arange(scaled_inputs.shape[0])
np.random.shuffle(shuffled_indices)

shuffled_inputs = scaled_inputs[shuffled_indices]
shuffled_targets = targets_equal_priors[shuffled_indices]

In [357]:
samples_count = shuffled_inputs.shape[0]

train_samples_count = int(0.8 * samples_count)
validation_samples_count = int(0.1 * samples_count)

test_samples_count = samples_count - train_samples_count - validation_samples_count

train_inputs = shuffled_inputs[:train_samples_count]
train_targets = shuffled_targets[:train_samples_count]

validation_inputs = shuffled_inputs[train_samples_count:train_samples_count+validation_samples_count]
validation_targets = shuffled_targets[train_samples_count:train_samples_count+validation_samples_count]

test_inputs = shuffled_inputs[train_samples_count+validation_samples_count:]
test_targets = shuffled_targets[train_samples_count+validation_samples_count:]

print(np.sum(train_targets), train_samples_count, np.sum(train_targets) / train_samples_count)
print(np.sum(validation_targets), validation_samples_count, np.sum(validation_targets) / validation_samples_count)
print(np.sum(test_targets), test_samples_count, np.sum(test_targets) / test_samples_count)

1773.0 3579 0.4953897736797988
234.0 447 0.5234899328859061
230.0 448 0.5133928571428571


In [358]:
np.savez('Audiobooks_data_train', inputs=train_inputs, targets=train_targets)
np.savez('Audiobooks_data_validation', inputs=validation_inputs, targets=validation_targets)
np.savez('Audiobooks_data_test', inputs=test_inputs, targets=test_targets)

In [359]:
import tensorflow as tf

In [360]:
npz = np.load('Audiobooks_data_train.npz')

train_inputs = npz['inputs'].astype(float)

train_targets = npz['targets'].astype(int)

npz = np.load('Audiobooks_data_validation.npz')

validation_inputs, validation_targets = npz['inputs'].astype(float), npz['targets'].astype(int)

npz = np.load('Audiobooks_data_test.npz')

test_inputs, test_targets = npz['inputs'].astype(float), npz['targets'].astype(int)

In [361]:
input_size = 10
output_size = 2

hidden_layer_size = 50
    

model = tf.keras.Sequential([
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    tf.keras.layers.Dense(hidden_layer_size, activation='relu'), 
    
    tf.keras.layers.Dense(output_size, activation='softmax') 
])

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

batch_size = 100

max_epochs = 100

early_stopping = tf.keras.callbacks.EarlyStopping(patience=2)

model.fit(train_inputs, 
          train_targets, 
          batch_size=batch_size,
          epochs=max_epochs, 
          callbacks=[early_stopping], 
          validation_data=(validation_inputs, validation_targets), 
          verbose = 2 
          )  

Epoch 1/100
36/36 - 1s - 21ms/step - accuracy: 0.6837 - loss: 0.5670 - val_accuracy: 0.7718 - val_loss: 0.4649
Epoch 2/100
36/36 - 0s - 1ms/step - accuracy: 0.7502 - loss: 0.4677 - val_accuracy: 0.8121 - val_loss: 0.3930
Epoch 3/100
36/36 - 0s - 1ms/step - accuracy: 0.7762 - loss: 0.4213 - val_accuracy: 0.8277 - val_loss: 0.3596
Epoch 4/100
36/36 - 0s - 1ms/step - accuracy: 0.7770 - loss: 0.3962 - val_accuracy: 0.8479 - val_loss: 0.3407
Epoch 5/100
36/36 - 0s - 1ms/step - accuracy: 0.7932 - loss: 0.3795 - val_accuracy: 0.8345 - val_loss: 0.3258
Epoch 6/100
36/36 - 0s - 1ms/step - accuracy: 0.7983 - loss: 0.3687 - val_accuracy: 0.8591 - val_loss: 0.3158
Epoch 7/100
36/36 - 0s - 1ms/step - accuracy: 0.8097 - loss: 0.3596 - val_accuracy: 0.8389 - val_loss: 0.3099
Epoch 8/100
36/36 - 0s - 1ms/step - accuracy: 0.8078 - loss: 0.3545 - val_accuracy: 0.8367 - val_loss: 0.3021
Epoch 9/100
36/36 - 0s - 1ms/step - accuracy: 0.8055 - loss: 0.3512 - val_accuracy: 0.8300 - val_loss: 0.3012
Epoch 10/

<keras.src.callbacks.history.History at 0x27728326ba0>

In [362]:
test_loss, test_accuracy = model.evaluate(test_inputs, test_targets)

[1m14/14[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 723us/step - accuracy: 0.8349 - loss: 0.3138


In [363]:
print('\nTest loss: {0:.2f}. Test accuracy: {1:.2f}%'.format(test_loss, test_accuracy*100.))


Test loss: 0.35. Test accuracy: 81.25%


# Here Ends the Instructor's Point of View

# Final Results

<div style="text-align: justify">
I intervene again to keep the instructor's code results for 30 different runs. I don't want to repeat the instructor's model by intervening with the code inside the cells above. Therefore, I will run the notebook until this cell, and then append the results of each run to calculate the average accuracy and loss as well as their standard deviations.  
    
***Be careful not to run the cell below twice, as it will initialize the lists to empty ones.***
</div>

In [13]:
# BE AWARE NOT TO RUN THIS TWICE:
accuracy_list = []
loss_list = []

In [364]:
accuracy_list.append(test_accuracy)
loss_list.append(test_loss)
len(accuracy_list)

30

In [365]:
# Calculate the metrics:
accuracy_avg = np.mean(accuracy_list)
accuracy_std = np.std(accuracy_list)
loss_avg = np.mean(loss_list)
loss_std = np.std(loss_list)

In [367]:
# Print the metrics:
print('Average Test Accuracy:', round(accuracy_avg, 4))
print('Standard Deviation Test Accuracy:', round(accuracy_std, 4))
print('Average Test Loss:', round(loss_avg, 4))
print('Standard Deviation Test Loss:', round(loss_std, 4))

Average Test Accuracy: 0.8089
Standard Deviation Test Accuracy: 0.0162
Average Test Loss: 0.3466
Standard Deviation Test Loss: 0.0193
