# Introduction

**Defining the Problem**

The aim for this Reuters data set is to classify each of the newswires into one of 46 different classes

This data set has 8982 samples in the training set and 2246 samples in the test set

Since there are multiple classes, this is a single-label, multiclass problem

**Choosing measure of success**

We can measure how successful the model is at classifying the newswires into the different classes by checking its accuracy

If the model can achieve an accuracy of at least 75%, we will consider it to be accurate and succesful

**Deciding an evaluation protocol**

We will be using the hold out validation method as we have sufficient data to execute this method well

# Methodology

**Preparing data**

First we need to load the data. We limit the dictionary to 10000 words so as to discard uncommonly used words to keep it manageable

In [None]:
from tensorflow.keras.datasets import reuters
(train_data, train_labels,), (test_data, test_labels) = reuters.load_data(num_words=10000)

Next we vectorize the samples

In [None]:
import numpy as np
def vectorize_sequences(sequences, dimension = 10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results

x_train = vectorize_sequences(train_data)
x_test = vectorize_sequences(test_data)
y_train = np.asarray(train_labels).astype('float32')
y_test = np.asarray(test_labels).astype('float32')

Lastly, labels are one-hot encoded

In [None]:
from tensorflow.keras.utils import to_categorical # one hot encoder for lists
one_hot_train_labels = to_categorical(train_labels)
one_hot_test_labels = to_categorical(test_labels)

**Developing a model that does better than the baseline**

First, we have to calculate the accuracy of the baseline model. The baseline model will predict according to the most populated class in the dataset

In [None]:
all_labels = np.zeros(46)
for i in range(len(train_labels)):
    all_labels[train_labels[i]] += 1
index = np.argmax(all_labels)
hits = np.array(test_labels) == index
print(np.sum(hits)/len(test_labels) * 100)

As we can see from above, the baseline model's accuracy is about 36%

The initial model that we will be building will be a small one that is just able to beat the baseline model's accuracy

In [None]:
from tensorflow.keras import models
from tensorflow.keras import layers
model = models.Sequential()
model.add(layers.Dense(16, activation = 'relu', input_shape = (10000,)))
model.add(layers.Dense(16, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

In [None]:
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

history = model.fit(partial_x_train, 
                    partial_y_train,
                    epochs = 20,
                    batch_size = 512,
                    validation_data = (x_val, y_val))

In [None]:
initial_test_results = model.evaluate(partial_x_train, partial_y_train)

As we can see from the above cell, this model achieves an accuracy of 96%, which is far greater than the baseline model

However, this model has a low capacity, which may cause information bottleneck, so we will scale up from this model.

**Scaling up**

In this step, we will be scaling up our model to prevent underfitting and to achieve peak performance without overfitting

The epoch_graphs() function will help use to plot graphs to compare the validation loss between to subsequent models we will be creating

In [None]:
def epochs_graphs(x, y_A, style_A, label_A, y_B, style_B, label_B, title, x_label, y_label):
    import matplotlib.pyplot as plt
    plt.clf()
    plt.plot(x, y_A, style_A, label = label_A)
    plt.plot(x, y_B, style_B, label = label_B)
    plt.title(title)
    plt.xlabel(x_label)
    plt.ylabel(y_label)
    plt.legend()
    plt.show()

First we will create a model with 32 layers and train it for 50 epochs

In [None]:
model = models.Sequential()
model.add(layers.Dense(32, activation = 'relu', input_shape = (10000,)))
model.add(layers.Dense(32, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

In [None]:
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

history_32 = model.fit(partial_x_train, 
                    partial_y_train,
                    epochs = 50,
                    batch_size = 512,
                    validation_data = (x_val, y_val))

Next, we will build a model with 64 layers and train it for 50 epochs

In [None]:
#building the model
model = models.Sequential()
model.add(layers.Dense(64, activation = 'relu', input_shape = (10000,)))
model.add(layers.Dense(64, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

#training the model
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

history_64 = model.fit(partial_x_train, 
                    partial_y_train,
                    epochs = 50,
                    batch_size = 512,
                    validation_data = (x_val, y_val))

We, can now compare the two models by plotting the graph of their validation losses

In [None]:
val_loss_32 = history_32.history['val_loss']
val_loss_64 = history_64.history['val_loss']

#plotting the graph
epochs_graphs(range(1,len(val_loss_64)+1), val_loss_64, 'bo', '64 Layer Model', val_loss_32, 'b', 
              '32 Layer Model', 'Validation loss of the 2 models', 'Epochs', 'Val Loss');

The graph above shows the 64 layer has a lower validation loss than the 32 layer, hence, overfitting has not yet occured and we can still expand the model

We can now build a model with 96 layers and train it for 50 epochs

In [None]:
#building the model
model = models.Sequential()
model.add(layers.Dense(96, activation = 'relu', input_shape = (10000,)))
model.add(layers.Dense(96, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

#training the model
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

history_96 = model.fit(partial_x_train, 
                    partial_y_train,
                    epochs = 50,
                    batch_size = 512,
                    validation_data = (x_val, y_val))

We, can now compare the two models by plotting the graph of their validation losses

In [None]:
val_loss_96 = history_96.history['val_loss']

#plotting the graph
epochs_graphs(range(1,len(val_loss_96)+1), val_loss_96, 'bo', '96 Layer Model', val_loss_64, 'b', 
              '64 Layer Model', 'Validation loss of the 2 models', 'Epochs', 'Val Loss');

From the above graph, the 96 layer model's validation loss is slightly higher than the 64 layer's after roughly 9 epochs, showing that overfitting is starting to occur.

Despite that, it is able to achieve lower validation loss at a lower nuumber of epochs, thus, we can still use 96 layers 

**Regularising the model and tuning hyperparameters**

After finding a suitable model size and the right number of epochs to train it for, we can still further improve the model through weight regularisation and tuning the hyperparameters

*Adding L1 regularisation to the model:*

In [None]:
from tensorflow.keras import regularizers
#building the model
model = models.Sequential()
model.add(layers.Dense(96, kernel_regularizer = regularizers.L1(0.001), activation = 'relu', input_shape = (10000,)))
model.add(layers.Dense(96, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

#training the model
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

history_l1 = model.fit(partial_x_train, 
                    partial_y_train,
                    epochs = 50,
                    batch_size = 512,
                    validation_data = (x_val, y_val))

In [None]:
val_loss_l1 = history_l1.history['val_loss']

#plotting the graph
epochs_graphs(range(1,len(val_loss_96)+1), val_loss_96, 'bo', '96 Layer Model', val_loss_l1, 'b', 
              'L1 Model', 'Validation loss of the 2 models', 'Epochs', 'Val Loss');

L1 regularization does not seem to reduce overfitting, as such will not be used

*Adding L2 regularisation to the model:*

In [None]:
#building the model
model = models.Sequential()
model.add(layers.Dense(96, kernel_regularizer = regularizers.L2(0.001), activation = 'relu', input_shape = (10000,)))
model.add(layers.Dense(96, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

#training the model
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

history_l2 = model.fit(partial_x_train, 
                    partial_y_train,
                    epochs = 50,
                    batch_size = 512,
                    validation_data = (x_val, y_val))

In [None]:
val_loss_l2 = history_l2.history['val_loss']
val_loss_96 = history_96.history['val_loss']

#plotting the graph
epochs_graphs(range(1,len(val_loss_96)+1), val_loss_96, 'bo', '96 Layer Model', val_loss_l2, 'b', 
              'L2 Model', 'Validation loss of the 2 models', 'Epochs', 'Val Loss');

L2 regularization greatly reduces overfitting over 50 epochs, but the 96 layer model without L2 regularization still achieves lower validation loss. Thus L2 regularization will not be used

*Adding dropout to the model:*

We will be using 50% dropout

In [None]:
#building the model
model = models.Sequential()
model.add(layers.Dense(96, activation = 'relu', input_shape = (10000,)))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(96, activation = 'relu'))
model.add(layers.Dropout(0.5))
model.add(layers.Dense(46, activation = 'softmax'))
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

#training the model
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

history_drop = model.fit(partial_x_train, 
                    partial_y_train,
                    epochs = 50,
                    batch_size = 512,
                    validation_data = (x_val, y_val))

In [None]:
val_loss_drop = history_drop.history['val_loss']
val_loss_96 = history_96.history['val_loss']

#plotting the graph
epochs_graphs(range(1,len(val_loss_96)+1), val_loss_96, 'bo', '96 Layer Model', val_loss_drop, 'b', 
              'Dropout Model', 'Validation loss of the 2 models', 'Epochs', 'Val Loss');

Dropout also does not reduce validation loss sufficiently, thus will not be used

# Results

We can now find the optimal number of epochs to train the model for using the graph above

In [None]:
print(np.argmin(val_loss_96), 'epochs')

Using this info we can finally train the final model and test it

In [None]:
#building the model
model = models.Sequential()
model.add(layers.Dense(96, activation = 'relu', input_shape = (10000,)))
model.add(layers.Dense(96, activation = 'relu'))
model.add(layers.Dense(46, activation = 'softmax'))
model.compile(optimizer = 'rmsprop', loss = 'categorical_crossentropy', metrics = ['accuracy'])

#training the model
x_val = x_train[:1000]
partial_x_train = x_train[1000:]
y_val = one_hot_train_labels[:1000]
partial_y_train = one_hot_train_labels[1000:]

history_l2_final = model.fit(partial_x_train, 
                    partial_y_train,
                    epochs = 5,
                    batch_size = 512,
                    validation_data = (x_val, y_val))

results = model.evaluate(x_test, one_hot_test_labels)
print(results)

# Conclusion

The model is able to achieve 78% accuracy, hence we can consider it to be successful