## Deep Learning with tensorflow keras 

### Multiclass classification with Reuters dataset 

We will work with the Reuters dataset from the keras package:
https://keras.io/api/datasets/reuters/


In [1]:
# install tensorflow library: pip install tensorflow


### Import the built-in Reuters dataset 


In [3]:
import numpy as np

from tensorflow.keras.datasets import reuters

# num_words restricts the data to the 10 000 most frequently occurring words in the dataset
(train_data, train_labels), (test_data, test_labels) = reuters.load_data(
    num_words=10000)


2022-11-14 14:42:11.971908: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.



Inspect the data, what does it represent? 
The features/examples are arranged into a Document Term Matrix (DTM).

If you have never worked with a Document Term Matrix (DTM) in Natural Language Processing (NLP), investigate what it is. 

In [4]:
# inspect the examples/features:

In [31]:
# each example is a list of integers (word indices):
print(train_data[10])


[1, 245, 273, 207, 156, 53, 74, 160, 26, 14, 46, 296, 26, 39, 74, 2979, 3554, 14, 46, 4689, 4329, 86, 61, 3499, 4795, 14, 61, 451, 4329, 17, 12]


Inspect the labels:

In [3]:
# inspect the labels:

How many classes are there?

In [2]:
# hint: np.unique()

Investigate the size of train and test, and the proportion test/train.

In [1]:
# train:


In [5]:
# test:

In [6]:
# train-test proportion:

Decode news wires back to text of the train_data[0] example with a keras built-in method:

In [37]:
word_index = reuters.get_word_index()
reverse_word_index = dict([(value, key) for (key, value) in word_index.items()])
decoded_newswire = " ".join([reverse_word_index.get(i - 3, "?") for i in
    train_data[0]])


In [7]:
# inspect the decoded text:


Note: indices in get_word_index() are offset by 3 as 0, 1 and 2 are reserved indices respectively for 
"padding", "start of sequence" and "unknown".  

### Class distribution

- How is the multiclass proportion of this dataset.
- What are the advanteges/disadvantages when we train a classifier based on this training set?

In [8]:
# calculate class distribuion from train labels

#unique_labels, counts_pr_class = #code here

# percentwise coverage of classes in dataset:
# pct_pr_class = (counts_pr_class/len(train_labels))*100


# create a plot with percentwise class coverage: 
# class labels on x-axis
# and percentwise class coverage on y-axis

from matplotlib import pyplot as plt


## Prepare the data

Implement the given function and vectorize the x_train and x_test data.

In [42]:
def vectorize_sequences(sequences, dimension=10000):
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.
    return results


In [43]:
#x_train = 
#x_test = 

### One-hot encode the labels
Implement the given function and one-hot encode the labels.

In [44]:
def to_one_hot(labels, dimension=46):
    results = np.zeros((len(labels), dimension))
    for i, label in enumerate(labels):
        results[i, label] = 1.
    return results

In [45]:
# one-hot encode the test and train labels (home made "control freak" edition):

#y_train = 
#y_test = 

In [46]:
# you can also use the built-in to_catecorical() function to do the same thing as in the cell above (pre made edition):

from tensorflow.keras.utils import to_categorical
y_train = to_categorical(train_labels)
y_test = to_categorical(test_labels)


### Build the model architecture

Look up and discuss the difference between: 
- Sequential models 
- Functional models 

Finalize the code below so the first two dense layers have 64 neurons and relu activation functions.

What size should the third, and last, dense layer have? 

Set the last layer's activation function to softmax. (WHY?)

In [9]:
# model architecture: 

import keras
from keras import layers
model = keras.Sequential([
    layers.Dense(),
    layers.Dense(),
    layers.Dense()
])

### Compile model

Finalize the code by implementing 

rmsprop optimizer

categorical crossentropy loss function,

and accuracy metric.

In [50]:
# compile model:

model.compile(optimizer="",
              loss="",
              metrics=[""])

Good to know:

Categorical crossentropy is used in classification
it measures the distance between two probability distributions

between the probability distribution output by the network and the
true distribution of the labels.

By minimizing this distance you train the network to output something
as close as possible to the true labels.

In [51]:
# Set some data aside for validation:

x_val = x_train[:1000]
partial_x_train = x_train[1000:]

y_val = y_train[:1000]
partial_y_train = y_train[1000:]  

# Train the model 

Discussion point - hyperparameter tuning

self study for later - keras tuner: https://keras.io/keras_tuner/

In [10]:
# fit the model with 20 epochs, batch size 512 and validation data as tuple of x_val and y_val
# batch size chosen as 2^x.  

history = model.fit(partial_x_train,
                    partial_y_train,
                    epochs=,
                    batch_size=,
                    validation_data=())

# Plot train and validation loss

Discussion: what does the plot mean?

In [11]:

loss = history.history["loss"]
val_loss = history.history["val_loss"]
epochs = range(1, len(loss) + 1)
plt.plot(epochs, loss, "bo", label="Training loss")
plt.plot(epochs, val_loss, "b", label="Validation loss")
plt.title("Training and validation loss")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend()
plt.show()

# Plot train and validation accuracy

Discussion: what does the plot mean?

In [12]:

plt.clf() # clear the previous figure
acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
plt.plot(epochs, acc, "bo", label="Training accuracy")
plt.plot(epochs, val_acc, "b", label="Validation accuracy")
plt.title("Training and validation accuracy")
plt.xlabel("Epochs")
plt.ylabel("Accuracy")
plt.legend()
plt.show()

Use the plot above to decide the beneficial number of epochs.

Adjust model for beneficial nr of epochs and retrain it:

In [13]:
#fill in the optimal number of epochs:
model = keras.Sequential()
model.add(layers.Dense(64, activation='relu', input_shape=(10000,)))
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(46, activation='softmax'))
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.fit(x_train,
          y_train,
          epochs=,
          batch_size=512,
          validation_data=(x_val, y_val))

results = model.evaluate(x_test, y_test)


### Inspect the results 

What does the output mean?

In [56]:

results

[0.9458857774734497, 0.7974176406860352]


### Predict on test data

Use the predict method to predict on x_test:

In [14]:

predictions = model.predict()

In [15]:
#ivestigate the shape of predictions

# one prediction:


In [16]:
# investigate the predictions (e.g the 0'th entry) 


In [17]:
# sum one row in predictions, what does this mean?

#hint: np.sum()


In [75]:
# What does this mean?
np.argmax(predictions[0])


3

### What is the accuracy per class in the test set?

Create a multiclass confusion matrix that compares predictions with the actual values

Compare with the earlier percentwise coverage plot

Comment on the results


In [19]:
# Ensure the predictions are converted to integers

pred_int = np.zeros_like(predictions)
pred_int[np.arange(len(predictions)), predictions.argmax(1)] = 1
print(pred_int[0])



In [20]:

# Create a confusion matrix to visualize the results. 
# What do they mean? 
