# Introduction to Neural Networks: Classification with Multiple Categories

Author: Pierre Nugues

## Fisher's Dataset on Irises
Ronald Fisher,  The use of multiple measurements in taxonomic problems, _Annals of Eugenics_, 1936 (https://onlinelibrary.wiley.com/doi/epdf/10.1111/j.1469-1809.1936.tb02137.x)

In [None]:
import numpy as np

X = np.array([[5.1, 3.5, 1.4, 0.2],
     [4.9, 3.0, 1.4, 0.2],
     [4.7, 3.2, 1.3, 0.2],
     [4.6, 3.1, 1.5, 0.2],
     [5., 3.6, 1.4, 0.2],
     [5.4, 3.9, 1.7, 0.4],
     [4.6, 3.4, 1.4, 0.3],
     [5., 3.4, 1.5, 0.2],
     [4.4, 2.9, 1.4, 0.2],
     [4.9, 3.1, 1.5, 0.1],
     [5.4, 3.7, 1.5, 0.2],
     [4.8, 3.4, 1.6, 0.2],
     [4.8, 3., 1.4, 0.1],
     [4.3, 3., 1.1, 0.1],
     [5.8, 4., 1.2, 0.2],
     [5.7, 4.4, 1.5, 0.4],
     [5.4, 3.9, 1.3, 0.4],
     [5.1, 3.5, 1.4, 0.3],
     [5.7, 3.8, 1.7, 0.3],
     [5.1, 3.8, 1.5, 0.3],
     [5.4, 3.4, 1.7, 0.2],
     [5.1, 3.7, 1.5, 0.4],
     [4.6, 3.6, 1., 0.2],
     [5.1, 3.3, 1.7, 0.5],
     [4.8, 3.4, 1.9, 0.2],
     [5., 3., 1.6, 0.2],
     [5., 3.4, 1.6, 0.4],
     [5.2, 3.5, 1.5, 0.2],
     [5.2, 3.4, 1.4, 0.2],
     [4.7, 3.2, 1.6, 0.2],
     [4.8, 3.1, 1.6, 0.2],
     [5.4, 3.4, 1.5, 0.4],
     [5.2, 4.1, 1.5, 0.1],
     [5.5, 4.2, 1.4, 0.2],
     [4.9, 3.1, 1.5, 0.1],
     [5., 3.2, 1.2, 0.2],
     [5.5, 3.5, 1.3, 0.2],
     [4.9, 3.1, 1.5, 0.1],
     [4.4, 3., 1.3, 0.2],
     [5.1, 3.4, 1.5, 0.2],
     [5., 3.5, 1.3, 0.3],
     [4.5, 2.3, 1.3, 0.3],
     [4.4, 3.2, 1.3, 0.2],
     [5., 3.5, 1.6, 0.6],
     [5.1, 3.8, 1.9, 0.4],
     [4.8, 3., 1.4, 0.3],
     [5.1, 3.8, 1.6, 0.2],
     [4.6, 3.2, 1.4, 0.2],
     [5.3, 3.7, 1.5, 0.2],
     [5., 3.3, 1.4, 0.2],
     [7., 3.2, 4.7, 1.4],
     [6.4, 3.2, 4.5, 1.5],
     [6.9, 3.1, 4.9, 1.5],
     [5.5, 2.3, 4., 1.3],
     [6.5, 2.8, 4.6, 1.5],
     [5.7, 2.8, 4.5, 1.3],
     [6.3, 3.3, 4.7, 1.6],
     [4.9, 2.4, 3.3, 1.],
     [6.6, 2.9, 4.6, 1.3],
     [5.2, 2.7, 3.9, 1.4],
     [5., 2., 3.5, 1.],
     [5.9, 3., 4.2, 1.5],
     [6., 2.2, 4., 1.],
     [6.1, 2.9, 4.7, 1.4],
     [5.6, 2.9, 3.6, 1.3],
     [6.7, 3.1, 4.4, 1.4],
     [5.6, 3., 4.5, 1.5],
     [5.8, 2.7, 4.1, 1.],
     [6.2, 2.2, 4.5, 1.5],
     [5.6, 2.5, 3.9, 1.1],
     [5.9, 3.2, 4.8, 1.8],
     [6.1, 2.8, 4., 1.3],
     [6.3, 2.5, 4.9, 1.5],
     [6.1, 2.8, 4.7, 1.2],
     [6.4, 2.9, 4.3, 1.3],
     [6.6, 3., 4.4, 1.4],
     [6.8, 2.8, 4.8, 1.4],
     [6.7, 3., 5., 1.7],
     [6., 2.9, 4.5, 1.5],
     [5.7, 2.6, 3.5, 1.],
     [5.5, 2.4, 3.8, 1.1],
     [5.5, 2.4, 3.7, 1.],
     [5.8, 2.7, 3.9, 1.2],
     [6., 2.7, 5.1, 1.6],
     [5.4, 3., 4.5, 1.5],
     [6., 3.4, 4.5, 1.6],
     [6.7, 3.1, 4.7, 1.5],
     [6.3, 2.3, 4.4, 1.3],
     [5.6, 3., 4.1, 1.3],
     [5.5, 2.5, 4., 1.3],
     [5.5, 2.6, 4.4, 1.2],
     [6.1, 3., 4.6, 1.4],
     [5.8, 2.6, 4., 1.2],
     [5., 2.3, 3.3, 1.],
     [5.6, 2.7, 4.2, 1.3],
     [5.7, 3., 4.2, 1.2],
     [5.7, 2.9, 4.2, 1.3],
     [6.2, 2.9, 4.3, 1.3],
     [5.1, 2.5, 3., 1.1],
     [5.7, 2.8, 4.1, 1.3],
     [6.3, 3.3, 6., 2.5],
     [5.8, 2.7, 5.1, 1.9],
     [7.1, 3., 5.9, 2.1],
     [6.3, 2.9, 5.6, 1.8],
     [6.5, 3., 5.8, 2.2],
     [7.6, 3., 6.6, 2.1],
     [4.9, 2.5, 4.5, 1.7],
     [7.3, 2.9, 6.3, 1.8],
     [6.7, 2.5, 5.8, 1.8],
     [7.2, 3.6, 6.1, 2.5],
     [6.5, 3.2, 5.1, 2.],
     [6.4, 2.7, 5.3, 1.9],
     [6.8, 3., 5.5, 2.1],
     [5.7, 2.5, 5., 2.],
     [5.8, 2.8, 5.1, 2.4],
     [6.4, 3.2, 5.3, 2.3],
     [6.5, 3., 5.5, 1.8],
     [7.7, 3.8, 6.7, 2.2],
     [7.7, 2.6, 6.9, 2.3],
     [6., 2.2, 5., 1.5],
     [6.9, 3.2, 5.7, 2.3],
     [5.6, 2.8, 4.9, 2.],
     [7.7, 2.8, 6.7, 2.],
     [6.3, 2.7, 4.9, 1.8],
     [6.7, 3.3, 5.7, 2.1],
     [7.2, 3.2, 6., 1.8],
     [6.2, 2.8, 4.8, 1.8],
     [6.1, 3., 4.9, 1.8],
     [6.4, 2.8, 5.6, 2.1],
     [7.2, 3., 5.8, 1.6],
     [7.4, 2.8, 6.1, 1.9],
     [7.9, 3.8, 6.4, 2.],
     [6.4, 2.8, 5.6, 2.2],
     [6.3, 2.8, 5.1, 1.5],
     [6.1, 2.6, 5.6, 1.4],
     [7.7, 3., 6.1, 2.3],
     [6.3, 3.4, 5.6, 2.4],
     [6.4, 3.1, 5.5, 1.8],
     [6., 3., 4.8, 1.8],
     [6.9, 3.1, 5.4, 2.1],
     [6.7, 3.1, 5.6, 2.4],
     [6.9, 3.1, 5.1, 2.3],
     [5.8, 2.7, 5.1, 1.9],
     [6.8, 3.2, 5.9, 2.3],
     [6.7, 3.3, 5.7, 2.5],
     [6.7, 3., 5.2, 2.3],
     [6.3, 2.5, 5., 1.9],
     [6.5, 3., 5.2, 2.],
     [6.2, 3.4, 5.4, 2.3],
     [5.9, 3., 5.1, 1.8]])

y = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
     2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
len(y)

## A Model for Multiple Categories

We use the softmax activation and a categorical entrpopy loss

In [None]:
import matplotlib.pyplot as plt
from tensorflow.keras.datasets import reuters
from tensorflow.keras import models
from tensorflow.keras import layers

np.random.seed(0)
model = models.Sequential([
    layers.Dense(10, activation='relu', input_shape=(4,)),
    layers.Dense(3, activation='softmax')])

model.compile(optimizer='nadam', loss='categorical_crossentropy', 
              metrics=['accuracy'])
model.summary()

## Shuffling the Indices

We shuffle the index to have a sample of all classes in the validation set

In [None]:
indices = list(range(len(y)))
np.random.shuffle(indices)
X = X[indices]
y = y[indices]
y[:10]

## Converting to One Hot Codes

With Keras, the $y$ vector is one-hot encoded.

In [None]:
y[:5]

In [None]:
from tensorflow.keras.utils import to_categorical
Y_cat = to_categorical(y)
Y_cat[:5]

## Fitting the Data

In [None]:
X_train = X[:120]
Y_train_cat = Y_cat[:120]

X_val = X[120:]
y_val = y[120:]
Y_val_cat = Y_cat[120:]

history = model.fit(X_train, Y_train_cat, 
                    epochs=80, batch_size=1, 
                    validation_data=(X_val, Y_val_cat),
                    verbose=0)
model.evaluate(X_val, Y_val_cat)

## Visualizing the Accuracy and the Loss

In [None]:
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']

epochs = range(1, len(acc) + 1)
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

plt.clf()
acc_values = history.history['accuracy']
val_acc_values = history.history['val_accuracy']
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.show()

## Prediction

The prediction results in a vector of probabilities

In [None]:
print('Observation 121')
print('Prediction probabilities:', model.predict(np.array([X[121]])))
print('Predicted class:', np.argmax(model.predict(np.array([X[121]]))))
print('Observed value:', y[121])

More observations

In [None]:
print('The original categories:\n', y[121:126])
print('The encoded categories:\n', Y_cat[121:126])
print('The predicted probabilities:\n', model.predict(X[121:126]))
print('The predicted classes:\n', list(map(np.argmax, model.predict(X[121:126]))))

## Computing the Loss

We compute the loss as in the binary case

In [None]:
from math import log
X_val_prob = model.predict(X_val)
loss = -sum([log(X_val_prob[i, y_val[i]]) 
             for i in range(len(y_val))])/len(y_val)
loss

## Another way to encode the targets

The encoding of the target can be tedious. We can use `sparse_categorical_crossentropy` to simplify it.

In [None]:
model2 = models.Sequential([
    layers.Dense(10, activation='relu', input_shape=(4,)),
    layers.Dense(3, activation='softmax')])

model2.compile(optimizer='nadam', loss='sparse_categorical_crossentropy', 
              metrics=['accuracy'])
model2.summary()

We do not need to use a one-hot encoding for $y$. Although more intuitive, this encoding no longer shows its relation to cross entropy.

In [None]:
X_train = X[:120]
y_train= y[:120]
X_val = X[120:]
y_val = y[120:]
y_val[:10]

We fit the model with $y$ classes encoded as integers

In [None]:
history = model2.fit(X_train, y_train, 
                     epochs=80, batch_size=1, 
                     validation_data=(X_val, y_val),
                     verbose=0)
model2.evaluate(X_val, y_val)

And we obtain the same results

In [None]:
print('The original categories:\n', y[121:126])
print('The predicted probabilities:\n', model2.predict(X[121:126]))
print('The predicted classes:\n', list(map(np.argmax, model2.predict(X[121:126]))))