# Multi-label Classification with a Multi-Output Model

In this notebook we perform a multi-label classification with a multi-output model using Keras. We show how to use the **Functional API** for building an arbitrary architecture (multi-output model)


We use the MNIST dataset for experimentation.

In this problem, each MNIST image has two labels:
- label 1: integer representing the digit
- label 2: False/True to represent even/odd

Thus, both label 1 and label 2 are multiclass:
- label 1: 10 class
- label 2: 2 class

For each image we need to predict two output probabilities (i.e., probabilities of the digit and even/odd). Thus, we build a multi-output multiclass classifier, or simply a **multi-output classifier**.

The Sequential Model of Keras doesn't support building multi-output Artificial Neural Networks (ANNs). We use the **Functional API** to do this. The functional API can handle models with non-linear topology, models with shared layers, and models with multiple inputs or outputs.

In [1]:
import numpy as np

from sklearn.metrics import confusion_matrix, accuracy_score, classification_report

import tensorflow as tf
from tensorflow import keras

## Load the Dataset

We load the train and test dataset using Keras. 

Then, we flatten the input images to create 1D array for each image.

Finally, scale the data.

In [2]:
mnist = keras.datasets.mnist

(X_train_full, y_train_full), (X_test, y_test) = mnist.load_data()

# Flatten the features to create 1D array for each image
X_train_full = X_train_full.reshape(60000, 784)
X_test = X_test.reshape(10000, 784)


# Create validation dataset as well as scale the data
X_valid, X_train = X_train_full[:5000] / 255.0, X_train_full[5000:] / 255.0
y_valid, y_train = y_train_full[:5000], y_train_full[5000:]
X_test = X_test / 255.0

print("No. of Training Samples: ", X_train.shape)
print("No. of Training Labels: ", y_train.shape)

print("\nNo. of Validation Samples: ", X_valid.shape)
print("No. of Validation Labels: ", y_valid.shape)

print("\nNo. of Testing Samples: ", X_test.shape)
print("No. of Testing Labels: ", y_test.shape)

print("\nX type: ", X_train.dtype)
print("y type: ", y_train.dtype)


No. of Training Samples:  (55000, 784)
No. of Training Labels:  (55000,)

No. of Validation Samples:  (5000, 784)
No. of Validation Labels:  (5000,)

No. of Testing Samples:  (10000, 784)
No. of Testing Labels:  (10000,)

X type:  float64
y type:  uint8


## Create Binary Labels

Each target y_train/t_valid/y_test represent the underlying image of a digit using an integer between 0 to 9. This target is used for predicting the digit via 10-class classification.

For predicting whether the digit is even or odd we need to create another target.

In [3]:
# Binarize the Target: Create a new target to determine whether the digit is even or odd
y_train_binary_1D = (y_train % 2 == 0)
y_test_binary_1D = (y_test % 2 == 0)
y_valid_binary_1D = (y_valid % 2 == 0)


'''
The target "y_train/t_valid/y_test" are 1D arrays.
For each instance, it has just a target class index (0 or 1).
We want to compute one target probability per class for each instance.
I.e., each instance should have 2 probabilities for 2 classes.
Thus, we need to convert class indices (or sparse labels) to one-hot vector labels. 
The 1D target "y" would be converted to Nx2 matrix (N=number of samples)
We do this by using the keras.utils.to_categorical() function. 
'''

y_train_binary = keras.utils.to_categorical(y_train_binary_1D)
y_test_binary = keras.utils.to_categorical(y_test_binary_1D)
y_valid_binary = keras.utils.to_categorical(y_valid_binary_1D)

## Create The Multi-Output Model using Keras Functional API

Creating the multi-output model is straightforward. The Functional API allows us to connect the last hidden layer to two output layers (for multiclass and binaey classification).

In [4]:
'''
Delete the TensorFlow graph before creating a new model, otherwise memory overflow will occur.
'''
keras.backend.clear_session()

'''
To reproduce the same result by the model in each iteration, we use fixed seeds for random number generation. 
'''
np.random.seed(42)
tf.random.set_seed(42)


'''
Create a Functional model. 
- First Layer (input_): It instantiates an input tensor for buildng the model 
- Hidden Layers: Dense hidden layer with the ReLU activation function
- Output Layer 1: Dense output layer with 10 neurons. Since it's a multi-class classification, we use "softmax"  
- Output Layer 2: Dense output layer with 2 neurons. Since it's a binary classification, we use "sigmoid"  
'''

input_ = keras.Input(shape=(784,))
hidden1 = keras.layers.Dense(300, activation="relu")(input_)
hidden2 = keras.layers.Dense(100, activation="relu")(hidden1)
output1 = keras.layers.Dense(10, activation="softmax")(hidden2)
output2 = keras.layers.Dense(2, activation="sigmoid")(hidden2)

# Create a Model by specifying its input and outputs
model = keras.models.Model(inputs=[input_], outputs=[output1, output2])


model.summary()

Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_1 (InputLayer)            [(None, 784)]        0                                            
__________________________________________________________________________________________________
dense (Dense)                   (None, 300)          235500      input_1[0][0]                    
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 100)          30100       dense[0][0]                      
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 10)           1010        dense_1[0][0]                    
______________________________________________________________________________________________

## Compile and Train the Model

In [5]:
%%time

# Define the optimizer
optimizer=keras.optimizers.SGD(learning_rate=1e-1, momentum=0.1)


'''
Compile the model.
Since we are using two different types of loss functions, we specify those using a list.
'''
model.compile(loss=["sparse_categorical_crossentropy", "binary_crossentropy"],
              optimizer=optimizer,
              metrics=["accuracy"])


# Create a callback object of early stopping
early_stopping_cb = keras.callbacks.EarlyStopping(monitor='val_loss',
                                  min_delta=0, 
                                  patience=10, 
                                  verbose=1, 
                                  mode='auto',
                                  restore_best_weights=True)


'''
Train the model.
We need to specify two types of labels for training and validation using lists.
'''
history = model.fit(X_train, [y_train, y_train_binary], 
                    batch_size=32, # batch size 32 is default
                    epochs=100,
                    verbose=1,
                    validation_data=(X_valid, [y_valid, y_valid_binary]),
                    callbacks=[early_stopping_cb])

Train on 55000 samples, validate on 5000 samples
Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 00017: early stopping
CPU times: user 2min 38s, sys: 25.2 s, total: 3min 3s
Wall time: 1min 10s


## Model Evaluation

The trained model predicts two types of output (output probability matrices):
- An N x 10 matrix for 10 class classification (digit)
- An N x 2 matrix for binary classification (even or odd)

We use these two predicted matrices to evaluate our model.

In [6]:
train_evaluation = model.evaluate(X_train, [y_train, y_train_binary], verbose=0)
test_evaluation = model.evaluate(X_test, [y_test, y_test_binary], verbose=0)

print("Train Evaluation: ", train_evaluation)
print("Test Evaluation: ", test_evaluation)


train_loss_multiclass = train_evaluation[1]
train_loss_binary = train_evaluation[2]
train_accuracy_multiclass = train_evaluation[3]
train_accuracy_binary = train_evaluation[4]

test_loss_multiclass = test_evaluation[1]
test_loss_binary = test_evaluation[2]
test_accuracy_multiclass = test_evaluation[3]
test_accuracy_binary = test_evaluation[4]


print("\n******************** Multiclass Classification ********************************************")


print("\nMulticlass Classification - Train Accuracy: ", train_accuracy_multiclass)
print("Multiclass Classification - Test Accuracy: ", test_accuracy_multiclass)

print("\nMulticlass Classification - Train Loss: ", train_loss_multiclass)
print("Multiclass Classification - Test Loss: ", test_loss_multiclass)


# model.predict(X_test) method return 10 probabilities per class for each instance (Dimension Nx10)
y_test_predicted = model.predict(X_test)
y_test_predicted_multiclass = np.argmax(y_test_predicted[0], axis=1) # get the label/index of the highest probability class
y_test_predicted_binary = np.argmax(y_test_predicted[1], axis=1) # get the label/index of the highest probability class



# model.predict_classes(X_test) method returns the index (class label) with largest probability (1D array)
#y_test_predicted= model.predict_classes(X_test)


y_train_predicted = model.predict(X_train)
y_train_predicted_multiclass = np.argmax(y_train_predicted[0], axis=1) # get the label/index of the highest probability class
y_train_predicted_binary = np.argmax(y_train_predicted[1], axis=1) # get the label/index of the highest probability class


print("\nTest Confusion Matrix (Multiclass):")
print(confusion_matrix(y_test, y_test_predicted_multiclass))

print("\nClassification Report (Multiclass):")
print(classification_report(y_test, y_test_predicted_multiclass))



print("\n******************** Binary Classification ********************************************")



print("\nBinary Classification - Train Accuracy: ", train_accuracy_binary)
print("Binary Classification - Test Accuracy: ", test_accuracy_binary)

print("\nBinary Classification - Train Loss: ", train_loss_binary)
print("Binary Classification - Test Loss: ", test_loss_binary)


print("\nTest Confusion Matrix (Binary):")
print(confusion_matrix(y_test_binary_1D, y_test_predicted_binary))

print("\nClassification Report (Binary):")
print(classification_report(y_test_binary_1D, y_test_predicted_binary))

Train Evaluation:  [0.024959471705039454, 0.014488145, 0.010469379, 0.9962364, 0.9967818]
Test Evaluation:  [0.10207103988252347, 0.068993315, 0.032915704, 0.9786, 0.9885]

******************** Multiclass Classification ********************************************

Multiclass Classification - Train Accuracy:  0.9962364
Multiclass Classification - Test Accuracy:  0.9786

Multiclass Classification - Train Loss:  0.014488145
Multiclass Classification - Test Loss:  0.068993315

Test Confusion Matrix (Multiclass):
[[ 969    0    1    2    0    0    5    0    2    1]
 [   0 1128    0    2    0    1    2    0    2    0]
 [   7    4 1002    5    3    0    2    2    7    0]
 [   0    0    2  996    0    2    0    1    5    4]
 [   0    0    3    0  965    0    4    1    2    7]
 [   2    0    0   16    1  863    2    1    4    3]
 [   5    2    0    1    6    3  939    0    2    0]
 [   2    7    8    4    4    0    0  986    7   10]
 [   1    1    1    2    0    2    4    2  960    1]
 [   2  