# MNIST MLP Classifier


## Multiclass Classifier

This Jupyter notebook builds a simple Multilayer Perceptron (MLP) classifier using Keras for the MNIST dataset. 

## Dataset

https://www.kaggle.com/oddrationale/mnist-in-csv


## Objective

Use the MNIST dataset to build a MLP for classification.

Import standard data processing and visualization libraries.

In [None]:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

TRAIN_DATA_PATH = "/kaggle/input/mnist-in-csv/mnist_train.csv"
TEST_DATA_PATH = "/kaggle/input/mnist-in-csv/mnist_test.csv"

raw_train_data = pd.read_csv(TRAIN_DATA_PATH)
raw_test_data = pd.read_csv(TEST_DATA_PATH)

Gain a basic understanding of the data.

In [None]:
print(f"Train Set Shape: {raw_train_data.shape}, Missing Data: {raw_train_data.isnull().values.any()}")
print(f"Test Set Shape: {raw_test_data.shape}, Missing Data: {raw_test_data.isnull().values.any()}\n")

print(raw_train_data.info())

raw_train_data.head(3)

The train and test sets contain 60,000 and 10,000 entries, respectively. Importantly, there are no missing data points. A quick glance at the first 5 entries of the train set show that 1/785 columns pertains to the label and the other 784/785 pertains to the pixels of the MNIST digit. Finally, the only datatype in the dataset is "int64". 

Let's next visualize some of the MNIST digits.

In [None]:
# create a temporary DataFrame for visualization purposes
plot_ten_df = raw_train_data.drop("label", axis=1).iloc[0:10, :]
plt.rcParams['figure.figsize'] = [15, 15]

# visualize the first 10 digits in the train set 
for index in range(10):
    plt.subplot(1, 10, index+1)
    # reshape pixel arragement to 28 x 28
    digit_array = np.asarray(plot_ten_df.iloc[index]).reshape(28, 28)
    plt.imshow(digit_array, cmap="binary")
    plt.title(raw_train_data["label"].iloc[index], fontsize=16)
    plt.axis("off")

As expected, each entry in the dataset maps to a 28 x 28 pixel handwritten digit. The corresponding labels are already encoded and represent the digit number.


At the moment, each pixel is represented by an "int64" with values [0-255]. We will use Gradient Descent for MLP training which is sensitive to feature scales. Therefore, let's normalize the pixel values to speed up Gradient Descent convergence.

In [None]:
# separate the pixels and the label
# cast int pixels to float
X_train = np.array(raw_train_data.drop("label", axis=1)).astype(float)
y_train = np.array(raw_train_data["label"])

X_test = np.array(raw_test_data.drop("label", axis=1)).astype(float)
y_test = np.array(raw_test_data["label"])

# divide by 255 to normalize 
# reshape arrays to 28 x 28 to match the pixel format
X_train = (X_train / 255).reshape(60000, 28, 28)
X_test = (X_test / 255).reshape(10000, 28, 28) 

Next, let's separate the pixels and the label from the train and test set. The train set will also be split to generate a validation set.

In [None]:
from sklearn.model_selection import train_test_split

# random_state=42 for reproducibility
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.15, random_state=42)

# verify the set sizes are as expected
print(f"X_train Shape: {X_train.shape}, y_train Shape: {y_train.shape}")
print(f"X_val Shape: {X_val.shape}, y_val Shape: {y_val.shape}")
print(f"X_test Shape: {X_test.shape}, y_test Shape: {y_test.shape}")

We are now ready to construct the neural network! The Keras API will be be used to build the MLP.

In [None]:
import tensorflow as tf

# random_seed=42 for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# build the MLP architecture
mlp_model = tf.keras.models.Sequential([
                tf.keras.layers.Flatten(input_shape=[28, 28], name="input_layer"),
                tf.keras.layers.Dense(150, activation="relu", name="hidden_layer1"),
                tf.keras.layers.Dense(100, activation="relu", name="hidden_layer2"),
                tf.keras.layers.Dense(50, activation="relu", name="hidden_layer3"),
                tf.keras.layers.Dense(10, activation="softmax", name="output_layer")
])

# compile MLP model
mlp_model.compile(loss="sparse_categorical_crossentropy",
                  optimizer="sgd",
                  metrics=["accuracy"])

# display a breakdown of the MLP model
mlp_model.summary()

Key features of the MLP:

1) Input layer 28 x 28 to match the MNIST pixel format

2) Output layer has 10 neurons as there are 10 possible MNIST digits --> [0-9]

The MLP is now ready for training! Let's define an early stopping callback that monitors the validation set cross entropy loss function. If the cost doesn't decrease after 10 epochs, stop model training as further improvements are unlikely. By using this callback, we can set the number of epochs during model training to a large number, mitigating concerns over underfitting/overfitting as there is now a built-in stopping mechanism. 

In [None]:
# define an early stopping callback, monitoring the validation cross-entropy loss function
val_stop = tf.keras.callbacks.EarlyStopping(monitor="val_loss", patience=10, restore_best_weights=True)

# train the MLP
training_progress = mlp_model.fit(X_train, y_train, epochs=1000,
                                  validation_data=(X_val, y_val),
                                  callbacks=[val_stop])

Next, let's plot the model training progression.

In [None]:
# define a function to plot the MLP training history
def training_plots(training_progress: dict):
    
    plt.rcParams['figure.figsize'] = [20, 8]
    
    plt.subplot(1, 2, 1)
    plt.plot(training_progress["accuracy"], "g", label="Train Accuracy")
    plt.plot(training_progress["val_accuracy"], "b", label="Validation Accuracy")
    plt.title("MNIST MLP Accuracy Plot", fontsize=16)
    plt.xlabel("Epoch", fontsize=16)
    plt.ylabel("Accuracy", fontsize=16)
    plt.legend(fontsize=16)
    
    plt.subplot(1, 2, 2)
    plt.plot(training_progress["loss"], "g", label="Train Loss")
    plt.plot(training_progress["val_loss"], "b", label="Validation Loss")
    plt.title("MNIST MLP Cross-Entropy Loss Plot", fontsize=16)
    plt.xlabel("Epoch", fontsize=16)
    plt.ylabel("Loss", fontsize=16)
    plt.legend(fontsize=16)
    
training_plots(training_progress.history)

Key Observations:

1) Accuracy on the train set almost reaches 1

2) Accuracy on the validation set is lower than the train set, as expected, but is not significantly worse

3) The callback mechanism worked as intended as val_loss plateaus at ~20 epochs

Let's now evaluate the MLP on the test set.

In [None]:
test_accuracy = mlp_model.evaluate(X_test, y_test, verbose=0)[1]

final_results = pd.DataFrame({"Model": ["Multilayer Perception (MLP)"],
                              "Train Accuracy": [0.9985],
                              "Validation Accuracy": [0.9760],
                              "Test Accuracy": [test_accuracy]})

print(final_results.to_string(index=False))

The test accuracy is quite close to the validation accuracy, as expected, since no hyperparameter tuning was performed. If we did perform hyperparameter tuning, the MLP would use the hyperparameters that perform the best on the validation set which would likely result in a decreased performance on the test set. 

To conclude, the simple MLP achieved ~97.7% accuracy on the MNIST dataset.