# Multi-Layer Perceptron (MLP) - Large & Complex Data Set


## Dataset
In this notebook, we perform image recognition on the **MNIST dataset**, which contains a collection of handwritten digits. It has a training set of 60,000 examples, and a test set of 10,000 examples. Each image is labeled with the digit it represents. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. There are 70,000 images. Each image is 28x28 pixels, and each feature simply represents one pixel’s intensity, from 0 (white) to 255 (black). Hence each image in the set has 784 features.

## Approach
The goal is to identify the numbers using Pattern Recognition techniques. Image recognition is the ability AI to detect, classify and identify objects in images. Since the dataset contains hand-written digits (0-9), it is a multi-class classfication problem.

We use **Multi-layer perception(MLP) classifier** for this purpose.

To expedite the training time, we also use dimensionality reduction techniques (**Principle Component Analysis**) to project the features into a smaller dimension.

In [3]:
import warnings
import time
import numpy as np
import pandas as pd
from scipy.io import loadmat

from sklearn.datasets import fetch_openml
from sklearn.neural_network import MLPClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression

from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score, classification_report
from sklearn.model_selection import train_test_split

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

## Load Data and Create Data Matrix (X) and the Label Vector (y)

We load the data from a file or can load it from cloud using Scikit-Learn.

In [11]:
# Load the data.
mnist = loadmat('mnist-original.mat')

#Create the data Matrix X and the target vector y.
X = mnist["data"].T.astype('float64')
y = mnist["label"][0].astype('int64')


#Load data using Scikit-Learn.
# mnist = fetch_openml('mnist_784', cache=False)

# X = mnist["data"].astype('float64')
# y = mnist["target"].astype('int64')


print("\nNo. of Samples: ", X.shape)
print("No. of Labels: ", y.shape)

print("\nX type: ", X.dtype)
print("y type: ", y.dtype)


No. of Samples:  (70000, 784)
No. of Labels:  (70000,)

X type:  float64
y type:  int64


## Split Data Into Training and Test Sets

We spilt the dataset into training (80%) and test (20%) subsets.

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0) #Set seed using random_state for reproducibility.

# Optimization Using Dimensionaly Reduction

Since every image in the data set has 785 features, we can expedite our model traning time by reducing the number of features. For this we use Principle Component Analysis(PCA) which is a dimension reduction technique.

Since PCA is affected by the scale of the data, we also need to standardize the data before applying it.

In [13]:
#Standardize the data.
scaler = StandardScaler()

# Fit on training set only.
scaler.fit(X_train)

# Apply transform to both the training set and the test set.
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

In [17]:
%%time
pca = PCA(n_components=0.95) #We set n_components to 95%, since we want to retain 95% of the variance.

pca.fit(X_train)

print("Number of Principle Components: ", pca.n_components_) #Print the number of principle components after PCA.

#Apply transformation to both the training and test set.
X_train_pca = pca.transform(X_train)
X_test_pca = pca.transform(X_test)

Number of Principle Components:  330
Wall time: 5.91 s


## Experiment 1: MLP 

First we train the MLP without applying PCA on the data. So we use all 784 features.

## MLP Classification

In [None]:
%%time

#Use GridSearchCV for Hyperparameter Tuning to find parameters that give best performance.
from sklearn.model_selection  import GridSearchCV

param_grid = {'hidden_layer_sizes': [(100,), (150,), (200,)], "solver":['sgd', 'adam'], 
              'learning_rate_init': (0.1, 0.01, 0.001), "alpha": (0.1, 0.01),
              'activation': ['logistic', 'relu']}

clf_mlp = MLPClassifier(early_stopping=True, n_iter_no_change=10, tol=1e-5, max_iter=500, random_state=1)


clf_mlp_cv = GridSearchCV(clf_mlp, param_grid, scoring='accuracy', cv=5, verbose=1, n_jobs=-1)
clf_mlp_cv.fit(X_train_pca, y_train)

params_optimal_mlp = clf_mlp_cv.best_params_

print("Best Score (accuracy): %f" % clf_mlp_cv.best_score_)
print("Optimal Hyperparameter Values: ", params_optimal_mlp)
print("\n")

Fitting 5 folds for each of 72 candidates, totalling 360 fits


[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.


In [22]:
%%time

t0 = time.time()
mlp_clf_pca = MLPClassifier(hidden_layer_sizes=(200,), max_iter=200, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate='constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)


mlp_clf_pca.fit(X_train_pca, y_train)

t1 = time.time()

duration_mlp_pca = t1 - t0
print("The PCA+MLP takes {:.1f}s.".format(duration_mlp_pca))

print("No. of Iterations:", mlp_clf_pca.n_iter_ )

y_train_predicted = mlp_clf_pca.predict(X_train_pca)

train_accuracy_mlp = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_mlp)

Iteration 1, loss = 0.49712015
Validation score: 0.935357
Iteration 2, loss = 0.19141485
Validation score: 0.948750
Iteration 3, loss = 0.14316349
Validation score: 0.957679
Iteration 4, loss = 0.11065116
Validation score: 0.964286
Iteration 5, loss = 0.08852195
Validation score: 0.967679
Iteration 6, loss = 0.07122619
Validation score: 0.967857
Iteration 7, loss = 0.06032630
Validation score: 0.969821
Iteration 8, loss = 0.05056308
Validation score: 0.973214
Iteration 9, loss = 0.04944693
Validation score: 0.970893
Iteration 10, loss = 0.04480906
Validation score: 0.972500
Iteration 11, loss = 0.03962610
Validation score: 0.974286
Iteration 12, loss = 0.04125116
Validation score: 0.973750
Iteration 13, loss = 0.03956745
Validation score: 0.972500
Iteration 14, loss = 0.03340966
Validation score: 0.974464
Iteration 15, loss = 0.02984075
Validation score: 0.974107
Iteration 16, loss = 0.03040872
Validation score: 0.974286
Iteration 17, loss = 0.03343423
Validation score: 0.973929
Iterat

## Evaluate the model on Test Data

In [23]:
%%time
y_test_predicted = mlp_clf_pca.predict(X_test_pca)

accuracy_score_test_mlp_pca = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", accuracy_score_test_mlp_pca)

print("\nTest Confusion Matrix:")
print(confusion_matrix(y_test, y_test_predicted))

print("\nClassification Report:")
print(classification_report(y_test, y_test_predicted))


Test Accuracy:  0.9754285714285714

Test Confusion Matrix:
[[1300    0    1    1    0    1    4    3    2    0]
 [   0 1582   11    1    2    0    0    2    4    2]
 [   7    6 1317    4    2    1    2    4    3    2]
 [   1    1    9 1392    0   11    0    4    6    3]
 [   2    1    7    0 1325    1    8    2    2   14]
 [   2    1    3   12    1 1228   15    4   10    4]
 [   4    1    3    0    3    3 1381    1    1    0]
 [   2    0    8    3    8    1    1 1418    5   15]
 [   3    2    6    6    4    4    4    3 1354    4]
 [   5    2    3    6   20    4    0   12    8 1359]]

Classification Report:
              precision    recall  f1-score   support

           0       0.98      0.99      0.99      1312
           1       0.99      0.99      0.99      1604
           2       0.96      0.98      0.97      1348
           3       0.98      0.98      0.98      1427
           4       0.97      0.97      0.97      1362
           5       0.98      0.96      0.97      1280
      

# Summary of Results

In [27]:
print("Accuracy of Classifier: ", accuracy_score_test_mlp_pca)
print("Running time: ", duration_mlp_pca)

Accuracy of Classifier:  0.9754285714285714
Running time:  54.84120059013367
