# Multi-Layer Perceptron (MLP) - In Search of Optimal Architecture


In this notebook we investigate the following two issues related to the MLP/ANN architecture:

- How many hidden layers should we use?
- How many neurons should we use in each of those layers?

Our goal is to understand how to determine the **optimal or near-optimal architecture** for a MLP.

We use the MNIST dataset to address these two questions.

There are other pertinent issues related to MLP architecture that we could have addressed. For example, what is the optimal activation function (logistic/relu/tanh), what is the best solver (momentum/adam), etc.

However, in this notebook we limit our attention to the questions related to #hidden layers and #neurons.

Based on prior experimentation, we used 'relu' as the activation function and 'adam' as the optimizer for Gradient Descent algorithm.

## Dataset: MNIST


We will use the MNIST dataset, which is a set of 70,000 small images of digits handwritten by high school students and employees of the US Census Bureau. Each image is labeled with the digit it represents.


There are 70,000 images. Each image is 28x28 pixels, and each feature simply represents one pixel’s intensity, from 0 (white) to 255 (black).

Thus, each image has 784 features. 

In [4]:
import warnings
import time
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from scipy.io import loadmat

#from sklearn.datasets import fetch_openml
from sklearn.neural_network import MLPClassifier

from sklearn.model_selection import train_test_split

## Load Data and Create Data Matrix (X) and the Label Vector (y)

First load the data and explore the feature names, target names, etc.

We may load the data from a local folder or load it directly from cloud using Scikit-Learn.

In [5]:
# Load the data from the local folder "data"
mnist = loadmat('data/mnist-original.mat')

#Create the data Matrix X and the target vector y
X = mnist["data"].T.astype('float64')
y = mnist["label"][0].astype('int64')

# Load data using Scikit-Learn
# mnist = fetch_openml('mnist_784', cache=False)

# X = mnist["data"].astype('float64')
# y = mnist["target"].astype('int64')

print("\nNo. of Samples: ", X.shape)
print("No. of Labels: ", y.shape)


No. of Samples:  (70000, 784)
No. of Labels:  (70000,)


## Scale the Data

In [3]:
X /= 255.0

## Split Data Into Training and Test Sets

In [4]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

print("X_train: ", X_train.shape)
print("X_test: ", X_test.shape)

X_train:  (56000, 784)
X_test:  (14000, 784)


## MLP Architecture

We use the following 14 architectures (by varying no. of hidden layers and no. of neurons) to compare their performance (test accuracy).

In [3]:
data = [[1, 100],
        [1, 200],
        [1, 500],
        [1, 1000],
        [1, 2000],
        [1, 5000],
        [2, (200, 100)],
        [2, (200, 200)],
        [2, (300, 200)],
        [3, (300, 200, 100)],
        [3, (200, 200, 200)],
        [4, (400, 300, 200, 100)],
        [5, (500, 400, 300, 200, 100)],
        [6, (1000, 500, 400, 300, 200, 100)]]
pd.DataFrame(data, columns=["Hidden Layers", "Neurons"])

Unnamed: 0,Hidden Layers,Neurons
0,1,100
1,1,200
2,1,500
3,1,1000
4,1,2000
5,1,5000
6,2,"(200, 100)"
7,2,"(200, 200)"
8,2,"(300, 200)"
9,3,"(300, 200, 100)"


## Hidden Layer = 1; Neurons = 100

In [5]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(100,), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate='constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.47920562
Validation score: 0.918393
Iteration 2, loss = 0.22648602
Validation score: 0.949643
Iteration 3, loss = 0.17103871
Validation score: 0.960179
Iteration 4, loss = 0.13810706
Validation score: 0.965893
Iteration 5, loss = 0.11742718
Validation score: 0.966607
Iteration 6, loss = 0.10230845
Validation score: 0.971429
Iteration 7, loss = 0.09025462
Validation score: 0.971786
Iteration 8, loss = 0.08193190
Validation score: 0.975357
Iteration 9, loss = 0.07368440
Validation score: 0.971607
Iteration 10, loss = 0.06764236
Validation score: 0.976071
Iteration 11, loss = 0.06192520
Validation score: 0.974821
Iteration 12, loss = 0.05745766
Validation score: 0.978036
Iteration 13, loss = 0.05276187
Validation score: 0.976071
Iteration 14, loss = 0.05004085
Validation score: 0.976250
Iteration 15, loss = 0.04673007
Validation score: 0.976071
Iteration 16, loss = 0.04411568
Validation score: 0.977857
Iteration 17, loss = 0.04131536
Validation score: 0.978214
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(100,), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [6]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_onehiddenlayer_100 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_onehiddenlayer_100)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_onehiddenlayer_100 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_onehiddenlayer_100)


Training Accuracy:  0.998

Test Accuracy:  0.9770714285714286


## Hidden Layer = 1; Neurons = 200

In [7]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(200,), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.40886675
Validation score: 0.932679
Iteration 2, loss = 0.19419106
Validation score: 0.951429
Iteration 3, loss = 0.14385625
Validation score: 0.960000
Iteration 4, loss = 0.11466252
Validation score: 0.964464
Iteration 5, loss = 0.09499003
Validation score: 0.965714
Iteration 6, loss = 0.08248333
Validation score: 0.970357
Iteration 7, loss = 0.07131126
Validation score: 0.970179
Iteration 8, loss = 0.06324839
Validation score: 0.972500
Iteration 9, loss = 0.05644300
Validation score: 0.973929
Iteration 10, loss = 0.05160096
Validation score: 0.973571
Iteration 11, loss = 0.04799070
Validation score: 0.975179
Iteration 12, loss = 0.04425474
Validation score: 0.976786
Iteration 13, loss = 0.04104076
Validation score: 0.975893
Iteration 14, loss = 0.03763602
Validation score: 0.976786
Iteration 15, loss = 0.03603052
Validation score: 0.977321
Iteration 16, loss = 0.03457569
Validation score: 0.976786
Iteration 17, loss = 0.03249957
Validation score: 0.977143
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(200,), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [8]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_onehiddenlayer_200 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_onehiddenlayer_200)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_onehiddenlayer_200 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_onehiddenlayer_200)


Training Accuracy:  0.9981428571428571

Test Accuracy:  0.9808571428571429


## Hidden Layer = 1; Neurons = 500

In [9]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(500,), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.34828794
Validation score: 0.951964
Iteration 2, loss = 0.15723136
Validation score: 0.971250
Iteration 3, loss = 0.11277818
Validation score: 0.974821
Iteration 4, loss = 0.08861000
Validation score: 0.978393
Iteration 5, loss = 0.07325503
Validation score: 0.980536
Iteration 6, loss = 0.06225991
Validation score: 0.980357
Iteration 7, loss = 0.05350172
Validation score: 0.982857
Iteration 8, loss = 0.04898891
Validation score: 0.981071
Iteration 9, loss = 0.04400116
Validation score: 0.981964
Iteration 10, loss = 0.04021121
Validation score: 0.983571
Iteration 11, loss = 0.03632542
Validation score: 0.983393
Iteration 12, loss = 0.03567402
Validation score: 0.980357
Iteration 13, loss = 0.03490005
Validation score: 0.983571
Iteration 14, loss = 0.03227332
Validation score: 0.983393
Iteration 15, loss = 0.03172489
Validation score: 0.982321
Iteration 16, loss = 0.03074304
Validation score: 0.983571
Iteration 17, loss = 0.03088784
Validation score: 0.984643
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(500,), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [10]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_onehiddenlayer_500 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_onehiddenlayer_500)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_onehiddenlayer_500 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_onehiddenlayer_500)


Training Accuracy:  0.9974464285714286

Test Accuracy:  0.9788571428571429


## Hidden Layer = 1; Neurons = 1000

In [11]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(1000,), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.30947555
Validation score: 0.956250
Iteration 2, loss = 0.13865780
Validation score: 0.965179
Iteration 3, loss = 0.09792018
Validation score: 0.973750
Iteration 4, loss = 0.07638223
Validation score: 0.978750
Iteration 5, loss = 0.06454135
Validation score: 0.976607
Iteration 6, loss = 0.05535012
Validation score: 0.980179
Iteration 7, loss = 0.04874788
Validation score: 0.979643
Iteration 8, loss = 0.04286844
Validation score: 0.977679
Iteration 9, loss = 0.04114738
Validation score: 0.977500
Iteration 10, loss = 0.04031664
Validation score: 0.979643
Iteration 11, loss = 0.03750210
Validation score: 0.978929
Iteration 12, loss = 0.03673160
Validation score: 0.979643
Iteration 13, loss = 0.03591200
Validation score: 0.979643
Iteration 14, loss = 0.03430436
Validation score: 0.980536
Iteration 15, loss = 0.03423258
Validation score: 0.978393
Iteration 16, loss = 0.03442608
Validation score: 0.978929
Iteration 17, loss = 0.03694579
Validation score: 0.980000
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(1000,), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [12]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_onehiddenlayer_1000 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_onehiddenlayer_1000)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_onehiddenlayer_1000 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_onehiddenlayer_1000)


Training Accuracy:  0.9983571428571428

Test Accuracy:  0.9832857142857143


## Hidden Layer = 1; Neurons = 2000

In [13]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(2000,), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.28722117
Validation score: 0.962500
Iteration 2, loss = 0.12345537
Validation score: 0.973571
Iteration 3, loss = 0.08975509
Validation score: 0.975536
Iteration 4, loss = 0.07200716
Validation score: 0.980536
Iteration 5, loss = 0.06105332
Validation score: 0.978214
Iteration 6, loss = 0.05452200
Validation score: 0.977857
Iteration 7, loss = 0.04893191
Validation score: 0.980893
Iteration 8, loss = 0.04568876
Validation score: 0.983214
Iteration 9, loss = 0.04341551
Validation score: 0.983750
Iteration 10, loss = 0.04421476
Validation score: 0.981607
Iteration 11, loss = 0.04220098
Validation score: 0.978571
Iteration 12, loss = 0.04581884
Validation score: 0.979286
Iteration 13, loss = 0.04089177
Validation score: 0.980357
Iteration 14, loss = 0.04065214
Validation score: 0.980179
Iteration 15, loss = 0.03974234
Validation score: 0.980000
Iteration 16, loss = 0.03955291
Validation score: 0.983393
Iteration 17, loss = 0.03967063
Validation score: 0.980536
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(2000,), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [14]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_onehiddenlayer_2000 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_onehiddenlayer_2000)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_onehiddenlayer_2000 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_onehiddenlayer_2000)


Training Accuracy:  0.9957321428571428

Test Accuracy:  0.9785714285714285


## Hidden Layer = 1; Neurons = 5000

In [15]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(5000,), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.25357017
Validation score: 0.967857
Iteration 2, loss = 0.11546254
Validation score: 0.970714
Iteration 3, loss = 0.08745436
Validation score: 0.972679
Iteration 4, loss = 0.07510534
Validation score: 0.974821
Iteration 5, loss = 0.06612306
Validation score: 0.976429
Iteration 6, loss = 0.06064331
Validation score: 0.977500
Iteration 7, loss = 0.05801994
Validation score: 0.978393
Iteration 8, loss = 0.06019180
Validation score: 0.975714
Iteration 9, loss = 0.05840026
Validation score: 0.976964
Iteration 10, loss = 0.05452443
Validation score: 0.978750
Iteration 11, loss = 0.05723096
Validation score: 0.978393
Iteration 12, loss = 0.05264961
Validation score: 0.978393
Iteration 13, loss = 0.05176128
Validation score: 0.981429
Iteration 14, loss = 0.04863848
Validation score: 0.980179
Iteration 15, loss = 0.04661380
Validation score: 0.978929
Iteration 16, loss = 0.04775651
Validation score: 0.980357
Iteration 17, loss = 0.04493079
Validation score: 0.980714
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(5000,), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [16]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_onehiddenlayer_5000 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_onehiddenlayer_5000)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_onehiddenlayer_5000 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_onehiddenlayer_5000)


Training Accuracy:  0.9975357142857143

Test Accuracy:  0.9831428571428571


## Hidden Layers = 2; Neurons = (200, 100)

In [17]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(200,100), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.37100682
Validation score: 0.944464
Iteration 2, loss = 0.15331098
Validation score: 0.964464
Iteration 3, loss = 0.10969901
Validation score: 0.969286
Iteration 4, loss = 0.08368030
Validation score: 0.971964
Iteration 5, loss = 0.06764082
Validation score: 0.975179
Iteration 6, loss = 0.05593642
Validation score: 0.976786
Iteration 7, loss = 0.05019062
Validation score: 0.975179
Iteration 8, loss = 0.04260343
Validation score: 0.978036
Iteration 9, loss = 0.03668269
Validation score: 0.980000
Iteration 10, loss = 0.03416973
Validation score: 0.976786
Iteration 11, loss = 0.03293507
Validation score: 0.978393
Iteration 12, loss = 0.03036566
Validation score: 0.977500
Iteration 13, loss = 0.02924157
Validation score: 0.978750
Iteration 14, loss = 0.02737091
Validation score: 0.979107
Iteration 15, loss = 0.02828488
Validation score: 0.975000
Iteration 16, loss = 0.02816211
Validation score: 0.973929
Iteration 17, loss = 0.02912006
Validation score: 0.978393
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(200, 100), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [18]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_twohiddenlayers_1 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_twohiddenlayers_1)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_twohiddenlayers_1 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_twohiddenlayers_1)


Training Accuracy:  0.9981785714285715

Test Accuracy:  0.9812142857142857


## Hidden Layers = 2; Neurons = (300, 200)

In [19]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(300,200), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.33372837
Validation score: 0.958393
Iteration 2, loss = 0.13647309
Validation score: 0.966429
Iteration 3, loss = 0.09455731
Validation score: 0.972143
Iteration 4, loss = 0.07341847
Validation score: 0.976607
Iteration 5, loss = 0.05972278
Validation score: 0.977679
Iteration 6, loss = 0.05096557
Validation score: 0.978571
Iteration 7, loss = 0.04259430
Validation score: 0.976607
Iteration 8, loss = 0.03850109
Validation score: 0.979821
Iteration 9, loss = 0.03798578
Validation score: 0.980000
Iteration 10, loss = 0.03528988
Validation score: 0.979821
Iteration 11, loss = 0.03419344
Validation score: 0.977857
Iteration 12, loss = 0.03449891
Validation score: 0.983571
Iteration 13, loss = 0.02966569
Validation score: 0.984107
Iteration 14, loss = 0.03349260
Validation score: 0.978393
Iteration 15, loss = 0.03365905
Validation score: 0.982500
Iteration 16, loss = 0.02815348
Validation score: 0.982143
Iteration 17, loss = 0.02699058
Validation score: 0.984107
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(300, 200), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [20]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_twohiddenlayers_2 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_twohiddenlayers_2)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_twohiddenlayers_2 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_twohiddenlayers_2)


Training Accuracy:  0.9987857142857143

Test Accuracy:  0.9842857142857143


## Hidden Layers = 3; Neurons = (300, 200, 100)

In [21]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(300,200,100), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.32578472
Validation score: 0.958571
Iteration 2, loss = 0.12857676
Validation score: 0.971071
Iteration 3, loss = 0.09145311
Validation score: 0.974107
Iteration 4, loss = 0.07035571
Validation score: 0.978214
Iteration 5, loss = 0.05900383
Validation score: 0.980536
Iteration 6, loss = 0.05090066
Validation score: 0.975714
Iteration 7, loss = 0.05141608
Validation score: 0.978393
Iteration 8, loss = 0.04430007
Validation score: 0.979286
Iteration 9, loss = 0.04055200
Validation score: 0.977143
Iteration 10, loss = 0.03681570
Validation score: 0.978393
Iteration 11, loss = 0.03706960
Validation score: 0.977143
Iteration 12, loss = 0.03556225
Validation score: 0.974643
Iteration 13, loss = 0.03687039
Validation score: 0.976964
Iteration 14, loss = 0.03563277
Validation score: 0.970536
Iteration 15, loss = 0.03702560
Validation score: 0.976429
Iteration 16, loss = 0.03450430
Validation score: 0.975536
Validation score did not improve more than tol=0.000010 for 10 co

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(300, 200, 100), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [22]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_twohiddenlayers_3 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_twohiddenlayers_3)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_twohiddenlayers_3 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_twohiddenlayers_3)


Training Accuracy:  0.9919464285714286

Test Accuracy:  0.9772857142857143


## Hidden Layers = 4; Neurons = (400, 300, 200, 100)

In [23]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(400,300,200,100), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.32364003
Validation score: 0.960714
Iteration 2, loss = 0.12928598
Validation score: 0.966429
Iteration 3, loss = 0.09555929
Validation score: 0.973036
Iteration 4, loss = 0.07802366
Validation score: 0.973036
Iteration 5, loss = 0.06514605
Validation score: 0.965179
Iteration 6, loss = 0.05664391
Validation score: 0.972500
Iteration 7, loss = 0.05556226
Validation score: 0.975179
Iteration 8, loss = 0.05511449
Validation score: 0.976429
Iteration 9, loss = 0.04900147
Validation score: 0.971786
Iteration 10, loss = 0.04600217
Validation score: 0.976964
Iteration 11, loss = 0.04895063
Validation score: 0.975536
Iteration 12, loss = 0.04737209
Validation score: 0.979107
Iteration 13, loss = 0.04108417
Validation score: 0.978929
Iteration 14, loss = 0.04750174
Validation score: 0.973214
Iteration 15, loss = 0.04617567
Validation score: 0.979107
Iteration 16, loss = 0.04150203
Validation score: 0.978929
Iteration 17, loss = 0.03970768
Validation score: 0.975893
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(400, 300, 200, 100), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [24]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_twohiddenlayers_4 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_twohiddenlayers_4)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_twohiddenlayers_4 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_twohiddenlayers_4)


Training Accuracy:  0.9956071428571428

Test Accuracy:  0.9785714285714285


## Hidden Layers = 5; Neurons = (500, 400, 300, 200, 100)

In [25]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(500,400,300,200,100), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.32686401
Validation score: 0.960893
Iteration 2, loss = 0.13652660
Validation score: 0.967500
Iteration 3, loss = 0.10428899
Validation score: 0.977321
Iteration 4, loss = 0.08798424
Validation score: 0.974464
Iteration 5, loss = 0.08025828
Validation score: 0.977143
Iteration 6, loss = 0.07131471
Validation score: 0.975536
Iteration 7, loss = 0.06801173
Validation score: 0.977679
Iteration 8, loss = 0.06521738
Validation score: 0.972500
Iteration 9, loss = 0.06195226
Validation score: 0.974821
Iteration 10, loss = 0.06379278
Validation score: 0.976071
Iteration 11, loss = 0.06136027
Validation score: 0.977679
Iteration 12, loss = 0.05853017
Validation score: 0.975714
Iteration 13, loss = 0.05661618
Validation score: 0.978750
Iteration 14, loss = 0.05344887
Validation score: 0.980536
Iteration 15, loss = 0.05478905
Validation score: 0.978571
Iteration 16, loss = 0.05535842
Validation score: 0.978214
Iteration 17, loss = 0.05373343
Validation score: 0.975179
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(500, 400, 300, 200, 100),
              learning_rate='constant', learning_rate_init=0.001, max_iter=1000,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=1, shuffle=True, solver='adam',
              tol=1e-05, validation_fraction=0.1, verbose=True,
              warm_start=False)

In [26]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_twohiddenlayers_5 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_twohiddenlayers_5)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_twohiddenlayers_5 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_twohiddenlayers_5)


Training Accuracy:  0.9975892857142857

Test Accuracy:  0.9827142857142858


## Hidden Layers = 6; Neurons = (1000, 500, 400, 300, 200, 100)

In [27]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(1000,500,400,300,200,100), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.33923566
Validation score: 0.961607
Iteration 2, loss = 0.14722758
Validation score: 0.965893
Iteration 3, loss = 0.11934753
Validation score: 0.971607
Iteration 4, loss = 0.10406398
Validation score: 0.973929
Iteration 5, loss = 0.09670969
Validation score: 0.972500
Iteration 6, loss = 0.09080314
Validation score: 0.975179
Iteration 7, loss = 0.08073741
Validation score: 0.974107
Iteration 8, loss = 0.08058210
Validation score: 0.976071
Iteration 9, loss = 0.07939734
Validation score: 0.978571
Iteration 10, loss = 0.07659585
Validation score: 0.977500
Iteration 11, loss = 0.07390236
Validation score: 0.977321
Iteration 12, loss = 0.07103932
Validation score: 0.979286
Iteration 13, loss = 0.06750114
Validation score: 0.980000
Iteration 14, loss = 0.06460772
Validation score: 0.979107
Iteration 15, loss = 0.06517882
Validation score: 0.981250
Iteration 16, loss = 0.06411654
Validation score: 0.978393
Iteration 17, loss = 0.06271576
Validation score: 0.979107
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(1000, 500, 400, 300, 200, 100),
              learning_rate='constant', learning_rate_init=0.001, max_iter=1000,
              momentum=0.9, n_iter_no_change=10, nesterovs_momentum=True,
              power_t=0.5, random_state=1, shuffle=True, solver='adam',
              tol=1e-05, validation_fraction=0.1, verbose=True,
              warm_start=False)

In [28]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_twohiddenlayers_6 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_twohiddenlayers_6)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_twohiddenlayers_6 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_twohiddenlayers_6)


Training Accuracy:  0.9953392857142858

Test Accuracy:  0.9802857142857143


## Hidden Layers = 2; Neurons = (200, 200)

In [31]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(200,200,), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.36495484
Validation score: 0.949821
Iteration 2, loss = 0.14276405
Validation score: 0.961964
Iteration 3, loss = 0.10075009
Validation score: 0.969286
Iteration 4, loss = 0.07928685
Validation score: 0.971429
Iteration 5, loss = 0.06495611
Validation score: 0.970000
Iteration 6, loss = 0.05641086
Validation score: 0.972143
Iteration 7, loss = 0.04947723
Validation score: 0.969643
Iteration 8, loss = 0.04421112
Validation score: 0.975714
Iteration 9, loss = 0.03980373
Validation score: 0.973571
Iteration 10, loss = 0.03421165
Validation score: 0.975179
Iteration 11, loss = 0.03316332
Validation score: 0.976607
Iteration 12, loss = 0.03145377
Validation score: 0.972679
Iteration 13, loss = 0.03080832
Validation score: 0.976071
Iteration 14, loss = 0.03164143
Validation score: 0.977143
Iteration 15, loss = 0.02985800
Validation score: 0.976607
Iteration 16, loss = 0.02814414
Validation score: 0.975357
Iteration 17, loss = 0.02824576
Validation score: 0.976250
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(200, 200), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [32]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_twohiddenlayers_7 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_twohiddenlayers_7)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_twohiddenlayers_7 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_twohiddenlayers_7)


Training Accuracy:  0.9981785714285715

Test Accuracy:  0.9828571428571429


## Hidden Layers = 2; Neurons = (200, 200, 200)

In [33]:
%%time
mlp_clf = MLPClassifier(hidden_layer_sizes=(200,200,200), max_iter=1000, alpha=0.01,
                    solver='adam', verbose=True, tol=1e-5, random_state=1, 
                    learning_rate = 'constant', learning_rate_init=0.001, activation='relu',
                    early_stopping=True, n_iter_no_change=10)

mlp_clf.fit(X_train, y_train)

Iteration 1, loss = 0.33766184
Validation score: 0.948750
Iteration 2, loss = 0.13272727
Validation score: 0.963750
Iteration 3, loss = 0.09475740
Validation score: 0.966429
Iteration 4, loss = 0.07354296
Validation score: 0.971786
Iteration 5, loss = 0.05974406
Validation score: 0.969643
Iteration 6, loss = 0.05673193
Validation score: 0.963571
Iteration 7, loss = 0.04775784
Validation score: 0.977500
Iteration 8, loss = 0.04071493
Validation score: 0.972679
Iteration 9, loss = 0.04059401
Validation score: 0.975000
Iteration 10, loss = 0.03878671
Validation score: 0.975893
Iteration 11, loss = 0.03583292
Validation score: 0.975357
Iteration 12, loss = 0.03679866
Validation score: 0.973393
Iteration 13, loss = 0.03746589
Validation score: 0.975536
Iteration 14, loss = 0.03279575
Validation score: 0.973929
Iteration 15, loss = 0.03797293
Validation score: 0.974643
Iteration 16, loss = 0.03109364
Validation score: 0.974821
Iteration 17, loss = 0.03298044
Validation score: 0.974286
Iterat

MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
              beta_2=0.999, early_stopping=True, epsilon=1e-08,
              hidden_layer_sizes=(200, 200, 200), learning_rate='constant',
              learning_rate_init=0.001, max_iter=1000, momentum=0.9,
              n_iter_no_change=10, nesterovs_momentum=True, power_t=0.5,
              random_state=1, shuffle=True, solver='adam', tol=1e-05,
              validation_fraction=0.1, verbose=True, warm_start=False)

In [34]:
y_train_predicted = mlp_clf.predict(X_train)

train_accuracy_twohiddenlayers_8 = np.mean(y_train_predicted == y_train)
print("\nTraining Accuracy: ", train_accuracy_twohiddenlayers_8)


y_test_predicted = mlp_clf.predict(X_test)

test_accuracy_twohiddenlayers_8 = np.mean(y_test_predicted == y_test)
print("\nTest Accuracy: ", test_accuracy_twohiddenlayers_8)


Training Accuracy:  0.994125

Test Accuracy:  0.9777857142857143


In [35]:
data = [[1, 100, test_accuracy_onehiddenlayer_100],
        [1, 200, test_accuracy_onehiddenlayer_200],
        [1, 500, test_accuracy_onehiddenlayer_500],
        [1, 1000, test_accuracy_onehiddenlayer_1000],
        [1, 2000, test_accuracy_onehiddenlayer_2000],
        [1, 5000, test_accuracy_onehiddenlayer_5000],
        [2, (200, 100), test_accuracy_twohiddenlayers_1],
        [2, (200, 200), test_accuracy_twohiddenlayers_7],
        [2, (300, 200), test_accuracy_twohiddenlayers_2],
        [3, (300, 200, 100), test_accuracy_twohiddenlayers_3],
        [3, (200, 200, 200), test_accuracy_twohiddenlayers_8],
        [4, (400, 300, 200, 100), test_accuracy_twohiddenlayers_4],
        [5, (500, 400, 300, 200, 100), test_accuracy_twohiddenlayers_5],
        [6, (1000, 500, 400, 300, 200, 100), test_accuracy_twohiddenlayers_6]]
pd.DataFrame(data, columns=["Hidden Layers", "Neurons", "Accuracy"])

Unnamed: 0,Hidden Layers,Neurons,Accuracy
0,1,100,0.977071
1,1,200,0.980857
2,1,500,0.978857
3,1,1000,0.983286
4,1,2000,0.978571
5,1,5000,0.983143
6,2,"(200, 100)",0.981214
7,2,"(200, 200)",0.982857
8,2,"(300, 200)",0.984286
9,3,"(300, 200, 100)",0.977286


## Observation

We observe that the MLP with **two hidden layers containing 300 and 200 neurons**, respetively performs the best (test accuracy = 0.984286).

However, we achieve similar accuracy (0.980857) by using only one hidden layer with 200 neurons as well.

We notice that the six hidden layer based MLP does not perform better than a single hidden layer (200 neurons) based MLP. It starts to show some **overfitting**. The training time for this heavy MLP architecture is significantly large (14 min vs. 4 min).

Thus, to create an optimal or near-optimal architecture for a MLP we should start with simple architectures, then gradually increase its complexity. It is never a good idea to start with a heavy architecture. The heavy architectures are prone to overfitting and expensive to train.