# CIFAR10 MLP 2

Due to the unsatisfied performance of the previous MLP model (0.5192 val_accuracy), let's try another MLP network model with lesser number of layers but also with more number of neurons per layer.

## Imports

In [5]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import tensorflow as tf
from tensorflow import keras
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import joblib
from mltoolkit.neural_networks import make_mlp
from mltoolkit.utils import dump_keras_model,dump_arrays,dump_sklearn_model,\
                            get_tf_logdir

In [124]:
def make_mlp(input_X,output_y,hidden_layers,neurons,
             flatten=False,
             hid_activation='relu',
             hid_initializer='glorot_uniform',
             hid_regularizer=None,
             out_activation=None,
             batch_norm=False,
             dropout=False,
             dropout_rate=0.2):
    '''
    Create a Sequential MLP model. All hidden layers will have the same
    number of neurons, activation function, initializer and regularizer.

    Parameters:
    ----------
    input_X: nd-array
        The input dataset X. Used to determine the shape of the input dataset.
    output_y: 1d-array
        The output dataset y. Used to determine the type and number of neurons
        for the output layer.
    hidden_layers: int
        The number of hidden layers for the model.
    neurons: int
        The number of neurons for each hidden layer in the model.
    flatten: bool, Default: False
        Whether to flatten the input dataset X. If input_X is already
        flatten, this should be False.
    hid_activation: str, Default: 'relu
        Activation function used for all the hidden layers.
    hid_initializer: str, Default: 'glorot_uniform'
        Initializer used for all the hidden layers.
    hid_regularizer: str, Default: None
        Regularizer used for all the hidden layers.
    out_activation: str, Default: None
        Activation function used for the output layer.
        For regression model, this should be leave as None.
        For classification model, this should be passed with the relevant
        activation function e.g. 'softmax'.
    batch_norm: bool, Default: False
        Whether to include layers of batch normalization for each hidden layer.
    
    Returns:
    ----------
    keras.Sequential: 
    '''
    if (batch_norm and dropout):
        raise AssertionError("This function is for generating simple " +
                             "MLP models, batch normalization and dropout " +
                             "cannot be used together, please create the " +
                             "model manually instead")
    
    model = keras.models.Sequential([
        keras.layers.Input(shape=input_X.shape[1:])
    ])
    if flatten: model.add(keras.layers.Flatten())
    if batch_norm:
        model.add(keras.layers.BatchNormalization())
        [[model.add(layer),
         model.add(keras.layers.BatchNormalization()),
         model.add(keras.layers.Activation(hid_activation))] 
         for layer in [
            keras.layers.Dense(neurons,
                               kernel_initializer=hid_initializer,
                               kernel_regularizer=hid_regularizer)
            for i in range(hidden_layers)
        ]]
    elif dropout:
        model.add(keras.layers.Dropout(dropout_rate))
        [[model.add(layer),
         model.add(keras.layers.Dropout(dropout_rate))] 
         for layer in [
            keras.layers.Dense(neurons,
                               activation=hid_activation,
                               kernel_initializer=hid_initializer,
                               kernel_regularizer=hid_regularizer)
            for i in range(hidden_layers)
        ]]
    else:
        [model.add(layer) for layer in [
            keras.layers.Dense(neurons,
                               activation=hid_activation,
                               kernel_initializer=hid_initializer,
                               kernel_regularizer=hid_regularizer)
            for i in range(hidden_layers)
        ]]
        
    
    if output_y.dtype == int or output_y.dtype == float:
        model.add(keras.layers.Dense(1,activation=None))
    else:
        model.add(keras.layers.Dense(np.unique(output_y).size,activation=out_activation))
    return model

## Loading the datasets

In [3]:
X_train_trans = joblib.load("Datasets\\X_train_trans.pkl")
X_test_trans = joblib.load("Datasets\\X_test_trans.pkl")
y_train_raw = joblib.load("Datasets\\Raw Data\\y_train_raw.pkl")
y_test_raw = joblib.load("Datasets\\Raw Data\\y_test_raw.pkl")

X_train_trans.shape,X_test_trans.shape,y_train_raw.shape,y_test_raw.shape

((50000, 700), (10000, 700), (50000,), (10000,))

Note that the X_train and X_test are both PCA transformed with 700 components to reduce the dimension.

## Splitting the Datasets

In [4]:
X_train,X_valid,y_train,y_valid = train_test_split(X_train_trans,y_train_raw,test_size=0.1,stratify=y_train_raw)

## MLP (5 x 400)

As usual, we will start with the vanilla model first. Then we only proceed to modify the model accordingly.

mlp_5: Default MLP with 5 hidden layers and 400 neurons each

In [19]:
mlp_5 = make_mlp(X_train,y_train,5,400,
                 hid_activation='elu',out_activation='softmax')

In [20]:
mlp_5.summary()

Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_6 (Dense)             (None, 400)               280400    
                                                                 
 dense_7 (Dense)             (None, 400)               160400    
                                                                 
 dense_8 (Dense)             (None, 400)               160400    
                                                                 
 dense_9 (Dense)             (None, 400)               160400    
                                                                 
 dense_10 (Dense)            (None, 400)               160400    
                                                                 
 dense_11 (Dense)            (None, 10)                4010      
                                                                 
Total params: 926,010
Trainable params: 926,010
Non-tr

In [21]:
mlp_5_logdir = get_tf_logdir("mlp_5")
mlp_5_tfboard = keras.callbacks.TensorBoard(mlp_5_logdir)

In [22]:
mlp_5_monitor = 'val_accuracy'
mlp_5_early = keras.callbacks.EarlyStopping(monitor=mlp_5_monitor,patience=10,restore_best_weights=True)
mlp_5_opt = keras.optimizers.Nadam(learning_rate=0.001)
mlp_5_schedule = keras.callbacks.ReduceLROnPlateau(monitor=mlp_5_monitor,factor=0.5,patience=3)

In [23]:
mlp_5.compile(optimizer=mlp_5_opt,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [24]:
mlp_5.fit(X_train,y_train,batch_size=500,epochs=200,
          callbacks=[mlp_5_tfboard,mlp_5_early,mlp_5_schedule],
          validation_data=[X_valid,y_valid])

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200


<keras.callbacks.History at 0x263c1e15930>

In [25]:
mlp_5.evaluate(X_test_trans,y_test_raw)



[1.6256675720214844, 0.5386999845504761]

This model now works even better than the one in the previous try with (20 x 100) architecture.\
The model also is now seriously overfitting the input training dataset which means our model architecture is able to hold most of the information possessed by our data.

We will try to regularize the model and see if it actually improves the accuracy.

### Saving model

In [26]:
dump_keras_model(mlp_5,filename="mlp_5.h5",save_weights=False)

## MLP (5 x 400, L1)

Let's try with the simplest L1 regularizer.

In [109]:
mlp_6_regular = keras.regularizers.L1(l1=0.0001)

In [110]:
mlp_6 = make_mlp(X_train,y_train,5,400,
                 hid_activation='elu',hid_regularizer=mlp_6_regular,out_activation='softmax')

In [111]:
mlp_6.summary()

Model: "sequential_15"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dense_90 (Dense)            (None, 400)               280400    
                                                                 
 dense_91 (Dense)            (None, 400)               160400    
                                                                 
 dense_92 (Dense)            (None, 400)               160400    
                                                                 
 dense_93 (Dense)            (None, 400)               160400    
                                                                 
 dense_94 (Dense)            (None, 400)               160400    
                                                                 
 dense_95 (Dense)            (None, 10)                4010      
                                                                 
Total params: 926,010
Trainable params: 926,010
Non-t

In [112]:
mlp_6_logdir = get_tf_logdir("mlp_6")
mlp_6_tfboard = keras.callbacks.TensorBoard(mlp_6_logdir)

In [113]:
mlp_6_monitor = 'val_accuracy'
mlp_6_early = keras.callbacks.EarlyStopping(monitor=mlp_6_monitor,patience=10,restore_best_weights=True)
mlp_6_opt = keras.optimizers.Nadam(learning_rate=0.005)
mlp_6_schedule = keras.callbacks.ReduceLROnPlateau(monitor=mlp_6_monitor,factor=0.5,patience=3)

In [114]:
mlp_6.compile(optimizer=mlp_6_opt,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [115]:
mlp_6.fit(X_train,y_train,batch_size=500,epochs=200,
          callbacks=[mlp_6_tfboard,mlp_6_early,mlp_6_schedule],
          validation_data=[X_valid,y_valid])

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200


<keras.callbacks.History at 0x263eae13dc0>

In [116]:
mlp_6.evaluate(X_test_trans,y_test_raw)



[1.654964804649353, 0.5511000156402588]

### Saving model

In [117]:
dump_keras_model(mlp_6,filename="mlp_6.h5",save_weights=False)

Now the evaluation accuracy is up to 0.5511.\
But the model is still slightly overfitting.

We will try adding Dropout regularizer next.

## MLP (5 x 400, L1, Dropout)

Let's try with the simplest L1 regularizer.

In [118]:
mlp_7_regular = keras.regularizers.L1(l1=0.0001)

In [131]:
mlp_7 = make_mlp(X_train,y_train,5,400,
                 hid_activation='elu',
                 hid_regularizer=mlp_7_regular,
                 out_activation='softmax',
                 dropout=True,dropout_rate=0.1)

In [132]:
mlp_7.summary()

Model: "sequential_17"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 dropout_6 (Dropout)         (None, 700)               0         
                                                                 
 dense_102 (Dense)           (None, 400)               280400    
                                                                 
 dropout_7 (Dropout)         (None, 400)               0         
                                                                 
 dense_103 (Dense)           (None, 400)               160400    
                                                                 
 dropout_8 (Dropout)         (None, 400)               0         
                                                                 
 dense_104 (Dense)           (None, 400)               160400    
                                                                 
 dropout_9 (Dropout)         (None, 400)             

In [133]:
mlp_7_logdir = get_tf_logdir("mlp_7")
mlp_7_tfboard = keras.callbacks.TensorBoard(mlp_7_logdir)

In [134]:
mlp_7_monitor = 'val_accuracy'
mlp_7_early = keras.callbacks.EarlyStopping(monitor=mlp_7_monitor,patience=10,restore_best_weights=True)
mlp_7_opt = keras.optimizers.Nadam(learning_rate=0.005)
mlp_7_schedule = keras.callbacks.ReduceLROnPlateau(monitor=mlp_7_monitor,factor=0.5,patience=3)

In [135]:
mlp_7.compile(optimizer=mlp_7_opt,
              loss='sparse_categorical_crossentropy',
              metrics=['accuracy'])

In [136]:
mlp_7.fit(X_train,y_train,batch_size=500,epochs=200,
          callbacks=[mlp_7_tfboard,mlp_7_early,mlp_7_schedule],
          validation_data=[X_valid,y_valid])

Epoch 1/200
Epoch 2/200
Epoch 3/200
Epoch 4/200
Epoch 5/200
Epoch 6/200
Epoch 7/200
Epoch 8/200
Epoch 9/200
Epoch 10/200
Epoch 11/200
Epoch 12/200
Epoch 13/200
Epoch 14/200
Epoch 15/200
Epoch 16/200
Epoch 17/200
Epoch 18/200
Epoch 19/200
Epoch 20/200
Epoch 21/200
Epoch 22/200
Epoch 23/200
Epoch 24/200
Epoch 25/200
Epoch 26/200
Epoch 27/200
Epoch 28/200
Epoch 29/200
Epoch 30/200
Epoch 31/200
Epoch 32/200
Epoch 33/200
Epoch 34/200
Epoch 35/200
Epoch 36/200
Epoch 37/200
Epoch 38/200
Epoch 39/200
Epoch 40/200
Epoch 41/200


<keras.callbacks.History at 0x263bba15810>

In [137]:
mlp_7.evaluate(X_test_trans,y_test_raw)



[1.4153943061828613, 0.5680000185966492]

### Saving model

In [138]:
dump_keras_model(mlp_7,filename="mlp_7.h5",save_weights=False)

Now the evaluation accuracy is up to 0.5680.\
It did not improve by a whole lot, but regularization does help.

## Conclusion

Perhaps we should try other neural networks architectures in the future, it seems like this is pretty much what we can try for now.

Model: Sequential (5 hidden layers with 400 neurons each)\
Activation: ELU\
Initialization: Glorot Uniform\
Regularization: Early Stopping, L1 (a = 0.0001) Dropout (rate = 0.1)\
Optimizer: Nadam\
Loss Function: Sparse Categorical Cross Entropy\
Learning Rate Schedule: Performance Scheduler (factor = 0.5, patience = 3)\
Output Activation: Softmax

Evaluation Accuracy: 0.5680