<a href="https://colab.research.google.com/github/imranttsia/Deep-learning/blob/main/Application_of_overfitting_techniques_on_MNIST_DATASET_USING_CNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Introduction
Overfitting or high variance in machine learning models occurs when the accuracy of your training dataset, the dataset used to “teach” the model, is greater than your testing accuracy. In terms of ‘loss’, overfitting reveals itself when your model has a low error in the training set and a higher error in the testing set. You can identify this visually by plotting your loss and accuracy metrics and seeing where the performance metrics converge for both datasets.

Overfitting in CNNs - Loss vs. Epoch Plot
Loss vs. Epoch Plot

Overfitting in CNNs - Accuracy vs. Epoch Plot
Accuracy vs. Epoch Plot

Overfitting indicates that your model is too complex for the problem that it is solving, i.e. your model has too many features in the case of regression models and ensemble learning, filters in the case of Convolutional Neural Networks, and layers in the case of overall Deep Learning Models. This causes your model to know the example data well, but perform poorly against any new data.

This is annoying but can be resolved through tuning your hyperparameters, but first, let’s start by making sure our data is divided into well-proportioned sets.

Splitting the Data
For a deep learning model, I recommend having 3 datasets: training, validation, and testing. The validation set should be used to fine-tune your model until you’re satisfied with its performance, then switch to the testing data to train the best version of your model. First, we’ll import the necessary library:

from sklearn.model_selection import train_test_split
Now let’s talk proportions. My ideal ratio is 70/10/20, meaning the training set should be made up of ~70% of your data, then devote 10% to the validation set, and 20% to the test set, like so,

# Create the Validation Dataset
Xtrain, Xval, ytrain, yval = train_test_split(train_images, train_labels_final, train_size=0.9, test_size=0.1, random_state=42)# Create the Test and Final Training Datasets
Xtrain, Xtest, ytrain, ytest = train_test_split(Xtrain, ytrain, train_size=0.78, random_state=42)
You will need to perform two train_test_split() function calls. The first call is done on the initial training set of images and labels to form the validation set. We’ll call the parameters random_state to keep consistency in results when running the function, and test_size to note that we want the size of our validation set to be 10% of the training data, and train_size to set it equal to the remaining percentage of data to be 90%.


Become a Full-Stack Data Scientist
Power Ahead in your AI ML Career | No Pre-requisites Required

This can be omitted by default as python is smart enough to do the math. The variables Xval and yval refer to our validation images and labels. On the second call, we will generate our testing dataset from our newly formed training data Xtrain and ytrain. We will repeat the above, but this time we will set the newest training set to be 78% of the previous and assign the newest dataset to the same variable as the previous for consistency. Meanwhile, we will assign the testing data to Xtest for the test images and test for the label data.

Now we’re ready to begin modeling. Refer to my previous blog to get a deep dive into the initial CNN setup. We will start on the second model assuming our first turned out like the image above. We will use the techniques below:


Regularization
Weight Initialization
Dropout Regularization
Weight Constraints
Other
 

Regularization
Regularization optimizes a model by penalizing complex models, therefore minimizing loss and complexity. Thus this forces our neural network to be simpler. Here we will use an L2 regularizer, as it is the most common and is more stable than an L1 regularizer. Here we’ll add a regularizer to the second and third layers of our network with a learning rate (lr) of 0.01.

In [40]:
# Hidden Layer 1
import tensorflow as tf
from keras import layers
from keras import models
from keras import regularizers
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(l=0.01),input_shape=(28, 28, 1)))
model.add(layers.MaxPooling2D((2, 2)))# Hidden Layer 2
model.add(layers.Conv2D(64, (3, 3), activation='relu', kernel_regularizer=regularizers.l2(l=0.01)))
model.add(layers.MaxPooling2D((2,2)))


Dropout Regularization
Dropout regularization ignores a random subset of units in a layer while setting their weights to zero during that phase of training.

The ideal rate for the input and hidden layers is 0.4, and the ideal rate for the output layer is 0.2. See below:

In [41]:
from keras.layers import Dropout
model.add(Dropout(0.4))
model.add(layers.Conv2D(256, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(Dropout(0.4))# Flattening- Convert 2D matrix to a 1D vector
model.add(layers.Flatten())
model.add(layers.Dense(512, activation = 'relu'))
model.add(Dropout(0.2))
model.add(layers.Dense(10, activation='softmax'))

In [42]:

model.summary()

Model: "sequential_3"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d_10 (Conv2D)          (None, 26, 26, 32)        320       
                                                                 
 max_pooling2d_11 (MaxPoolin  (None, 13, 13, 32)       0         
 g2D)                                                            
                                                                 
 conv2d_11 (Conv2D)          (None, 11, 11, 64)        18496     
                                                                 
 max_pooling2d_12 (MaxPoolin  (None, 5, 5, 64)         0         
 g2D)                                                            
                                                                 
 dropout_7 (Dropout)         (None, 5, 5, 64)          0         
                                                                 
 conv2d_12 (Conv2D)          (None, 3, 3, 256)        

In [43]:
model.add(layers.Flatten())
model.add(layers.Dense(64, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

In [44]:
from keras.datasets import mnist
from keras.utils import to_categorical
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

In [45]:
train_images.ndim

3

In [46]:
train_images.shape

(60000, 28, 28)

In [48]:
test_images.ndim

3

In [47]:
test_images.shape

(10000, 28, 28)

In [49]:
train_images = train_images.reshape((60000, 28, 28, 1))
train_images = train_images.astype('float32') / 255
test_images = test_images.reshape((10000, 28, 28, 1))
test_images = test_images.astype('float32') / 255
train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)
model.compile(optimizer='rmsprop',
loss='categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=5, batch_size=64)

Epoch 1/5
Epoch 2/5
Epoch 3/5
Epoch 4/5
Epoch 5/5


<keras.callbacks.History at 0x7f0a703fd7f0>

In [50]:
test_loss, test_acc = model.evaluate(test_images, test_labels)
test_acc



0.9785000085830688