In the previous section we saw overfitting which means the gap between trainig accuracy and test accuracy was high. to overcome overfitting we can use different regularization methods and dropout. In this example I will use L2 regularization and dropout. **Dropout** is a way of disconnecting some connections between neurons of two consecutive layers in order to increase CNN architecture. Moreover, we used batch normalization for faster training which also plays role as regularizer.


In [4]:
import os 
os.environ["TFF_CPP_MIN_LOG_LEVEL"]="2"

In [8]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, regularizers
from tensorflow.keras.datasets import cifar10

In [9]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

In [14]:
inputs = keras.Input(shape=(32, 32, 3))
x = layers.Conv2D(32, 3,padding="same",kernel_regularizer=regularizers.l2(0.01),)(inputs)
x = layers.BatchNormalization()(x)
x = keras.activations.relu(x)
x = layers.MaxPooling2D()(x)

x = layers.Conv2D(64, 3,padding="same",kernel_regularizer=regularizers.l2(0.01),)(x)
x = layers.BatchNormalization()(x)
x = keras.activations.relu(x)
x = layers.MaxPooling2D()(x)

x = layers.Conv2D(128, 3,padding="same",kernel_regularizer=regularizers.l2(.01))(x)
x = layers.BatchNormalization()(x)
x = keras.activations.relu(x)
x = layers.Flatten()(x)

x = layers.Dense(64, activation="relu",kernel_regularizer=regularizers.l2(0.01))(x)
outputs = layers.Dense(10)(x)
model = keras.Model(inputs=inputs, outputs=outputs)

In [15]:
# model configuration
model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(lr=3e-4),
    metrics=["accuracy"]
)



In [17]:
model.fit(x_train,y_train,batch_size=64,epochs=150,verbose=2)

Epoch 1/150
782/782 - 92s - loss: 1.9993 - accuracy: 0.5174 - 92s/epoch - 118ms/step
Epoch 2/150
782/782 - 95s - loss: 1.2819 - accuracy: 0.6416 - 95s/epoch - 122ms/step
Epoch 3/150
782/782 - 99s - loss: 1.2062 - accuracy: 0.6688 - 99s/epoch - 126ms/step
Epoch 4/150
782/782 - 100s - loss: 1.1753 - accuracy: 0.6815 - 100s/epoch - 128ms/step
Epoch 5/150
782/782 - 104s - loss: 1.1472 - accuracy: 0.6932 - 104s/epoch - 133ms/step
Epoch 6/150
782/782 - 98s - loss: 1.1287 - accuracy: 0.7016 - 98s/epoch - 125ms/step
Epoch 7/150
782/782 - 96s - loss: 1.1078 - accuracy: 0.7108 - 96s/epoch - 123ms/step
Epoch 8/150
782/782 - 89s - loss: 1.0892 - accuracy: 0.7169 - 89s/epoch - 114ms/step
Epoch 9/150
782/782 - 81s - loss: 1.0726 - accuracy: 0.7222 - 81s/epoch - 103ms/step
Epoch 10/150
782/782 - 95s - loss: 1.0507 - accuracy: 0.7300 - 95s/epoch - 121ms/step
Epoch 11/150
782/782 - 103s - loss: 1.0387 - accuracy: 0.7322 - 103s/epoch - 132ms/step
Epoch 12/150
782/782 - 85s - loss: 1.0263 - accuracy: 0.7

<keras.src.callbacks.History at 0x24a437f4750>

In [18]:
#evaluation 
model.evaluate(x_test,y_test,batch_size=64,verbose=2)

157/157 - 5s - loss: 1.8349 - accuracy: 0.5291 - 5s/epoch - 31ms/step


[1.8348898887634277, 0.5291000008583069]

we can see here that the gap between the training accuracy and the test accuracy has decreased using regularization and dropout. Althouth the train accuracy can be higher by increasing the training time(increasing the batch size).