In the previous section we saw overfitting which means the gap between trainig accuracy and test accuracy was high. to overcome overfitting we can use different regularization methods and dropout. In this example I will use L2 regularization and dropout. **Dropout** is a way of disconnecting some connections between neurons of two consecutive layers in order to increase CNN architecture. Moreover, we used batch normalization for faster training which also plays role as regularizer.


In [19]:
import os 
os.environ["TFF_CPP_MIN_LOG_LEVEL"]="2"

In [20]:
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, regularizers
from tensorflow.keras.datasets import cifar10

In [21]:
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
x_train = x_train.astype("float32") / 255.0
x_test = x_test.astype("float32") / 255.0

In [28]:
#defining cnn model
inputs=keras.Input(shape=(32,32,3))
x=layers.Conv2D(32,3,activation="relu",padding="same",name="conv1")(inputs)
x = layers.BatchNormalization()(x)
x=layers.Conv2D(64,3,activation="relu",padding="same",name="conv2")(x)
x = layers.BatchNormalization()(x)
x=layers.MaxPooling2D()(x)
x=layers.Dropout(.4)(x)

x=layers.Conv2D(128,3,activation="relu",padding="valid",name="conv3")(x)
x = layers.BatchNormalization()(x)
x=layers.Conv2D(128,3,activation="relu",padding="valid",name="conv4")(x)
x = layers.BatchNormalization()(x)
x=layers.MaxPooling2D()(x)
x=layers.Dropout(.4)(x)

x=layers.Flatten()(x)
x=layers.Dropout(.4)(x)
x=layers.Dense(64,activation="relu",kernel_regularizer=regularizers.l2(.01),name="fc1")(x)
outputs=layers.Dense(10,name="fc2")(x)
model=keras.Model(inputs=inputs,outputs=outputs)

In [30]:
print(model.summary())

Model: "model_7"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 input_9 (InputLayer)        [(None, 32, 32, 3)]       0         
                                                                 
 conv1 (Conv2D)              (None, 32, 32, 32)        896       
                                                                 
 batch_normalization_31 (Ba  (None, 32, 32, 32)        128       
 tchNormalization)                                               
                                                                 
 conv2 (Conv2D)              (None, 32, 32, 64)        18496     
                                                                 
 batch_normalization_32 (Ba  (None, 32, 32, 64)        256       
 tchNormalization)                                               
                                                                 
 max_pooling2d_16 (MaxPooli  (None, 16, 16, 64)        0   

In [23]:
# inputs=keras.Input(shape=(32,32,3))
# x=layers.Conv2D(32,3,padding="same",activation="relu",name="conv1",kernel_regularizer=regularizers.l2(.1))(inputs)
# x=layers.BatchNormalization()(x)
# x=layers.Conv2D(64,3,padding="same",activation="relu",name="conv2",kernel_regularizer=regularizers.l2(.1))(inputs)
# x=layers.BatchNormalization()(x)
# x=layers.MaxPooling2D()(x)
# x=layers.Dropout(.4)(x)

# x=layers.Conv2D(128,3,padding="valid",activation="relu",name="conv3",kernel_regularizer=regularizers.l2(.1))(inputs)
# x=layers.BatchNormalization()(x)
# x=layers.Conv2D(128,3,padding="valid",activation="relu",name="conv4",kernel_regularizer=regularizers.l2(.1))(inputs)
# x=layers.BatchNormalization()(x)
# x=layers.MaxPooling2D()(x)
# x=layers.Dropout(.4)(x)

# x=layers.Flatten()(x)
# x=layers.Dense(100,activation="relu",kernel_regularizer=regularizers.l2(.1),name="fc1")(x)
# x=layers.Dense(64,activation="relu",kernel_regularizer=regularizers.l2(.1),name="fc2")(x)
# x=layers.Dropout(.4)(x)
# outputs=layers.Dense(10,name="outputput_layer")(x)
# model=keras.Model(inputs=inputs,outputs=outputs)

In [24]:
# inputs = keras.Input(shape=(32, 32, 3))
# x = layers.Conv2D(32, 3,padding="same",kernel_regularizer=regularizers.l2(0.01),)(inputs)
# x = layers.BatchNormalization()(x)
# x = keras.activations.relu(x)
# x = layers.MaxPooling2D()(x)

# x = layers.Conv2D(64, 3,padding="same",kernel_regularizer=regularizers.l2(0.01),)(x)
# x = layers.BatchNormalization()(x)
# x = keras.activations.relu(x)
# x = layers.MaxPooling2D()(x)

# x = layers.Conv2D(128, 3,padding="same",kernel_regularizer=regularizers.l2(.01))(x)
# x = layers.BatchNormalization()(x)
# x = keras.activations.relu(x)
# x = layers.Flatten()(x)

# x = layers.Dense(64, activation="relu",kernel_regularizer=regularizers.l2(0.01))(x)
# x=layers.Dropout(0.5)(x)
# outputs = layers.Dense(10)(x)
# model = keras.Model(inputs=inputs, outputs=outputs)

In [31]:
# model configuration
model.compile(
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=keras.optimizers.Adam(learning_rate=.01),
    metrics=["accuracy"]
)

In [32]:
model.fit(x_train,y_train,batch_size=64,epochs=10,verbose=2)

Epoch 1/10
782/782 - 233s - loss: 3.0906 - accuracy: 0.3565 - 233s/epoch - 298ms/step
Epoch 2/10
782/782 - 234s - loss: 1.8612 - accuracy: 0.5113 - 234s/epoch - 299ms/step
Epoch 3/10
782/782 - 238s - loss: 1.7224 - accuracy: 0.5864 - 238s/epoch - 304ms/step
Epoch 4/10
782/782 - 230s - loss: 1.6329 - accuracy: 0.6148 - 230s/epoch - 294ms/step
Epoch 5/10
782/782 - 227s - loss: 1.5971 - accuracy: 0.6320 - 227s/epoch - 291ms/step
Epoch 6/10
782/782 - 227s - loss: 1.5590 - accuracy: 0.6464 - 227s/epoch - 291ms/step
Epoch 7/10
782/782 - 227s - loss: 1.5389 - accuracy: 0.6535 - 227s/epoch - 290ms/step
Epoch 8/10
782/782 - 225s - loss: 1.5127 - accuracy: 0.6670 - 225s/epoch - 288ms/step
Epoch 9/10
782/782 - 239s - loss: 1.4945 - accuracy: 0.6775 - 239s/epoch - 306ms/step
Epoch 10/10
782/782 - 245s - loss: 1.4851 - accuracy: 0.6838 - 245s/epoch - 313ms/step


<keras.src.callbacks.History at 0x22999b6ba90>

In [27]:
#evaluation 
model.evaluate(x_test,y_test,batch_size=64,verbose=2)

157/157 - 7s - loss: 1.0162 - accuracy: 0.6767 - 7s/epoch - 45ms/step


[1.0161794424057007, 0.6766999959945679]

here I only used regularization in the first fc layer. Because it has the most number of parameters. althogh in practical regularization is applied in all layers. But here after applying this into fc1, overfitting is avoided. Because train and test accuracy is almost same.

we can see here that the gap between the training accuracy and the test accuracy has decreased using regularization and dropout. Althouth the train accuracy can be higher by increasing the training time(increasing the batch size).