In [10]:
import os

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "2"
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.datasets import mnist

In [11]:
# physical_devices = tf.config.list_physical_devices("GPU")
# tf.config.experimental.set_memory_growth(physical_devices[0], True)

In [12]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.reshape(-1, 28, 28, 1).astype("float32") / 255.0
x_test = x_test.reshape(-1, 28, 28, 1).astype("float32") / 255.0

CNN architecture fundamental CNN->Batchnorm->ReLU. If we want to code for 10 layers then it is not convenient to write same code multiple times for different layers. Instead we can define a class for the model and use it.

In [13]:
class CNNBlock(layers.Layer):
    def __init__(self, out_channels, kernel_size=3): 
        super(CNNBlock, self).__init__()
        self.conv = layers.Conv2D(out_channels, kernel_size, padding="same")
        self.bn = layers.BatchNormalization()

    def call(self, input_tensor, training=False):      
        x = self.conv(input_tensor)
        x = self.bn(x, training=training)
        x = tf.nn.relu(x)
        return x

**NOTE**
- CNNBlock calss inherits keras.layers.Layer
- __init__ constructor. here behaviour of the layer can be defined. when an object of this class is instantiated we need to pass the out_channels and kernel_size.
- super is the parent class constructor which defines the layer properly
- out_channels=number of filters: refers to depth of the convolutional layers
- self.conv : defines a conv layer
- def call : main forward pass

In [14]:
class ResBlock(layers.Layer):
    def __init__(self, channels):
        super(ResBlock, self).__init__()
        self.channels = channels
        self.cnn1 = CNNBlock(channels[0], 3)
        self.cnn2 = CNNBlock(channels[1], 3)
        self.cnn3 = CNNBlock(channels[2], 3)
        self.pooling = layers.MaxPooling2D()
        self.identity_mapping = layers.Conv2D(channels[1], 3, padding="same")

    def call(self, input_tensor, training=False):
        x = self.cnn1(input_tensor, training=training)
        x = self.cnn2(x, training=training)
        x = self.cnn3(x + self.identity_mapping(input_tensor), training=training,)
        x = self.pooling(x)
        return x

**NOTE**

**Resnet analogy**:
Imagine you're an artist trying to fix a painting that isn't quite what you want it to be. You start with an initial version of the painting that you're not entirely satisfied with. However, instead of starting from scratch, you decide to make incremental adjustments to improve it. This way, you preserve the parts that are already good while focusing on correcting the areas that need improvement.

**Explanation**: the input tensor is like the initial image. then after conv1,conv2 layer the output is compared with the input tensor and get the residual features which is actually the differences between output of second conv and input tensors.



- channels : is a list of 3 integers representing the output size of 3 CNN layers
- identity_mapping: this is the main part of understanding of this block of code.
  - The purpose of self.identity_mapping in the ResBlock is to introduce an identity shortcut connection. By this shortcut connection we have a several advantages.
    - It adds the input tensors with the output of the second cnn. this addition actually tells us the difference between the input and the second cnn output. By this difference check the model have the chance to check if the residual features are retained. $output= Input + Residual$. here, residual is the difference. By adding the residual, the network "corrects" the output and ensures that the relevant information is preserved.
    - The term "shortcut" comes from the fact that this connection provides a shortcut for the gradient during backpropagation, allowing it to flow more easily through the network. In traditional deep networks, as the number of layers increases, the gradients can diminish (vanishing gradient) or explode (exploding gradient) as they propagate backward during training. This makes training deeper networks more challenging. With the shortcut connection, the gradients during backpropagation can directly "shortcut" through the identity mapping (the addition operation) without being affected by the convolutional layers. The addition operation creates a "skip connection" that enables the gradient to flow directly from the output to the input of cnn2, effectively bypassing cnn1 and cnn3.

In [15]:
class ResNet_Like(keras.Model):
    def __init__(self, num_classes=10):
        super(ResNet_Like, self).__init__()
        self.block1 = ResBlock([32, 32, 64])
        self.block2 = ResBlock([128, 128, 256])
        self.block3 = ResBlock([128, 256, 512])
        self.pool = layers.GlobalAveragePooling2D()  #layers.Flatten()
        self.classifier = layers.Dense(num_classes)

    def call(self, input_tensor, training=False):
        x = self.block1(input_tensor, training=training)
        x = self.block2(x, training=training)
        x = self.block3(x, training=training)
        x = self.pool(x, training=training)
        x = self.classifier(x)
        return x
    def model(self):
        x = keras.Input(shape=(28, 28, 1))
        return keras.Model(inputs=[x], outputs=self.call(x))

**NOTE**
- this is Model like ResNet, not exactly!
- Global Average Pooling 2D reduces the spatial dimensions of the tensor to a single value per channel by averaging the values across the spatial dimensions. This is often used as an alternative to traditional flattening before the final classification layer.

In [16]:
model = ResNet_Like().model()
base_input = model.layers[0].input
base_output = model.layers[2].output
output = layers.Dense(10)(layers.Flatten()(base_output))
model = keras.Model(base_input, output)

- model=ResNet().model() : creates an instances of ResNet and calls its model functionn. This instance represents the whole architecture.
- base_input and base_output: retrieves the model input and output tensors.
- output= layers.Dense(10)(layers.Flatten())(base_output): creates new output layer

In [17]:
model.compile(
    optimizer=keras.optimizers.Adam(),
    loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)


In [18]:
model.fit(x_train, y_train, batch_size=64, epochs=1, verbose=2)
model.evaluate(x_test, y_test, batch_size=64, verbose=2)
model.save("pretrained")

938/938 - 554s - loss: 0.1052 - accuracy: 0.9671 - 554s/epoch - 591ms/step
157/157 - 20s - loss: 0.0456 - accuracy: 0.9849 - 20s/epoch - 126ms/step
INFO:tensorflow:Assets written to: pretrained\assets


INFO:tensorflow:Assets written to: pretrained\assets
