# Lecture Notes: Residual and Inception Neural Networks

## I. Introduction and Context

*   The Residual Net architecture comes from a paper titled **"Deep Residual Learning for Image Recognition"**.
*   The research and architecture were built by **Microsoft** researchers.
*   The paper was published in **December 2015**, making it about 8 years old, but the architecture is still widely used today.
*   The **Inception model** architecture originates from **Google**.

## II. Plain Networks: The Problem

*   In the ResNet research paper, conventional networks like VGG16, AlexNet, and LayerNet are referred to as **Plain Networks**.
*   Plain Networks involve passing input through a **series of layers** (e.g., convolution $\rightarrow$ ReLU $\rightarrow$ convolution $\rightarrow$ ReLU).
*   Going deeper into Plain Networks introduces the **vanishing gradients problem**.
    *   In backpropagation, if derivatives (gradient values) are between 0 and 1, multiplying many small values results in a value very close to zero. This makes updating weights in earlier layers difficult.
*   **Performance Degradation:** Contrary to intuition, simply increasing the depth of a network does not always improve performance.
    *   When compared, a deeper plain network (e.g., 56 layers) resulted in a **larger training error** compared to a smaller network (e.g., 20 layers).
    *   This worse performance holds true even after training for a large number of epochs (e.g., 6,000 or 50,000 epochs) and using standard techniques like dropout, ReLU, or batch normalization.
    *   This issue—where training and test error gets worse as the network gets deeper—was the key motivation for introducing Residual Networks.

## III. Residual Networks (ResNets) Core Concepts

### A. The Residual Block and Skip Connection

*   The key concept introduced by ResNets is the **residual block** or **identity block**.
*   This block uses a **skip connection**.
*   **Mechanism:**
    1.  Input $X$ (which is typically the output of a preceding activation function, like ReLU) is passed through a block of layers (e.g., two convolutions with a ReLU in between). This intermediate output is $F(X)$.
    2.  The original input $X$ is directly added to the intermediate output $F(X)$.
    3.  The final output of the residual block is $F(X) + X$.
    4.  This concatenated output is then passed through the final activation function (e.g., ReLU).

### B. Identity Function Logic

*   The skip connection is also called an **identity connection** or **identity function**.
*   **Justification:** If the layers within the block are not learning anything useful, the learning output $F(X)$ would be close to zero.
*   If $F(X) \approx 0$, then the output becomes $0 + X = X$.
*   Since the input $X$ is often $\text{ReLU}(X)$ (as it comes from a prior ReLU layer), the output is $\text{ReLU}(X)$, which is the same as the input.
*   This ensures that if a layer is useless, the **knowledge of the previous layer is still forwarded** to subsequent layers, preventing them from becoming "dead" or stuck on zero values.
*   A key takeaway is that **adding more layers would not alter (worsen) the performance** because the skip connection provides a path for the information to flow through.

### C. Dealing with Dimension Mismatch

*   When performing element-wise addition (concatenation) where $F(X)$ and $X$ are added, the input ($X$) and the output of the convolutional path ($F(X)$) **must be of the same size or dimensions** (e.g., same vector length or same cuboid size).
*   If the input and output dimensions are **not** the same (e.g., $56 \times 56 \times 64$ needs to be added to $28 \times 28 \times 128$), the size of the input $X$ must be modified.
*   The primary methods for matching the size and shape are:
    1.  Adding an **extra convolution block** (e.g., $1 \times 1$ convolution) inside the skip connection. This can reduce spatial dimensions (e.g., using a stride of 2) and change the number of kernels (depth).
    2.  Using **padding** in the skip connection.

## IV. ResNet Architecture and Performance

*   ResNets were tested in various depths, including 18, 34, 50, 101, and 152 layers.
*   The ResNet architecture is structured in blocks, often repeating a three-layer combination multiple times (e.g., 3x, 4x, 6x, 3x for ResNet-50).
*   **Results:** When using residual networks, a deeper model (ResNet-34) demonstrated **lower error** compared to a shallower model (ResNet-18), reversing the performance degradation seen in Plain Networks.
*   If new layers added are useful, performance increases; if they are useless, the skip connection ensures the model performance is not worsened.
*   For very large datasets and deep neural networks, ResNets are the recommended architecture.

## V. Implementation Notes

*   Pre-trained ResNet models (e.g., **ResNet50**) are readily available in libraries like `tensorflow.keras.applications`.
*   These pre-trained models are typically trained on the **ImageNet** dataset.
*   A common practice involves **freezing** the weights of the early layers of the pre-trained model and only training the last few layers (dense layers). This is a form of pre-training or transfer learning.
*   When building ResNets from scratch, the model summary will show **'add' connections**, indicating where the skip connection output is added to the main path.


<img src="https://i.ibb.co/Z6cTK6d8/image.png">
<a href="https://arxiv.org/pdf/1512.03385">Full paper here</a>

# CODE
### in below code resnet 50 is implemented but other are also same just no of layers is diff

In [1]:
!python --version
!pip show sklearn

Python 3.12.11
[0m

In [16]:
import tensorflow as tf
from tensorflow.keras import layers, models,Sequential
from tensorflow.keras.layers import Rescaling
import numpy as np

In [10]:
# Set up directory and parameters
train_dir = '/content/drive/MyDrive/dag_cot/training_set/training_set/'
img_size = (224, 224)  # ResNet50 standard input size
batch_size = 32  # Adjust batch size as per your GPU capacity

In [11]:
# Data Augmentation layers (applied during training)
data_augmentation = tf.keras.Sequential([
    layers.RandomFlip('horizontal'),  # Random horizontal flip
    layers.RandomRotation(0.1),  # Random rotation
    layers.RandomZoom(0.1),  # Random zoom
])

In [12]:
# Load training and validation datasets using image_dataset_from_directory
# using training dataset only for both training and testing
train_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    train_dir,
    labels = 'inferred',
    label_mode = 'categorical',
    validation_split=0.2,  # 80/20 split for training/validation
    subset="training",
    seed=123,  # For reproducibility
    image_size=img_size,
    batch_size=batch_size
)

Found 8006 files belonging to 2 classes.
Using 6405 files for training.


In [6]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [13]:
validation_dataset = tf.keras.preprocessing.image_dataset_from_directory(
    train_dir,
    validation_split=0.2,
    labels = 'inferred',
    label_mode = 'categorical',
    subset="validation",
    seed=123,
    image_size=img_size,
    batch_size=batch_size
)

Found 8006 files belonging to 2 classes.
Using 1601 files for validation.


In [14]:
# another way of applying data augmentation
train_dataset = train_dataset.map(lambda x, y: (data_augmentation(x, training=True), y))

In [34]:
def residual_block(x, filters, kernel_size=3, stride=1):
    shortcut=x

    # conv - 1*1 layer
    x=layers.Conv2D(filters,kernel_size=1,strides=stride,padding='same')(x)
    x=layers.BatchNormalization()(x)
    x=layers.Activation('relu')(x)

    #conv 3*3 layer
    x=layers.Conv2D(filters,kernel_size=kernel_size,strides=1,padding='same')(x)
    x=layers.BatchNormalization()(x)
    x=layers.Activation('relu')(x)

    # conv 1*1 layer
    x=layers.Conv2D(filters*4,kernel_size=1,strides=1,padding='same')(x)
    x=layers.BatchNormalization()(x)

    if stride!=1 or shortcut.shape[-1]!=filters*4:
        shortcut=layers.Conv2D(filters*4,kernel_size=1,strides=stride,padding='same')(shortcut)
        shortcut=layers.BatchNormalization()(shortcut)
    x=layers.Add()([x,shortcut])
    x=layers.Activation('relu')(x)
    return x

In [35]:
# building resent 50
inputs = tf.keras.Input(shape=(224,224,3))
x=layers.Conv2D(64,kernel_size=7,strides=2,padding='same')(inputs)
x=layers.BatchNormalization()(x)
x=layers.Activation('relu')(x)
x=layers.MaxPool2D(pool_size=3,strides=2,padding='same')(x)

# residual layer

x=residual_block(x,filters=64)
x=residual_block(x,filters=64)
x=residual_block(x,filters=64)


x=residual_block(x,filters=128,stride=2)
x=residual_block(x,filters=128)
x=residual_block(x,filters=128)
x=residual_block(x,filters=128)


x=residual_block(x,filters=256,stride=2)
x=residual_block(x,filters=256)
x=residual_block(x,filters=256)
x=residual_block(x,filters=256)
x=residual_block(x,filters=256)
x=residual_block(x,filters=256)


x=residual_block(x,filters=512,stride=2)
x=residual_block(x,filters=512)
x=residual_block(x,filters=512)

x=layers.GlobalAveragePooling2D()(x)
x=layers.Dense(1000, activation='softmax')(x)
model = models.Model(inputs, x)

In [36]:
model.summary()