### Fast RCNN

Fast RCNN is a significant improvement over RCNN as it processes the entire image at once and then applies RoI (Region of Interest) pooling to extract features for region proposals. This reduces computational redundancy by running the CNN only once for the entire image.

#### Key Steps:

1. **Backbone Network (VGG16)**:
   - A pre-trained VGG16 is used as the backbone to extract feature maps from the entire image. The image is passed once through the convolutional layers, generating feature maps.

2. **RoI Pooling**:
   - RoI Pooling is used to convert the feature maps of each proposed region into a fixed size (e.g., 7x7). This allows the Fast RCNN to process regions of varying sizes.
   - The proposed regions (RoIs) are provided as input coordinates.

3. **Fully Connected Layers**:
   - After RoI pooling, the pooled features are flattened and passed through two fully connected layers to generate high-level features for classification and bounding box regression.

4. **Classification and Bounding Box Regression**:
   - The features are passed through two output heads:
     - A softmax classifier that assigns a class label to the region.
     - A bounding box regressor that refines the region's coordinates.
     
5. **Training**:
   - Fast RCNN is trained using a multi-task loss function that combines classification loss (softmax) and bounding box regression loss.

#### Limitations:
- Fast RCNN still relies on an external region proposal algorithm (e.g., Selective Search), which is not optimized and slows down the overall speed.


In [1]:
# Example Implementation of Fast RCNN is as follows

import tensorflow as tf

# Fast R-CNN model
class FastRCNN(tf.keras.Model):
    def __init__(self, num_classes):
        super(FastRCNN, self).__init__()
        # Feature extractor (backbone)
        self.backbone = tf.keras.applications.VGG16(include_top=False, input_shape=(None, None, 3))
        self.roi_pooling = tf.keras.layers.MaxPool2D(pool_size=(7, 7))
        self.flatten = tf.keras.layers.Flatten()
        self.fc1 = tf.keras.layers.Dense(4096, activation='relu')
        self.fc2 = tf.keras.layers.Dense(4096, activation='relu')
        self.classifier = tf.keras.layers.Dense(num_classes, activation='softmax')
        self.bbox_regressor = tf.keras.layers.Dense(num_classes * 4, activation='linear')  # 4 bbox coordinates per class

    def call(self, inputs, rois):
        feature_map = self.backbone(inputs)

        pooled_regions = []
        for roi in rois:
            pooled = self.roi_pooling(feature_map[:, roi[0]:roi[2], roi[1]:roi[3], :])
            pooled_regions.append(pooled)

        x = tf.concat(pooled_regions, axis=0)
        x = self.flatten(x)
        x = self.fc1(x)
        x = self.fc2(x)

        classification = self.classifier(x)
        bbox_regression = self.bbox_regressor(x)

        return classification, bbox_regression

# Example usage
def fast_rcnn_pipeline(image, rois):
    model = FastRCNN(num_classes=21)  # For 20 classes + background (VOC example)
    classification, bbox_regression = model(image, rois)
    return classification, bbox_regression

In [None]:
'''
The following is a test to check if the Fast RCNN model completes 1 forward pass. You can use such tests to check if your model works initially.
'''

import numpy as np

# Create a random constant image of shape (1, 224, 224, 3) (batch_size=1)
random_image = tf.constant(np.random.random((1, 224, 224, 3)), dtype=tf.float32)

# Create random RoIs (Region of Interest) - [x_min, y_min, x_max, y_max]
random_rois = tf.constant([[50, 30, 170, 150]], dtype=tf.int32)  # Example 1 random RoI

# Instantiate the Fast RCNN model (example, number of classes = 21)
model = FastRCNN(num_classes=21)

# Forward pass through the Fast RCNN model
classification, bbox_regression = model(random_image, random_rois)

# Assertions to check the output shape
assert classification.shape[0] == random_rois.shape[0], "Classification output mismatch"
assert bbox_regression.shape[0] == random_rois.shape[0], "Bounding box output mismatch"

# Print success message if assertions pass
print("Fast RCNN forward pass successful!")
print("Classification Output Shape:", classification.shape)
print("Bounding Box Regression Output Shape:", bbox_regression.shape)

### Faster RCNN

Faster RCNN is an extension of Fast RCNN that integrates the Region Proposal Network (RPN) into the model itself, making region proposal generation a learnable process. This removes the reliance on external region proposal methods like Selective Search and significantly speeds up the detection pipeline.

#### Key Components:

1. **Backbone Network (ResNet50)**:
   - The entire image is first passed through a pre-trained convolutional backbone (e.g., ResNet50) to generate feature maps.

2. **Region Proposal Network (RPN)**:
   - RPN is a small network that slides over the feature maps generated by the backbone.
   - It predicts two things for each anchor:
     - Objectness score: Whether the anchor contains an object (foreground) or background.
     - Bounding box deltas: Adjustments to the anchor coordinates to refine the proposal.
   - RPN outputs a set of refined region proposals (RoIs).

3. **RoI Align**:
   - Similar to RoI Pooling in Fast RCNN, but uses RoI Align for more accurate feature extraction.
   - RoI Align fixes the misalignment issues caused by quantization in RoI Pooling by avoiding any rounding of floating-point RoI coordinates.

4. **Classification and Bounding Box Regression**:
   - After RoI Align, the features for each proposal are flattened and passed through two fully connected layers.
   - Similar to Fast RCNN, the features are then used for:
     - **Classification**: Assign a class label to the region.
     - **Bounding Box Regression**: Refine the bounding box coordinates.

5. **Training**:
   - The model is trained with a multi-task loss that combines:
     - **RPN Loss**: Loss for objectness and bounding box regression from RPN.
     - **Fast RCNN Loss**: Loss for classification and bounding box regression of the final detections.

#### Advantages:
- **End-to-End Training**: The RPN is trained jointly with the rest of the network, making the region proposal process more efficient.
- **Faster**: Since RPN is integrated within the network, Faster RCNN is much faster than Fast RCNN, especially when processing large datasets.

In [4]:
import tensorflow as tf

# Region Proposal Network (RPN) module
class RPN(tf.keras.layers.Layer):
    def __init__(self, num_anchors):
        super(RPN, self).__init__()
        self.conv = tf.keras.layers.Conv2D(512, (3, 3), padding='same', activation='relu')
        self.cls_layer = tf.keras.layers.Conv2D(num_anchors * 2, (1, 1))  # Binary classification (object/no object)
        self.reg_layer = tf.keras.layers.Conv2D(num_anchors * 4, (1, 1))  # 4 coordinates for each anchor

    def call(self, feature_map):
        x = self.conv(feature_map)
        cls_logits = self.cls_layer(x)
        reg_deltas = self.reg_layer(x)
        return cls_logits, reg_deltas

# Faster R-CNN with RPN and RoI Align (for feature extraction and pooling)
class FasterRCNN(tf.keras.Model):
    def __init__(self, num_classes, num_anchors):
        super(FasterRCNN, self).__init__()
        # Backbone network (feature extractor)
        self.backbone = tf.keras.applications.ResNet50(include_top=False, input_shape=(None, None, 3))
        # Region Proposal Network (RPN)
        self.rpn = RPN(num_anchors)
        # RoI Pooling
        self.roi_align = tf.keras.layers.MaxPool2D(pool_size=(7, 7))
        # Fully connected layers for classification and bounding box regression
        self.fc1 = tf.keras.layers.Dense(4096, activation='relu')
        self.fc2 = tf.keras.layers.Dense(4096, activation='relu')
        self.classifier = tf.keras.layers.Dense(num_classes, activation='softmax')
        self.bbox_regressor = tf.keras.layers.Dense(num_classes * 4, activation='linear')

    def call(self, inputs, rois):
        feature_map = self.backbone(inputs)
        rpn_cls_logits, rpn_bbox_deltas = self.rpn(feature_map)

        pooled_regions = []
        for roi in rois:
            pooled = self.roi_align(feature_map[:, roi[0]:roi[2], roi[1]:roi[3], :])
            pooled_regions.append(pooled)

        x = tf.concat(pooled_regions, axis=0)
        x = tf.keras.layers.Flatten()(x)
        x = self.fc1(x)
        x = self.fc2(x)

        classification = self.classifier(x)
        bbox_regression = self.bbox_regressor(x)

        return classification, bbox_regression, rpn_cls_logits, rpn_bbox_deltas

# Example usage
def faster_rcnn_pipeline(image, rois):
    model = FasterRCNN(num_classes=21, num_anchors=9)  # VOC-like dataset with 9 anchors
    classification, bbox_regression, rpn_cls_logits, rpn_bbox_deltas = model(image, rois)
    return classification, bbox_regression, rpn_cls_logits, rpn_bbox_deltas

In [None]:
'''
The following is a test to check if the Faster RCNN model completes 1 forward pass. You can use such tests to check if your model works initially.
'''


# Create a random constant image of shape (1, 224, 224, 3) (batch_size=1)
random_image = tf.constant(np.random.random((1, 224, 224, 3)), dtype=tf.float32)

# Create random RoIs (Region of Interest) - [x_min, y_min, x_max, y_max]
random_rois = tf.constant([[50, 30, 170, 150]], dtype=tf.int32)  # Example 1 random RoI

# Instantiate the Faster RCNN model (example, number of classes = 21 and anchors = 9)
model = FasterRCNN(num_classes=21, num_anchors=9)

# Forward pass through the Faster RCNN model
classification, bbox_regression, rpn_cls_logits, rpn_bbox_deltas = model(random_image, random_rois)

# Assertions to check output shapes
assert classification.shape[0] == random_rois.shape[0], "Classification output mismatch"
assert bbox_regression.shape[0] == random_rois.shape[0], "Bounding box output mismatch"
assert rpn_cls_logits.shape[0] == random_image.shape[0], "RPN classification logits output mismatch"
assert rpn_bbox_deltas.shape[0] == random_image.shape[0], "RPN bounding box deltas output mismatch"

# Print success message if assertions pass
print("Faster RCNN forward pass successful!")
print("Classification Output Shape:", classification.shape)
print("Bounding Box Regression Output Shape:", bbox_regression.shape)
print("RPN Classification Logits Output Shape:", rpn_cls_logits.shape)
print("RPN Bounding Box Deltas Output Shape:", rpn_bbox_deltas.shape)