# Faster R-CNN and YOLOv3 Object Detection Assignment

This notebook contains the comprehensive analysis and implementation of both Faster R-CNN and YOLOv3 object detection models.

## 1. Faster R-CNN - Problem 2: Code Analysis

### Important Components of Faster R-CNN Implementation:

#### **Region Proposal Network (RPN)**
- **Location**: `data/faster_rcnn/model/resnet.py` - function `rpn(base_layers, num_anchors)`
- **Implementation**: The RPN is implemented as a convolutional layer that slides over the feature map and generates region proposals
- **Key Lines**:
  ```python
  def rpn(base_layers,num_anchors):
      x = Convolution2D(512, (3, 3), padding='same', activation='relu', kernel_initializer='normal', name='rpn_conv1')(base_layers)
      x_class = Convolution2D(num_anchors, (1, 1), activation='sigmoid', kernel_initializer='uniform', name='rpn_out_class')(x)
  ```

#### **ROI Pooling Implementation**
- **Location**: `data/faster_rcnn/model/RoiPoolingConv.py`

ROI Pooling is implemented using a custom Keras layer.

**Important Features:**

   - Manages the image data formats channels_first and channels_last.

   - TensorFlow resize operations are used for effective pooling.

   - Applying max pooling and converting ROI coordinates to cropped regions

#### **Backbone Network (ResNet)**
- **Location**: `data/faster_rcnn/model/resnet.py`

ResNet-50 serves as the primary feature extractor in the implementation.

**Important Functions:**

   - `nn_base()`: Extracting Base ResNet-50 features

   - `classifier()`: ROI feature processing classification head

#### **Loss Functions**
- **Location**: `data/faster_rcnn/model/losses.py`

Specialized loss functions are implemented for:

   - The loss of RPN classification (`rpn_loss_cls`)

   - Loss from RPN regression (`rpn_loss_regr`)

   The final loss of the classifier (`class_loss_cls`, `class_loss_regr`)

#### **Training Pipeline**
- **Location**: `data/faster_rcnn/train.py`
- **Key Features**:
  - Two-stage training: first trains RPN, then classifier
  - Data augmentation support (horizontal flips, vertical flips, 90° rotations)
  - Alternating RPN and classifier training within each iteration
- **Training Strategy**:
  1. Feed forward through network
  2. Generate ROI proposals using RPN
  3. Calculate IoU between proposals and ground truth
  4. Sample positive and negative examples for classifier
  5. Train both RPN and classifier in same iteration

#### **Prediction Pipeline**
- **Location**: `data/faster_rcnn/predict.py`
- **Process**: RPN → ROI extraction → Classification → NMS → Final detections

#### **Annotation Validation and Parser** ⭐ (Enhanced)
- **Location**: `data/faster_rcnn/model/parser.py`

**Important Enhancements:**

 - Coordinate swapping (x2 < x1 or y2 < y1) is fixed by **automatic coordinate validation**.
  
 - Verifies that every coordinate is inside the boundaries of the image.
  
 - Bounding boxes that are too small or have zero area are eliminated through the process of **"invalid box filtering."**

**The handling of missing images:** 

 - elegantly omits missing pictures while providing thorough reporting

 **Thorough logging:**

 - gives data on errors, skipped lines, and fixes.

 **The function of validation:**
 
  ```python
  def validate_bbox(x1, y1, x2, y2, width, height):
      # Fixes swapped coordinates
      # Clamps to image bounds
      # Validates minimum size (2×2 pixels)
      # Returns (valid, x1, y1, x2, y2)
  ```
- **Results**: Successfully loads 7,880 valid images with 19 classes from Simpsons dataset

#### **Annotation Validation Tool** ⭐ (New)
- **Location**: `data/faster_rcnn/validate_annotations.py`
- **Purpose**: Standalone script to validate and fix annotation files before training
- **Features**:
  - Validates bounding box coordinates
  - Checks for missing images
  - Fixes coordinate errors automatically
  - Generates detailed reports
- **Usage**:
  ```bash
  # Validate and create fixed annotation file
  python data/faster_rcnn/validate_annotations.py annotation.txt -o annotation_fixed.txt
  
  # Validate only (report errors without fixing)
  python data/faster_rcnn/validate_annotations.py annotation.txt --no-fix
  ```

## 2. YOLOv3 - Problem 6: Code Analysis

### Important Components of YOLOv3 Implementation:

#### **Darknet Backbone**
- **Location**: `data/yolov3/yolo3/model.py` - function `darknet_body(x)`
- **Implementation**: Custom implementation of Darknet-53 backbone
- **Key Features**:
  ```python
  def darknet_body(x):
      x = DarknetConv2D_BN_Leaky(32, (3,3))(x)
      x = resblock_body(x, 64, 1)
      x = resblock_body(x, 128, 2)
      # ... continues with 52 total convolutional layers
  ```

#### **Multi-Scale Prediction**
- **Location**: `data/yolov3/yolo3/model.py` - function `yolo_body()`
- **Implementation**: Uses feature maps at 3 different scales (13×13, 26×26, 52×52)
- **Key Innovation**: Concatenates features from earlier layers with upsampled features
  ```python
  # Multi-scale detection heads
  route1 = concatenate([(ip), x], axis=-1)
  route2 = concatenate([(ip2)], x_small], axis=-1)
  ```

#### **YOLO Detection Head**
- **Location**: `data/yolov3/yolo3/model.py` - function `make_last_layers()`
- **Implementation**: Each detection head predicts:
  - Bounding box coordinates (4 values)
  - Objectness score (1 value)
  - Class probabilities (N classes)
- **Total outputs**: 3 scales × 3 anchors × (5 + N_classes)

#### **Anchor Box System**
- **Location**: `data/yolov3/model_data/yolo_anchors.txt`
- **Implementation**: Uses pre-defined anchor boxes (9 total: 3 per scale)
- **Anchor dimensions**: Optimized for COCO dataset objects

#### **Non-Maximum Suppression**
- **Location**: `data/yolov3/yolo3/model.py` - function `yolo_eval()`
- **Implementation**: Applies IoU-based filtering and class-specific NMS
- **Process**: Score threshold → IoU filtering → NMS per class → Final detections

#### **Prediction Pipeline**
- **Location**: `data/yolov3/yolo.py` - class `YOLO`
- **Key Methods**:
  - `detect_image()`: Processes individual images
  - `detect_video()`: Processes video files
  - `letterbox_image()`: Maintains aspect ratio during resizing

#### **Training Infrastructure**
- **Location**: `data/yolov3/train.py`
- **Features**:
  - Custom data generators for YOLO format
  - Data augmentation (random crops, colors, flips)
  - Learning rate scheduling
  - Transfer learning support from Darknet weights

In [2]:
# Example: Validate and test annotation loading
import os
import sys

# Add paths
sys.path.append('data/faster_rcnn')

from model.parser import get_data

# Load annotations
print("Loading and validating annotations...")
try:
    all_imgs, classes_count, class_mapping = get_data('data/faster_rcnn/annotation_fixed.txt')
    
    print(f"\n✓ Successfully loaded {len(all_imgs)} images")
    print(f"✓ Found {len(classes_count)} classes")
    print(f"\nTop classes by image count:")
    
    # Sort by count
    sorted_classes = sorted(classes_count.items(), key=lambda x: x[1], reverse=True)
    for class_name, count in sorted_classes[:10]:
        if class_name != 'bg':
            print(f"  {class_name}: {count} images")
    
    print("\n✓ All annotations validated and ready for training!")
    
except Exception as e:
    print(f"Error: {e}")
    print("Make sure annotation_fixed.txt exists in data/faster_rcnn/")


Loading and validating annotations...
Parsing annotation files

Annotation parsing summary:
  Total valid images: 7880
  Skipped lines: 0
  Fixed bounding boxes: 0
  Invalid bounding boxes: 0

Training images per class (19 classes):
{'abraham_grampa_simpson': 686,
 'apu_nahasapeemapetilon': 206,
 'bart_simpson': 650,
 'bg': 0,
 'charles_montgomery_burns': 648,
 'chief_wiggum': 208,
 'comic_book_guy': 207,
 'edna_krabappel': 212,
 'homer_simpson': 718,
 'kent_brockman': 213,
 'krusty_the_clown': 427,
 'lisa_simpson': 755,
 'marge_simpson': 629,
 'milhouse_van_houten': 210,
 'moe_szyslak': 403,
 'ned_flanders': 675,
 'nelson_muntz': 219,
 'principal_skinner': 614,
 'sideshow_bob': 200}

✓ Successfully loaded 7880 images
✓ Found 19 classes

Top classes by image count:
  lisa_simpson: 755 images
  homer_simpson: 718 images
  abraham_grampa_simpson: 686 images
  ned_flanders: 675 images
  bart_simpson: 650 images
  charles_montgomery_burns: 648 images
  marge_simpson: 629 images
  principal

## 3. Model Comparison

| Aspect | Faster R-CNN | YOLOv3 |
|--------|-------------|---------|
| **Architecture** | Two-stage (RPN + Classifier) | Single-stage |
| **Backbone** | ResNet-50 | Darknet-53 |
| **Prediction Time** | Slower (multiple steps) | Faster (single pass) |
| **Accuracy** | Generally higher | Slightly lower but very competitive |
| **Memory Usage** | Higher (ROI storage) | Lower |
| **Implementation Complexity** | More complex | Simpler |
| **Training** | Two-stage training | End-to-end training |
| **Region Proposals** | Explicit RPN | Implicit anchor-based |
|

**Key Differences:**
- **Faster R-CNN**: Region proposal → Feature extraction → Classification cascade
- **YOLOv3**: Direct dense prediction with multi-scale feature fusion
- **Faster R-CNN**: Better for small objects, more precise localization
- **YOLOv3**: Faster inference, better for real-time applications

In [3]:
# Import necessary libraries for testing
import os
import sys
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image

# Add project paths
sys.path.append('data/faster_rcnn')
sys.path.append('data/yolov3')

print('Dependencies imported successfully!')

Dependencies imported successfully!


In [4]:
# Example: Validate and test annotation loading
import os
import sys

# Add paths
sys.path.append('data/faster_rcnn')

from model.parser import get_data

# Load annotations
print("Loading and validating annotations...")
try:
    all_imgs, classes_count, class_mapping = get_data('data/faster_rcnn/annotation_fixed.txt')
    
    print(f"\n✓ Successfully loaded {len(all_imgs)} images")
    print(f"✓ Found {len(classes_count)} classes")
    print(f"\nTop classes by image count:")
    
    # Sort by count
    sorted_classes = sorted(classes_count.items(), key=lambda x: x[1], reverse=True)
    for class_name, count in sorted_classes[:10]:
        if class_name != 'bg':
            print(f"  {class_name}: {count} images")
    
    print("\n✓ All annotations validated and ready for training!")
    
except Exception as e:
    print(f"Error: {e}")
    print("Make sure annotation_fixed.txt exists in data/faster_rcnn/")


Loading and validating annotations...
Parsing annotation files

Annotation parsing summary:
  Total valid images: 7880
  Skipped lines: 0
  Fixed bounding boxes: 0
  Invalid bounding boxes: 0

Training images per class (19 classes):
{'abraham_grampa_simpson': 686,
 'apu_nahasapeemapetilon': 206,
 'bart_simpson': 650,
 'bg': 0,
 'charles_montgomery_burns': 648,
 'chief_wiggum': 208,
 'comic_book_guy': 207,
 'edna_krabappel': 212,
 'homer_simpson': 718,
 'kent_brockman': 213,
 'krusty_the_clown': 427,
 'lisa_simpson': 755,
 'marge_simpson': 629,
 'milhouse_van_houten': 210,
 'moe_szyslak': 403,
 'ned_flanders': 675,
 'nelson_muntz': 219,
 'principal_skinner': 614,
 'sideshow_bob': 200}

✓ Successfully loaded 7880 images
✓ Found 19 classes

Top classes by image count:
  lisa_simpson: 755 images
  homer_simpson: 718 images
  abraham_grampa_simpson: 686 images
  ned_flanders: 675 images
  bart_simpson: 650 images
  charles_montgomery_burns: 648 images
  marge_simpson: 629 images
  principal

## 4. Setup and Configuration

### Project Structure:
```
r-cnn-yolo/
├── data/
│   ├── faster_rcnn/     # Faster R-CNN implementation
│   └── yolov3/          # YOLOv3 implementation
├── plots/                # Output results
├── src/                  # Source code
├── reports/              # Logs and analysis
├── main.py               # Main execution script
├── object-detection.ipynb # This notebook
└── requirements.txt      # Dependencies
```