# **Introduction to Computer Vision. Lab 13. Introduction to Artificial Intelligence**

## **Theory:**

**Multi-Class Object Detection and Localization:**
- Multi-class object detection and localization involve identifying multiple objects of different classes within an image and determining their precise locations.
- This task extends single object detection by incorporating the ability to handle multiple instances of different classes simultaneously.

**Convolutional Neural Networks (CNNs):**
- CNNs are highly effective for image-related tasks due to their ability to capture spatial hierarchies through convolutional layers.
- For multi-class object detection and localization, CNN architectures are modified to include outputs for both class probabilities and bounding box coordinates.

**Modifications to CNN Architecture:**
1. **Output Layer Adjustments:**
   - The output layer should predict class probabilities for each object in the image and bounding box coordinates \((x, y, w, h)\).
   - This can be achieved by having separate outputs for classification and bounding box regression.

2. **Loss Function:**
   - A custom loss function that combines classification loss (e.g., categorical cross-entropy) and localization loss (e.g., mean squared error) is used to train the model.
   - The classification loss ensures accurate class predictions, while the localization loss ensures precise bounding box predictions.

**Dataset Requirements:**
- The dataset should contain images with multiple instances of objects from different classes.
- Each image should be annotated with bounding boxes and class labels for all objects present.

**Training and Evaluation:**
- The model is trained on a dataset with annotated bounding boxes and class labels.
- Evaluation metrics include the accuracy of object classification and the precision of bounding box predictions, often measured using metrics like Intersection over Union (IoU).



## **Excercise 1: Use your already built convolutional neural network to perform multi-class object detection and localization**

---
- Specifically, when you have an image with the same class more than once present in the image, or you have two or more classes present in the image, you need to detect and localize these classes.


- To this end, you need to modify your previous algorithm such that it can detect two or more objects of the same class in an image and localize them, and it can also detect two or more objects of different classes in an image, and localize them. You also need to find yourself an appropriate data set from the net, that contains at least one class, but that class is present more than once in the images. A good choice would be pedestrians for example.


- You need to show a performance that is significantly better than a random gues

### Multi-Class Object Detection and Localization using Convolutional Neural Network

In [None]:
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from tensorflow.keras.optimizers import Adam
import json
import os
import cv2
from sklearn.model_selection import train_test_split

# Fix random seed for reproducibility
np.random.seed(42)
tf.random.set_seed(42)

# Load dataset
def load_dataset(data_dir):
    images = []
    labels = []
    bboxes = []
    annotations_path = os.path.join(data_dir, 'annotations.json')
    with open(annotations_path, 'r') as f:
        annotations = json.load(f)
    for ann in annotations['annotations']:
        img_path = os.path.join(data_dir, 'images', ann['file_name'])
        image = ''' TO DO '''
        image = ''' TO DO '''
        label = ann['category_id']
        bbox = ann['bbox']
        images.append(image)
        labels.append(label)
        bboxes.append(bbox)
    images = np.array(images)
    labels = np.array(labels)
    bboxes = np.array(bboxes)

    # Assertions to check the dataset loading
    assert images.ndim == 4, f"Expected images to have 4 dimensions, got {images.ndim}"
    assert labels.ndim == 1, f"Expected labels to have 1 dimension, got {labels.ndim}"
    assert bboxes.ndim == 2, f"Expected bboxes to have 2 dimensions, got {bboxes.ndim}"
    
    return images, labels, bboxes

# Load dataset
data_dir = 'path_to_your_dataset'
X, y_labels, y_bboxes = load_dataset(data_dir)

# Convert labels to one-hot encoding
num_classes = len(np.unique(y_labels))
y_labels_one_hot = tf.keras.utils.to_categorical(y_labels, num_classes=num_classes)

# Assertions for one-hot encoding
assert y_labels_one_hot.shape[1] == num_classes, "One-hot encoding failed to create the correct number of classes."

# Split the dataset
X_train, X_test, y_labels_train, y_labels_test, y_bboxes_train, y_bboxes_test = train_test_split(
    X, y_labels_one_hot, y_bboxes, test_size=0.2, random_state=42)

# Define the CNN model for multi-class object detection and localization
def build_detection_model(input_shape, num_classes):
    model = ''' TO DO '''
    model.add(''' TO DO ''')
    model.add(''' TO DO ''')
    model.add(''' TO DO ''')
    model.add(''' TO DO ''')
    model.add(''' TO DO ''')
    model.add(''' TO DO ''')
    model.add(''' TO DO ''')  # num_classes for classification + 4 for bbox
    # Assert the output shape is as expected
    assert model.output_shape[1] == num_classes + 4, "Model output shape is incorrect."
    
    return model

# Build and compile the model
input_shape = ''' TO DO '''
model = build_detection_model(input_shape, num_classes)
model.compile(optimizer=Adam(learning_rate=0.001), loss=['categorical_crossentropy', 'mean_squared_error'])

# Custom loss function
def custom_loss(y_true, y_pred):
    classification_loss = tf.keras.losses.categorical_crossentropy(y_true[:, :num_classes], y_pred[:, :num_classes])
    localization_loss = tf.keras.losses.mean_squared_error(y_true[:, num_classes:], y_pred[:, num_classes:])
    return classification_loss + localization_loss

# Train the model
batch_size = 32
epochs = 50
model.compile(optimizer=Adam(learning_rate=0.001), loss=custom_loss)
history = model.fit(X_train, np.hstack((y_labels_train, y_bboxes_train)), epochs=epochs, batch_size=batch_size, validation_data=(X_test, np.hstack((y_labels_test, y_bboxes_test))))

# Evaluate the model
loss = model.evaluate(X_test, np.hstack((y_labels_test, y_bboxes_test)))
print(f'Loss: {loss}')

# Assert that the final loss is below the threshold
assert loss < 0.1, f"Model loss is too high: {loss}. Expected loss < 0.1"

# Predict on test images
predictions = model.predict(X_test)
y_pred_labels = predictions[:, :num_classes]
y_pred_bboxes = predictions[:, num_classes:]

# Function to draw bounding box on image
def draw_bbox(image, bbox):
    x, y, w, h = bbox
    x1, y1, x2, y2 = int(x), int(y), int(x + w), int(y + h)
    return cv2.rectangle(image, (x1, y1), (x2, y2), (255, 0, 0), 2)

# Visualize predictions
import matplotlib.pyplot as plt

for i in range(5):
    image = X_test[i].copy()
    true_bbox = y_bboxes_test[i]
    pred_bbox = y_pred_bboxes[i]
    image_true = draw_bbox(image.copy(), true_bbox)
    image_pred = draw_bbox(image.copy(), pred_bbox)
    plt.figure(figsize=(10, 5))
    plt.subplot(1, 2, 1)
    plt.title('True Bounding Box')
    plt.imshow(image_true)
    plt.subplot(1, 2, 2)
    plt.title('Predicted Bounding Box')
    plt.imshow(image_pred)
    plt.show()

# **Conclusion:**

Using a CNN for multi-class object detection and localization involves extending the network to predict class probabilities and bounding box coordinates for multiple objects. By training the model with a combined loss function, it learns to accurately detect and localize multiple objects in images.
