## Introduction
<p style="text-align: justify;">
Welcome to this comprehensive guide on object detection using the Keras-CV library. Object detection is a crucial computer vision task that involves identifying and locating objects within images. 
</p>
<p style="text-align: justify;">
Image object detection is a complex task in the realm of computer vision, posing unique challenges that demand sophisticated solutions. Unlike single-output tasks, object detection requires training models to predict both bounding box coordinates and class labels, adding a layer of intricacy to the learning process. Additionally, the scarcity of labeled examples for object detection further intensifies the difficulty of training accurate and robust models.
</p>
<p style="text-align: justify;">
Object detection involves identifying and precisely localizing multiple objects within an image. This dual nature of the task—predicting bounding box coordinates and associated class labels—introduces a higher level of complexity compared to single-output tasks like image classification. Achieving accurate localization and classification simultaneously demands advanced architectures and careful training strategies.
Keras-CV simplifies the development of object detection models, providing pre-trained models and powerful utilities.
</p>
<p style="text-align: justify;">
In this notebook, we will explore step-by-step procedures for performing object detection using Keras-CV, starting from dataset preparation to model training and evaluation. We'll use the COCO 2017 dataset as an example, but the techniques can be applied to custom datasets with minimal adjustments.
</p>

**If you think this notebook could be a resource for others, consider giving it an upvote for better discoverability!**

## Online Demo
<p style="text-align: justify;">
You can work with online demo in the following address:
<a
href='https://hamiddamadi.ir/app/imageObjectDetection'
target='_blank'
rel="noreferrer"
>
https://hamiddamadi.ir/app/imageObjectDetection.
</a>
</p>

## Prerequisites
<p style="text-align: justify;">
Make sure you have the required libraries installed before running the notebook.
</p>
<p style="text-align: justify;">
This code cell installs the pycocotools package using the pip package manager. 'pycocotools' is a Python API that provides tools for working with the COCO (Common Objects in Context) dataset. This dataset is commonly used for object detection, segmentation, and captioning tasks in computer vision. Installing this package allows you to access and manipulate the COCO dataset in your code.
</p>

In [None]:
!pip install pycocotools

## Import Required Libraries and Modules
This code cell imports several Python libraries and modules:

* **keras_cv**: It seems to be a library related to Keras and computer vision tasks.
* **cv2**: The OpenCV library for computer vision tasks.
* **tensorflow** (imported as tf): TensorFlow, a popular machine learning library.
* **os**: The OS module for interacting with the operating system.
* **numpy** (imported as np): NumPy, a library for numerical operations in Python.
* **pandas** (imported as pd): Pandas, a library for data manipulation and analysis.
* **pycocotools**: The Python API for working with the COCO (Common Objects in Context) dataset.

In [None]:
import keras_cv 
import cv2
import matplotlib.pyplot as plt
import tensorflow as tf
import os
import numpy as np
import pandas as pd
import pycocotools

In [None]:
# Constants
IMAGE_SIZE = (640, 640)
AUTOTUNE = tf.data.AUTOTUNE # Used for configuring parallelism in TensorFlow's data loading pipeline
BBOX_FORMAT = 'xywh' # Indicating the format of bounding boxes
BATCH_SIZE = 4

## Prepare Dataset For KerasCV Models
<p style="text-align: justify;">
This section is about prepartion of dataset to adjust with keras_cv models' dataset format. We'll use the COCO 2017 dataset as an example, but the techniques can be applied to custom datasets with minimal adjustments.
</p>
<p style="text-align: justify;">
For KerasCV to work with your bounding boxes, you need to convert them into a specific dictionary format. The following requirements outline the necessary structure.

```
bounding_boxes = {
    # num_boxes may be a Ragged dimension
    'boxes': Tensor(shape=[batch, num_boxes, 4]),
    'classes': Tensor(shape=[batch, num_boxes])
}
```
</p>

<p style="text-align: justify;">
KEYS_TO_KEEP: This is a list containing the class labels that we want to keep from the COCO dataset. We will filter out annotations corresponding to classes not present in this list.
</p>

In [None]:
# Load COCO annotations (adjust the paths accordingly)
coco_annotation_file = "/kaggle/input/coco-2017-dataset/coco2017/annotations/instances_train2017.json"
coco_image_dir_train = "/kaggle/input/coco-2017-dataset/coco2017/train2017"
coco_train = pycocotools.coco.COCO(coco_annotation_file)
KEYS_TO_KEEP = [1, 2, 3, 4]

<p style="text-align: justify;">
There are a few bounding box formats. For more info on supported bounding box formats, visit
<a
href='https://keras.io/api/keras_cv/bounding_box/formats/'
target='_blank'
rel="noreferrer"
>
https://keras.io/api/keras_cv/bounding_box/formats/.
</a>
</p>
<p style="text-align: justify;">
First, it is possible to perform format conversion between any two pairs:
</p>

```
def normalize_bbox(inputs):
    image = inputs["image"]
    normalized_bbox = keras_cv.bounding_box.convert_format(
        inputs["objects"]["bbox"],
        images=image,
        source="xywh",
        target=BBOX_FORMAT,
    )
    return normalized_bbox
```

<p style="text-align: justify;">
This function is responsible for normalizing the bounding box coordinates to the required format. It takes the bounding box coordinates and the image dimensions as input and returns the normalized coordinates.
</p>
<p style="text-align: justify;">
Note that our source and target bounding_box_format are the same, so, we don't use this function. 
</p>
<p style="text-align: justify;">
Now, we define various functions that will be used throughout the notebook for preparing the dataset. These include:
</p>

* **get_annotations_for_image**: This function retrieves the annotations for a specific image ID from the COCO dataset. It filters out annotations based on the classes specified in the KEYS_TO_KEEP list.

* **decode_and_resize**: This function reads and decodes images from file paths. It is used to preprocess images before feeding them into the model.

* **load_coco_dataset**: This function loads the COCO dataset, retrieves annotations for each image, and filters out unwanted classes using the get_annotations_for_image function.

* **load_dataset**: This function loads and preprocesses a single dataset sample. It reads an image file, decodes it, and prepares the bounding box and class label information in the required format.

* **make_dataset**: This function creates a TensorFlow dataset by loading and preprocessing a specified range of images from the COCO dataset. It utilizes the load_coco_dataset and load_dataset functions to prepare the dataset for training.

In [None]:
def get_annotations_for_image(image_id, coco):
    # Get annotations for the specified image ID from COCO dataset
    annotation_ids = coco.getAnnIds(imgIds=image_id)
    annotations = coco.loadAnns(annotation_ids)

    # Placeholder for bounding box and label information
    bounding_boxes = []
    labels = []

    for annotation in annotations:
        bbox = annotation['bbox']  # Format: [x, y, width, height]
        label = annotation['category_id'] 
        if(label in KEYS_TO_KEEP):
            converted_label = KEYS_TO_KEEP.index(label)
            bounding_boxes.append(bbox)
            labels.append(converted_label)

    return {'boxes': bounding_boxes, 'labels': labels}


def decode_and_resize(img_path):
    img = tf.io.read_file(img_path)
    img = tf.image.decode_jpeg(img, channels=3)
    return img


def load_coco_dataset(image_ids, coco, coco_image_dir):
    images = []
    boxes_list = []
    labels_list = []

    for imgid in image_ids:
        img = coco.loadImgs(imgid)[0]
        img_name = img['file_name'].strip()
        img_path = os.path.join(coco_image_dir, img_name)

        if os.path.isfile(img_path) and os.path.getsize(img_path):
            annotations = get_annotations_for_image(imgid, coco)
            if len(annotations['boxes']) > 0 and len(annotations['labels']) > 0:
                images.append(img_path)
                boxes_list.append(annotations['boxes'])
                labels_list.append(annotations['labels'])
    
    return images, boxes_list, labels_list

def load_dataset(image_path, classes, bbox):
    # Read Image
    image = decode_and_resize(image_path)
    bounding_boxes = {
        "classes": tf.cast(classes, dtype=tf.float32),
        "boxes": bbox,
    }
    return {"images": tf.cast(image, tf.float32), "bounding_boxes": bounding_boxes}

def make_dataset(m,n, coco, coco_image_dir):
    image_ids = coco.getImgIds()[m:n]
    image_paths, bbox, classes = load_coco_dataset(image_ids, coco, coco_image_dir)
    bbox = tf.ragged.constant(bbox)
    classes = tf.ragged.constant(classes)
    image_paths = tf.ragged.constant(image_paths)

    dataset = tf.data.Dataset.from_tensor_slices((image_paths, classes, bbox))
    dataset = dataset.map(load_dataset, num_parallel_calls=AUTOTUNE)
    dataset = dataset.ragged_batch(BATCH_SIZE, drop_remainder=True)
    dataset = dataset.shuffle(8 * BATCH_SIZE)
    dataset = dataset.apply(tf.data.experimental.ignore_errors())
    dataset = dataset.prefetch(AUTOTUNE)
    
    return dataset

Now that we have defined our constants and functions, let's use them to create the training and validation datasets.

* **train_dataset**: This dataset is created using the make_dataset function, specifying the range of images from 0 to 10,000. It will be used for training the object detection model.

* **val_dataset**: Similarly, the validation dataset is created by specifying the range of images from 10,000 to 13,000. This dataset will be used to evaluate the model's performance on unseen data.

These datasets are crucial for training and evaluating the model's ability to detect objects in different images. The data is shuffled, batched, and preprocessed to ensure optimal training conditions.

In [None]:
train_dataset = make_dataset(0,500, coco_train, coco_image_dir_train)
val_dataset = make_dataset(500,700, coco_train, coco_image_dir_train)

Finally, we refine the class mapping to suit our specific needs for the object detection model.

* **get_class_mapping**: The function retrieves the original class mapping from the COCO dataset. This mapping includes category IDs and their corresponding names.
* **original_dict**: The original class mapping obtained from the COCO dataset.
* **key_mapping**: A mapping is created to convert original keys (category IDs) to a desired range [0, len(KEYS_TO_KEEP)). This is important for aligning the class indices with our reduced set of classes.
* **class_mapping**: The final class mapping is generated by applying the key_mapping to the original dictionary. Only the classes present in KEYS_TO_KEEP are retained, and their IDs are adjusted to start from 0.

In [None]:
 def get_class_mapping():
    categories = coco_train.loadCats(coco_train.getCatIds())
    class_mapping = {category['id']: category['name'] for category in categories}
    return class_mapping

original_dict = get_class_mapping()

# Create a mapping to convert original keys to the desired range [0, len(KEYS_TO_KEEP))
key_mapping = {key: i for i, key in enumerate(sorted(KEYS_TO_KEEP))}

class_mapping = {key_mapping[key]: value for key, value in original_dict.items() if key in KEYS_TO_KEEP}

## Visualization and Data Augmentation

<p style="text-align: justify;">
Here, we define a function visualize_dataset to display a gallery of images with their corresponding bounding boxes. This can be useful for visually inspecting the dataset and verifying the correctness of the preprocessing steps.
<p>
Inputs:

* **inputs**: Dataset input containing images and bounding boxes.
* **rows**: Number of rows in the visualization grid.
* **cols**: Number of columns in the visualization grid.
    
<p style="text-align: justify;">
The function extracts a batch from the dataset (next(iter(inputs.take(1)))). It then retrieves images and bounding boxes from the batch. Finally, the plot_bounding_box_gallery function from keras_cv.visualization is used to visualize the images with bounding boxes, applying the specified parameters.
</p>
<p style="text-align: justify;">
This visualization aids in understanding how the data is processed and prepared for training.
</p>

In [None]:
def visualize_dataset(inputs, rows, cols):
    inputs = next(iter(inputs.take(1)))
    images, bounding_boxes = inputs["images"], inputs["bounding_boxes"]
    keras_cv.visualization.plot_bounding_box_gallery(
        images,
        value_range=(0,255),
        rows=rows,
        cols=cols,
        y_true=bounding_boxes,
        scale=5,
        font_scale=0.7,
        bounding_box_format=BBOX_FORMAT,
        class_mapping=class_mapping,
    )

<p style="text-align: justify;">
This code snippet visualizes a sample from the training dataset. It extracts a batch from the dataset, retrieves images and their corresponding bounding boxes, and then plots them using the plot_bounding_box_gallery function from keras_cv.visualization. Adjust the rows and cols parameters to change the layout of the gallery.
</p>

In [None]:
visualize_dataset(train_dataset, rows=2, cols=2)

<p style="text-align: justify;">
This code snippet demonstrates the application of data augmentation to the training dataset using a list of augmenters. The create_augmenter_fn function is defined to create an augmenter function that applies each augmenter in the list sequentially. The resulting augmenter function is then applied to the training dataset using the map function. Adjust the augmenters and their parameters based on your specific requirements.
</p>

In [None]:
augmenters = [
    keras_cv.layers.RandomFlip(mode="horizontal", bounding_box_format=BBOX_FORMAT),
    keras_cv.layers.JitteredResize(
        target_size=IMAGE_SIZE, scale_factor=(0.75, 1.3), bounding_box_format=BBOX_FORMAT
    ),
]


def create_augmenter_fn(augmenters):
    def augmenter_fn(inputs):
        for augmenter in augmenters:
            inputs = augmenter(inputs)
        return inputs

    return augmenter_fn


augmenter_fn = create_augmenter_fn(augmenters)

train_dataset = train_dataset.map(augmenter_fn, num_parallel_calls=AUTOTUNE)

In [None]:
visualize_dataset(train_dataset, rows=2, cols=2)

Instead of using JitteredResize, let's use the deterministic keras_cv.layers.Resizing() layer.

In [None]:
inference_resizing = keras_cv.layers.Resizing(
    IMAGE_SIZE[0], IMAGE_SIZE[1], bounding_box_format=BBOX_FORMAT, pad_to_aspect_ratio=True
)
val_dataset = val_dataset.map(inference_resizing, num_parallel_calls=AUTOTUNE)


In [None]:
visualize_dataset(val_dataset, rows=2, cols=2)

## Creating the Model
<p style="text-align: justify;">
This code snippet shows how to create a YOLOv8Detector model from a preset. The model is instantiated with the yolo_v8_m backbone, the specified bounding box format, and the number of classes obtained from the class_mapping dictionary. Adjust the backbone and other parameters based on your model requirements.
</p>

In [None]:
model = keras_cv.models.YOLOV8Detector.from_preset(
    "yolo_v8_m_backbone",
    bounding_box_format=BBOX_FORMAT,
    num_classes=4
)

<p style="text-align: justify;">
In this code snippet, the base learning rate is set to 0.005, and an SGD optimizer is created with a momentum of 0.9 and a global clipnorm of 10.0. The global clipnorm is crucial for stability in object detection tasks. You can adjust the learning rate and other parameters based on your specific requirements.
</p>

In [None]:
base_lr = 0.005
optimizer = tf.keras.optimizers.SGD(
    learning_rate=base_lr, momentum=0.9, global_clipnorm=10.0
)

<p style="text-align: justify;">
Here, the YOLOv8 model is compiled using binary crossentropy for classification loss and the Complete IoU (CIoU) loss for bounding box regression. The previously defined SGD optimizer is used. You can customize the choice of losses and optimizer based on your specific use case.
</p>

In [None]:
model.compile(
    classification_loss="binary_crossentropy",
    box_loss="ciou",
    optimizer=optimizer,
)

## Training and Saving the Model
<p style="text-align: justify;">
We need to extract the inputs from the preprocessing dictionary and get them ready to be fed into the model.
</p>

In [None]:
def dict_to_tuple(inputs):
    return inputs["images"], keras_cv.bounding_box.to_dense(
        inputs["bounding_boxes"], max_boxes=32
    )


train_dataset = train_dataset.map(dict_to_tuple, num_parallel_calls=AUTOTUNE)
val_dataset = val_dataset.map(dict_to_tuple, num_parallel_calls=AUTOTUNE)

<p style="text-align: justify;">
This callback is used to evaluate metrics on the validation dataset during training. It's specifically designed for tasks like object detection with bounding box annotations in COCO format. The bounding_box_format parameter should be set according to the format used in your dataset. We also save our model when the mAP score improves.
</p>

In [None]:
class EvaluateCOCOMetricsCallback(tf.keras.callbacks.Callback):
    def __init__(self, data, save_path):
        super().__init__()
        self.data = data
        self.metrics = keras_cv.metrics.BoxCOCOMetrics(
            bounding_box_format=BBOX_FORMAT,
            evaluate_freq=1e9,
        )

        self.save_path = save_path
        self.best_map = -1.0

    def on_epoch_end(self, epoch, logs):
        self.metrics.reset_state()
        for batch in self.data:
            images, y_true = batch[0], batch[1]
            y_pred = self.model.predict(images, verbose=0)
            self.metrics.update_state(y_true, y_pred)

        metrics = self.metrics.result(force=True)
        logs.update(metrics)

        current_map = metrics["MaP"]
        if current_map > self.best_map:
            self.best_map = current_map
            self.model.save(self.save_path)  # Save the model when mAP improves

        return logs

<p style="text-align: justify;">
This code snippet trains the model using the fit method. The training dataset (train_dataset) is passed to the fit method, and the training process runs for a specified number of epochs (Run for 10-35~ epochs to achieve good scores). The EvaluateCOCOMetricsCallback is added to the list of callbacks, which allows the evaluation of COCO metrics on the validation dataset during training.
</p>

In [None]:
history = model.fit(
    train_dataset,
    validation_data=val_dataset,
    epochs=30,
    callbacks=[EvaluateCOCOMetricsCallback(val_dataset, "model.keras")]
)

In [None]:
def visualize_train_history():
    """
    Visualize the training history (accuracy and loss).

    Parameters:
        history (tf.keras.callbacks.History): Training history.
    """
    plt.figure(figsize=(12, 4))
    # Plot training & validation loss values
    plt.subplot(1, 2, 2)
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('Model loss')
    plt.xlabel('Epoch')
    plt.ylabel('Loss')
    plt.legend(['Train', 'Validation'], loc='upper left')

    plt.tight_layout()
    plt.show()

visualize_train_history()

## Evaluating the Model

In [None]:
# Load COCO annotations (adjust the paths accordingly)
coco_annotation_file_test = "/kaggle/input/coco-2017-dataset/coco2017/annotations/instances_val2017.json"
coco_image_dir_test = "/kaggle/input/coco-2017-dataset/coco2017/val2017"
coco_test = pycocotools.coco.COCO(coco_annotation_file_test)

test_dataset = make_dataset(0,400, coco_test, coco_image_dir_test)
test_dataset = test_dataset.map(inference_resizing, num_parallel_calls=AUTOTUNE)
test_dataset = test_dataset.map(dict_to_tuple, num_parallel_calls=AUTOTUNE)

model.evaluate(test_dataset)

## Inference
<p style="text-align: justify;">
You may need to configure your MultiClassNonMaxSuppression operation to achieve visually appealing results.
</p>

In [None]:
model.prediction_decoder = keras_cv.layers.MultiClassNonMaxSuppression(
    bounding_box_format=BBOX_FORMAT,
    from_logits=True,
    iou_threshold=0.2,
    confidence_threshold=0.6,
)

In [None]:
# Load and preprocess image
# image = cv2.imread("/kaggle/input/coco-2017-dataset/coco2017/test2017/000000000001.jpg")
filepath = tf.keras.utils.get_file(origin="https://stackabuse.s3.amazonaws.com/media/object-detection-with-imageai-python-1.jpg")
image = tf.keras.utils.load_img(filepath)
image = np.array(image)
image_batch = inference_resizing([image])
# Run prediction
y_pred = model.predict(image_batch)
print(y_pred)
# y_pred is a bounding box Tensor:
# {"classes": ..., boxes": ...}
keras_cv.visualization.plot_bounding_box_gallery(
    image_batch,
    value_range=(0, 255),
    rows=1,
    cols=1,
    y_pred=y_pred,
    scale=5,
    font_scale=0.7,
    bounding_box_format=BBOX_FORMAT,
    class_mapping=class_mapping
)
