# Introduction

Now, let's put in practice some if the concepts learn in Pytorch. In this demo we will add some code to detect object.

### Step 1. Importing Required Libraries
First, make sure you have the required libraries installed. You can install them using pip if you haven't already:

In [None]:
pip install torch torchvision

In [2]:
import torch
import torchvision
from torchvision import transforms
from PIL import Image
import matplotlib.pyplot as plt
import matplotlib.patches as patches

- **torch**: The main PyTorch library for tensor operations and neural network building.
- **torchvision**: A library for computer vision tasks that provides pre-trained models, datasets, and image transformations.
- **transforms**: A module in torchvision for common image transformations.
- **PIL**: Python Imaging Library, used here to open and manipulate images.
- **matplotlib**: A plotting library used for visualizing images and drawing bounding boxes.
- **patches**: A module from matplotlib used to create rectangles (bounding boxes).

### Step 2. Loading Pre-Trained Model

In [None]:
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)
model.eval()

- **torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)**: Loads a pre-trained Faster R-CNN model with a ResNet-50 backbone. The _pretrained=True_ argument specifies that the model weights are pre-trained on the COCO dataset.
- **model.eval()**: Sets the model to evaluation mode, which is necessary for inference. This disables certain layers like dropout which are only used during training.

### Defining a Function to Get Predictions

In [11]:
def get_prediction(img_path, threshold):
    img = Image.open(img_path).convert('RGB')
    transform = transforms.Compose([
        transforms.ToTensor()
    ])
    img = transform(img)
    pred = model([img])
    pred_class = [COCO_INSTANCE_CATEGORY_NAMES[i] for i in list(pred[0]['labels'].numpy())] # Get the Prediction Score
    pred_boxes = [[(i[0], i[1]), (i[2], i[3])] for i in list(pred[0]['boxes'].detach().numpy())] # Bounding boxes
    pred_score = list(pred[0]['scores'].detach().numpy())
    pred_t = [pred_score.index(x) for x in pred_score if x > threshold][-1]
    pred_boxes = pred_boxes[:pred_t+1]
    pred_class = pred_class[:pred_t+1]
    return pred_boxes, pred_class


- **img_path**: Path to the input image.
- **Image.open(img_path)**: Opens the image using PIL.
- **transforms.Compose([transforms.ToTensor()])**: Creates a composition of transformations. Here, it converts the image to a tensor.
- **img = transform(img)**: Applies the transformation to the image.
- **model([img])**: Passes the image through the model to get predictions. The model expects a list of images, hence the [img].
- **pred_class**: Extracts the class labels of the detected objects and converts them to readable class names using **COCO_INSTANCE_CATEGORY_NAMES**.
- **pred_boxes**: Extracts the bounding boxes for the detected objects.
- **pred_score**: Extracts the confidence scores for the detected objects.
- **pred_t**: Filters the predictions based on the given threshold. Only predictions with a confidence score higher than the threshold are considered.
- **pred_boxes** and **pred_class** are sliced to include only the predictions above the threshold.
- **return pred_boxes, pred_class**: Returns the filtered bounding boxes and class names.

### Function to Plot Image with Bounding Boxes

In [5]:
def plot_image_with_boxes(img_path, boxes, pred_cls):
    img = Image.open(img_path)
    fig, ax = plt.subplots(1, figsize=(12,9))
    ax.imshow(img)
    for box, cls in zip(boxes, pred_cls):
        rect = patches.Rectangle(box[0], box[1][0] - box[0][0], box[1][1] - box[0][1], linewidth=2, edgecolor='r', facecolor='none')
        ax.add_patch(rect)
        plt.text(box[0][0], box[0][1], cls, fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))
    plt.show()

- **img_path**: Path to the input image.
- **boxes**: List of bounding boxes.
- **pred_cls**: List of predicted class names.
- **Image.open(img_path)**: Opens the image using PIL.
- **fig, ax = plt.subplots(1, figsize=(12,9))**: Creates a subplot with a specific size for displaying the image.
- **ax.imshow(img)**: Displays the image in the subplot.
- **for box, cls in zip(boxes, pred_cls)**: Iterates through the bounding boxes and class names.
- **patches.Rectangle**: Creates a rectangle (bounding box) with the specified coordinates, line width, and color.
- **ax.add_patch(rect)**: Adds the rectangle to the subplot.
- **plt.text(box[0][0], box[0][1], cls, fontsize=12, bbox=dict(facecolor='yellow', alpha=0.5))**: Adds a text label (class name) near the bounding box with a yellow background.
- **plt.show()**: Displays the plot.

### Step 5. COCO Clases

This list contains the class names for the COCO dataset, which the Faster R-CNN model is trained on. The indices of this list correspond to the class labels predicted by the model.

In [6]:
COCO_INSTANCE_CATEGORY_NAMES = [
    '__background__', 'person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus',
    'train', 'truck', 'boat', 'traffic light', 'fire hydrant', 'N/A', 'stop sign',
    'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow', 'elephant',
    'bear', 'zebra', 'giraffe', 'N/A', 'backpack', 'umbrella', 'N/A', 'N/A', 'handbag',
    'tie', 'suitcase', 'frisbee', 'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat',
    'baseball glove', 'skateboard', 'surfboard', 'tennis racket', 'bottle', 'N/A', 'wine glass',
    'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple', 'sandwich', 'orange', 'broccoli',
    'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch', 'potted plant', 'bed', 'N/A',
    'dining table', 'N/A', 'N/A', 'toilet', 'N/A', 'tv', 'laptop', 'mouse', 'remote', 'keyboard',
    'cell phone', 'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'N/A', 'book', 'clock',
    'vase', 'scissors', 'teddy bear', 'hair drier', 'toothbrush'
]

### Example Usage

Once, we have setup and define the needed functions, next step is to test it. For this, we will test 3 different images and see how the code behaves:

1. **bird_test.png:** The code is able to predict it properly.
2. **hotdog_test.png:** The code is able to predict it properly.
3. **coco_test.png:** The code is cannot predict properly. It detects the object as a cake.

In [None]:
img_path = '../images/bird_test.png'  # Replace with your image path
threshold = 0.8  # Confidence threshold
boxes, pred_cls = get_prediction(img_path, threshold)
plot_image_with_boxes(img_path, boxes, pred_cls)

- **img_path**: Path to the input image. Replace 'path/to/your/image.jpg' with the actual path.
- **threshold**: Confidence threshold for filtering predictions. Only predictions with a score above this value will be considered.
- **get_prediction(img_path, threshold)**: Calls the function to get the predictions.
- **plot_image_with_boxes(img_path, boxes, pred_cls)**: Calls the function to plot the image with bounding boxes and class labels.

# Conclusion

In this demo we explore one of many use cases where PyTorch can be used and how it can detect objects.