## Object detection using PyTorch

This example is good for object detection (classify and locate the objects within an image). It utilizes RetinaNet as the preferred model and is built on a PyTorch backend. Command line arguments section can be un-commented to allow for more customization.

Available models w/ PyTorch:
* Faster R-CNN w/ ResNet50 backbone (accurate, slower)
* Faster R-CNN w/ MobileNet v3 backbone (faster, less accurate)
* RetinaNet w/ ResNet50 backbone (balance accuracy and speed)

Input images will be classified using the top predictions among the 90 classes recognized by the COCO dataset. Try it on, modify the inputImage variable to suit the needs.

### Environment configuration

In [1]:
pip install numpy==1.24.1

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
pip install torch torchvision

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
pip install opencv-contrib-python

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [4]:
pip install imutils

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


### Importing packages

In [5]:
# import the necessary packages
from torchvision.models import detection
import numpy as np
import imutils
import argparse
import pickle
import torch
import cv2

The most important import is detection from torchvision.models. The detection module contains PyTorch’s pre-trained object detectors.

### Model preparation - command line arguments

In [6]:
""" # construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()

ap.add_argument("-i", "--image", type=str, default="images/baseball.jpg",
	help="path to the input image")
ap.add_argument("-m", "--model", type=str, default="frcnn-mobilenet",
	choices=["frcnn-resnet", "frcnn-mobilenet", "retinanet"],
	help="name of the object detection model")
ap.add_argument("-l", "--labels", type=str, default="coco_classes.pickle",
	help="path to file containing list of categories in COCO dataset")
ap.add_argument("-c", "--confidence", type=float, default=0.5,
	help="minimum probability to filter weak detections")
args = vars(ap.parse_args()) """

defaultImage = "images/baseball.jpg"
defaultModel = "retinanet"
defaultLabels = "resources/coco_classes.pickle"
defaultConfidence = 0.5


**Note: edit the defaults for easier config without cl arguments (in a Jupyter Notebook for example)**

We have a number of command line arguments here, including:

- image: The path to the input image we want to apply object detection to
- model: The type of PyTorch object detector we’ll be using (Faster R-CNN + ResNet, Faster R-CNN + MobileNet, or RetinaNet + ResNet)
- labels: The path to the COCO labels file, containing human readable class labels
- confidence: Minimum predicted probability to filter out weak detections

### Initialization

In [7]:
# set the device we will be using to run the model
DEVICE = torch.device("cuda" if torch.cuda.is_available() else "cpu")
# load the list of categories in the COCO dataset and then generate a
# set of bounding box colors for each class

# uncomment below for cl arguments
# CLASSES = pickle.loads(open(args["labels"], "rb").read())
CLASSES = pickle.loads(open(defaultLabels, "rb").read())
COLORS = np.random.uniform(0, 255, size=(len(CLASSES), 3))

- **DEVICE**: sets the device we’ll be using for inference (either CPU or GPU).
- **CLASSES**: We then load our class labels from disk.
- **COLORS**: initialize a random color for each unique label. We’ll use these colors when drawing predicted bounding boxes and labels on our output image.

### Models dictionary

In [8]:
# initialize a dictionary containing model name and its corresponding 
# torchvision function call
MODELS = {
	"frcnn-resnet": detection.fasterrcnn_resnet50_fpn,
	"frcnn-mobilenet": detection.fasterrcnn_mobilenet_v3_large_320_fpn,
	"retinanet": detection.retinanet_resnet50_fpn
}
""" 
# uncomment below for cl arguments
# load the model and set it to evaluation mode
model = MODELS[args["model"]](pretrained=True, progress=True,
	num_classes=len(CLASSES), pretrained_backbone=True).to(DEVICE)"""

model = MODELS[defaultModel](pretrained=True, progress=True,
	num_classes=len(CLASSES), pretrained_backbone=True).to(DEVICE)
model.eval()

Downloading: "https://download.pytorch.org/models/retinanet_resnet50_fpn_coco-eeacb38b.pth" to C:\Users\far/.cache\torch\hub\checkpoints\retinanet_resnet50_fpn_coco-eeacb38b.pth
100.0%


RetinaNet(
  (backbone): BackboneWithFPN(
    (body): IntermediateLayerGetter(
      (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
      (bn1): FrozenBatchNorm2d(64, eps=0.0)
      (relu): ReLU(inplace=True)
      (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
      (layer1): Sequential(
        (0): Bottleneck(
          (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn1): FrozenBatchNorm2d(64, eps=0.0)
          (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
          (bn2): FrozenBatchNorm2d(64, eps=0.0)
          (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
          (bn3): FrozenBatchNorm2d(256, eps=0.0)
          (relu): ReLU(inplace=True)
          (downsample): Sequential(
            (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
            (1): FrozenBatchNorm2d(256, eps=0.0)


We define a **MODELS** dictionary to map the name of a given object detector to its corresponding PyTorch function.

We load the model from disk and send it to the appropriate DEVICE. We pass in a number of key parameters, including:

- pretrained: Tells PyTorch to load the model architecture with pre-trained weights on the COCO dataset
- progress=True: Displays download progress bar if model has not already been downloaded and cached
- num_classes: Total number of unique classes
- pretrained_backbone: Also provide the backbone network to the object detector

**Important:** We place the model in evaluation mode

### Preparing input image for object detection

In [9]:
# load the image from disk
# uncomment below for cl arguments
# image = cv2.imread(args["image"])

image = cv2.imread(defaultImage)
# resized = imutils.resize(image, width = 450)
# orig = resized.copy()
orig = image.copy()
# convert the image from BGR to RGB channel ordering and change the
# image from channels last to channels first ordering
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = image.transpose((2, 0, 1))
# add the batch dimension, scale the raw pixel intensities to the
# range [0, 1], and convert the image to a floating point tensor
image = np.expand_dims(image, axis=0)
image = image / 255.0
image = torch.FloatTensor(image)
# send the input to the device and pass the it through the network to
# get the detections and predictions
image = image.to(DEVICE)
detections = model(image)[0]

We load our input image from disk and clone it so that we can draw the bounding box predictions on it later in this script.

We preprocess our image by:

- Converting color channel ordering from BGR to RGB (since PyTorch models were trained on RGB-ordered images)
- Swapping color channel ordering from “channels last” (OpenCV and Keras/TensorFlow default) to “channels first” (PyTorch default)
- Adding a batch dimension
- Scaling pixel intensities from the range [0, 255] to [0, 1]
- Converting the image from a NumPy array to a tensor with a floating point data type

The image is then moved to the appropriate device. At that point, we pass the image through the model to obtain our bounding box predictions.

### Looping over bounding boxes

In [10]:
# loop over the detections
for i in range(0, len(detections["boxes"])):
	# extract the confidence (i.e., probability) associated with the
	# prediction
	confidence = detections["scores"][i]
	# filter out weak detections by ensuring the confidence is
	# greater than the minimum confidence

	# uncomment below for cl arguments
	# if confidence > args["confidence"]:
	if confidence > defaultConfidence:
		# extract the index of the class label from the detections,
		# then compute the (x, y)-coordinates of the bounding box
		# for the object
		idx = int(detections["labels"][i])
		box = detections["boxes"][i].detach().cpu().numpy()
		(startX, startY, endX, endY) = box.astype("int")
		# display the prediction to our terminal
		label = "{}: {:.2f}%".format(CLASSES[idx], confidence * 100)
		print("[INFO] {}".format(label))
		# draw the bounding box and label on the image
		cv2.rectangle(orig, (startX, startY), (endX, endY),
			COLORS[idx], 2)
		y = startY - 15 if startY - 15 > 15 else startY + 15
		cv2.putText(orig, label, (startX, y),
			cv2.FONT_HERSHEY_SIMPLEX, 0.5, COLORS[idx], 2)
# show the output image
cv2.imshow("Output", orig)
cv2.waitKey(0)

[INFO] person: 96.76%
[INFO] baseball glove: 91.22%
[INFO] car: 70.09%
[INFO] person: 69.06%
[INFO] person: 62.83%
[INFO] sports ball: 62.51%
[INFO] baseball bat: 57.82%


-1

We loop over all detections from the network. We then grab the confidence (i.e., probability) associated with the detection.

We filter out weak detections that do not meet our minimum confidence test i.e. "if confidence > 0.5 (default)". This filters out false-positive detections.

From there, we:

- Extract the idx of the class label with the largest corresponding probability
- Obtain the bounding box coordinates and convert them to integers
- Display the prediction to our terminal
- Draw the predicted bounding box and class label on our output image

We wrap up the script by displaying our output image with bounding boxes drawn on it.