## Classifying images using PyTorch

This example is good for image classification (describe the overall context of the image matching to a label). It utilizes ResNet50 as the preferred model for Image Classification and is built on a PyTorch backend. Command line arguments section can be un-commented to allow for more customization.

Input images will be classified using the top predictions among the 1000 classes recognized by the ImageNet challenge. Try it on, modify the inputImage variable to suit the needs.

**TO DO: expand the config.py to make all configs in a single place**

### Environment configuration

In [1]:
pip install numpy==1.24.1

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
pip install torch torchvision

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [3]:
pip install opencv-contrib-python

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


In [4]:
pip install imutils

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip available: 22.2.2 -> 23.0.1
[notice] To update, run: python.exe -m pip install --upgrade pip


#### Import the necessary packages

In [5]:
# import config.py script on project structure
from resources import config
# import PyTorch's pre-rained neural networks
from torchvision import models
import imutils
import numpy as np
import argparse
# access PyTorch API
import torch
# import OpenCV bindings
import cv2

#### Define function to accept input images and preprocess it

In [6]:
def preprocess_image(image):
	# swap the color channels from BGR to RGB, resize it, and scale
	# the pixel values to [0, 1] range
	image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
	image = cv2.resize(image, (config.IMAGE_SIZE, config.IMAGE_SIZE))
	image = image.astype("float32") / 255.0
	# subtract ImageNet mean, divide by ImageNet standard deviation,
	# set "channels first" ordering, and add a batch dimension
	image -= config.MEAN
	image /= config.STD
	image = np.transpose(image, (2, 0, 1))
	image = np.expand_dims(image, 0)
	# return the preprocessed image
	return image

We start the preprocessing operations by:

- Swapping from BGR to RGB channel ordering (the pre-trained networks we’re using here utilized RGB channel ordering whereas OpenCV uses BGR ordering by default)
- Resizing our image to fixed dimensions (i.e., 224×224), ignoring aspect ratio
- Converting our image to a floating point data type and then scaling the pixel intensities to the range [0, 1]

### Model preparation - command line arguments

In [7]:
# uncomment to run using cl arguments
""" # construct the argument parser and parse the arguments
ap = argparse.ArgumentParser()
ap.add_argument("-i", "--image", required=True,
	help="path to the input image")
ap.add_argument("-m", "--model", type=str, default="vgg16",
	choices=["vgg16", "vgg19", "inception", "densenet", "resnet"],
	help="name of pre-trained network to use")
args = vars(ap.parse_args()) """

# use below if running as a Jupyter Notebook
inputImage = "images/soda.jpg"
pretrainedModel = models.resnet50(pretrained=True)
model = pretrainedModel.to(config.DEVICE)
model.eval()




ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 

- Specifying the *pretrained=True* flag instructs PyTorch to not only load the model architecture definition, but also download the pre-trained ImageNet weights for the model.
- We put our model into evaluation mode, instructing PyTorch to handle special layers. Putting your model into evaluation mode before making predictions is critical, so don’t forget to do it!

Now that our model is loaded, we need an input image — let’s take care of that now:

### Input image loading

In [8]:
# load the image from disk, clone it (so we can draw on it later),
# and preprocess it
print("[INFO] loading image...")
# only use below if running with cl arguments
# image = cv2.imread(args["image"])
image = cv2.imread(inputImage)
# scaling the original image to screen-friendly aspect ratio
resized = imutils.resize(image, width = 450)
orig = resized.copy()
image = preprocess_image(image)
# convert the preprocessed image to a torch tensor and flash it to
# the current device
image = torch.from_numpy(image)
image = image.to(config.DEVICE)
# load the preprocessed the ImageNet labels
print("[INFO] loading ImageNet labels...")
imagenetLabels = dict(enumerate(open(config.IN_LABELS)))

[INFO] loading image...
[INFO] loading ImageNet labels...


- Input image is loaded from disk. A copy is made to draw on top of it to visualize the prediction. Preprocessing is called for resizing.
- image is converted from a NumPy array to a PyTorch tensor and loads it to CPU (or GPU if available).
- ImageNet class labels are loaded from disk

### Making a prediction

In [9]:
# classify the image and extract the predictions
# only use below if running with cl arguments
# print("[INFO] classifying image with '{}'...".format(args["model"]))
print("[INFO] classifying image with '{}'...".format(pretrainedModel))
logits = model(image)
probabilities = torch.nn.Softmax(dim=-1)(logits)
sortedProba = torch.argsort(probabilities, dim=-1, descending=True)
# loop over the predictions and display the rank-5 predictions and
# corresponding probabilities to our terminal
for (i, idx) in enumerate(sortedProba[0, :5]):
	print("{}. {}: {:.2f}%".format
		(i, imagenetLabels[idx.item()].strip(),
		probabilities[0, idx.item()] * 100))

[INFO] classifying image with 'ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): Bottleneck(
      (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)
      (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (downsample): Sequential(
        (0): Conv2d(64, 256, 

- **logits** performs a forward-pass of our network, resulting in the outputs of the network.
- We pass the outputs through **Softmax** to obtain predicted probabilities for the possible classes.
- Probabilities are sorted in descending order (higher probabilities are top of the list).
- Top 5 probabilities are looped-over and displayed

### Drawing results on screen

In [10]:
# draw the top prediction on the image and display the image to
# our screen
(label, prob) = (imagenetLabels[probabilities.argmax().item()],
	probabilities.max().item())
cv2.putText(orig, "Label: {}, {:.2f}%".format(label.strip(), prob * 100),
	(10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 255), 2)
cv2.imshow("Classification", orig)
cv2.waitKey(0)

-1