<a href="https://colab.research.google.com/github/hxviet/RPNet/blob/main/rpnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

In this notebook, we're going to implement the deep learning model RPNet [1] with PyTorch to detect bounding boxes of and recognize characters on Chinese car license plates. We're going to use images of license plates from the Chinese City Parking Dataset updated in March 2019 (CCPD2019) [2].

## Demo instrucions

RPNet consists of a detection module and a recognition module. I've trained a detection module for 5 epochs on nearly 200k images to achieve a 63% detection accuracy, and I used that pretrained detection module to train RPNet for 10 epochs, which achieve a 37% detection accuracy and 21% recognition accuracy. If you only want to test these two models, read all notes and run all cells in the following sections from top to bottom:
1. **Preparing the environment**
2. **Preparing data** &rarr; **Extracting dataset to disk**
3. **Preparing data** &rarr; **Dataset constants**
4. **Preparing data** &rarr; **Creating Dataset objects**
5. **Defining the model**
6. **Demo**

(To see notebook sections, click the **Table of contents** button (image.png) on Colab's left menu bar.)

## Acknowledgements

A lot of code in this notebook were inspired by [2] (mostly for model definition) and [3] (mostly for training optimization) as well as the documentation of the libraries imported.

I would like to express my gratitude to Dr. Do Ba Lam, my project advisor, and Mr. Tran The Anh at the School of Information and Communications Technology, Hanoi University of Science and Technology for their valuable advice and suggestions. I would also like to thank my father Hoang Xuan Hieu for sponsoring my Google Drive storage and Google Colab compute units.

Hoang Xuan Viet

# Preparing the environment

## Utilities

In [None]:
!pip install torchinfo

In [None]:
import os, math, torch
from typing import Union, Callable, List, Tuple
from torch.utils.data import Dataset, DataLoader, ConcatDataset
from torchvision.io import read_image
from torch import nn, optim
import matplotlib.pyplot as plt
from torchvision import transforms
import torchvision.transforms.functional as F
from torchvision.utils import draw_bounding_boxes
from torchinfo import summary
from torchvision.ops import roi_pool, box_iou, box_convert
from torch.nn.functional import one_hot

In [None]:
def string_plate_num(t: Union[List[int], Tuple[int, ...], torch.Tensor]) -> str:
    """
    Given a list, tuple, or tensor containing indices of the characters of a Chinese license plate,
    returns the corresponding plate number string.
    The represented Chinese license plate must be a Chinese character,
    followed by an English letter, then followed by English letters or digits.
    """
    char_0 = PROVINCES[t[0]]
    char_1 = LETTERS[t[1]]
    remaining_chars = [LETTERS_AND_DIGITS[e] for e in t[2:]]
    return char_0 + char_1 + ''.join(remaining_chars)

def show_images(images: List[torch.Tensor], labels: List[str]=None, max_ncols: int=4, figsize: Tuple[int, int]=None):
    """
    Shows a grid of images
    """
    ncols = min(len(images), max_ncols)
    nrows = math.ceil(len(images) / ncols)
    if not figsize:
        figsize = (5 * ncols, 5 * nrows)
    fig, axs = plt.subplots(nrows, ncols, squeeze=False, figsize=figsize)
    for i, img in enumerate(images):
        ax = axs[divmod(i, ncols)]
        ax.imshow(F.to_pil_image(img.detach()))
        ax.set(xticklabels=[], yticklabels=[], xticks=[], yticks=[])
        if labels:
            ax.set_title(labels[i], fontsize=24)

## Hardware

This notebook should be run with a CUDA-capable GPU. On Colab's top menu bar, go to **Runtime** &rarr; **Change runtime type**, and then under **Hardware accelerator**, choose **GPU**.

In [None]:
device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')
!nvidia-smi

## The working directory

The working directory should
* be in your Google Drive and
* contain
    * [the dataset archive](https://drive.google.com/drive/folders/1qLjlYAczIsjuCqIJIXo2d8VKbdix93BF?usp=share_link) or a shortcut thereto (Learn how to make a shortcut in your Google Drive [here](https://support.google.com/drive/answer/9700156?hl=en&co=GENIE.Platform%3DDesktop&oco=0).) and
    * a directory named **models** storing trained models' parameters (You can make a copy of [this directory](https://drive.google.com/drive/folders/1TP8ecZtUBOB49D1CeCgMcjyjDBy3ugwJ?usp=share_link), which has the parameters of the models I trained. If you only want to test these models, you can make a shortcut to the directory instead of copying).

In [None]:
from google.colab import drive
drive.mount('/content/drive')

To get the path to the directory you would like to be the working directory, click the **Files** button on Colab's left menu bar (image.png), expand **drive** &rarr; **MyDrive** &rarr; some more directories until you get to your desired working directory. Right-click on your desired working directory and click **Copy path**. Then, run the following cell and paste the path when asked for input.

In [None]:
%cd {input('Enter path to desired working directory: ')}

# Preparing data

## Extracting dataset to disk

Because the dataset archive is quite large (about 12 GB), this will take around 10 to 25 minutes.

In [None]:
!tar -xf 'CCPD2019.tar.xz' -C '/content'

## Dataset constants

In [None]:
TEST_SET_NAMES = ('blur', 'challenge', 'db', 'fn', 'rotate', 'tilt', 'weather')
PROVINCES = ("皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O")
LETTERS = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
             'X', 'Y', 'Z', 'O')
LETTERS_AND_DIGITS = ('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
       'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'O')

## Creating `Dataset` objects

In [None]:
class CCPD2019(Dataset):

    def __init__(
            self,
            root: Union[str, os.PathLike],
            img_transform: Callable=None,
            bbox_transform: Callable=None,
            plate_num_transform: Callable=None
    ):
        """
        :param root: path to the dataset (or a subset) directory
        :param img_transform: transform to apply to images
        :param bbox_transform: transform to apply to bounding box labels
        :param plate_num_transform: transform to apply to license plate number labels
        """
        self.img_transform = img_transform
        self.bbox_transform = bbox_transform
        self.plate_num_transform = plate_num_transform
        self.img_paths = []
        if not os.path.isdir(root):
            raise NotADirectoryError(f'Not a directory: {root}')
        print('Creating CCPD2019 object for data at', root, end='\n')
        for dirpath, subdirnames, filenames in os.walk(root):
            self.img_paths.extend([os.path.join(dirpath, fn) for fn in filenames])
            print('\t Added', len(filenames), 'files from', dirpath)
        print('\t Done')

    def __len__(self):
        return len(self.img_paths)

    def __getitem__(self, idx):
        """
        Returns three objects representing an image, the bounding box, and the plate number.

        If no transformation is specified,
        the image is a tensor of size (C, H, W) and dtype `torch.uint8`,
        the bounding box is a tensor in (x_min, y_min, x_max, y_max) format, and
        the plate number is a tensor containing the indices of the 7 characters in the license plate.
        """
        img_path = self.img_paths[idx]
        image = read_image(img_path).to(device)
        annotations = os.path.basename(img_path)
        area, tilt, bbox, vertices, plate_num, brightness, blurriness = annotations.split('-')
        bbox = [[int(i) for i in point.split('&')] for point in bbox.split('_')]
        bbox = torch.tensor(bbox, device=device).flatten()
        plate_num = [int(i) for i in plate_num.split('_')]
        plate_num = torch.tensor(plate_num, device=device)
        if self.img_transform:
            image = self.img_transform(image)
        if self.bbox_transform:
            bbox = self.bbox_transform(bbox)
        if self.plate_num_transform:
            plate_num = self.plate_num_transform(plate_num)
        return image, bbox, plate_num

We're going to apply some transformations to the data before feeding them to the neural network.
* The images are going to be resized to 480x480 and converted to `torch.float32`.
* The bounding box labels are going to be converted to $(c_x, c_y, w, h)$ format, where $c_x$ and $c_y$ are the coordinates of the bounding box center, and $w$ and $h$ are the width and height of the bounding box. All four numbers are between $0$ and $1$.
* The plate number characters are going to be one-hot encoded.

In [None]:
img_transform = transforms.Compose([
    transforms.Resize((480, 480)),
    transforms.Lambda(lambda img: F.convert_image_dtype(img, torch.float32))
])

bbox_transform = transforms.Compose([
    transforms.Lambda(lambda t: box_convert(t, in_fmt='xyxy', out_fmt='cxcywh')),
    transforms.Lambda(lambda t: t / torch.tensor([720, 1160, 720, 1160], device=t.device))
])

In [None]:
# for demo, this cell can be skipped
train_set = CCPD2019(
    root='/content/CCPD2019/ccpd_base',
    img_transform=img_transform,
    bbox_transform=bbox_transform
)

In [None]:
test_sets = {}
num_test_samples = 0
for set_name in TEST_SET_NAMES:
    test_set = CCPD2019(
        root='/content/CCPD2019/ccpd_' + set_name,
        img_transform=img_transform,
        bbox_transform=bbox_transform
    )
    test_sets[set_name] = test_set
    num_test_samples += len(test_set)
print('Total number of test samples:', num_test_samples)

## Creating `DataLoader` objects

In [None]:
batch_size = 64

In [None]:
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, pin_memory=False)
print('Training set loader size:', len(train_loader), 'batches')

In [None]:
test_loaders = {}
for set_name in test_sets:
    loader = DataLoader(test_sets[set_name], batch_size=batch_size, shuffle=True, pin_memory=False)
    test_loaders[set_name] = loader
    print(f'Test set "{set_name}" loader size: {len(loader)} batches')

## Viewing some samples

Let's look at the first batch of samples from the training set to make sure that we've loaded and transformed data correctly. We will draw the bounding boxes in red and print the license plate numbers on top of the images. The Chinese characters might not display because they're missing from the font used but it's okay because if the 6 other characters are correct, it's very almost certain that we've handled data properly.

In [None]:
batch = next(iter(train_loader))
images_with_box, plate_nums = [], []
for i in range(batch_size):
    image = F.convert_image_dtype(batch[0][i], torch.uint8)
    bbox = box_convert(batch[1][i:i+1], 'cxcywh', 'xyxy') * 480
    plate_num = string_plate_num(batch[2][i])
    img_with_box = draw_bounding_boxes(image, bbox, colors=['red'], width=5)
    images_with_box.append(img_with_box)
    plate_nums.append(plate_num)
show_images(images_with_box, plate_nums)

# Defining the model

The following figure [1] describes the architecture of RPNet.

![The architecture of RPNet](https://drive.google.com/uc?export=view&id=1rXhiu4UqT7eFsyafS6Pp_skmWI47cJwQ)

In [None]:
torch.backends.cudnn.benchmark = True #let gpu benchmark convolution algorithms and choose the fastest

## The detection module

In [None]:
class LPDetection(nn.Module):
    """
    The detection module of RPNet
    """

    def __init__(self):
        super(LPDetection, self).__init__()
        feature_map_0 = nn.Sequential(
            nn.Conv2d(in_channels=3, out_channels=48, kernel_size=5, stride=2, padding=2, bias=False),
            nn.BatchNorm2d(num_features=48),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
            nn.ReLU(),
            nn.Dropout(0.5)
        )
        feature_map_1 = nn.Sequential(
            nn.Conv2d(in_channels=48, out_channels=64, kernel_size=5, stride=1, padding=2, bias=False),
            nn.BatchNorm2d(num_features=64),
            nn.MaxPool2d(kernel_size=2, stride=1, padding=1),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        feature_map_2 = nn.Sequential(
            nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, stride=1, padding=2, bias=False),
            nn.BatchNorm2d(num_features=128),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        feature_map_3 = nn.Sequential(
            nn.Conv2d(in_channels=128, out_channels=160, kernel_size=5, stride=1, padding=2, bias=False),
            nn.BatchNorm2d(num_features=160),
            nn.MaxPool2d(kernel_size=2, stride=1, padding=1),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        feature_map_4 = nn.Sequential(
            nn.Conv2d(in_channels=160, out_channels=192, kernel_size=5, stride=1, padding=2, bias=False),
            nn.BatchNorm2d(num_features=192),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        feature_map_5 = nn.Sequential(
            nn.Conv2d(in_channels=192, out_channels=192, kernel_size=5, stride=1, padding=2, bias=False),
            nn.BatchNorm2d(num_features=192),
            nn.MaxPool2d(kernel_size=2, stride=1, padding=1),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        feature_map_6 = nn.Sequential(
            nn.Conv2d(in_channels=192, out_channels=192, kernel_size=5, stride=1, padding=2, bias=False),
            nn.BatchNorm2d(num_features=192),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        feature_map_7 = nn.Sequential(
            nn.Conv2d(in_channels=192, out_channels=192, kernel_size=5, stride=1, padding=2, bias=False),
            nn.BatchNorm2d(num_features=192),
            nn.MaxPool2d(kernel_size=2, stride=1, padding=1),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        feature_map_8 = nn.Sequential(
            nn.Conv2d(in_channels=192, out_channels=192, kernel_size=3, stride=1, padding=1, bias=False),
            nn.BatchNorm2d(num_features=192),
            nn.MaxPool2d(kernel_size=2, stride=2, padding=1),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        feature_map_9 = nn.Sequential(
            nn.Conv2d(in_channels=192, out_channels=192, kernel_size=3, stride=1, padding=1, bias=False),
            nn.BatchNorm2d(num_features=192),
            nn.MaxPool2d(kernel_size=2, stride=1, padding=1),
            nn.ReLU(),
            nn.Dropout(0.2)
        )
        self.feature_extractor = nn.Sequential(
            feature_map_0,
            feature_map_1,
            feature_map_2,
            feature_map_3,
            feature_map_4,
            feature_map_5,
            feature_map_6,
            feature_map_7,
            feature_map_8,
            feature_map_9
        )
        self.box_regressor = nn.Sequential(
            nn.Linear(23232, 100),
            nn.ReLU(),
            nn.Linear(100, 100),
            nn.ReLU(),
            nn.Linear(100, 4),
        )
    
    def forward(self, x):
        x = self.feature_extractor(x)
        x = nn.Flatten()(x)
        x = self.box_regressor(x)
        return x # tensor([cx, cy, w, h]) for each sample, with all 4 numbers in [0, 1]

In [None]:
detection_model = LPDetection().to(device)
summary(detection_model, input_size=(batch_size, 3, 480, 480))

## The whole RPNet

In [None]:
class RPNet(nn.Module):
    """
    An end-to-end neural network for Chinese 7-character license plate detection and recognition
    """

    def __init__(self):
        super(RPNet, self).__init__()
        self.detection_module = LPDetection()
        self.character_classifier_0 = nn.Sequential(
            nn.Linear(53248, 128),
            nn.ReLU(),
            nn.Linear(128, len(PROVINCES)),
        )
        self.character_classifier_1 = nn.Sequential(
            nn.Linear(53248, 128),
            nn.ReLU(),
            nn.Linear(128, len(LETTERS)),
        )
        self.character_classifier_2 = nn.Sequential(
            nn.Linear(53248, 128),
            nn.ReLU(),
            nn.Linear(128, len(LETTERS_AND_DIGITS)),
        )
        self.character_classifier_3 = nn.Sequential(
            nn.Linear(53248, 128),
            nn.ReLU(),
            nn.Linear(128, len(LETTERS_AND_DIGITS)),
        )
        self.character_classifier_4 = nn.Sequential(
            nn.Linear(53248, 128),
            nn.ReLU(),
            nn.Linear(128, len(LETTERS_AND_DIGITS)),
        )
        self.character_classifier_5 = nn.Sequential(
            nn.Linear(53248, 128),
            nn.ReLU(),
            nn.Linear(128, len(LETTERS_AND_DIGITS)),
        )
        self.character_classifier_6 = nn.Sequential(
            nn.Linear(53248, 128),
            nn.ReLU(),
            nn.Linear(128, len(LETTERS_AND_DIGITS)),
        )
    
    def forward(self, x):
        # get feature maps and bounding box
        feature_extractor = self.detection_module.feature_extractor
        x0 = feature_extractor[0](x)
        x1 = feature_extractor[1](x0)
        x2 = feature_extractor[2](x1)
        x3 = feature_extractor[3](x2)
        x4 = feature_extractor[4](x3)
        x5 = feature_extractor[5](x4)
        x6 = feature_extractor[6](x5)
        x7 = feature_extractor[7](x6)
        x8 = feature_extractor[8](x7)
        x9 = feature_extractor[9](x8)
        bbox = self.detection_module.box_regressor(nn.Flatten()(x9))
        # extract RoIs from 2nd, 4th, and 6th feature maps
        bbox_vertices = box_convert(bbox, 'cxcywh', 'xyxy').clamp(min=0, max=1)
        bbox_vertices = list(bbox_vertices.unsqueeze(dim=1))
        roi_1 = roi_pool(input=x1,
                         boxes=bbox_vertices,
                         output_size=(8, 16),
                         spatial_scale=x1.size()[2])
        roi_3 = roi_pool(input=x3,
                         boxes=bbox_vertices,
                         output_size=(8, 16),
                         spatial_scale=x3.size()[2])
        roi_5 = roi_pool(input=x5,
                         boxes=bbox_vertices,
                         output_size=(8, 16),
                         spatial_scale=x5.size()[2])
        rois = nn.Flatten()(torch.cat((roi_1, roi_3, roi_5), dim=1))
        # classify characters
        char_0 = self.character_classifier_0(rois)
        char_1 = self.character_classifier_1(rois)
        char_2 = self.character_classifier_2(rois)
        char_3 = self.character_classifier_3(rois)
        char_4 = self.character_classifier_4(rois)
        char_5 = self.character_classifier_5(rois)
        char_6 = self.character_classifier_6(rois)
        return bbox, (char_0, char_1, char_2, char_3, char_4, char_5, char_6)

In [None]:
rpnet = RPNet().to(device)
summary(rpnet, input_size=(batch_size, 3, 480, 480))

# Training and testing the model

Each model which we're going to train will be trained on the train set for several epochs. After each epoch, if the loss function averaged over the training set decreases, the model's new parameters are saved and tested.

We're going to use the AdamW optimizer.

Each training epoch will probably take somewhere between 30  and 90 minutes, depending on the GPU Colab allocates to you.

## Pre-training the detection module

As explained in [1], we need to pre-train the detection module before training RPNet end-to-end so that the detection module can give reasonable bounding box predictions.

The loss function for training the detection module is the Smooth L1 Loss between the predicted bounding box and the true bounding box in $(c_x, c_y, w, h)$ format.

In [None]:
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, pin_memory=False)
detection_criterion = nn.SmoothL1Loss()
detection_optimizer = optim.AdamW(detection_model.parameters())

In [None]:
best_avg_train_loss = float('inf')

for epoch in range(1, 11):
    # train 1 epoch
    print('--------------------')
    print('Training epoch', epoch)
    detection_model.train()
    total_train_loss = 0
    num_samples_trained = 0
    for batch_idx, (images, true_boxes, true_plate_nums) in enumerate(train_loader):
        detection_optimizer.zero_grad(set_to_none=True)
        predicted_boxes = detection_model(images)
        train_loss = detection_criterion(predicted_boxes, true_boxes)
        total_train_loss += train_loss.detach()
        train_loss.backward()
        detection_optimizer.step()
        num_samples_trained += len(images)
        if batch_idx % 1000 == 0 or batch_idx == len(train_loader) - 1:
            avg_train_loss = total_train_loss.item() / num_samples_trained
            print('[{}/{} samples ({:.0f}%)]\tAverage training detection loss: {:.6f}'.format(
                num_samples_trained, len(train_loader.dataset),
                100.0 * num_samples_trained / len(train_loader.dataset), avg_train_loss))
    # save and test model if average training loss improves
    if avg_train_loss < best_avg_train_loss:
        best_avg_train_loss = avg_train_loss
        torch.save(detection_model.state_dict(), 'models/detection_new_code.pth')
        print('\nTesting epoch', epoch)
        detection_model.eval()
        with torch.no_grad():
            overall_test_accuracy = 0
            # evaluate model on each test set
            for test_set_name in TEST_SET_NAMES:
                test_loader = test_loaders[test_set_name]
                test_accuracy = 0
                for images, true_boxes, true_plate_nums in test_loader:
                    predicted_boxes = detection_model(images)
                    true_bbox_vertices = box_convert(true_boxes, 'cxcywh', 'xyxy')
                    predicted_bbox_vertices = box_convert(predicted_boxes, 'cxcywh', 'xyxy')
                    IoUs = box_iou(predicted_bbox_vertices, true_bbox_vertices).diagonal()
                    test_accuracy += (IoUs > 0.7).sum().item()
                overall_test_accuracy += test_accuracy
                test_accuracy /= len(test_loader.dataset)
                print(f'Set "{test_set_name}" detection accuracy: {100.0 * test_accuracy:.2f}%')
            overall_test_accuracy /= num_test_samples
            print(f'Overall test detection accuracy: {100.0 * overall_test_accuracy:.2f}%')
    print()

## Training RPNet end-to-end

We're going to load pretrained weights to the detection module of RPNet and train RPNet end-to-end. The loss function will be the sum of detection loss and recognition loss, where detection loss is the same loss function we used in training the detection module and recognition loss is the Cross Entropy Loss between predicted and true character distributions.

In [None]:
train_loader = DataLoader(train_set, batch_size=batch_size, shuffle=True, pin_memory=False)
rpnet.detection_module.load_state_dict(torch.load('models/detection.pth'))
detection_criterion = nn.SmoothL1Loss()
recognition_criterion = nn.CrossEntropyLoss()
rpnet_optimizer = optim.AdamW(rpnet.parameters())

In [None]:
best_avg_train_loss = float('inf')

for epoch in range(1, 11):
    # train 1 epoch
    print('--------------------')
    print('Training epoch', epoch)
    rpnet.train()
    total_train_loss = 0
    num_samples_trained = 0
    for batch_idx, (images, true_boxes, true_plate_nums) in enumerate(train_loader):
        rpnet_optimizer.zero_grad(set_to_none=True)
        predicted_boxes, predicted_plate_nums = rpnet(images)
        train_loss = detection_criterion(predicted_boxes, true_boxes)
        for i in range(7):
            train_loss += recognition_criterion(predicted_plate_nums[i], true_plate_nums[:, i])
        total_train_loss += train_loss.detach()
        train_loss.backward()
        rpnet_optimizer.step()
        num_samples_trained += len(images)
        if batch_idx % 1000 == 0 or batch_idx == len(train_loader) - 1:
            avg_train_loss = total_train_loss.item() / num_samples_trained
            print('[{}/{} samples ({:.0f}%)]\tAverage training recognition loss: {:.6f}'.format(
                num_samples_trained, len(train_loader.dataset),
                100.0 * num_samples_trained / len(train_loader.dataset), avg_train_loss))
    # save and test model if average training loss improves
    if avg_train_loss < best_avg_train_loss:
        best_avg_train_loss = avg_train_loss
        torch.save(rpnet.state_dict(), f'models/rpnet_epoch{epoch}.pth')
        print('\nTesting epoch', epoch)
        rpnet.eval()
        with torch.no_grad():
            overall_detection_accuracy, overall_recognition_accuracy = 0, 0
            # evaluate model on each test set
            for test_set_name in TEST_SET_NAMES:
                test_loader = test_loaders[test_set_name]
                detection_accuracy, recognition_accuracy = 0, 0
                for images, true_boxes, true_plate_nums in test_loader:
                    predicted_boxes, predicted_plate_nums = rpnet(images)
                    true_bbox_vertices = box_convert(true_boxes, 'cxcywh', 'xyxy')
                    predicted_bbox_vertices = box_convert(predicted_boxes, 'cxcywh', 'xyxy')
                    predicted_plate_nums = [t.argmax(dim=1, keepdim=True) for t in predicted_plate_nums]
                    predicted_plate_nums = torch.hstack(predicted_plate_nums)
                    IoUs = box_iou(predicted_bbox_vertices, true_bbox_vertices).diagonal()
                    detection_accuracy += (IoUs > 0.7).sum().item()
                    recog_acc_cond1 = IoUs > 0.6
                    recog_acc_cond2 = predicted_plate_nums.eq(true_plate_nums).sum(dim=1) == 7
                    recognition_accuracy += (recog_acc_cond1 * recog_acc_cond2).sum().item()
                overall_detection_accuracy += detection_accuracy
                overall_recognition_accuracy += recognition_accuracy
                detection_accuracy /= len(test_loader.dataset)
                recognition_accuracy /= len(test_loader.dataset)
                print(f'Set "{test_set_name}" \t detection accuracy: {100.0 * detection_accuracy:.2f}%', end='\t')
                print(f'recognition accuracy: {100.0 * recognition_accuracy:.2f}%')
            overall_detection_accuracy /= num_test_samples
            overall_recognition_accuracy /= num_test_samples
            print(f'Overall test detection accuracy: {100.0 * overall_detection_accuracy:.2f}%')
            print(f'Overall test recognition accuracy: {100.0 * overall_recognition_accuracy:.2f}%')
    print()

# Demo

This section lets you test a trained detection module and RPNet on an image with a Chinese car license plate.

There are some cells above you need to run before running those below; check out the **Introduction** section for details if you haven't.

## 1. Load the models

If you've run the cells that need to be executed before those in this section as specified at the beginning of the notebook, first, run the following cell to load a set of trained parameters to RPNet.

In [None]:
detection_model.load_state_dict(torch.load('models/detection.pth'))
rpnet.load_state_dict(torch.load('models/rpnet_epoch10.pth'))
detection_model.eval()
rpnet.eval()

## 2. Select an image

Next, either run the following cell to choose a random image from one of the test sets...

In [None]:
rand_test_set_name = TEST_SET_NAMES[torch.randint(len(TEST_SET_NAMES), size=(1,)).item()]
rand_test_set = test_sets[rand_test_set_name]
print(f'Random image chosen from test set "{rand_test_set_name}"')
image, true_bbox, true_plate_num = rand_test_set[torch.randint(len(rand_test_set), size=(1,)).item()]

... or run the following cell to use an image you've uploaded to Colab disk. Make sure that your image contains exactly one 7-character Chinese car license plate. To upload an image to Colab disk, click the **Files** button on Colab's left menu bar (image.png), then click the **Upload to session storage** button (image.png).

In [None]:
img_path = input('Enter path to image: ')
image = img_transform(read_image(img_path).to(device))

## 3. Get predictions

Then, run the following cell to get the detection module's predicted bounding box.

In [None]:
with torch.no_grad():
    pred_bbox= detection_model(image.unsqueeze(0))
    print('Bounding box in (cx, cy, w, h) format predicted by the detection module: ', end='')
    print([round(i.item(), 4) for i in pred_bbox.squeeze()])
    pred_bbox_vertices = box_convert(pred_bbox, 'cxcywh', 'xyxy') * 480
    converted_img = F.convert_image_dtype(image, torch.uint8)
    img_with_box = draw_bounding_boxes(converted_img, pred_bbox_vertices, colors=['red'], width=5)
    show_images([img_with_box], max_ncols=1, figsize=(10, 10))

Finally, run this cell below to get RPNet's predicted bounding box and plate number.

In [None]:
with torch.no_grad():
    pred_bbox, pred_plate_num = rpnet(image.unsqueeze(0))
    print('Bounding box in (cx, cy, w, h) format predicted by RPNet: ', end='')
    print([round(i.item(), 4) for i in pred_bbox.squeeze()])
    pred_bbox_vertices = box_convert(pred_bbox, 'cxcywh', 'xyxy') * 480
    print('Plate number predicted by RPNet:', string_plate_num([t.argmax() for t in pred_plate_num]))
    converted_img = F.convert_image_dtype(image, torch.uint8)
    img_with_box = draw_bounding_boxes(converted_img, pred_bbox_vertices, colors=['red'], width=5)
    show_images([img_with_box], max_ncols=1, figsize=(10, 10))

# References

1. Xu, Z. et al. (2018). Towards End-to-End License Plate Detection and Recognition: A Large Dataset and Baseline. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science(), vol 11217. Springer, Cham. https://doi.org/10.1007/978-3-030-01261-8_16

2. https://github.com/detectRecog/CCPD

3. Lin, J. (2022). Optimize PyTorch Performance for Speed and Memory Efficiency. Towards Data Science. https://towardsdatascience.com/optimize-pytorch-performance-for-speed-and-memory-efficiency-2022-84f453916ea6