## AutoPark - Number Plate Detection and Recognition
---
42028 - Assignment 3 - By PlateMates (Group 6)

The dataset used can be found here: https://github.com/detectRecog/CCPD

#### import necessary modules

In [34]:
%matplotlib inline
import os
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras
from sklearn.metrics import confusion_matrix, classification_report

# import the pre-trained Xception CNN and its related preprocessing method
from tensorflow.keras.applications.xception import Xception, preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image 
from tensorflow.keras.callbacks import ModelCheckpoint

## Dataset Annotations

Annotations are embedded in the file name.

A sample image name is "025-95_113-154&383_386&473-386&473_177&454_154&383_363&402-0_0_22_27_27_33_16-37-15.jpg". Each name can be split into seven fields. Those fields are explained as follows.

- **Area**: Area ratio of license plate area to the entire picture area.

- **Tilt degree**: Horizontal tilt degree and vertical tilt degree.

- **Bounding box coordinates**: The coordinates of the left-up and the right-bottom vertices.

- **Four vertices locations**: The exact (x, y) coordinates of the four vertices of LP in the whole image. These coordinates start from the right-bottom vertex.

- **License plate number**: Each image in CCPD has only one LP. Each LP number is comprised of a Chinese character, a letter, and five letters or numbers. A valid Chinese license plate consists of seven characters: province (1 character), alphabets (1 character), alphabets+digits (5 characters). "0_0_22_27_27_33_16" is the index of each character. These three arrays are defined as follows. The last character of each array is letter O rather than a digit 0. We use O as a sign of "no character" because there is no O in Chinese license plate characters.
```
provinces = ["皖", "沪", "津", "渝", "冀", "晋", "蒙", "辽", "吉", "黑", "苏", "浙", "京", "闽", "赣", "鲁", "豫", "鄂", "湘", "粤", "桂", "琼", "川", "贵", "云", "藏", "陕", "甘", "青", "宁", "新", "警", "学", "O"]
alphabets = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W',
             'X', 'Y', 'Z', 'O']
ads = ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
       'Y', 'Z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'O']
```

- **Brightness**: The brightness of the license plate region.

- **Blurriness**: The Blurriness of the license plate region.

In [35]:
def load_dataset(directory, target_size):
    files = os.listdir(directory)

    images = []
    boxes = []
    plate_labels = []

    for filename in files:
        # get bounding box coords and plate number
        f = filename.split("-")
        bounding_box = [int(e) for e in f[2].split("_") for e in e.split("&")] # gives [xmin, ymin, xmax, ymax]
        plate_number = f[4]

        # extract image data to numpy array
        image_path = os.path.join(directory, filename)
        img = image.load_img(image_path, target_size=target_size)
        img_arr = image.img_to_array(img)
        
        images.append(img_arr)
        boxes.append(bounding_box)
        plate_labels.append(plate_number)
    
    return np.array(images), np.array(boxes), np.array(plate_labels)

In [38]:
# directory for saving model checkpoints during training
checkpoint_dir = "checkpoints/"

base_dir = "dataset/"
train_dir = os.path.join(base_dir, "train")
val_dir = os.path.join(base_dir, "validation")
test_dir = os.path.join(base_dir, "test")


target_image_size = (512, 512)

# load data for each set
X_train, y_train_boxes, y_train_labels = load_dataset(train_dir, target_image_size)
X_val, y_val_boxes, y_val_labels  = load_dataset(val_dir, target_image_size)
X_test, y_test_boxes, y_test_labels  = load_dataset(test_dir, target_image_size)

print(X_train.shape, y_train_boxes.shape, y_train_labels.shape)


(700, 512, 512, 3) (700, 4) (700,)
