The main objective is the problem is to be able to extract the license plate number from the given image using deep learning based and classical algorithms.
- character-recog : used for training character recognition on segmented images.
- lp-recog : used to train end-to-end model and license plate recognition model.
- submission : contains files necessary only for prediction and inference.
Readme in the respective directory directs the use of the files.
Run the following code in the terminal to get started locally:
To run in virtual env
python3 -m venv lp_ocr
. lp_ocr/bin/activate
Install dependencies
cd submission
pip3 install -r requirements.txt
- Download the dataset and pre-trained weights from here
- Place the weights in submission folder.
- Place the dataset_characters.zip file in character-recog folder and other 2 zip dataset files in lp-recog folder.
Our approach is split into 3 sub-tasks, namely:
- Segementing characters out from the license plate.
- Character recognition of segmented license plate.
- License plate detection from images and videos containing full images of cars.
- For the images having poor contrast, character segmentation and contour detection becomes
much difficult. Like for the images of number plates taken in poor lighting conditions. To
deal with it we used a technique called as CLAHE (Contrast Limited Adaptive Histogram Equalization).
clahe=cv2.createCLAHE(clipLimit=3., tileGridSize=(8,8))
- To deal with noisy images we used cv2.fastNlMeansDenoisingColored function which is basically
implementation of Non-local Means Denoising algorithm. This algorithm replaces the value of a
pixel by an average of a selection of other pixels values.
image = cv2.fastNlMeansDenoisingColored(image ,None,10,10,7,21)
- To clean and extract more information from the plate further, we extracted Value channel from the
HSV format of image and applied adaptive thresholding to reveal the characters on the license plate
and further applied dilation using cv2.morphologyEx function to join the broken parts of characters.
thresh = cv2.morphologyEx(thresh, cv2.MORPH_DILATE, kernel)
- For contour detection after image processing, we passed RETR_TREE as argument in cv2.findContours
function and selected only those contours that are of a size bigger that 1/200th Area of the image,
or are not bigger than half the image size.
A further classification of height not less that 1/5th of the image height, and width not more than 1/5th of the image width was also added to remove the inherent noise in the given number plate. Cases like 'O' and 'D' where one contour was inside another contour were also taken care of.contours,_=cv2.findContours(iwl_bb,cv2.RETR_TREE,cv2.CHAIN_APPROX_SIMPLE)
cd submission
python3 segmentation.py <path_to_image>
- Dataset can be found in character-recog/dataset_characters.zip
- The dataset we have mainly consists of centered and vertically aligned images and does not reflect most of the real world license plate segmented images such are titled and off set images. We hence add these images in form of data augmentations on the already exsisting images using Albumentations library.
A.OneOf([
A.ShiftScaleRotate(always_apply=False, p=0.3, shift_limit=(0.0, 0.0), scale_limit=(0.0, 0.0), rotate_limit=(-10, 10), interpolation=0, border_mode=0, value=(0, 0, 0), mask_value=None),
A.ShiftScaleRotate(always_apply=False, p=0.3, shift_limit=(-0.1, 0.1), scale_limit=(0.0, 0.0), rotate_limit=(0, 0), interpolation=0, border_mode=0, value=(0, 0, 0), mask_value=None),
A.ShiftScaleRotate(always_apply=False, p=0.3, shift_limit=(0.0, 0.0), scale_limit=(-0.1, 0.1), rotate_limit=(0, 0), interpolation=0, border_mode=0, value=(0, 0, 0), mask_value=None),
A.ShiftScaleRotate(always_apply=False, p=0.1, shift_limit=(-0.1, 0.1), scale_limit=(-0.1, 0.1), rotate_limit=(-10, 10), interpolation=0, border_mode=0, value=(0, 0, 0), mask_value=None),
],p=1.0),
- We implemented our deep learning models in PyTorch, as its the growing industry and academia standard. We initially experimented with custom models by combining few blocks of cnn, batchnorm and max pool and ended with fully connected layers.But the result was not satisfactory hence we used pretrained imagenet models such as ResNets and EfficientNets. The table below sumarises the results we obtained after resonable training period of each model.
Model | Pretrained on | Accuracyval |
---|---|---|
akbhd | - | 88.75 |
vatch | - | 85.31 |
drklrd | - | 92.64 |
ResNet18 | ImageNet | 98.67 |
ResNet34 | ImageNet | 98.72 |
ResNet50 | ImageNet | 98.53 |
EfficientNet-b0 | ImageNet | 98.63 |
EfficientNet-b1 | ImageNet | 98.67 |
- Ensembling models: After training sufficient number of models we ensembled a combination of them but we could not find any improvement on our dataset and also on the segmented images obtained from few real word license plate images. The models' ensemble gave the same accuracy as a single ResNet34 but with increased latency.
Ensemble Combo | Accuracyval |
---|---|
ResNet34*3 3 different trained models |
98.72 |
ResNet18+ResNet34+ResNet50 | 98.61 |
ResNet34+ResNet50+EfficientNet-b1 | 98.66 |
ResNet34+EfficientNet-b0+EfficientNet-b1 | 98.7 |
ResNet18+EfficientNet-b0+EfficientNet-b1 | 98.59 |
ResNet50+EfficientNet-b0+EfficientNet-b1 | 98.59 |
- We also experimented with Tesseract OCR, which is pretrained text recongnition model.
- Tesseract failed to recognise text when images were blurry and rotated.
- We could not get the python version of the model installed. We even faced several difficulties in installing the package, and even once installed it would break frequently. Hence adequte bench marking of the model could not be performed but it performed reasonably well.
- Prediction
Run the following commands in terminal to predict on a image
cd submission
python3 predict.py --folder <path_to_folder> --image <path_to_image>
- Training
The Readme in character-recog folder describes the purpose of various files employed.
To start training run the following command.
cd character-recog
python3 train.py
Note: Please change the necessary paths to dataset and checkpoints
- Our model is far from perfect, the model struggles to perform accurate predictinos on segemented images as the characters contain alot of noise and is roughly in the shape of a character. For example : 6 begin predicted as G
- The model always fails when the license plate is tilted and the segmented images contain adjacent characters.
- Increasing datasets sizes and to include actual segmented models.
- Or increasing more number of augmentations to cover most real world scenarios.
- Better ensembling techniques.
- Use an end-to-end model to train on both prediction and segmentation.
- We used yolov5 object detection algorithms to perform license plate detection on images and videos.
- The dataset can be found in lp-recog/license_plate_yolov5pytorch.zip
- We use Roboflow.ai web application to annotate, pre-process, split and change label format of the datasets. The dataset we used is sourced from several sites but were in Pascal VOC format (.xml files) and needed to be converted to yolov5 specific format. We convert the labels into yolov5 format and resize the images to be 640x640 pixels.
- We trained the yolov5s for license plate detection. The results can be seen below
- Prediction
Run the following commands in terminal to predict on a image
cd submission
python3 detect.py --source <path_to_folder/image> --weights best.pt --device 'cpu' --save-txt
- Training
The Readme in lp-recog folder describes the purpose of various files employed.
To start training run the following command.
cd lp-recog
sh setup_yolov5.sh
sh train_yolov5.sh
Note: Please change the necessary paths to dataset and checkpoints
- Our model detected all license plates tested on and could predict around 3fps. Below are some of the results
-
We trained an end to end object detection model which would segment out the characters and classify them as well. The major advantage of doing so is that, end to end models generally perform better than independenlty trained models as:
- The parameters are shared between classification and segmentation hence the loss helps improve both the classification and segmentation.
- Reduces points of failure and is easly to tune.
- The models trained independenlty may not perform as expected as they are trained on similar data output from previous models but not the exact ones.
- Dataset path lp-recog/characters-np_yolov5pytorch.zip
- Datasets was created using Roboflow.ai and converted into yolo5 format. But the dataset lacks variations and does not reflect real world images hence we added few augmentations within the Roboflow web app.
- We trained yolov5s and yolov5l models which gave similar results but yolov5l performed better.
Model | Precisionval | Recallval | mapval 0.5:0.95 |
mapval 0.5 |
---|---|---|---|---|
Yolov5L | 0.9788 | 0.995 | 0.7774 | 0.9951 |
Yolov5S | 0.9874 | 0.9678 | 0.684 | 0.9951 |
- Prediction
Run the following commands in terminal to predict on a image.
cd submission
python3 detect.py --source <path_to_folder/image> --weights end_to_end.pt --device 'cpu' --save-txt
- Training
The Readme in lp-recog folder describes the purpose of various files employed.
Change the path of TRAIN_YAML file in train_yolov5.sh to the yaml file in the dataset and run the following command.
cd lp-recog
sh setup_yolov5.sh
sh train_yolov5.sh
Note: Please change the necessary paths checkpoints
- The dataset being very small and not reflecting most of the possible variations causes the model to miss several characters and misclassify certain images.
- Our approach of splitting up recognition, segmentation and classification model is not robust and can easily be beaten by an end-to-end model.
- Hence we wish to train an end-to-end model as done in subsection 4 but with the following changes:
- Increase the dataset size to reflect real world images.
- Or increase the number of augmentations to cover most of the real world images.
- Use different algorithms such as Instance segmentation as they are more robust and can predict more accuracetly due to pixel-wise classification as oposed a bounding box which may enclose adjacent characters.
- Our license plate detection model has good accuracy on real world data we could also try to integrate it into the end-to-end model making complete end-to-end pipeline from recogniton to segmentation to classification.