# Meter Reader with PaddleOCR
This notebook shows how to create a meter reader with OpenVINO Runtime. We use the pre-trained [PP-OCR](https://github.com/PaddlePaddle/PaddleOCR) to build up a inference task pipeline:

1. Config the screen area of the meter reader.
2. Config the layout information of the meter reader.
3. Pre-process the image based on the given information.
4. Perform OCR recognition.
5. Structure output information.

<img align='center' src= "https://user-images.githubusercontent.com/83450930/236680983-f23e8728-c7f9-4460-8794-44fac360a4ac.png" alt="drawing" width="1500"/>

In some cases, the screen area in the image is not in a fixed position. A detection model can be used to dynamically provide the screen area information. Please see [PaddleDetection](https://github.com/PaddlePaddle/PaddleDetection) for more details.

## Imports

In [None]:
!pip install -q "pyclipper>=1.2.1" "shapely>=1.7.1"

In [None]:
import os
import cv2
import numpy as np
import sys
import math
from pathlib import Path
import tarfile
import requests
import copy

from openvino.runtime import Core

sys.path.append("../utils")
import notebook_utils as utils
import processing as processing

## PaddleOCR with OpenVINO™

[PaddleOCR](https://github.com/PaddlePaddle/PaddleOCR) is an ultra-light OCR model trained with PaddlePaddle deep learning framework, which aims to create multilingual and practical OCR tools. 

The PaddleOCR pre-trained model used in the demo refers to the *"Chinese and English ultra-lightweight PP-OCR model (9.4M)"*. More open source pre-trained models can be downloaded at [PaddleOCR Github](https://github.com/PaddlePaddle/PaddleOCR) or [PaddleOCR Gitee](https://gitee.com/paddlepaddle/PaddleOCR).

A standard PaddleOCR includes two parts of deep learning models, text detection and text recognition. This notebook only needs the text recognition part. For running the model, we first initialize the runtime for inference, then, read the network architecture and model weights from the `.pdmodel` and `.pdiparams` files to load to CPU.

More details for running PaddleOCR with OpenVINO™ are shown in [405-paddle-ocr-webcam](../405-paddle-ocr-webcam/405-paddle-ocr-webcam.ipynb).

### Download the Model for Text **Recognition**

The pre-trained models used in the demo are downloaded and stored in the "model" folder.

In [None]:
# Define the function to download text detection and recognition models from PaddleOCR resources.

def run_model_download(model_url, model_file_path):
    """
    Download pre-trained models from PaddleOCR resources

    Parameters:
        model_url: url link to pre-trained models
        model_file_path: file path to store the downloaded model
    """
    model_name = model_url.split("/")[-1]
    
    if model_file_path.is_file(): 
        print("Model already exists")
    else:
        # Download the model from the server, and untar it.
        print("Downloading the pre-trained model... May take a while...")

        # Create a directory.
        os.makedirs("model", exist_ok=True)
        response = requests.get(model_url)
        with open(f"model/{model_name}", "wb") as model_tar_file:
            model_tar_file.write(response.content)
        print("Model Downloaded")

        file = tarfile.open(f"model/{model_name}")
        res = file.extractall("model")
        file.close()
        if not res:
            print(f"Model Extracted to {model_file_path}.")
        else:
            print("Error Extracting the model. Please check the network.")
            
rec_model_url = "https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar"
rec_model_file_path = Path("model/ch_PP-OCRv3_rec_infer/inference.pdmodel")

run_model_download(rec_model_url, rec_model_file_path)

### Load the Model for Text **Recognition** with Dynamic Shape

Input to text recognition model refers to detected bounding boxes with different image sizes, for example, dynamic input shapes. Hence:

1. Input dimension with dynamic input shapes needs to be specified before loading text recognition model.
2. Dynamic shape is specified by assigning -1 to the input dimension or by setting the upper bound of the input dimension using, for example, `Dimension(1, 512)`.

>Note: Since the text recognition model is with dynamic input shape and current release of OpenVINO 2022.2 does not support dynamic shape on iGPU, you cannot directly switch device to iGPU for inference in this case. Otherwise, you may need to resize the input images to this model into a fixed size and then try running the inference on iGPU.

In [None]:
# Initialize OpenVINO Runtime for text recognition.
core = Core()

# Read the model and corresponding weights from a file.
rec_model = core.read_model(model=rec_model_file_path)

# Assign dynamic shapes to every input layer on the last dimension.
for input_layer in rec_model.inputs:
    input_shape = input_layer.partial_shape
    input_shape[3] = -1
    rec_model.reshape({input_layer: input_shape})

rec_compiled_model = core.compile_model(model=rec_model, device_name="CPU")

# Get input and output nodes.
rec_input_layer = rec_compiled_model.input(0)
rec_output_layer = rec_compiled_model.output(0)

## Configuration and Helper functions

To structure output, we should first config some parameters (for example, the coordinates of the corners of the screen).

Then, use the following helper functions for preprocessing and postprocessing frames:

1. Preprocessing the input image: use affine transformations to normalize skewed images.
2. Preprocessing for text recognition: resize and normalize detected box images to the same size (for example, `(3, 32, 320)` size for images with Chinese text) for easy batching in inference.
3. Postprocessing for structure output: fix some errors in recognition.

### Configuration

There are four information to config:
1. POINTS: the coordinates of the corners of the screen
2. DESIGN_SHAPE: the original shape of the screen
3. DESIGN_LAYOUT: the elements and their positions in the screen, which is in a standard DESIGN_SHAPE
4. RESULT_TEMP: a template of output, which is a dictory. The keys in the `dict` are output features.

In the example image, there are 8 features to output.
1. Info_Probe: Probe information of power frequency field strength meter. "探头:---" means there is no information about Porbe.
2. Freq_Set: The work frequency
3. Val_Total: The value measured. "无探头" means there is no information about Porbe, otherwise, there should be a float value.
4. Val_X: Value from x axis.
5. Val_Y: Value from y axis.
6. Val_Z: Value from z axis.
7. Unit
8. Field: One of Conventional, electric field, magnetic field. The chinese words "电场" means electric field.

<img align='center' src= "https://user-images.githubusercontent.com/83450930/236680146-5751e291-d509-4d71-a2cb-bfbf35609051.jpg" alt="drawing" width="400"/>

In [None]:
# The coordinates of the corners of the screen
POINTS = [[1121, 56],    # Left top
          [3242, 183],   # right top
          [3040, 1841],  # right bottom
          [1000, 1543]]   # left bottom

# The size of the screen
DESIGN_SHAPE = (1300, 1000)

# Output template
RESULT_TEMP = {"Info_Probe":"探头:---", 
               "Freq_Set":"", 
               "Val_Total":"无探头", 
               "Val_X":"", 
               "Val_Y":"", 
               "Val_Z":"", 
               "Unit":"A/m", 
               "Field":"常规"}

# features and the layout information
DESIGN_LAYOUT = {'Info_Probe':[14, 36, 410, 135],  # feature_name, xmin, ymin, xmax, ymax
                 'Freq_Set':[5, 290, 544, 406], 
                 'Val_Total':[52, 419, 1256, 741], 
                 'Val_X':[19, 774, 433, 882], 
                 'Val_Y':[433, 773, 874, 884], 
                 'Val_Z':[873, 773, 1276, 883], 
                 'Unit':[1064, 291, 1295, 403], 
                 'Field':[5, 913, 243, 998]}

### Preprocessing the input image

Use affine transformations to normalize skewed images

In [None]:
def pre_processing(img, point_list, target_shape):
    # affine transformations
    # point list is the coordinates of the corners of the screen
    # target shape is the design shape
    
    target_w, target_h = target_shape
    pts1 = np.float32(point_list)
    pts2 = np.float32([[0, 0],[target_w,0],[target_w, target_h],[0,target_h]])
    
    M = cv2.getPerspectiveTransform(pts1, pts2)
    img2 = cv2.warpPerspective(img, M, (target_w,target_h))
    return img2

### Preprocessing Image Functions for Text Recognition

In [None]:
# Preprocess for text recognition.
def resize_norm_img(img, max_wh_ratio):
    """
    Resize input image for text recognition

    Parameters:
        img: bounding box image from text detection 
        max_wh_ratio: value for the resizing for text recognition model
    """
    rec_image_shape = [3, 48, 320]
    imgC, imgH, imgW = rec_image_shape
    assert imgC == img.shape[2]
    character_type = "ch"
    if character_type == "ch":
        imgW = int((32 * max_wh_ratio))
    h, w = img.shape[:2]
    ratio = w / float(h)
    if math.ceil(imgH * ratio) > imgW:
        resized_w = imgW
    else:
        resized_w = int(math.ceil(imgH * ratio))
    resized_image = cv2.resize(img, (resized_w, imgH))
    resized_image = resized_image.astype('float32')
    resized_image = resized_image.transpose((2, 0, 1)) / 255
    resized_image -= 0.5
    resized_image /= 0.5
    padding_im = np.zeros((imgC, imgH, imgW), dtype=np.float32)
    padding_im[:, :, 0:resized_w] = resized_image
    return padding_im


def prep_for_rec(dt_boxes, frame):
    """
    Preprocessing of the detected bounding boxes for text recognition

    Parameters:
        dt_boxes: detected bounding boxes from text detection 
        frame: original input frame 
    """
    ori_im = frame.copy()
    img_crop_list = [] 
    for bno in range(len(dt_boxes)):
        tmp_box = copy.deepcopy(dt_boxes[bno])
        img_crop = processing.get_rotate_crop_image(ori_im, tmp_box)
        img_crop_list.append(img_crop)
        
    img_num = len(img_crop_list)
    # Calculate the aspect ratio of all text bars.
    width_list = []
    for img in img_crop_list:
        width_list.append(img.shape[1] / float(img.shape[0]))
    
    # Sorting can speed up the recognition process.
    indices = np.argsort(np.array(width_list))
    return img_crop_list, img_num, indices


def batch_text_box(img_crop_list, img_num, indices, beg_img_no, batch_num):
    """
    Batch for text recognition

    Parameters:
        img_crop_list: processed detected bounding box images 
        img_num: number of bounding boxes from text detection
        indices: sorting for bounding boxes to speed up text recognition
        beg_img_no: the beginning number of bounding boxes for each batch of text recognition inference
        batch_num: number of images for each batch
    """
    norm_img_batch = []
    max_wh_ratio = 0
    end_img_no = min(img_num, beg_img_no + batch_num)
    for ino in range(beg_img_no, end_img_no):
        h, w = img_crop_list[indices[ino]].shape[0:2]
        wh_ratio = w * 1.0 / h
        max_wh_ratio = max(max_wh_ratio, wh_ratio)
    for ino in range(beg_img_no, end_img_no):
        norm_img = resize_norm_img(img_crop_list[indices[ino]], max_wh_ratio)
        norm_img = norm_img[np.newaxis, :]
        norm_img_batch.append(norm_img)

    norm_img_batch = np.concatenate(norm_img_batch)
    norm_img_batch = norm_img_batch.copy()
    return norm_img_batch

### Post-processing

Post-processing is a very personal step. The image to be recognized may appear blurry, halo, etc., and there may be some errors in the recognition result, which can be corrected through post-processing.

The following code fixes some issues that may exist in the example situation.

In [None]:
def post_processing(results):
    # `LF` can be recognized correctly, but the other may be misidentified
    if 'LF' in results['Info_Probe']: 
        results['Info_Probe'] = "探头:LF-01"
    
    # the target infor is `Frequence`, do not need the suffix `实时值`
    # we will simply delete the words `实时值`
    results['Freq_Set'] = results['Freq_Set'].split('实时值')[0]
    
    # the target infor is values, do not need the prefix
    results['Val_X'] = results['Val_X'].replace("X","").replace(":","") 
    results['Val_Y'] = results['Val_Y'].replace("Y","").replace(":","") 
    results['Val_Z'] = results['Val_Z'].replace("Z","").replace(":","") 
    
    # μ is easy to be recognized as u, and '/' is aesy to be ignored
    if 'T' in results['Unit']: 
        results['Unit'] = "μT"
    elif 'kV' or 'kv' in results['Unit']:
        results['Unit'] = "kV/m"
    elif 'v' or 'V' in results['Unit']:
        results['Unit'] = "V/m"
    else:
        results['Unit'] = "A/m"
        
    return results

## Main Function

In [None]:
# Download images
IMG_URL = "https://user-images.githubusercontent.com/83450930/236680146-5751e291-d509-4d71-a2cb-bfbf35609051.jpg"
IMG_FILE_NAME = IMG_URL.split("/")[-1]
utils.download_file(IMG_URL, show_progress=False)

In [None]:
# Read images
img = cv2.imread(IMG_FILE_NAME)

# affine transformations to normalize skewed images
img = pre_processing(img, POINTS, DESIGN_SHAPE)

# copy the structure output template
struct_result = copy.deepcopy(DESIGN_LAYOUT)

# structure recognition begins here
for key in DESIGN_LAYOUT.keys():
    # cut imgs according the layout information
    xmin, ymin, xmax, ymax = DESIGN_LAYOUT[key]
    cut_img = img[ymin:ymax, xmin:xmax]
    
    h = ymax - ymin # height of cut_img
    w = xmax - xmin # width of cut_img
    dt_boxes = [np.array([[0,0],[w,0],[w,h],[0,h]],dtype='float32')]
    batch_num = 1
    
    # since the input img is cut, we do not need a detection model to find the position of texts
    # Preprocess detection results for recognition.
    img_crop_list, img_num, indices = prep_for_rec(dt_boxes, cut_img)

    # For storing recognition results, include two parts:
    # txts are the recognized text results, scores are the recognition confidence level. 
    rec_res = [['', 0.0]] * img_num
    txts = [] 
    scores = []

    for beg_img_no in range(0, img_num):

        # Recognition starts from here.
        norm_img_batch = batch_text_box(
            img_crop_list, img_num, indices, beg_img_no, batch_num)

        # Run inference for text recognition. 
        rec_results = rec_compiled_model([norm_img_batch])[rec_output_layer]

        # Postprocessing recognition results.
        postprocess_op = processing.build_post_process(processing.postprocess_params)
        rec_result = postprocess_op(rec_results)
        for rno in range(len(rec_result)):
            rec_res[indices[beg_img_no + rno]] = rec_result[rno]   
        if rec_res:
            txts = [rec_res[i][0] for i in range(len(rec_res))] 
            scores = [rec_res[i][1] for i in range(len(rec_res))]
    
    # record the recognition result
    struct_result[key] = txts[0]
    
# Post-processing, fix some error made in recognition
post_processing(struct_result)

# Print result
print(struct_result)

 ## Try it with your meter photos!
 
 For your own photos, you only need to modify the `Configuration` and `post_processing` to run above!