# Handwritten Chinese and Japanese OCR

In this tutorial, optical character recognition for handwritten Chinese (simplified) and Japanese is presented. Roman alphabet OCR can be find in [notebook 208](../208-optical-character-recognition). This model is capable of doing only one line of symbols each time. 

Models used for this notebooks are [handwritten-japanese-recognition](https://docs.openvinotoolkit.org/latest/omz_models_model_handwritten_japanese_recognition_0001.html) and [handwritten-simplified-chinese](https://docs.openvinotoolkit.org/latest/omz_models_model_handwritten_simplified_chinese_recognition_0001.html). To decode models output to readable text [kondate_nakayosi](https://github.com/openvinotoolkit/open_model_zoo/blob/master/data/dataset_classes/kondate_nakayosi.txt) and [scut_ept](https://github.com/openvinotoolkit/open_model_zoo/blob/master/data/dataset_classes/scut_ept.txt) charlists are used. Both model are from [Open Model Zoo](https://github.com/openvinotoolkit/open_model_zoo/).

## Imports

In [None]:
import cv2
import matplotlib.pyplot as plt
import numpy as np
from collections import namedtuple
from itertools import groupby
from openvino.inference_engine import IECore
from os import path

## Settings

Set up all constants and folders used in this notebook

In [None]:
# Directories where data will be placed
model_folder = "model"
data_folder = "data"
charlist_folder = f"{data_folder}/charlists"

# Precision used by model
precision = "FP16"

To group files, you have to define the collection. In this case, you can use `namedtuple`.

In [None]:
Language = namedtuple(typename='Language', field_names=['model_name', 'charlist_name', 'demo_image_name'])
chinese_files = Language(model_name="handwritten-simplified-chinese-recognition-0001", charlist_name="chinese_charlist.txt", demo_image_name="handwritten_chinese_test.jpg")
japanese_files = Language(model_name="handwritten-japanese-recognition-0001", charlist_name="japanese_charlist.txt", demo_image_name="handwritten_japanese_test.png")

## Select Language

Depending on your choice you will need to change a line of code in the cell below.

In case you want to use Japanese OCR this line should be ```language = 'japanese'``` otherwise if you want to use Chinese ```language = 'chinese'```.

In [None]:
# Select language by using either language='chinese' or language='japanese'
language = 'japanese'

languages = {
        "chinese": chinese_files,
        "japanese": japanese_files
}

selected_language = languages.get(language)

## Download Model

As you already have images and charlists to run on, now you are missing only the model file. In the sections below there are cells for downloading either the Chinese or Japanese model.
 
If it is your first run models will download. It might take up to ten minutes. 

We use `omz_downloader`, which is a command-line tool from the `openvino-dev` package. `omz_downloader` automatically creates a directory structure and downloads the selected model. 

In [None]:
path_to_model = f'{model_folder}/intel/{selected_language.model_name}/{precision}/{selected_language.model_name}.bin'
if not path.isfile(path_to_model):
    download_command = f'omz_downloader --name {selected_language.model_name} --output_dir {model_folder} --precision {precision}'
    print(download_command)
    ! $download_command

## Load Network and Execute

When all files are downloaded and language is selected, you need to read and load the network to run inference. The path to the model is defined based on the selected language.

In [None]:
ie = IECore()

path_to_model = f"{model_folder}/intel/{selected_language.model_name}/{precision}/{selected_language.model_name}.xml"

net = ie.read_network(
    model=path_to_model
)

### Select Device Name

You may choose to run the network on multiple devices by default it will load the model on the CPU (you can choose manually CPU, GPU, MYRIAD, etc.) or let the engine choose the best available device (AUTO).

To list all available devices that you can use, uncomment and run line ```print(ie.available_devices)```.

In [None]:
# To check available device names run line below
# print(ie.available_devices)

exec_net = ie.load_network(network=net, device_name="CPU")

## Fetch Information About Input and Output Layers 

The model is loaded, now you need to fetch information about input and output layers. This allows you to properly pass input and read the output.

In [None]:
recognition_output_layer = next(iter(exec_net.outputs))
recognition_input_layer = next(iter(exec_net.input_info))

## Load an Image

The next step is loading the image. 

Model input expects a single-channel image, for that reason we read the image in grayscale.

After loading the input image next step is getting information that you will use for calculating the scale ratio. It describes the ratio between required input layer height and current image height. In the cell below image will be resized and padded to keep letters proportions and meet input shape.

In [None]:
# Read file name of demo file based on used model

file_name = selected_language.demo_image_name

# Text detection models expects image in grayscale format
# IMPORTANT!!! This model allows to read only one line at time

# Read image
image = cv2.imread(filename=f"{data_folder}/{file_name}", flags=cv2.IMREAD_GRAYSCALE)

# Fetch shape
image_height, _ = image.shape

# B,C,H,W = batch size, number of channels, height, width
_, _, H, W = net.input_info[recognition_input_layer].input_data.shape

# Calculate scale ratio between input shape height and image height to resize image
scale_ratio = H / image_height

# Resize image to meet network expected input sizes
resized_image = cv2.resize(image, None, fx=scale_ratio, fy=scale_ratio, interpolation=cv2.INTER_AREA)

# Pad image to meet input size
resized_image = np.pad(resized_image, ((0, 0), (0, W - resized_image.shape[1])), mode='edge')

# Reshape to network input shape
input_image = resized_image[None, None, :, :]

## Visualise Input Image

After preprocessing you can display how the current image looks like.

In [None]:
plt.figure(figsize=(20, 1))
plt.axis('off')
plt.imshow(resized_image, cmap='gray', vmin=0, vmax=255);

## Prepare Charlist

The model is loaded, image is ready to go. The only element left in charlist. It is downloaded but before you will use it, there is one more thing. You need to add a blank symbol at the beginning of the charlist. It is expected by both languages models.

In [None]:
# Get dictionary to encode output, based on model documentation
used_charlist = selected_language.charlist_name

# With both models, there should be blank symbol added at index 0 of each charlists
blank_char = '~'

with open(f"{charlist_folder}/{used_charlist}", 'r', encoding='utf-8') as charlist:
    letters = blank_char + ''.join(line.strip() for line in charlist)

## Run Inference

Now, when everything is ready to go, run inference. As an input argument you need to use previously fetched information about the input layer and preprocessed input image, and to read output predictions you need information from the output.

In [None]:
# Run inference on model
predictions = exec_net.infer(inputs={recognition_input_layer: input_image})[recognition_output_layer]

## Process Output Data

The output of model format is W x B x L, where:

* W - output sequence length
* B - batch size
* L - confidence distribution across the supported symbols in Kondate and Nakayosi.

You need to make it in a more human-readable format. To do this you need to get a symbol with the highest probability. When you hold a list of indexes that are predicted to have the highest probability, due to limitations given by [CTC Decoding](https://towardsdatascience.com/beam-search-decoding-in-ctc-trained-neural-networks-5a889a3d85a7) you will remove concurrent symbols and then remove all the blanks.

The last step is getting symbols from corresponding indexes in charlist.

In [None]:
# Remove unnececery dimension
predictions = np.squeeze(predictions)

# Run argmax to pick most possible symbols
predictions_indexes = np.argmax(predictions, axis=1)

In [None]:
# Use groupby to remove concurrent letters, as required by CTC greedy decoding
output_text_indexes = list(groupby(predictions_indexes))

# Remove grouper objects
output_text_indexes, _ = np.transpose(output_text_indexes, (1, 0))

# Remove blank symbols
output_text_indexes = output_text_indexes[output_text_indexes != 0]

# Assign letters to indexes from output array
output_text = [letters[letter_index] for letter_index in output_text_indexes]

## Print Output

Now you have a list of letters predicted by the model. The only thing left to do is display the picture and predicted text below.

In [None]:
plt.figure(figsize=(20, 1))
plt.axis('off')
plt.imshow(resized_image, cmap='gray', vmin=0, vmax=255)

print(''.join(output_text))