opencv-ocr

Implement OCR based on OpenCV (opencv-python).

General idea

To implement OCR (Optical character recognition) with OpenCV, we will follow these general steps:

Preprocess the image: OCR requires a clear, bright, and noise-free image, so the first step is to preprocess the image, such as removing noise, smoothing, enhancing contrast, binarizing, and so on.
Text localization: OCR needs to recognize the text, which must be localized first. We can use edge detection algorithms and morphological operations provided by OpenCV, such as erosion and dilation, to detect and segment text regions.
Character segmentation: Our OCR task is to recognize individual characters rather than whole words, so that we need to use character segmentation algorithms to further segment the text regions into individual characters.
Feature extraction: Once our text regions or characters are segmented, we need to extract their features for recognition. we can use feature extraction algorithms provided by OpenCV, such as SIFT, SURF, or ORB, etc.
Train the model: Once we have prepared the feature data, we can start training the model. We can use various machine learning algorithms, such as Support Vector Machine (SVM), Neural Networks, Random Forest, etc.
Recognize text: When our model is trained, we can use it to recognize text. We can use the model to predict the text in the image and return the results to the user.

Flowchart:

flowchart LR
    g[Deep Learning] -.-> e
    a[Original] --> b[Pre-processed] --> c["Chars (Image)"] --> e["Chars (String)"] --> f[Result]
    b --> d[Location] ---> f
    a -----> f

Implementation principle

Next, I will explain the specific steps for us to implement this OCR program with examples.

Before we start, this is the example picture we used:

Preprocess the image

Image preprocessing is a key step to achieve an efficient and accurate OCR program. This step is mainly to enhance image quality and optimize the next steps.

We use operations such as denoising, smoothing, enhanced contrast and binaryization to better identify text. Clear, bright and noise-free images help to improve the accuracy of the OCR system. In addition, image preprocessing can also adapt the OCR program to different types of images. In real life, the image may be affected by lighting, camera quality, angle and other factors, resulting in poor image quality. Preprocessing helps to solve these problems, so that the OCR program can work normally under different conditions. The pre-processed image has better quality and clarity, which helps to improve the effect of next steps, such as text positioning, character segmentation, feature extraction, etc.

def preprocess_image(img, ksize=3):
    # Grayscale
    img = cv.cvtColor(img, cv.COLOR_BGR2GRAY)

    # Denoising
    img = cv.medianBlur(img, ksize=ksize)

    # Binaryization
    _, img_bin = cv.threshold(
        img, 0, 255, cv.THRESH_BINARY + cv.THRESH_OTSU)

    # Inverse the binary image
    img_bin = 255 - img_bin

    return img_bin

Character segmentation

Then we need to divide the text area in the image into single characters. Compared with complete sentences or words, recognizing a single letter is less prone to ambiguity (because there are only 26 possibilities in total), thus improving the overall recognition accuracy. And after dividing the sentence into a single letter, the problem can be simplified to a classification problem and a classifier can be established for each letter. In this way, the difficulty of the training model is reduced, and the calculation cost is also reduced.

Finally, after dividing into single letters, the OCR system can support multiple languages more easily, because most languages are composed of basic characters (our OCR software does not support multiple languages for the time being).

We use the projection method to determine the position of each character and then cut it according to its position. At the same time, we record the position of each character for subsequent use.

We first calculate the horizontal projection and cut the picture horizontally through it:

def get_h_progection(img):
    r, c = img.shape
    h_progection = np.zeros(img.shape, np.uint8)
    hrowsum = [0]*r
    for i, j in product(range(r), range(c)):
        if img[i, j] == 255:
            hrowsum[i] += 1
    for i in range(r):
        for j in range(hrowsum[i]):
            h_progection[i, j] = 255
    cv.show('h_progection', h_progection)
    return hrowsum

Then we calculate the vertical projection and cut the picture vertically through it (don't forget to border each character for deep learning model recognition):

def get_v_progection(img):
    r, c = img.shape
    v_progection = np.zeros(img.shape, np.uint8)
    vcolsum = [0]*c
    for i, j in product(range(r), range(c)):
        if img[i, j] == 255:
            vcolsum[j] += 1
    for j in range(c):
        for i in range(r-vcolsum[j], r):
            v_progection[i, j] = 255
    cv.imshow('v_progection', v_progection)
    return vcolsum

Recognize text

In order to save time, we skipped the steps of training the deep learning model and directly used the open source model of Google's Tesseract team.

Directly call Tesseract's API to identify character images:

def recognize_text(char):
    return pytesseract.image_to_string(char, lang='eng', config='--psm 10').strip()

Synthesis results

Finally, draw the position information of the character (green box) and the recognition result (red character) to the original picture:

str = []
font_size, font_weight = 1, 2
for i in range(len(rows)):
    for j in range(len(rows[i])):
        text = recognize_text(rows[i][j])
        cv.rectangle(img, (p[i][j][1]-p[i][j][3], p[i][j]
                           [0]-p[i][j][2]), (p[i][j][1], p[i][j][0]), (0, 255, 0), 2)
        cv.putText(img, text, (p[i][j][1]-p[i][j][3]-4, p[i][j][0]-p[i]
                   [j][2]-1), cv.FONT_HERSHEY_COMPLEX, font_size, (50, 50, 255), font_weight)
        str.append(text)

print(''.join(str), end=None)
cv.imshow('Result', img)
cv.waitKey(0)

License

The code in this project is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.vscode		.vscode
docs		docs
examples		examples
opencv-ocr		opencv-ocr
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

opencv-ocr

General idea

Implementation principle

Preprocess the image

Character segmentation

Recognize text

Synthesis results

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mogeko/opencv-ocr

Folders and files

Latest commit

History

Repository files navigation

opencv-ocr

General idea

Implementation principle

Preprocess the image

Character segmentation

Recognize text

Synthesis results

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages