<h1> Implementation of code sequence to detect text fields on photo </h1>

As one of the possible ways to detect text fields on photo, we can use keras_ocr. It is a package for text detection and recognition with deep learning, yet we will use only text detection part to combine it with our project text recognition model.

Team also tried to use EAST, OpenCV, Tensorflow and Tesseract, yet decided to proceed with keras_ocr, so other trials won't be commited and pushed to the repository.

Additional documentation on keras_ocr can be found here: https://keras-ocr.readthedocs.io/en/latest/index.html

In [None]:
#common import
import matplotlib.pyplot as plt

#for text field detection
import keras_ocr
import tensorflow as tf

#for image processing
import numpy as np
import cv2
from skimage.filters import threshold_local
from PIL import Image

In [None]:
# keras-ocr text detector
detector = keras_ocr.detection.Detector()

In [None]:
# read image
image = keras_ocr.tools.read('images/product_test_2.jpg')

In [None]:
# get predictions
predictions = detector.detect(images=[image])[0]

In [None]:
# draw boxes around text
showImage = keras_ocr.tools.drawBoxes(image, predictions, (0,255,0), 1)

# show image
plt.imshow(showImage)

Cutting detected text fiels into separate images

In [None]:
#cut one of the detected text box for further processing
tempImg = keras_ocr.tools.warpBox(image, predictions[11], margin=2)
plt.imshow(tempImg)
plt.show()

#save box
tf.keras.utils.save_img('images/img_before_processing.jpg', tempImg)

**Image processing part poc**

Image processing goal is to make text more readable for OCR model. We will use OpenCV for this purpose.
In short we will:
1. Convert image to grayscale (a must for OCR, as it recognises black letters on white background)
2. Apply thresholding
3. Apply dilation (and maybe erosion) to remove some noise and possibly we will need to apply some other filters to make text more readable like bilateral filter (may be skipped)
4. Find contours
5. Find bounding boxes for contours
6. Crop image using bounding boxes into small pieces (letters)

In [None]:
def plot_gray(image):
    plt.figure(figsize=(16,10))
    return plt.imshow(image, cmap='Greys_r')

In [None]:
procImg = cv2.imread('images/img_before_processing.jpg')

#could be that we need downscale/upscale (resize of the image) here
#procImg = cv2.resize(procImg, dim(width, height), interpolation = cv2.INTER_AREA)

#convert to grayscale for further processing
gray = cv2.cvtColor(procImg, cv2.COLOR_BGR2GRAY)
plot_gray(gray)

In [None]:
#apply gaussian blur to remove noise
blurred = cv2.GaussianBlur(gray, (3, 3), 0, borderType=cv2.BORDER_WRAP)
plot_gray(blurred)

In [None]:
import imutils
#apply thresholding to convert to black and white

T = threshold_local(blurred, 21, offset = 5, method = "gaussian")
threshImg = (blurred > T).astype("uint8") * 255
plot_gray(threshImg)

ret, thresh1 = cv2.threshold(blurred, 0, 255, cv2.THRESH_BINARY + cv2.THRESH_OTSU)
plot_gray(thresh1)

dilated = cv2.dilate(thresh1, None, iterations=1)
plot_gray(dilated)

#check if we need to inverse colors to make text black and background white
threshImgInverse = cv2.bitwise_not(thresh1)
plot_gray(threshImgInverse)

In [None]:
#save processed image as a result
cv2.imwrite('images/img_after_processing.jpg', threshImg)

#save processed image inverse as a result
cv2.imwrite('images/img_after_processing_inverse.jpg', threshImgInverse)

In [None]:
# test of bilateral filter
temp = cv2.bilateralFilter(threshImg, 3, 100, 100, borderType=cv2.BORDER_CONSTANT)
plot_gray(temp)

<h3> Structuring findings for text field detection as a defined code sequence to use for further image processing </h3>

In [None]:
def detectText (img):
    # get predictions
    predictions = detector.detect(images=[img])[0]

    # draw boxes around text
    image = keras_ocr.tools.drawBoxes(img, predictions, (0,255,0), 0)

    # show image
    plt.imshow(image)
    plt.show()

    # return each detected box as a separate image
    for i in range(len(predictions)):
        temp = keras_ocr.tools.warpBox(img, predictions[i], margin=2)
        plt.imshow(temp)
        plt.show()
        # save each image (uncomment to save)
      #  tf.keras.utils.save_img('images/temp_text_{}.jpg'.format(i), temp)

In [None]:
# test with an image of a product label from phone camera
imgTest = keras_ocr.tools.read('images/product_test_2.jpg')
detectText(imgTest)

As can be seen - image (pre)processing could be adjusted for better detection result. Additionaly team can change thesholds for detection to eliminate cropped images with no recognisable text.