<a href="https://colab.research.google.com/github/likeshd/ocr_work/blob/main/easyOCR_vs_kerasOCR_vs_paddleOCR_vs_pytesseract_vs_openCV.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Methodology
Image Selection
I have selected three images for comparison, each presenting unique challenges to the OCR tools. The images used are as follows:

Image 1: A photograph of a car’s license plate. The text is clear and high-contrast, making it a relatively straightforward image for OCR. The background is metallic, and the text is well-defined, which should generally pose minimal difficulty for most OCR systems.
Image 2: A cover page of a notebook with the word “Carousel” prominently displayed. This image includes stylized fonts and a patterned background, which adds complexity. The OCR tool must accurately detect the decorative text without being misled by the background patterns.
Image 3: A picture of a broadband router with a label that includes text and a QR code. The text is relatively small and located on a plastic surface with some reflection, making it challenging for OCR systems to read without errors. The presence of the QR code could also add noise to the text detection process.
These images were chosen to evaluate the OCR tools across a range of scenarios, from simple and clear text to more complex, stylized, or low-contrast situations.

Criteria for Comparison
I have considered following parameters in the comparison:

Accuracy: The primary measure of success for each OCR tool, determined by how accurately the tool reads the text in each image.
Speed: The time taken by each tool to process the images and output the text.
Ease of Use: How simple and straightforward it was to implement each tool, considering the installation process, the complexity of the code, and the availability of documentation and support.
Robustness: The ability of each tool to handle various challenges, such as stylized fonts, low contrast, or background noise, without significant degradation in performance.
Tools and Environment
The OCR tools compared in this study are EasyOCR, KerasOCR, Pytesseract, PaddleOCR, and OpenCV. All tools were tested in the same environment to ensure a fair comparison.

The images were processed in their original form, with no pre-processing applied, to evaluate the raw performance of each OCR tool under typical conditions.

## EasyOCR

In [None]:
import easyocr

reader = easyocr.Reader(['en'], gpu=True)
easy_ocr_result = reader.readtext(easy_ocr_image, detail=1, paragraph = False)
print(easy_ocr_result)

for (coord, text, prob) in easy_ocr_result:
  (top_left, top_right, bottom_right, bottom_left) = coord
  tx, ty = (int(top_left[0]), int(top_left[1]))
  bx, by = (int(bottom_right[0]), int(bottom_right[1]))
  cv2.rectangle(easy_ocr_image, (tx, ty), (bx, by), (0, 255, 0), 2)
  cv2.putText(easy_ocr_image, text, (tx, ty - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)

cv2.imwrite("easy_ocr_result.png", easy_ocr_image)

EasyOCR proved to be a reliable tool for detecting text across different types of images — whether it’s a book cover, a license plate, or a device label. Its versatility makes it a useful solution for a variety of real-world applications, from document scanning to object recognition.

## KerasOCR

In [None]:
import keras_ocr

pipeline = keras_ocr.pipeline.Pipeline() # initiliaze keras_pipeline

images = [
    keras_ocr.tools.read(img) for img in ['image.jpeg']
]

keras_ocr_result = pipeline.recognize([images[0]])
print(keras_ocr_result)

for (text, bbox) in keras_ocr_result[0]:
  print(text)
  print(bbox)
  (topleft, topright, bottomright, bottomleft) = bbox
  tx, ty = (int(topleft[0]), int(topleft[1]))
  bx, by = (int(bottomright[0]), int(bottomright[1]))
  cv2.rectangle(images[0], (tx, ty), (bx, by), (0, 255, 0), 2)
  cv2.putText(images[0], text, (tx, ty - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)

cv2.imwrite("keras_ocr_result.png", images[0])
sv.plot_image(images[0])

KerasOCR demonstrated strong performance across all three images, accurately detecting and recognizing text from different sources — a book cover, a car license plate, and a broadband router. Whether dealing with designed fonts, reflective surfaces, or curved labels, KerasOCR proved to be an efficient tool for diverse text recognition tasks.

## PaddleOCR

In [None]:
# !pip install paddlepaddle
# !pip install paddleocr
# !git clone https://github.com/PaddlePaddle/PaddleOCR.git

from paddleocr import PaddleOCR, draw_ocr
import numpy as np
import cv2

ocr = PaddleOCR(use_angle_cls=True, lang='en')

image_vgr = cv2.imread('image.jpeg')
text_result = ocr.ocr(image_vgr, cls=True)

print(len(text_result))
print(text_result)

coordinates = []
for data in text_result[0]:
  for i in range(len(data)):
    print(data[i])
    coordinates.append(data[i])

for row in text_result[0]:
  bbox = [[int(r[0]), int(r[1])] for r in row[0]]
  print(bbox)
  coordinates.append(bbox)
  cv2.putText(image_vgr, row[1][0], bbox[0], cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 2)

cv2.imwrite("paddle_ocr_result.png", image_vgr)
sv.plot_image(image_vgr)

PaddleOCR is effective for detecting and recognizing text in clear, standard fonts and high-contrast environments. However, it may face challenges with stylized fonts, low contrast, complex backgrounds, and small text. Its performance can vary based on the quality and clarity of the input images.

#pytesseract

In [None]:
# Import necessary libraries
import cv2
import pytesseract
from pytesseract import Output
import matplotlib.pyplot as plt

# Load the image
image_path = 'image.jpeg'  # Replace with your image path
image = cv2.imread(image_path)

# Convert the image to grayscale
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Apply thresholding to preprocess the image
_, binary_image = cv2.threshold(gray_image, 150, 255, cv2.THRESH_BINARY_INV)

# Perform text detection using Tesseract
# If you have Tesseract installed in a custom path, specify it using:
# pytesseract.pytesseract.tesseract_cmd = r'path_to_your_tesseract.exe'
detection_data = pytesseract.image_to_data(binary_image, output_type=Output.DICT)

# Draw the bounding boxes around detected text
n_boxes = len(detection_data['level'])
for i in range(n_boxes):
    (x, y, w, h) = (detection_data['left'][i], detection_data['top'][i],
                    detection_data['width'][i], detection_data['height'][i])
    cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

# Display the image with bounding boxes and detected text
plt.figure(figsize=(10, 10))
plt.imshow(cv2.cvtColor(image, cv2.COLOR_BGR2RGB))
plt.axis('off')
plt.show()
cv2.imwrite("pytesseract_result.png", cv2.cvtColor(image, cv2.COLOR_BGR2RGB))

PyTesseract is effective at detecting and recognizing text in clear, high-contrast images, as seen with the license plate in Image 1 and the text on the router in Image 3. However, it may struggle with more complex or stylized text, as shown in Image 1 where it has difficulty with the decorative text on the book cover; whereas it fails to detect text accurately in Image 2.

## OpenCV

In [None]:
import cv2
import numpy as np
from skimage import io as sv

# Load the image
image_path = 'image.jpeg'  # replace with your image path
image = cv2.imread(image_path)

# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# Apply edge detection
edges = cv2.Canny(gray, 50, 150, apertureSize=3)

# Dilate the edges to close gaps between edge segments
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (5, 5))
dilated = cv2.dilate(edges, kernel, iterations=2)

# Find contours based on edges detected
contours, _ = cv2.findContours(dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

# Filter contours based on size, shape, etc. (assuming text is within rectangular regions)
for contour in contours:
    # Approximate the contour to a polygon
    approx = cv2.approxPolyDP(contour, 0.02 * cv2.arcLength(contour, True), True)

    # Get bounding box for the contour
    x, y, w, h = cv2.boundingRect(approx)

    # Filter out small or large areas which are unlikely to be text
    aspect_ratio = w / float(h)
    if 0.2 < aspect_ratio < 5:  # Typical aspect ratio range for text
        if 100 < w * h < 10000:  # Area constraints to filter out very small/large regions
            # Draw bounding box around detected text area
            cv2.rectangle(image, (x, y), (x + w, y + h), (0, 255, 0), 2)

# Convert image from BGR (OpenCV default) to RGB for correct color representation
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# Use skimage's plot function to display the image
sv.imshow(image_rgb)
sv.show()
cv2.imwrite("opencv_result.png", image_rgb)

OpenCV is more effective in scenarios where the text has high contrast against a simple, uniform background, such as on license plates or printed text on plain surfaces, especially when the text is in a standard font and well-aligned. However, its effectiveness decreases in situations with complex backgrounds, varied fonts, low contrast, or when the text is surrounded by other non-text elements like patterns, logos, or images, making it harder for the algorithm to accurately isolate and identify the text.