<a href="https://colab.research.google.com/github/rahiakela/computer-vision-research-and-practice/blob/main/opencv-projects-and-guide/ocr-with-opencv-and-tesseract/07_improving_results_with_tesseract_options.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

##Improving OCR Results with Tesseract Options

In [None]:
%%shell

sudo apt install tesseract-ocr
pip install tesseract
pip install pytesseract
pip install Pillow==9.0.0

Just restart the colab environment.

In [1]:
import cv2
import pytesseract
import csv
import numpy as np


from matplotlib import pyplot as plt
from google.colab.patches import cv2_imshow

%matplotlib inline

In [2]:
pytesseract.pytesseract.tesseract_cmd = (r'/usr/bin/tesseract')

Let's download images.

In [None]:
%%shell

wget https://github.com/rahiakela/computer-vision-research-and-practice/raw/main/opencv-projects-and-guide/ocr-with-opencv-and-tesseract/images/text-orient-1.png
wget https://github.com/rahiakela/computer-vision-research-and-practice/raw/main/opencv-projects-and-guide/ocr-with-opencv-and-tesseract/images/text-orient-1.png

In [39]:
!tesseract --help-psm

Page segmentation modes:
  0    Orientation and script detection (OSD) only.
  1    Automatic page segmentation with OSD.
  2    Automatic page segmentation, but no OSD, or OCR.
  3    Fully automatic page segmentation, but no OSD. (Default)
  4    Assume a single column of text of variable sizes.
  5    Assume a single uniform block of vertically aligned text.
  6    Assume a single uniform block of text.
  7    Treat the image as a single text line.
  8    Treat the image as a single word.
  9    Treat the image as a single word in a circle.
 10    Treat the image as a single character.
 11    Sparse text. Find as much text as possible in no particular order.
 12    Sparse text with OSD.
 13    Raw line. Treat the image as a single text line,
       bypassing hacks that are Tesseract-specific.


##PSM 0

Orientation and script detection (OSD) examines the input image, but instead of returning the
actual OCR’d text, OSD returns two values:

* How the page is oriented, in degrees
* The confidence of the script

In [24]:
def text_orientation(img_path, options):
  image = cv2.imread(img_path)

  image_bgr = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

  # determine the text orientation
  results = pytesseract.image_to_osd(image_bgr, output_type=pytesseract.Output.DICT, config=options)

  print(f"Page number: {results['page_num']}")
  print(f"Orientation: {results['orientation']}")
  print(f"Rotate: {results['rotate']}")
  print(f"Orientation confidence: {results['orientation_conf']}")
  print(f"Script: {results['script']}")
  print(f"Script confidence: {results['script_conf']}")

In [25]:
text_orientation("text-orient-1.png", options="--psm 0")

--------------------------------------------------------------
Page number: 0
Orientation: 0
Rotate: 0
Orientation confidence: 4.51
Script: Latin
Script confidence: 4.58


In [26]:
text_orientation("text-orient-2.png", options="--psm 0")

Page number: 0
Orientation: 90
Rotate: 270
Orientation confidence: 3.7
Script: Latin
Script confidence: 8.15


##PSM 1

In [42]:
def psm_options(img_path, options=None):
  image = cv2.imread(img_path)

  image_bgr = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

  # determine the text orientation
  results = pytesseract.image_to_string(image_bgr, config=options)
  return results

In [34]:
results = psm_options("text-orient-1.png", options="--psm 1")
results

"In the first part of this tutorial, we'll discuss how\nautoencoders can be used used for image retrieval\nand building image search engines.\n\nFrom there, we'll implement a convolutional autoencoder\nthat we'll then train on our image dataset.\n\x0c"

In [38]:
results = psm_options("text-orient-2.png", options="--psm 1")
results

" \n\nIn the first part of this tutorial, we'll discuss how\nautoencoders can be used used for image retrieval\nand building image search engines.\n\nFrom there, we'll implement a convolutional autoencoder\nthat we'll then train on our image dataset.\n\x0c"

##PSM 3

In [40]:
results = psm_options("text-orient-1.png", options="--psm 3")
results

"In the first part of this tutorial, we'll discuss how\nautoencoders can be used used for image retrieval\nand building image search engines.\n\nFrom there, we'll implement a convolutional autoencoder\nthat we'll then train on our image dataset.\n\x0c"

In [41]:
results = psm_options("text-orient-2.png", options="--psm 3")
results

" \n\nIn the first part of this tutorial, we'll discuss how\nautoencoders can be used used for image retrieval\nand building image search engines.\n\nFrom there, we'll implement a convolutional autoencoder\nthat we'll then train on our image dataset.\n\x0c"

PSM 3 is the default behavior of Tesseract.

In [43]:
results = psm_options("text-orient-1.png")
results

"In the first part of this tutorial, we'll discuss how\nautoencoders can be used used for image retrieval\nand building image search engines.\n\nFrom there, we'll implement a convolutional autoencoder\nthat we'll then train on our image dataset.\n\x0c"

In [44]:
results = psm_options("text-orient-2.png")
results

" \n\nIn the first part of this tutorial, we'll discuss how\nautoencoders can be used used for image retrieval\nand building image search engines.\n\nFrom there, we'll implement a convolutional autoencoder\nthat we'll then train on our image dataset.\n\x0c"

##PSM 4