# Text Detection and Extraction using OpenCV and OCR

GEEKSFORGEEKS. [Here](https://www.geeksforgeeks.org/text-detection-and-extraction-using-opencv-and-ocr/amp/)


OpenCV (Open source computer vision) is a library of programming functions mainly aimed at real-time computer vision. OpenCV in python helps to process an image and apply various functions like resizing image, pixel manipulations, object detection, etc. In this article, we will learn how to use contours to detect the text in an image and save it to a text file.

Required Installations:

- OpenCV. [Here](https://www.geeksforgeeks.org/opencv-python-tutorial/amp/)
- Tesseract. [Here](https://github.com/tesseract-ocr/tesseract/releases). [Windows Installer](https://github.com/UB-Mannheim/tesseract/wiki)

__OpenCV package__ is used to read an image and perform certain image processing techniques. __Python-tesseract__ is a wrapper for Google’s Tesseract-OCR Engine which is used to recognize text from images.


```
pip install opencv-python
pip install pytesseract
```

In [1]:
# Import required packages 
import cv2 
import pytesseract 
print(cv2.__version__)

ModuleNotFoundError: No module named 'cv2'

In [2]:
# Mention the installed location of Tesseract-OCR in your system 
pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'

In [3]:
# Read image from which text needs to be extracted 
img = cv2.imread(".\\Images\\opencv-ocr-sample.jpg") 

### Preprocessing the image starts 

The colorspace of the image is first changed and stored in a variable.

In [4]:
# Convert the image to gray scale 
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) 

A threshold is applied to the coverted image using cv2.threshold function.

There are 3 types of thresholding [Threshold technique](https://www.geeksforgeeks.org/python-thresholding-techniques-using-opencv-set-1-simple-thresholding/amp/):

- Simple Thresholding
- Adaptive Thresholding
- Otsu’s Binarization

In [5]:
# Performing OTSU threshold 
ret, thresh1 = cv2.threshold(gray, 0, 255, cv2.THRESH_OTSU | cv2.THRESH_BINARY_INV) 

### To get a rectangular structure

Used to define a structural element like elliptical, circular, rectangular etc.

In [6]:
# Specify structure shape and kernel size.  
# Kernel size increases or decreases the area  
# of the rectangle to be detected. 
# A smaller value like (10, 10) will detect  
# each word instead of a sentence. 
rect_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (18, 18)) 

Dilation makes the groups of text to be detected more accurately since it dilates (expands) a text block.

In [7]:
# Appplying dilation on the threshold image 
dilation = cv2.dilate(thresh1, rect_kernel, iterations = 1) 

### Finding Contours

Used to find contours in the dilated image

This function returns contours and hierarchy. Contours is a python list of all the contours in the image. Each contour is a Numpy array of (x, y) coordinates of boundary points in the object. Contours are typically used to find a white object from a black background. All the above image processing techniques are applied so that the Contours can detect the boundary edges of the blocks of text of the image.

In [8]:
# Finding contours 
contours, hierarchy = cv2.findContours(dilation, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_NONE)

In [9]:
# Creating a copy of image 
im2 = img.copy() 

A text file is opened in write mode and flushed. This text file is opened to save the text from the output of the OCR.

In [10]:
# A text file is created and flushed 

file = open(".\\Output\\recognized.txt", "w+") 
file.write("") 
file.close() 

### Applying OCR

- Loop through each contour and take the x and y coordinates and the width and height using the function cv2.boundingRect(). 
- Then draw a rectangle in the image using the function cv2.rectangle() with the help of obtained x and y coordinates and the width and height. 

    - There are 5 parameters in the cv2.rectangle(), the first parameter specifies the input image, followed by the x and y coordinates (starting coordinates of the rectangle), the ending coordinates of the rectangle which is (x+w, y+h), the boundary color for the rectangle in RGB value and the size of the boundar
    
- Crop the rectangular region and then pass it to the tesseract to extract the text from the image.



In [16]:
# Looping through the identified contours 
# Then rectangular part is cropped and passed on 
# to pytesseract for extracting text from it 
# Extracted text is then written into the text file 

for cnt in contours: 

    x, y, w, h = cv2.boundingRect(cnt) 

    # Drawing a rectangle on copied image 
    rect = cv2.rectangle(im2, (x, y), (x + w, y + h), (0, 255, 0), 2) 

    # Cropping the text block for giving input to OCR 
    cropped = im2[y:y + h, x:x + w] 

    # Open the file in append mode 
    file = open(".\\Output\\recognized.txt", "a") 
        
    # Adding custom options
    custom_config = r'--oem 3 --psm 6'

    # Apply OCR on the cropped image 
    #text = pytesseract.image_to_string(cropped) 
    text = pytesseract.image_to_string(cropped, config=custom_config)

    # Appending the text into file 
    file.write(text) 
    file.write("\n") 

    # Close the file 
    file.close 