Skip to content

fivedots0/Text-Restoration

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Text Recovery with OpenCV and Pytesseract

This repository provides a simple implementation of text recovery from images using OpenCV and Pytesseract. The main functionality is divided into three functions:

1. preprocess_image(image): This function takes an input image and performs various preprocessing steps to enhance the readability of the text. The steps include:

Converting the image to grayscale

Applying adaptive thresholding to binarize the image

Denoising the image using morphological operations

Dilating the image to enhance text regions

2. extract_text(image, config): This function uses Pytesseract, an optical character recognition (OCR) library, to extract text from the preprocessed image. The config parameter allows you to specify additional Pytesseract configurations, such as the language model and page segmentation mode.

3. text_recovery_with_ocr(image_path): This is the main function that combines the preprocessing and text extraction steps. It takes the path to the input image, preprocesses it, and then extracts the text using Pytesseract.

About

Text restoration is the process of extracting and cleaning up text from images or scanned documents. Here's a brief overview of how to do this using OpenCV and Pytesseract: Use OpenCV for image preprocessing like grayscaling and deskewing. Apply Pytesseract OCR to extract text. Postprocess the text to clean up any remaining issues.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages