GitHub - fivedots0/Text-Restoration: Text restoration is the process of extracting and cleaning up text from images or scanned documents. Here's a brief overview of how to do this using OpenCV and Pytesseract: Use OpenCV for image preprocessing like grayscaling and deskewing. Apply Pytesseract OCR to extract text. Postprocess the text to clean up any remaining issues.

Text Recovery with OpenCV and Pytesseract

This repository provides a simple implementation of text recovery from images using OpenCV and Pytesseract. The main functionality is divided into three functions:

1. preprocess_image(image): This function takes an input image and performs various preprocessing steps to enhance the readability of the text. The steps include:

Converting the image to grayscale

Applying adaptive thresholding to binarize the image

Denoising the image using morphological operations

Dilating the image to enhance text regions

2. extract_text(image, config): This function uses Pytesseract, an optical character recognition (OCR) library, to extract text from the preprocessed image. The config parameter allows you to specify additional Pytesseract configurations, such as the language model and page segmentation mode.

3. text_recovery_with_ocr(image_path): This is the main function that combines the preprocessing and text extraction steps. It takes the path to the input image, preprocesses it, and then extracts the text using Pytesseract.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
ocr.py		ocr.py
tes3.png		tes3.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Recovery with OpenCV and Pytesseract

1. preprocess_image(image): This function takes an input image and performs various preprocessing steps to enhance the readability of the text. The steps include:

Applying adaptive thresholding to binarize the image

Denoising the image using morphological operations

Dilating the image to enhance text regions

2. extract_text(image, config): This function uses Pytesseract, an optical character recognition (OCR) library, to extract text from the preprocessed image. The config parameter allows you to specify additional Pytesseract configurations, such as the language model and page segmentation mode.

3. text_recovery_with_ocr(image_path): This is the main function that combines the preprocessing and text extraction steps. It takes the path to the input image, preprocesses it, and then extracts the text using Pytesseract.

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Recovery with OpenCV and Pytesseract

1. preprocess_image(image): This function takes an input image and performs various preprocessing steps to enhance the readability of the text. The steps include:

Applying adaptive thresholding to binarize the image

Denoising the image using morphological operations

Dilating the image to enhance text regions

2. extract_text(image, config): This function uses Pytesseract, an optical character recognition (OCR) library, to extract text from the preprocessed image. The config parameter allows you to specify additional Pytesseract configurations, such as the language model and page segmentation mode.

3. text_recovery_with_ocr(image_path): This is the main function that combines the preprocessing and text extraction steps. It takes the path to the input image, preprocesses it, and then extracts the text using Pytesseract.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages