Text Extraction from Image

Overview

This Python script utilizes the Tesseract OCR (Optical Character Recognition) engine along with the Pillow (PIL) library to extract text from images. The script processes the image by converting it to grayscale, enhancing its quality, removing noise, and applying filters to improve OCR accuracy.

Prerequisites

Ensure you have Tesseract OCR installed on your system. You can download it from Tesseract OCR.
Install the required Python libraries using the following:
```
pip install pytesseract
pip install Pillow
```

Setup

Define the path to the Tesseract executable (tesseract_cmd) and the Tesseract data folder (TESSDATA_PREFIX) in the script.

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
os.environ['TESSDATA_PREFIX'] = 'C:/Program Files/Tesseract-OCR/tessdata'

Usage

Call the extract_text_from_image function with the path to the image you want to process.

image_path = 'path/to/your/image.png'
result_text = extract_text_from_image(image_path)
print(result_text)

Image Processing Steps

1-Open the image using PIL. 2-Convert the image to grayscale. 3-Enhance the image quality using autocontrast. 4-Remove noise using a median filter. 5-Invert the image colors if needed for optimal OCR.

Note

Ensure the correct path to Tesseract OCR and its data is provided. Adjust language parameters in the OCR configuration for your specific use case. Experiment with additional image processing techniques based on the characteristics of your images for improved OCR results.

Contributing

Contributions are welcome! Feel free to open an issue or submit a pull request with any improvements or bug fixes.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
pytesseract.py		pytesseract.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text Extraction from Image

Overview

Prerequisites

Setup

Usage

Image Processing Steps

Note

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Text Extraction from Image

Overview

Prerequisites

Setup

Usage

Image Processing Steps

Note

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages