This OCR (Optical Character Recognition) project utilizes OpenCV for image preprocessing and Tesseract for text recognition. OCR is a technology that extracts text from images, making it possible to convert scanned documents, images, or handwritten text into machine-readable text.
-
Document Digitization: OCR allows for the conversion of physical documents into digital formats, making them searchable and editable.
-
Text Extraction from Images: Extracting text content from images, such as photographs, screenshots, or scanned pages.
-
Data Entry Automation: Automate data entry tasks by extracting text information from documents or forms.
-
Document Classification: Classify documents based on their content by analyzing extracted text.
-
Accessibility: Improve accessibility by converting text from images into readable text for visually impaired individuals.
The original image is converted to grayscale using OpenCV. This simplifies the image and reduces the number of channels.
Gaussian blur is applied to the grayscale image to reduce noise and create a smoother image, enhancing OCR accuracy.
Adaptive thresholding is used to segment the image into foreground (text) and background. It adapts to varying lighting conditions.
Morphological operations, such as closing and opening, are applied to clean up the image by removing small noise and filling gaps.
The image is inverted to have white text on a black background. This inversion is often beneficial for better compatibility with OCR engines.
Canny edge detection is employed to highlight edges in the image, improving text segmentation.
Histogram equalization is applied to enhance the contrast of the image, making the text stand out.
Contour detection is used to perform layout analysis and identify text regions. Skew correction is applied to correct any rotational skew in the document.
Horizontal and vertical lines are removed using morphological operations, eliminating interference from grid lines or table borders in documents.
- Clone the Repository:
git clone https://github.com/namam2398/ocr-project.git
- Install Dependencies:
pip install -r requirements.txt- Run the OCR:
python main.py-
Modify Preprocessing Steps: Adjust the preprocessing steps in
ocr_module/image_preprocessing.pyto suit your document characteristics. -
Fine-Tune OCR Parameters: Experiment with Tesseract configuration options in
ocr_module/text_recognition.pyfor optimal text recognition. -
Handle Different Languages: Set the language parameter in
main.pyandocr_module/text_recognition.pyfor multilingual support.
Feel free to extend and adapt this project based on your specific use cases and requirements. Contributions and improvements are welcome!
This README now includes a brief explanation of each preprocessing step used in the OCR project, providing a clearer understanding of the image processing pipeline.