This project extracts text from images using Python and Tesseract OCR (Optical Character Recognition).
It works on printed and handwritten text images, returning the detected text as plain text or saving it to a file.
- Reads text from images (JPG, PNG, BMP, etc.)
- Supports printed and handwritten text
- Easy integration with Python scripts
- Option to save extracted text to
.txtfile - Lightweight and open-source
- Python 3.8+
- PyTesseract (Python wrapper for Google Tesseract OCR)
- Pillow (PIL) for image processing
- Download the installer from: 👉 https://github.com/UB-Mannheim/tesseract/wiki
- Install it and note the installation path (e.g.,
C:\Program Files\Tesseract-OCR\tesseract.exe)
brew install tesseractsudo apt update
sudo apt install tesseract-ocrpip install pytesseract pillow📁 image-text-reader/
├── images/
│ ├── sample1.jpg
│ ├── sample2.png
├── read_text.py
└── README.md
python read_text.py🧠 Output will appear in the terminal and a .txt file will be created for each image:
sample1_output.txt
sample2_output.txt
Output (Console):
Hello World!
This text was extracted using PyTesseract OCR.
-
For better accuracy:
- Use clear, high-resolution images.
- Preprocess images (grayscale, thresholding, noise removal).
-
You can detect text in different languages:
pytesseract.image_to_string(image, lang='eng+hin')
(Make sure the language data is installed for Tesseract.)
This project is licensed under the MIT License.