A powerful Python-based OCR tool supporting multiple engines for handling challenging images with noise, poor lighting, and complex backgrounds.
-
Multiple OCR Engines:
- PaddleOCR β - Best for noisy/grainy images
- EasyOCR - Excellent with challenging backgrounds
- Surya OCR - Modern, handles noise well
- Tesseract - Fast, good for clean images
-
Advanced Capabilities:
- β HEIC/HEIF image support (auto-conversion)
- β Confidence scores for all engines
- β Batch processing for multiple images
- β JSON export with detailed results
- β Processing time metrics
- β Error handling and recovery
-
Performance Optimizations (v2.1):
- β‘ Singleton pattern: 10-100x faster batch processing
- π― Lazy loading: Only load engines when needed
- π GPU auto-detection: Automatic CUDA support
- π Progress bars: Visual feedback with tqdm
- π Quiet mode: Minimal output for automation
-
Easy Deployment:
- π³ Docker support (works on all platforms)
- π¦ Simple helper scripts
- π§ Flexible configuration
docker build -t python-advanced-ocr .# Copy your image to images/ directory
cp /path/to/photo.jpg images/
# Run OCR with PaddleOCR (best for noisy images)
./run.sh images/photo.jpg paddleocr
# Or with all engines
./run.sh images/photo.jpg all
# Save results to JSON
./run.sh images/photo.jpg paddleocr images/results.json# Process all images in images/ directory
./batch_ocr.sh paddleocr
# Results saved to output/batch_results.jsondocker run --rm \
-v $(pwd)/images:/images \
python-advanced-ocr \
--engine paddleocr \
--input /images/photo.jpgdocker run --rm \
-v $(pwd)/images:/images \
-v $(pwd)/output:/output \
python-advanced-ocr \
--engine paddleocr \
--input-dir /images \
--output-dir /outputNote: Modern Docker uses docker compose (with space), not docker-compose (with hyphen).
# Single image (edit docker-compose.yml first to set your image path)
docker compose run ocr-single
# Batch processing
docker compose run ocr-batch
# Or use the simpler helper scripts instead (recommended):
./run.sh images/photo.jpg paddleocr
./batch_ocr.sh paddleocrpip install paddleocr paddlepaddle opencv-python Pillow numpypip install easyocr opencv-python Pillow numpypip install surya-ocr# Install tesseract-ocr system package first
# Ubuntu/Debian: sudo apt-get install tesseract-ocr
# macOS: brew install tesseract
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
pip install pytesseract Pillowpython3 ocr_tool.py --engine paddleocr --input photo.jpg
python3 ocr_tool.py --engine all --input photo.jpg --output results.json
python3 ocr_tool.py --engine paddleocr --input-dir ./images/ --output-dir ./results/| Engine | Speed | Accuracy (Clean) | Accuracy (Noisy) | Resource Usage |
|---|---|---|---|---|
| PaddleOCR | Medium | 96% | 92% β | Medium |
| EasyOCR | Slow | 95% | 90% | High |
| Surya | Medium | 94% | 88% | Medium |
| Tesseract | Very Fast | 90% | 60% | Low |
./run.sh images/solar_panel.heic paddleocr./run.sh images/document.jpg easyocr./batch_ocr.sh all./run.sh images/photo.jpg all images/comparison.jsonusage: ocr_tool.py [-h] [--version] [--engine {paddleocr,easyocr,surya,tesseract,all}]
[--input INPUT] [--input-dir INPUT_DIR]
[--output OUTPUT] [--output-dir OUTPUT_DIR]
[--verbose] [--quiet]
Advanced OCR Tool v2.1 - Performance Optimized
optional arguments:
-h, --help show this help message and exit
--version show program's version number and exit
--engine {paddleocr,easyocr,surya,tesseract,all}
OCR engine to use (default: paddleocr)
--input INPUT Input image file
--input-dir INPUT_DIR
Input directory for batch processing
--output OUTPUT Output JSON file
--output-dir OUTPUT_DIR
Output directory for batch processing
--verbose, -v Verbose output (default)
--quiet, -q Quiet mode (minimal output)
python-advanced-ocr/
βββ ocr_tool.py # Main OCR tool
βββ Dockerfile # Docker configuration
βββ docker-compose.yml # Docker Compose configuration
βββ run.sh # Helper script for single images
βββ batch_ocr.sh # Helper script for batch processing
βββ requirements.txt # Python dependencies
βββ images/ # Place your images here
βββ output/ # Batch processing results
βββ README.md # This file
Solution: Use Docker (recommended)
docker build -t python-advanced-ocr .
./run.sh images/photo.jpg paddleocrSolution: Install pillow-heif
pip install pillow-heifSolution: Use PaddleOCR instead of Tesseract
./run.sh images/noisy_image.jpg paddleocrSolution: Process images one at a time or use Tesseract (lower memory usage)
./run.sh images/photo.jpg tesseract{
"image": "photo.jpg",
"image_path": "/path/to/photo.jpg",
"engines": {
"paddleocr": {
"engine": "PaddleOCR",
"text": "Extracted text here...",
"confidence": 0.9234,
"lines": 15,
"processing_time": 2.34,
"success": true
}
}
}Based on official PaddleOCR recommendation, engines are initialized once and reused for all subsequent images:
Before (v1):
- Each image: Initialize engine β Process β Destroy
- 100 images: 100 initializations (very slow!)
After (v2.1):
- First image: Initialize engine β Process
- Next 99 images: Process only (10-100x faster!)
Automatically detects and uses CUDA if available:
# No configuration needed - just works!
python3 ocr_tool.py --engine paddleocr --input photo.jpg
# Output: β GPU detected: NVIDIA GeForce RTX 3080Perfect for scripts and automation:
# Only show final results, no progress output
python3 ocr_tool.py --quiet --engine paddleocr --input photo.jpg --output results.jsonContributions welcome! Please feel free to submit issues or pull requests.
MIT License