Skip to content

swipswaps/python-advanced-ocr

Repository files navigation

Advanced Python OCR Tool v2.1

A powerful Python-based OCR tool supporting multiple engines for handling challenging images with noise, poor lighting, and complex backgrounds.

πŸš€ Features

  • Multiple OCR Engines:

    • PaddleOCR ⭐ - Best for noisy/grainy images
    • EasyOCR - Excellent with challenging backgrounds
    • Surya OCR - Modern, handles noise well
    • Tesseract - Fast, good for clean images
  • Advanced Capabilities:

    • βœ… HEIC/HEIF image support (auto-conversion)
    • βœ… Confidence scores for all engines
    • βœ… Batch processing for multiple images
    • βœ… JSON export with detailed results
    • βœ… Processing time metrics
    • βœ… Error handling and recovery
  • Performance Optimizations (v2.1):

    • ⚑ Singleton pattern: 10-100x faster batch processing
    • 🎯 Lazy loading: Only load engines when needed
    • πŸš€ GPU auto-detection: Automatic CUDA support
    • πŸ“Š Progress bars: Visual feedback with tqdm
    • πŸ”‡ Quiet mode: Minimal output for automation
  • Easy Deployment:

    • 🐳 Docker support (works on all platforms)
    • πŸ“¦ Simple helper scripts
    • πŸ”§ Flexible configuration

πŸ“‹ Quick Start (Docker - Recommended for Fedora)

1. Build Docker Image

docker build -t python-advanced-ocr .

2. Process Single Image

# Copy your image to images/ directory
cp /path/to/photo.jpg images/

# Run OCR with PaddleOCR (best for noisy images)
./run.sh images/photo.jpg paddleocr

# Or with all engines
./run.sh images/photo.jpg all

# Save results to JSON
./run.sh images/photo.jpg paddleocr images/results.json

3. Batch Processing

# Process all images in images/ directory
./batch_ocr.sh paddleocr

# Results saved to output/batch_results.json

🐳 Docker Usage

Single Image

docker run --rm \
    -v $(pwd)/images:/images \
    python-advanced-ocr \
    --engine paddleocr \
    --input /images/photo.jpg

Batch Processing

docker run --rm \
    -v $(pwd)/images:/images \
    -v $(pwd)/output:/output \
    python-advanced-ocr \
    --engine paddleocr \
    --input-dir /images \
    --output-dir /output

Using Docker Compose

Note: Modern Docker uses docker compose (with space), not docker-compose (with hyphen).

# Single image (edit docker-compose.yml first to set your image path)
docker compose run ocr-single

# Batch processing
docker compose run ocr-batch

# Or use the simpler helper scripts instead (recommended):
./run.sh images/photo.jpg paddleocr
./batch_ocr.sh paddleocr

πŸ’» Direct Installation (Windows/macOS)

Install PaddleOCR (Recommended)

pip install paddleocr paddlepaddle opencv-python Pillow numpy

Install EasyOCR

pip install easyocr opencv-python Pillow numpy

Install Surya OCR

pip install surya-ocr

Install Tesseract

# Install tesseract-ocr system package first
# Ubuntu/Debian: sudo apt-get install tesseract-ocr
# macOS: brew install tesseract
# Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki

pip install pytesseract Pillow

Run Directly

python3 ocr_tool.py --engine paddleocr --input photo.jpg
python3 ocr_tool.py --engine all --input photo.jpg --output results.json
python3 ocr_tool.py --engine paddleocr --input-dir ./images/ --output-dir ./results/

πŸ“Š Performance Comparison

Engine Speed Accuracy (Clean) Accuracy (Noisy) Resource Usage
PaddleOCR Medium 96% 92% ⭐ Medium
EasyOCR Slow 95% 90% High
Surya Medium 94% 88% Medium
Tesseract Very Fast 90% 60% Low

🎯 Use Cases

Solar Panel Labels (Noisy/Grainy Images)

./run.sh images/solar_panel.heic paddleocr

Documents with Complex Backgrounds

./run.sh images/document.jpg easyocr

Batch Processing Multiple Images

./batch_ocr.sh all

Compare All Engines

./run.sh images/photo.jpg all images/comparison.json

πŸ“– Command Line Options

usage: ocr_tool.py [-h] [--version] [--engine {paddleocr,easyocr,surya,tesseract,all}]
                   [--input INPUT] [--input-dir INPUT_DIR]
                   [--output OUTPUT] [--output-dir OUTPUT_DIR]
                   [--verbose] [--quiet]

Advanced OCR Tool v2.1 - Performance Optimized

optional arguments:
  -h, --help            show this help message and exit
  --version             show program's version number and exit
  --engine {paddleocr,easyocr,surya,tesseract,all}
                        OCR engine to use (default: paddleocr)
  --input INPUT         Input image file
  --input-dir INPUT_DIR
                        Input directory for batch processing
  --output OUTPUT       Output JSON file
  --output-dir OUTPUT_DIR
                        Output directory for batch processing
  --verbose, -v         Verbose output (default)
  --quiet, -q           Quiet mode (minimal output)

πŸ“ Project Structure

python-advanced-ocr/
β”œβ”€β”€ ocr_tool.py           # Main OCR tool
β”œβ”€β”€ Dockerfile            # Docker configuration
β”œβ”€β”€ docker-compose.yml    # Docker Compose configuration
β”œβ”€β”€ run.sh                # Helper script for single images
β”œβ”€β”€ batch_ocr.sh          # Helper script for batch processing
β”œβ”€β”€ requirements.txt      # Python dependencies
β”œβ”€β”€ images/               # Place your images here
β”œβ”€β”€ output/               # Batch processing results
└── README.md             # This file

πŸ”§ Troubleshooting

PaddlePaddle Installation Fails on Fedora

Solution: Use Docker (recommended)

docker build -t python-advanced-ocr .
./run.sh images/photo.jpg paddleocr

HEIC Images Not Working

Solution: Install pillow-heif

pip install pillow-heif

Low Accuracy on Noisy Images

Solution: Use PaddleOCR instead of Tesseract

./run.sh images/noisy_image.jpg paddleocr

Out of Memory Errors

Solution: Process images one at a time or use Tesseract (lower memory usage)

./run.sh images/photo.jpg tesseract

πŸ“ Output Format

{
  "image": "photo.jpg",
  "image_path": "/path/to/photo.jpg",
  "engines": {
    "paddleocr": {
      "engine": "PaddleOCR",
      "text": "Extracted text here...",
      "confidence": 0.9234,
      "lines": 15,
      "processing_time": 2.34,
      "success": true
    }
  }
}

⚑ Performance Improvements (v2.1)

Singleton Pattern

Based on official PaddleOCR recommendation, engines are initialized once and reused for all subsequent images:

Before (v1):

  • Each image: Initialize engine β†’ Process β†’ Destroy
  • 100 images: 100 initializations (very slow!)

After (v2.1):

  • First image: Initialize engine β†’ Process
  • Next 99 images: Process only (10-100x faster!)

GPU Auto-Detection

Automatically detects and uses CUDA if available:

# No configuration needed - just works!
python3 ocr_tool.py --engine paddleocr --input photo.jpg
# Output: βœ“ GPU detected: NVIDIA GeForce RTX 3080

Quiet Mode for Automation

Perfect for scripts and automation:

# Only show final results, no progress output
python3 ocr_tool.py --quiet --engine paddleocr --input photo.jpg --output results.json

🀝 Contributing

Contributions welcome! Please feel free to submit issues or pull requests.

πŸ“„ License

MIT License

πŸ™ Acknowledgments

About

Advanced Python OCR tool using PaddleOCR, EasyOCR, and Surya for handling noisy/challenging images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors