FastAPI OCR service powered by PaddleOCR for text extraction and visualization from images.
Open Source OCR Service for Education & Research
Created by RogaTekno - Licensed under MIT License.
- Features
- Performance
- Quick Start
- API Endpoints
- Supported Languages
- Use Cases
- Project Structure
- Configuration
- Development
- Documentation
- Requirements
- License & Usage
- Contributing
- Support
- Text Extraction - Extract text from images with high accuracy using PaddleOCR
- Bounding Box Detection - Get precise coordinates of detected text regions
- Visualization - Generate annotated images with bounding boxes and confidence scores
- Multi-language Support - Support for 10+ languages including English and Indonesian
- Confidence Scoring - Get confidence scores for each detected text region
- RESTful API - Clean, well-documented API endpoints
- Interactive Documentation - Auto-generated Swagger UI and ReDoc
- FastAPI Framework - Modern, high-performance web framework with async support
- Repository Pattern - Clean architecture for maintainability and testability
- Type Safety - Full type hints with Python 3.12+
- Comprehensive Testing - Test suite with coverage reporting
- Structured Logging - Request tracking and debugging with Loguru
- Error Handling - Consistent error responses with detailed messages
- Hot Reload - Auto-reload during development
- OpenAPI Specification - Standard API documentation
- Modular Design - Easy to extend and customize
- Well Documented - Comprehensive guides and examples
The modular architecture makes it easy to:
- Add new document type extractors (KTP, KK, BPJS, etc.)
- Implement custom field mappers
- Integrate additional OCR engines
- Add caching layers
- Create batch processing endpoints
- Extend with new features
Typical performance metrics on standard hardware:
| Operation | Average Time | Min | Max | Notes |
|---|---|---|---|---|
| First Request (Model Load) | 8-15s | 5s | 30s | One-time, model download |
| Text Extraction (small) | 200-500ms | 100ms | 1s | < 1MB image, < 1000px |
| Text Extraction (large) | 1-3s | 500ms | 5s | > 5MB image, > 2000px |
| Visualization | 300-800ms | 200ms | 2s | Includes extraction |
Minimum (Development):
- CPU: Dual-core 2.0GHz
- RAM: 4GB
- Disk: 2GB (for models)
Recommended (Production):
- CPU: Quad-core 2.5GHz+
- RAM: 8GB+
- Disk: 5GB SSD
- GPU: NVIDIA GPU with CUDA (optional, 2-3x faster)
For Faster Processing:
- Image Size: Resize images to 1000-2000px width
- Format: Use JPEG for photos, PNG for documents
- Batch Processing: Process multiple images concurrently
- GPU: Enable GPU acceleration if available
- Caching: Cache results for repeated documents
For Better Accuracy:
- Image Quality: Use high-resolution scans (300 DPI+)
- Lighting: Ensure good lighting and contrast
- Language: Specify correct language code
- Preprocessing: Deskew and denoise images
- Format: PNG preserves quality better than JPEG
Single Instance:
- Handles 10-20 concurrent requests
- ~5-10 requests/second on CPU
- ~20-50 requests/second on GPU
Scaling Options:
- Horizontal: Deploy multiple instances behind load balancer
- Queue: Use Celery/Redis for async processing
- Caching: Redis cache for repeated requests
- CDN: Serve visualized images from CDN
# Clone or navigate to project
cd snaptext
# Create virtual environment
python -m venv .venv
.venv\Scripts\activate
# Install PaddlePaddle 2.6.2 (stable version)
pip install "paddlepaddle>=2.6.0,<3.0.0"
# Install PaddleOCR 2.10.0 (compatible)
pip install "paddleocr>=2.7.0,<3.0.0"
# Install remaining dependencies
pip install -r requirements.txt
# Create configuration
copy .env.example .env
# Start server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000# Clone repository
git clone <repository-url>
cd snaptext
# Create virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Create configuration
cp .env.example .env
# Run server
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000# Health check
curl http://localhost:8000/api/v1/health
# Extract text from image
curl -X POST http://localhost:8000/api/v1/ocr/extract \
-F "file=@document.jpg" \
-F "lang=en"
# Get visualization with bounding boxes
curl -X POST http://localhost:8000/api/v1/ocr/visualize \
-F "file=@document.jpg" \
--output visualized.pngimport requests
API_URL = "http://localhost:8000"
# Extract text
with open("document.jpg", "rb") as f:
response = requests.post(
f"{API_URL}/api/v1/ocr/extract",
files={"file": f},
data={"lang": "en"}
)
result = response.json()
print(f"Extracted text: {result['data']['text']}")
print(f"Confidence: {result['data']['confidence']:.2%}")
print(f"Regions found: {result['data']['region_count']}")| Method | Endpoint | Description |
|---|---|---|
| GET | /api/v1/health |
Health check and service status |
| POST | /api/v1/ocr/extract |
Extract text from uploaded image |
| POST | /api/v1/ocr/visualize |
Get image with bounding boxes marked |
| GET | /api/v1/ocr/info |
Service capabilities and configuration |
See API.md for complete API documentation including:
- Request/response formats
- Error codes and handling
- Usage examples in multiple languages
- Rate limiting and constraints
| Code | Language | Model Size | Accuracy |
|---|---|---|---|
en |
English | ~4MB | High |
id |
Indonesian | ~4MB | High |
ch |
Chinese | ~10MB | Very High |
japan |
Japanese | ~8MB | High |
korean |
Korean | ~8MB | High |
vi |
Vietnamese | ~4MB | Medium |
fr |
French | ~4MB | Medium |
german |
German | ~4MB | Medium |
it |
Italian | ~4MB | Medium |
portuguese |
Portuguese | ~4MB | Medium |
spanish |
Spanish | ~4MB | Medium |
Generic Text Extraction:
- Extract raw text from any image containing text
- Get bounding boxes and confidence scores
- Visualize detected text regions
- Support for 10+ languages
- Ideal for documents with varying layouts
- Teaching OCR: Demonstrate text extraction concepts
- FastAPI Learning: Example of REST API best practices
- Repository Pattern: Study clean architecture
- Course Projects: Base for student assignments
- Document Analysis: Extract text from scanned documents
- Benchmarking: Compare OCR techniques
- Data Collection: Gather text from images
- Preprocessing: Prepare data for NLP pipelines
Note: SnapText currently provides generic OCR only. The following use cases require additional development (see Contributing):
- Document Digitization (KTP, KK, BPJS, etc.) - Requires field extraction implementation
- Invoice Processing - Requires structured data extraction
- Form Automation - Requires template-based parsing
- Archive Search - Requires integration with search systems
These are planned features that need community contributions to become available.
snaptext/
├── app/ # Application source
│ ├── __init__.py
│ ├── main.py # FastAPI application factory
│ ├── api/ # API Layer
│ │ └── v1/
│ │ └── endpoints/ # Route handlers
│ │ ├── ocr.py # OCR endpoints
│ │ └── health.py # Health check
│ ├── core/ # Core infrastructure
│ │ ├── config.py # Settings management
│ │ ├── exceptions.py # Custom exceptions
│ │ └── logging.py # Logging configuration
│ ├── models/ # Data models
│ │ └── schemas.py # Pydantic schemas
│ ├── repositories/ # Repository Layer
│ │ ├── base.py # Repository interface
│ │ └── ocr_repository.py # PaddleOCR implementation
│ ├── services/ # Service Layer
│ │ └── ocr_service.py # Business logic
│ └── utils/ # Utilities
├── docs/ # Documentation
│ ├── README.md
│ ├── ARCHITECTURE.md # System design
│ ├── SETUP.md # Installation guide
│ ├── API.md # API reference
│ ├── TROUBLESHOOTING.md # Common issues
│ └── DEVELOPMENT.md # Dev workflow
├── tests/ # Test suite
│ ├── conftest.py # Pytest fixtures
│ └── test_api/ # Endpoint tests
├── .github/ # GitHub configurations
│ ├── ISSUE_TEMPLATE/ # Issue templates
│ ├── PULL_REQUEST_TEMPLATE.md # PR template
│ └── CODE_OF_CONDUCT.md # Community guidelines
├── .env.example # Configuration template
├── LICENSE # MIT License
├── README.md # This file
├── CONTRIBUTING.md # Contribution guide
├── CONTRIBUTORS.md # Contributor list
├── requirements.txt # Python dependencies
└── pyproject.toml # Project metadata
Create a .env file from .env.example:
# Application
APP_NAME=SnapText
APP_VERSION=1.0.0
ENVIRONMENT=development
DEBUG=true
# Server
HOST=0.0.0.0
PORT=8000
WORKERS=1
# API
API_V1_PREFIX=/api/v1
# CORS
CORS_ORIGINS=["http://localhost:3000", "http://localhost:8000"]
CORS_ALLOW_CREDENTIALS=true
CORS_ALLOW_METHODS=["*"]
CORS_ALLOW_HEADERS=["*"]
# PaddleOCR
PADDLEOCR_LANG=en
PADDLEOCR_USE_ANGLE_CLS=true
# Upload
MAX_UPLOAD_SIZE_MB=10
ALLOWED_EXTENSIONS=["jpg","jpeg","png","bmp","webp"]
# Logging
LOG_LEVEL=INFO
LOG_FORMAT=text
LOG_FILE_PATH=logs/app.log
LOG_ROTATION="500 MB"
LOG_RETENTION="10 days"
# Performance
REQUEST_TIMEOUT_SECONDS=60
ASYNC_PROCESSING_ENABLED=trueMAX_UPLOAD_SIZE_MB: Maximum file size in MB (default: 10)
- Increase for high-resolution images
- Decrease for memory-constrained environments
PADDLEOCR_LANG: Default language (default: "en")
- Change based on your primary use case
- Can be overridden per request
LOG_LEVEL: Logging verbosity
DEBUG: Detailed logging for developmentINFO: Normal operation loggingWARNING: Only warnings and errorsERROR: Only errors
# Run all tests
pytest
# With coverage report
pytest --cov=app --cov-report=html
# Run specific test file
pytest tests/test_api/test_ocr.py -v
# Run with debug output
pytest -v -s# Format code
black app/ tests/
# Lint and auto-fix
ruff check app/ tests/ --fix
# Type checking
mypy app/
# Run all checks
black app/ && ruff check app/ --fix && mypy app/ && pytest --cov=appSee DEVELOPMENT.md for:
- Development workflow
- Coding standards
- Testing strategies
- Debugging tips
- ARCHITECTURE.md - System design, patterns, and data flow
- SETUP.md - Detailed installation and configuration
- API.md - Complete API reference with examples
- TROUBLESHOOTING.md - Common issues and solutions
- DEVELOPMENT.md - Development workflow and guidelines
- docs/README.md - Documentation index
When running, access:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- OpenAPI JSON: http://localhost:8000/api/v1/openapi.json
- Python: 3.12 or 3.13
- RAM: 4GB minimum (8GB recommended)
- Disk: ~2GB for PaddleOCR models
- OS: Windows 10+, Ubuntu 20.04+, macOS 12+
- PaddlePaddle: 2.6.2 (NOT 3.x - compatibility issues)
- PaddleOCR: 2.10.0 (compatible with PaddlePaddle 2.6.x)
- FastAPI: 0.115.0+
- Python: 3.12+
Compatible:
- Windows 10/11
- Ubuntu 20.04/22.04/24.04
- macOS 12+ (Monterey or later)
- Python 3.12, 3.13
Not Compatible:
- Python < 3.12
- PaddlePaddle 3.x
- Windows 8 or older
- macOS 11 or older
SnapText follows the Repository Pattern for clean separation of concerns:
┌─────────────────────────────┐
│ API Layer (FastAPI) │ HTTP handlers
│ - /extract (generic OCR) │
│ - /visualize (bounding boxes) │
│ - /health (service status) │
│ - /info (capabilities) │
└──────────┬──────────────────┘
│
▼
┌─────────────────────────────┐
│ Service Layer │ Business logic
│ - Validation │ - Orchestration
│ - Coordination │ - Error handling
│ - File processing │
└──────────┬──────────────────┘
│
▼
┌─────────────────────────────┐
│ Repository Layer │ Data access
│ - PaddleOCR wrapper │ - Image processing
│ - Format conversion │ - Result parsing
│ - Model lifecycle │
└──────────┬──────────────────┘
│
▼
┌─────────────────────────────┐
│ PaddleOCR Engine │ Text extraction
│ - Detection models │
│ - Recognition models │
│ - Multi-language support │
└─────────────────────────────┘
What This Architecture Provides:
- Text Extraction - High accuracy OCR from images
- Bounding Box Detection - Precise text region coordinates
- Confidence Scoring - Quality metrics for each detection
- Multi-language Support - 10+ languages with optimized models
- Visualization - Annotated images with detected regions
- RESTful API - Clean, documented endpoints
- Extensibility - Easy to add custom processors and mappers
Benefits:
- Easy testing (mock repositories)
- Swappable OCR engines
- Clear boundaries between layers
- Maintainable codebase
- Simple to extend with new functionality
See ARCHITECTURE.md for details.
Issue:
ConvertPirAttribute2RuntimeAttribute not support
[pir::ArrayAttribute<pir::DoubleAttribute>]
Cause: PaddlePaddle 3.x uses PIR (new IR) not supported by PaddleOCR.
Solution: Use PaddlePaddle 2.6.2 with PaddleOCR 2.10.0 (already configured in requirements.txt)
Verification:
python -c "import paddle; import paddleocr; print(f'PaddlePaddle: {paddle.__version__}'); print(f'PaddleOCR: {paddleocr.__version__}')"Expected output:
PaddlePaddle: 2.6.2
PaddleOCR: 2.10.0
Issue: First request takes 10-30 seconds
Cause: PaddleOCR downloads models on first use
Solution: Pre-download models:
python -c "from paddleocr import PaddleOCR; ocr = PaddleOCR(lang='en'); print('Models downloaded')"See TROUBLESHOOTING.md for more issues.
SnapText is open-source under the MIT License - see LICENSE file for details.
You are FREE to:
- Use for personal or commercial projects
- Modify and customize the code
- Distribute and share your modifications
- Use for educational purposes
- Use for research and academic work
SnapText is designed primarily for:
-
Educational Purposes
- Learning OCR technology
- Understanding FastAPI framework
- Studying REST API design
- Teaching materials
-
Research Purposes
- Academic research projects
- Experimentation with OCR
- Benchmarking approaches
- Publishing papers
-
Development
- Building custom solutions
- Integration into applications
- Open-source contributions
While freely available, please:
- Respect Privacy: Comply with data protection laws (GDPR, PDPA, etc.)
- Give Attribution: Credit "RogaTekno" when appropriate
- Contribute Back: Share improvements with the community
- Use Responsibly: Follow ethical guidelines for AI/OCR usage
We welcome contributions! Please see CONTRIBUTING.md for guidelines.
SnapText has a solid foundation with generic OCR capabilities. We're looking for contributors to help expand the project:
Document Processors:
- Indonesian ID Cards (KTP) - Extract NIK, name, address, etc.
- Family Cards (KK) - Parse family member information
- BPJS Cards - Health and employment insurance data
- Bank Statements - Account numbers, transactions, balances
- Custom document types - Build specialized extractors
Infrastructure Improvements:
- Docker containerization for easy deployment
- Batch processing for multiple images
- Redis caching for repeated requests
- Async processing with Celery/Redis queue
- Mobile-optimized API endpoints
Quality & Performance:
- Additional language models
- Performance optimizations
- Enhanced test coverage
- Documentation improvements
- Bug fixes and refinements
See CONTRIBUTING.md for detailed guidelines on how to get started!
How to Contribute:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a Pull Request
See CONTRIBUTING.md for detailed guidelines.
- Documentation: See docs/ for comprehensive guides
- Issues: Report bugs via GitHub Issues
- Discussions: Ask questions in GitHub Discussions
- API Docs: http://localhost:8000/docs (when running)
When reporting issues, please include:
- Python version
- PaddlePaddle/PaddleOCR versions
- OS and environment
- Error messages or logs
- Steps to reproduce
- Contributors: See CONTRIBUTORS.md
- Code of Conduct: See CODE_OF_CONDUCT.md
Built with excellent open-source tools:
- FastAPI - Modern web framework
- PaddleOCR - OCR engine
- PaddlePaddle - Deep learning framework
- Pydantic - Data validation
- Uvicorn - ASGI server
Thanks to all the people who contribute to SnapText!
Made with contrib.rocks.
If you find SnapText useful:
- Star this repository on GitHub
- Share with fellow students/researchers
- Spread the word about this free resource
- Contribute improvements back to community
- Author: amubhya
- Organization: RogaTekno
- Project: SnapText
- Year: 2026
- License: MIT License
Developed by RogaTekno for the global education and research community.