AI-powered OCR web application for extracting, processing, and analyzing text from images and PDF documents.
VisionParse AI is a full-stack OCR (Optical Character Recognition) application built using React + FastAPI that extracts text from images and PDFs using Tesseract OCR.
The application includes smart text processing, background OCR tasks, PDF support, export functionality, and a modern responsive UI.
- 📤 Drag & Drop File Upload
- 🖼️ Image OCR Processing
- 📄 PDF Text Extraction
- ⚡ Background OCR Processing
- 📊 OCR Confidence Scores
- 🔍 Search Within Extracted Text
- 🧠 Smart Text Cleanup & Processing
- 📑 TXT / PDF / DOCX / JSON Export
- 🌙 Dark Mode UI
- 📱 Responsive Design
- 🔐 JWT Authentication
- 📋 History Dashboard
- React 18
- Vite
- Tailwind CSS
- FastAPI
- Python
- SQLAlchemy
- Tesseract OCR
- OpenCV
- pdf2image
- Poppler
- JWT Authentication
- Passlib / bcrypt
OCR/
├── frontend/
├── backend/
├── README.md
└── .gitignoregit clone https://github.com/YOUR_USERNAME/VisionParseAI.git
cd VisionParseAIcd backend
python -m venv venv
# Windows
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run backend server
uvicorn main:app --reloadBackend runs on:
http://localhost:8000API Docs:
http://localhost:8000/docscd frontend
npm install
npm run devFrontend runs on:
http://localhost:5173Example .env configuration:
TESSERACT_PATH=C:/Program Files/Tesseract-OCR/tesseract.exe
POPPLER_PATH=C:/poppler/Library/bin(Add Screenshot Here)
(Add Screenshot Here)
(Add Screenshot Here)
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/upload |
Upload document |
| GET | /api/ocr/result/{id} |
Get OCR result |
| GET | /api/history |
Document history |
| GET | /api/export/{id} |
Export extracted text |
| POST | /api/auth/login |
User login |
- File validation
- JWT authentication
- UUID-based file naming
- Secure environment variables
- Restricted CORS configuration
✅ Core OCR pipeline completed ✅ Frontend and backend integration completed ✅ Local OCR processing functional
Future improvements:
- Advanced AI text analysis
- Cloud deployment
- OCR optimization
- Enhanced UI/UX
Gokul Nath
MIT License