Professional Django REST API for extracting structured information from documents (PDF, Word, Images) using AI and natural language prompts - Simple Upload & Extract!
Production API: https://docparse.onrender.com
- π API Endpoint: https://docparse.onrender.com/api/documents/
- π Swagger UI: https://docparse.onrender.com/swagger/
- π ReDoc: https://docparse.onrender.com/redoc/
- β€οΈ Health Check: https://docparse.onrender.com/health/
- π Multi-format Support: PDF, Word (.docx), Text (.txt), and image files (JPG, PNG, etc.)
- π€ AI-Powered Extraction: Uses OpenAI GPT with natural language prompts
- β‘ Instant Results: Upload document with optional prompt and get immediate extraction
- π No Complex Workflows: Single endpoint - upload and extract in one step
- π RESTful API: Built with Django REST Framework
- π Interactive Documentation: Swagger UI and ReDoc
- π³ Docker Ready: Complete containerization support
- π OCR Support: Extract text from images using Tesseract
- Quick Start
- Simple Usage
- API Endpoint
- Example Extractions
- Documentation
- Docker Commands
- Environment Setup
- Clone and configure:
git clone <repository-url>
cd DocParse
cp .env.example .env
# Edit .env file with your OpenAI API key- Build and run:
docker-compose up --build- Access the application:
- π API: http://localhost:8000/api/documents/
- π Swagger UI: http://localhost:8000/swagger/
- π ReDoc: http://localhost:8000/redoc/
- Install dependencies:
pip install -r requirements.txt
sudo apt-get install tesseract-ocr # For OCR support- Configure environment:
cp .env.example .env
# Edit .env file with your OpenAI API key- Setup database:
python manage.py makemigrations
python manage.py migrate- Run server:
python manage.py runserverWith specific prompt:
curl -X POST https://docparse.onrender.com/api/documents/ \
-F "file=@your_document.pdf" \
-F "prompt=What is the total amount?"Without prompt (extracts everything):
curl -X POST https://docparse.onrender.com/api/documents/ \
-F "file=@your_document.pdf"Upload Document & Extract Information:
With specific prompt:
curl -X POST http://localhost:8000/api/documents/ \
-F "file=@your_document.pdf" \
-F "prompt=What is the total amount?"Without prompt (extracts everything):
curl -X POST http://localhost:8000/api/documents/ \
-F "file=@your_document.pdf"Example Response:
{
"seller": {
"company_name": "BrightLine Traders Ltd",
"address": "78 Innovation Road, Kigali, Rwanda"
},
"buyer": {
"company_name": "TechNova Solutions",
"address": "902 Enterprise Drive, Kigali, Rwanda"
},
"invoice_number": "PRO-2024-014",
"date": "2024-02-10",
"subtotal": "RWF 555,000",
"tax": "18%",
"total": "RWF 654,900"
}| Method | Endpoint | Description |
|---|---|---|
| POST | /api/documents/ |
Upload document & extract with optional prompt |
| Parameter | Type | Required | Description |
|---|---|---|---|
file |
File | β | Document file (PDF, DOCX, TXT, JPG, PNG, etc.) |
prompt |
String | β | Natural language question about the document |
document_type |
String | β | invoice, proforma, receipt, other |
# Extract total amount only
curl -X POST http://localhost:8000/api/documents/ \
-F "file=@invoice.pdf" \
-F "prompt=What is the total amount?"
# Extract names only
curl -X POST http://localhost:8000/api/documents/ \
-F "file=@invoice.pdf" \
-F "prompt=Get me the names of people in this document"
# Extract invoice date
curl -X POST http://localhost:8000/api/documents/ \
-F "file=@invoice.pdf" \
-F "prompt=What is the invoice date?"# Extract everything (no prompt)
curl -X POST http://localhost:8000/api/documents/ \
-F "file=@invoice.pdf"# Extract contact info
curl -X POST http://localhost:8000/api/documents/ \
-F "file=@business_card.jpg" \
-F "prompt=Extract name, phone number, email, and company"- Production Swagger: https://docparse.onrender.com/swagger/ - Test APIs directly
- Production ReDoc: https://docparse.onrender.com/redoc/ - Clean documentation
- Swagger UI: http://localhost:8000/swagger/ - Test APIs directly in browser
- ReDoc: http://localhost:8000/redoc/ - Clean, readable documentation
- OpenAPI JSON: http://localhost:8000/swagger.json - Raw API specification
- Postman Collection: Import
DocParse_API.postman_collection.json- Production URL:
https://docparse.onrender.com - Local URL:
http://localhost:8000
- Production URL:
# Start services
docker-compose up -d
# View logs
docker-compose logs -f web
# Stop services
docker-compose down
# Clean restart (if having issues)
docker-compose down -v
docker system prune -f
docker-compose up --buildCreate a .env file in the project root (copy from .env.example):
# Django Configuration
SECRET_KEY=your-secret-key-here-change-in-production
DEBUG=False
ALLOWED_HOSTS=localhost,127.0.0.1
# OpenAI Configuration (required for AI extraction)
OPENAI_API_KEY=sk-proj-your-openai-api-key-here
OPENAI_MODEL=gpt-3.5-turboInput Files:
- PDF:
.pdf - Word:
.docx,.doc - Text:
.txt - Images:
.jpg,.jpeg,.png,.gif,.bmp,.tiff
import requests
# Production API
url = 'https://docparse.onrender.com/api/documents/'
# Upload and extract with specific prompt
with open('invoice.pdf', 'rb') as f:
response = requests.post(
url,
files={'file': f},
data={'prompt': 'What is the total amount?'}
)
result = response.json()
print(f"Extracted data: {result}")
# Upload and extract everything (no prompt)
with open('invoice.pdf', 'rb') as f:
response = requests.post(
url,
files={'file': f}
)
result = response.json()
print(f"All data: {result}")const FormData = require('form-data');
const fs = require('fs');
const form = new FormData();
form.append('file', fs.createReadStream('invoice.pdf'));
form.append('prompt', 'What is the total amount?');
fetch('https://docparse.onrender.com/api/documents/', {
method: 'POST',
body: form
})
.then(response => response.json())
.then(data => console.log(data));Docker ContainerConfig error:
# Clean up Docker containers and volumes
docker-compose down -v
docker system prune -f
docker-compose up --buildOpenAI API errors:
- Check API key validity and quota
- Ensure OPENAI_API_KEY is set in .env file
- Verify you have credits in your OpenAI account
- Use specific prompts for faster, more accurate results
- Optimize image quality for better OCR results
- Use PDF format when possible for best accuracy
With Specific Prompts:
- Any information you ask for in natural language
- Financial data (amounts, totals, line items)
- Contact information (names, emails, phones)
- Dates and reference numbers
- Custom business data
Without Prompt (Extracts All):
- All dates found in document
- All monetary amounts
- All person names
- All company names
- All email addresses
- π§ Email: manziosee3@gmail.com
- π Live API: https://docparse.onrender.com
- π Documentation: https://docparse.onrender.com/swagger/
Get inspired with these example prompts:
"What is the total amount?""Get me the names of people in this document""What is the invoice date?""Extract all contact information""What are the line items?""Find all monetary amounts""Who are the companies mentioned?""What are the key dates?""Extract email addresses"
Or leave prompt empty to extract everything automatically!
Simple and powerful - just upload your document and get instant results!