DocParse - AI-Powered Document Parser API

Professional Django REST API for extracting structured information from documents (PDF, Word, Images) using AI and natural language prompts - Simple Upload & Extract!

🚀 Live Demo

Production API: https://docparse.onrender.com

🌐 API Endpoint: https://docparse.onrender.com/api/documents/
📖 Swagger UI: https://docparse.onrender.com/swagger/
📚 ReDoc: https://docparse.onrender.com/redoc/
❤️ Health Check: https://docparse.onrender.com/health/

🚀 Features

📄 Multi-format Support: PDF, Word (.docx), Text (.txt), and image files (JPG, PNG, etc.)
🤖 AI-Powered Extraction: Uses OpenAI GPT with natural language prompts
⚡ Instant Results: Upload document with optional prompt and get immediate extraction
📋 No Complex Workflows: Single endpoint - upload and extract in one step
🔄 RESTful API: Built with Django REST Framework
📚 Interactive Documentation: Swagger UI and ReDoc
🐳 Docker Ready: Complete containerization support
🔍 OCR Support: Extract text from images using Tesseract

🚀 Quick Start

Using Docker (Recommended)

Clone and configure:

git clone <repository-url>
cd DocParse
cp .env.example .env
# Edit .env file with your OpenAI API key

Build and run:

docker-compose up --build

Access the application:

🌐 API: http://localhost:8000/api/documents/
📖 Swagger UI: http://localhost:8000/swagger/
📚 ReDoc: http://localhost:8000/redoc/

Manual Installation

Install dependencies:

pip install -r requirements.txt
sudo apt-get install tesseract-ocr  # For OCR support

Configure environment:

cp .env.example .env
# Edit .env file with your OpenAI API key

Setup database:

python manage.py makemigrations
python manage.py migrate

Run server:

python manage.py runserver

⚡ Simple Usage

Production API (Live)

With specific prompt:

curl -X POST https://docparse.onrender.com/api/documents/ \
  -F "file=@your_document.pdf" \
  -F "prompt=What is the total amount?"

Without prompt (extracts everything):

curl -X POST https://docparse.onrender.com/api/documents/ \
  -F "file=@your_document.pdf"

Local Development

Upload Document & Extract Information:

With specific prompt:

curl -X POST http://localhost:8000/api/documents/ \
  -F "file=@your_document.pdf" \
  -F "prompt=What is the total amount?"

Without prompt (extracts everything):

curl -X POST http://localhost:8000/api/documents/ \
  -F "file=@your_document.pdf"

Example Response:

{
  "seller": {
    "company_name": "BrightLine Traders Ltd",
    "address": "78 Innovation Road, Kigali, Rwanda"
  },
  "buyer": {
    "company_name": "TechNova Solutions",
    "address": "902 Enterprise Drive, Kigali, Rwanda"
  },
  "invoice_number": "PRO-2024-014",
  "date": "2024-02-10",
  "subtotal": "RWF 555,000",
  "tax": "18%",
  "total": "RWF 654,900"
}

🔗 API Endpoint

Method	Endpoint	Description
POST	`/api/documents/`	Upload document & extract with optional prompt

Request Parameters

Parameter	Type	Required	Description
`file`	File	✅	Document file (PDF, DOCX, TXT, JPG, PNG, etc.)
`prompt`	String	❌	Natural language question about the document
`document_type`	String	❌	`invoice`, `proforma`, `receipt`, `other`

💡 Example Extractions

Specific Information

# Extract total amount only
curl -X POST http://localhost:8000/api/documents/ \
  -F "file=@invoice.pdf" \
  -F "prompt=What is the total amount?"

# Extract names only
curl -X POST http://localhost:8000/api/documents/ \
  -F "file=@invoice.pdf" \
  -F "prompt=Get me the names of people in this document"

# Extract invoice date
curl -X POST http://localhost:8000/api/documents/ \
  -F "file=@invoice.pdf" \
  -F "prompt=What is the invoice date?"

All Information

# Extract everything (no prompt)
curl -X POST http://localhost:8000/api/documents/ \
  -F "file=@invoice.pdf"

Business Cards

# Extract contact info
curl -X POST http://localhost:8000/api/documents/ \
  -F "file=@business_card.jpg" \
  -F "prompt=Extract name, phone number, email, and company"

📚 Documentation

Live API Documentation

Production Swagger: https://docparse.onrender.com/swagger/ - Test APIs directly
Production ReDoc: https://docparse.onrender.com/redoc/ - Clean documentation

Local API Documentation

Swagger UI: http://localhost:8000/swagger/ - Test APIs directly in browser
ReDoc: http://localhost:8000/redoc/ - Clean, readable documentation
OpenAPI JSON: http://localhost:8000/swagger.json - Raw API specification

Testing Tools

Postman Collection: Import DocParse_API.postman_collection.json
- Production URL: https://docparse.onrender.com
- Local URL: http://localhost:8000

🐳 Docker Commands

# Start services
docker-compose up -d

# View logs
docker-compose logs -f web

# Stop services
docker-compose down

# Clean restart (if having issues)
docker-compose down -v
docker system prune -f
docker-compose up --build

⚙️ Environment Setup

Required Environment Variables

Create a .env file in the project root (copy from .env.example):

# Django Configuration
SECRET_KEY=your-secret-key-here-change-in-production
DEBUG=False
ALLOWED_HOSTS=localhost,127.0.0.1

# OpenAI Configuration (required for AI extraction)
OPENAI_API_KEY=sk-proj-your-openai-api-key-here
OPENAI_MODEL=gpt-3.5-turbo

Supported File Formats

Input Files:

PDF: .pdf
Word: .docx, .doc
Text: .txt
Images: .jpg, .jpeg, .png, .gif, .bmp, .tiff

🔧 Advanced Usage

Python Integration

import requests

# Production API
url = 'https://docparse.onrender.com/api/documents/'

# Upload and extract with specific prompt
with open('invoice.pdf', 'rb') as f:
    response = requests.post(
        url,
        files={'file': f},
        data={'prompt': 'What is the total amount?'}
    )

result = response.json()
print(f"Extracted data: {result}")

# Upload and extract everything (no prompt)
with open('invoice.pdf', 'rb') as f:
    response = requests.post(
        url,
        files={'file': f}
    )

result = response.json()
print(f"All data: {result}")

JavaScript/Node.js Integration

const FormData = require('form-data');
const fs = require('fs');

const form = new FormData();
form.append('file', fs.createReadStream('invoice.pdf'));
form.append('prompt', 'What is the total amount?');

fetch('https://docparse.onrender.com/api/documents/', {
    method: 'POST',
    body: form
})
.then(response => response.json())
.then(data => console.log(data));

🚨 Troubleshooting

Common Issues

Docker ContainerConfig error:

# Clean up Docker containers and volumes
docker-compose down -v
docker system prune -f
docker-compose up --build

OpenAI API errors:

Check API key validity and quota
Ensure OPENAI_API_KEY is set in .env file
Verify you have credits in your OpenAI account

Performance Tips

Use specific prompts for faster, more accurate results
Optimize image quality for better OCR results
Use PDF format when possible for best accuracy

📊 What DocParse Can Extract

With Specific Prompts:

Any information you ask for in natural language
Financial data (amounts, totals, line items)
Contact information (names, emails, phones)
Dates and reference numbers
Custom business data

Without Prompt (Extracts All):

All dates found in document
All monetary amounts
All person names
All company names
All email addresses

📞 Support

📧 Email: manziosee3@gmail.com
🌐 Live API: https://docparse.onrender.com
📖 Documentation: https://docparse.onrender.com/swagger/

🎯 Example Prompts

Get inspired with these example prompts:

"What is the total amount?"
"Get me the names of people in this document"
"What is the invoice date?"
"Extract all contact information"
"What are the line items?"
"Find all monetary amounts"
"Who are the companies mentioned?"
"What are the key dates?"
"Extract email addresses"

Or leave prompt empty to extract everything automatically!

Simple and powerful - just upload your document and get instant results!

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
docparse_project		docparse_project
documents		documents
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
DocParse_API.postman_collection.json		DocParse_API.postman_collection.json
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
RENDER_DEPLOY.md		RENDER_DEPLOY.md
SHARING_GUIDE.md		SHARING_GUIDE.md
build.sh		build.sh
docker-compose.yml		docker-compose.yml
entrypoint.sh		entrypoint.sh
manage.py		manage.py
render.yaml		render.yaml
requirements.txt		requirements.txt
test.txt		test.txt

Folders and files

Latest commit

History

Repository files navigation

DocParse - AI-Powered Document Parser API

🚀 Live Demo

🚀 Features

📋 Table of Contents

🚀 Quick Start

Using Docker (Recommended)

Manual Installation

⚡ Simple Usage

Production API (Live)

Local Development

🔗 API Endpoint

Request Parameters

💡 Example Extractions

Specific Information

All Information

Business Cards

📚 Documentation

Live API Documentation

Local API Documentation

Testing Tools

🐳 Docker Commands

⚙️ Environment Setup

Required Environment Variables

Supported File Formats

🔧 Advanced Usage

Python Integration

JavaScript/Node.js Integration

🚨 Troubleshooting

Common Issues

Performance Tips

📊 What DocParse Can Extract

📞 Support

🎯 Example Prompts

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages