A Python-powered AI automation engine that transforms business documents (invoices, contracts, bank statements, reports) into structured data, insights, and interactive dashboards β automatically.
| # | Feature | Description |
|---|---|---|
| 1 | Document Upload | Upload PDF invoices, contracts, bank statements, reports |
| 2 | Automated Extraction | Extract structured data using pdfplumber + regex patterns |
| 3 | LLM Summarization | OpenAI-powered executive summaries, risk flags, action items |
| 4 | Structured JSON Output | Clean, schema-consistent JSON for every document |
| 5 | Interactive Dashboard | Streamlit dashboard with charts, risk alerts, spending trends |
| 6 | Automation Layer | Auto-save to CSV/SQLite, PDF report generation, email notifications |
| 7 | Zapier Integration | Webhook API to connect with 6000+ apps (Slack, Sheets, QuickBooks) |
Automation_project/
βββ config.py # Central configuration
βββ extractor.py # PDF text & field extraction
βββ summarizer.py # OpenAI LLM summarization
βββ pipeline.py # End-to-end processing pipeline
βββ database.py # SQLite storage layer
βββ automation.py # CSV export, PDF reports, email
βββ dashboard.py # Streamlit interactive dashboard
βββ main.py # CLI entry point
βββ generate_samples.py # Sample PDF generator for testing
βββ requirements.txt # Python dependencies
βββ .env.example # Environment variables template
βββ .gitignore
βββ samples/ # Generated sample PDFs
βββ uploads/ # Uploaded documents
βββ outputs/
βββ json/ # Structured JSON outputs
βββ csv/ # CSV exports
βββ reports/ # Generated PDF reports
cd Automation_project
pip install -r requirements.txt# Copy the template
cp .env.example .env
# Edit .env and add your OpenAI API key
# OPENAI_API_KEY=sk-your-key-hereNote: The system works without an API key β it will skip LLM analysis and only use regex extraction.
python generate_samples.pyThis creates 4 sample PDFs in samples/:
sample_invoice.pdfsample_contract.pdfsample_bank_statement.pdfsample_quarterly_report.pdf
streamlit run dashboard.pyOpen http://localhost:8501 in your browser.
# Process a single PDF
python main.py process samples/sample_invoice.pdf
# Process without LLM (regex-only)
python main.py process samples/sample_invoice.pdf --no-llm
# Batch process all PDFs in a folder
python main.py batch samples/
# Export all records to CSV
python main.py export-csv
# Generate summary PDF report
python main.py export-pdf
# View dashboard stats in terminal
python main.py stats
# Launch Streamlit dashboard
python main.py dashboard PDF Upload
β
βΌ
βββββββββββββββββββ
β PDF Extraction β pdfplumber β raw text + tables
β + Regex Fields β regex patterns β invoice #, date, amount, vendor...
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Document Type β Heuristic keyword scoring
β Detection β invoice / contract / bank_statement / report / email
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β LLM Analysis β OpenAI API (gpt-4o-mini)
β (if enabled) β β Summary, risk flags, action items, structured JSON
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Merge & Store β Regex + LLM fields β canonical output
β β β SQLite DB + JSON file
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Automation β β CSV append
β Layer β β PDF report generation
β β β Email notification (optional)
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Dashboard β Charts, risk alerts, spending trends
β (Streamlit) β Search, export, download reports
βββββββββββββββββββ
- Drag & drop PDFs
- Toggle AI analysis on/off
- View extraction results, risk badges, summaries
- Download individual PDF reports
- Total documents & total amount metrics
- Document type distribution (pie chart)
- Risk level distribution (bar chart)
- Monthly spending trend (line chart)
- Risk alerts & pending payments tables
- Search across all processed documents
- Sortable data table
- Detail view with full JSON & action items
- One-click CSV export
- Summary PDF report generation
- Download previous exports
{
"file": "sample_invoice.pdf",
"document_type": "invoice",
"invoice_number": "INV-2026-0042",
"vendor": "Acme Solutions Ltd.",
"date": "February 20, 2026",
"due_date": "March 22, 2026",
"total_amount": 11714.75,
"currency": "USD",
"tax_amount": "$917.75",
"risk_flag": "low",
"risk_reason": "Late payment fee clause present",
"summary": "Invoice from Acme Solutions for cloud hosting, API development, and support services totaling $11,714.75 due March 22, 2026.",
"action_items": [
"Schedule payment before March 22 to avoid late fees",
"Verify API integration hours against project tracker",
"Confirm SSL certificate count matches deployed domains"
],
"line_items": [
{
"description": "Cloud Hosting Services",
"quantity": "1",
"unit_price": "$2,500.00",
"amount": "$2,500.00"
},
{
"description": "API Integration Development",
"quantity": "40 hrs",
"unit_price": "$150.00",
"amount": "$6,000.00"
}
]
}Connect AutoOps AI with 6000+ apps to automate your entire workflow.
-
Start the webhook server:
python webhook_api.py
-
Expose your local server for testing:
# Install ngrok from https://ngrok.com/download ngrok http 8000 -
Configure your Zapier Zap:
- Trigger: Gmail, Dropbox, Slack, etc.
- Action: Webhooks by Zapier β POST
- URL:
https://your-ngrok-url.ngrok.io/process - Headers:
X-API-Key: your-secret-api-key-here
π§ Gmail β AutoOps β Slack
Automatically process invoice attachments and notify your team in Slack.
βοΈ Dropbox β AutoOps β Google Sheets
Watch a Dropbox folder and add processed data to a tracking spreadsheet.
π° Email β AutoOps β QuickBooks
Extract invoice data and create entries in your accounting software.
π¨ High-Risk Alert
Send SMS and create Trello cards when contracts have risk flags.
π Read the full Zapier Integration Guide β
Edit config.py or .env to customize:
| Setting | Default | Description |
|---|---|---|
OPENAI_MODEL |
gpt-4o-mini |
OpenAI model to use |
TEMPERATURE |
0.2 |
LLM creativity (lower = more deterministic) |
MAX_TOKENS |
2048 |
Max response tokens |
SMTP_HOST |
smtp.gmail.com |
Email server for notifications |
- Python 3.11+
- pdfplumber β PDF text & table extraction
- OpenAI API β LLM summarization & structuring
- Streamlit β Interactive web dashboard
- Plotly β Charts & visualizations
- SQLite β Lightweight database
- FPDF2 β PDF report generation
- pandas β Data manipulation
To enable email alerts after processing:
- Set SMTP credentials in
.env - Enable "Auto-email notification" in the dashboard sidebar
- For Gmail, use an App Password
MIT License β free for personal and commercial use.