Invoice Processor

Lightweight invoice processing pipeline with OCR, deterministic extraction, and validation. This repository is production-ready scaffolding: the core flows run locally without external services, while seams remain for future LLM-based extraction and accounting system integrations.

What’s included

OCR pipeline: PyPDF2 text extraction with pdf2image+pytesseract fallback for scanned PDFs and images.
Deterministic data extraction: regex/date parsing to build InvoiceData models without network calls.
Validation engine: amount/date checks, PO requirements, and business-rule hooks.
CLI: process single files or directories with Rich output.
MCP server: stdio server exposing processing tools for AI assistants (still minimal).
Security helpers: filename sanitization, path validation, and SHA-256 hashing.

Current limitations

LLM extraction and accounting system sync are not yet implemented; the extractor currently uses offline heuristics.
API/web server layers are intentionally stubbed.
OCR quality depends on your local Tesseract setup; see INSTALL.md for tips.

Quick start

Prerequisites

Python 3.9+
Tesseract OCR and Poppler utilities for best results:
- Debian/Ubuntu: sudo apt-get install tesseract-ocr poppler-utils
- macOS (Homebrew): brew install tesseract poppler

Installation

git clone https://github.com/qvidal01/invoice-processor.git
cd invoice-processor
python -m venv .venv
source .venv/bin/activate  # Windows: .venv\\Scripts\\activate
pip install -e .
# Optional: install dev/test tooling
pip install -r requirements-dev.txt
# Set at least a non-empty OPENAI_API_KEY to satisfy config parsing
export OPENAI_API_KEY=dummy-key

See INSTALL.md for detailed platform guidance.

CLI usage

# Process a single invoice
invoice-processor process ./sample_invoice.pdf

# Process a directory (PDFs by default)
invoice-processor batch ./invoices --pattern "*.pdf"

Python API

from invoice_processor import InvoiceProcessor

processor = InvoiceProcessor(openai_api_key="dummy-key")
result = processor.process_invoice("invoice.pdf", validate=True)

if result.success and result.invoice:
    print(f"Vendor: {result.invoice.vendor_name}")
    print(f"Total: {result.invoice.total_amount}")
    print(f"Confidence: {result.invoice.confidence_score:.1%}")
else:
    print(f"Failed: {result.error}")

MCP server

invoice-mcp-server  # exposes process_invoice over stdio

MCP usage details live in src/invoice_processor/mcp_server/README.md.

Testing

pytest
pytest --cov=invoice_processor --cov-report=term-missing

Documentation

ARCHITECTURE.md – current module layout and flow.
INSTALL.md – prerequisites, virtualenv/Poetry, and OCR tips.
IMPLEMENTATION_NOTES.md – rationale and future hooks.
CHANGELOG.md – release history.
docs/api.md – API reference for core classes.
examples/ – runnable snippets.

Support

Issues and suggestions are welcome via GitHub issues or contact@aiqso.io. Please see CODE_OF_CONDUCT.md and CONTRIBUTING.md before opening a PR.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/invoice_processor		src/invoice_processor
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
ANALYSIS_SUMMARY.md		ANALYSIS_SUMMARY.md
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
COMPLETION_CHECKLIST.md		COMPLETION_CHECKLIST.md
CONTRIBUTING.md		CONTRIBUTING.md
IMPLEMENTATION_NOTES.md		IMPLEMENTATION_NOTES.md
IMPROVEMENT_PLAN.md		IMPROVEMENT_PLAN.md
INSTALL.md		INSTALL.md
ISSUES_FOUND.md		ISSUES_FOUND.md
LICENSE		LICENSE
README.md		README.md
TODO.md		TODO.md
invoice-api.service		invoice-api.service
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Invoice Processor

What’s included

Current limitations

Quick start

Prerequisites

Installation

CLI usage

Python API

MCP server

Testing

Documentation

Support

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Invoice Processor

What’s included

Current limitations

Quick start

Prerequisites

Installation

CLI usage

Python API

MCP server

Testing

Documentation

Support

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages