A modular Python application that leverages Large Language Models (LLMs) like LLaMA 2 to extract structured data from PDF invoices. It features a user-friendly Streamlit web interface and supports custom prompt-based extraction, Pydantic validation, and multi-invoice parsing.
Invoice-Parser-LLM/
├── app/ # Streamlit UI components
│ ├── main.py # App launcher
│ └── layout.py # Streamlit UI layout
│
├── core/ # Core logic
│ ├── parser.py # PDF parsing & text extraction
│ ├── prompt_templates.py # LLM prompt templates
│ └── validator.py # Data validation with Pydantic
│
├── models/ # LLM integration
│ └── llama_model.py # Load & run LLaMA from Hugging Face
│
├── data/sample_invoices/ # Sample invoice PDFs
├── outputs/extracted_data/ # Parsed JSON outputs
├── tests/test_parser.py # Unit tests
│
├── requirements.txt # Dependencies
├── .env # API keys / tokens
└── README.md # This file
- 📄 PDF Invoice Upload – Upload invoices via a web interface.
- 💬 Custom Prompting – Use natural language to define extraction logic.
- 🧠 LLM Integration – Built with LLaMA 2 and Hugging Face transformers.
- 📦 Structured Output – Data returned in validated JSON format.
- ✅ Data Validation – Uses Pydantic to ensure schema correctness.
- 🧩 Modular Codebase – Scalable and easy to maintain.
git clone https://github.com/your-username/Invoice-Parser-LLM.git
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
pip install -r requirements.txt
Request Access: LLaMA-2 on Hugging Face
Login via CLI:
huggingface-cli login
python -m streamlit run app/main.py
