# Invoice Payments Workflow

<a href="https://colab.research.google.com/github/run-llama/llamacloud-demo/blob/main/examples/document_workflows/invoice_payments/invoice_payments.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This tutorial shows you how to create an agentic workflow that can parse an invoice from a vendor, verify that it aligns with the terms of the contract (pulled from a knowledge base of vendor contracts), and generate a payment recommendation report.

LLMs and agents can help automate this workflow which traditionally takes a lot of manual work:
1. Invoice processing with LlamaParse allows you to extract data from even very complicated invoices into a structured schema.
2. There's automatic grounding in a knowledge base (vendor contracts) without human annotations
3. The optimization decisions are automatically surfaced to the user.

![](invoice_flow.png)

In this example, assume that the company is LlamaCo. You have vendor agreements with a few different companies, from office supplies to marketing to IT. You are generating a payment recommendation plan for an office supplies invoice.

In [None]:
!pip install llama-index llama-index-indices-llama-cloud llama-cloud llama-parse 

In [1]:
import nest_asyncio

nest_asyncio.apply()

## Setup

We setup both the vendor agreements index and also the parser for the invoice.

### Index the Vendor Agreements

We first setup the index and retriever to index the vendor agreements. We will be doing this in LlamaCloud, which allows us to not only return chunks of the document, but return the entire document.

Make sure to upload the documents in `vendor_agreements` to [LlamaCloud](https://cloud.llamaindex.ai/).

In [62]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

index = LlamaCloudIndex(
  name="vendor_agreements", 
  project_name="llamacloud_demo",
  organization_id="cdcb3478-1348-492e-8aa0-25f47d1a3902",
  # api_key="llx-..."
)

retriever = index.as_retriever(
    retrieval_mode="files_via_metadata",
    files_top_k=1
)

### Setup Parser

Define LlamaParse which will be used for processing the invoices.

In [67]:
from llama_parse import LlamaParse

# use our multimodal models for extractions
parser = LlamaParse(result_type="markdown")

### Define Invoice Schema

This schema represents the fields we want to extract.

In [68]:
from pydantic import BaseModel, Field
from typing import List, Optional
from datetime import date
from decimal import Decimal

class LineItem(BaseModel):
    description: str
    quantity: int
    unit_price: Decimal
    total: Decimal

class InvoiceOutput(BaseModel):
    # Key Identifiers
    invoice_number: str
    invoice_date: date
    po_number: str
    
    # Company Info
    vendor_name: str
    vendor_address: str
    
    # Financial Details
    line_items: List[LineItem]
    subtotal: Decimal
    tax: Optional[Decimal]
    total_amount: Decimal
    
    # Payment Info
    payment_terms: str

### Define Vendor Contract Terms Schema

This is extracted from the vendor contract and contains information on relevant payment terms.

In [69]:
class EarlyPaymentDiscount(BaseModel):
    percentage: float = Field(..., description="Discount percentage if paid early.")
    days: int = Field(..., description="Number of days within which the payment must be made to apply the discount.")

class BulkOrderDiscount(BaseModel):
    percentage: float = Field(..., description="Discount percentage for bulk orders.")
    threshold: float = Field(..., description="Subtotal threshold above which the bulk discount applies.")

class VendorContractTerms(BaseModel):
    payment_terms: str = Field(..., description="The standard payment terms (e.g., Net 30).")
    early_payment_discount: Optional[EarlyPaymentDiscount] = Field(
        None, 
        description="Optional early payment discount terms."
    )
    bulk_order_discount: Optional[BulkOrderDiscount] = Field(
        None, 
        description="Optional bulk order discount terms."
    )

### Define Payment Plan Schema 

This is the schema of the final payment plan.

In [88]:
class PaymentDueReport(BaseModel):
    """Final payments due report."""
    invoice_number: str = Field(..., description="The identifier of the invoice this report refers to.")
    original_amount_due: float = Field(..., description="The amount due without any discounts or early payment considerations.")
    early_payment_amount_due: Optional[float] = Field(None, description="The discounted amount if paid before the early payment deadline.")
    early_payment_deadline: Optional[date] = Field(None, description="The last date by which the early payment discount can be applied.")
    bulk_discount_applied: bool = Field(..., description="Indicates whether a bulk order discount has been applied.")
    recommended_action: str = Field(..., description="A recommendation for how and when to pay (e.g., 'Pay early to save 5%').")
    notes: Optional[str] = Field(None, description="Additional commentary, instructions, or considerations for the payer.")
    
    def render(self) -> str:
        """Render the payment due report as a Markdown string."""
        lines = []
        lines.append("## Invoice Payment Recommendation")
        lines.append("")
        lines.append(f"**Invoice Number:** {self.invoice_number}")
        lines.append(f"**Original Amount Due:** ${self.original_amount_due:,.2f}")
        
        if self.early_payment_amount_due is not None and self.early_payment_deadline is not None:
            lines.append("")
            lines.append("### Early Payment Option")
            lines.append(f"- **Discounted Amount:** ${self.early_payment_amount_due:,.2f}")
            lines.append(f"- **Pay By:** {self.early_payment_deadline.isoformat()}")
        
        lines.append("")
        lines.append("### Bulk Discount")
        lines.append(f"- **Applied:** {'Yes' if self.bulk_discount_applied else 'No'}")

        lines.append("")
        lines.append("### Recommendation")
        lines.append(self.recommended_action)
        
        if self.notes:
            lines.append("")
            lines.append("### Notes")
            lines.append(self.notes)

        return "\n".join(lines)

## Setup Invoice Processing Workflow

Let's define the following invoice processing workflow:
1. Extract out structured data from the invoice
2. Match data against the contract agreement
3. Generate a payment plan

In [89]:
from llama_index.core.workflow import (
    Event,
    StartEvent,
    StopEvent,
    Context,
    Workflow,
    step,
)
from llama_index.core.llms import LLM
from typing import Optional
from pydantic import BaseModel
from llama_index.core.schema import Document
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.prompts import ChatPromptTemplate
from llama_index.core.llms import ChatMessage, MessageRole
from llama_index.core.retrievers import BaseRetriever
from pathlib import Path
import logging
import json
import os

_logger = logging.getLogger(__name__)
_logger.setLevel(logging.INFO)


INVOICE_EXTRACT_PROMPT = """\
You are given invoice data below. \
Please extract out relevant information from the invoice into the defined schema - the schema is defined as a function call.\

{invoice_data}
"""


CONTRACT_EXTRACT_PROMPT = """\
You are given a vendor contract below. \
Please extract out the payment terms from the contract into the defined schema - the schema is defined as a function call.\

{contract_doc}
"""


PAYMENTS_DUE_SYSTEM_PROMPT = """\
You are a helpful financial assistant that takes invoice data and vendor contract terms, \
then produces a structured payments due report according to the provided schema. \
You must return all relevant fields clearly and accurately.
"""

PAYMENTS_DUE_USER_PROMPT = """\
You are given the following information:

1. Extracted Invoice Data
{invoice_output}

2. Vendor Contract Terms
{vendor_terms}

Based on the above invoice data and vendor contract terms, create a payments due report according \
to the schema defined in the function call.

"""



class InvoiceOutputEvent(Event):
    invoice_output: InvoiceOutput


class InvoiceContractEvent(Event):
    invoice_output: InvoiceOutput
    vendor_terms: VendorContractTerms


class HandleQuestionEvent(Event):
    question: str


class QuestionAnsweredEvent(Event):
    question: str
    answer: str


class CollectedAnswersEvent(Event):
    combined_answers: str


class LogEvent(Event):
    msg: str
    delta: bool = False
    # clear_previous: bool = False


class InvoicePaymentsWorkflow(Workflow):
    """Invoice Payments workflow."""

    def __init__(
        self,
        parser: LlamaParse,
        contract_retriever: BaseRetriever,
        llm: LLM | None = None,
        similarity_top_k: int = 20,
        output_dir: str = "data_out",
        **kwargs,
    ) -> None:
        """Init params."""
        super().__init__(**kwargs)

        self.parser = parser
        self.retriever = contract_retriever
        
        self.llm = llm or OpenAI(model="gpt-4o-mini")
        self.similarity_top_k = similarity_top_k

        # if not exists, create
        out_path = Path(output_dir) / "workflow_output"
        if not out_path.exists():
            out_path.mkdir(parents=True, exist_ok=True)
            os.chmod(str(out_path), 0o0777)
        self.output_dir = out_path

    @step
    async def parse_invoice(
        self, ctx: Context, ev: StartEvent
    ) -> InvoiceOutputEvent:
        # load output template file
        invoice_output_path = Path(
            f"{self.output_dir}/invoice_output.json"
        )
        if invoice_output_path.exists():
            if self._verbose:
                ctx.write_event_to_stream(LogEvent(msg=">> Loading invoice from cache"))
            invoice_output_dict = json.load(open(str(invoice_output_path), "r"))
            invoice_output = InvoiceOutput.model_validate(invoice_output_dict)
        else:
            if self._verbose:
                ctx.write_event_to_stream(LogEvent(msg=">> Parsing invoice"))
            # parse invoice
            docs = await self.parser.aload_data(ev.invoice_path)
            # extract from invoice
            prompt = ChatPromptTemplate.from_messages([
                ("user", INVOICE_EXTRACT_PROMPT)
            ])
            invoice_output = await llm.astructured_predict(
                InvoiceOutput,
                prompt,
                invoice_data="\n".join([d.get_content(metadata_mode="all") for d in docs])
            )
            if not isinstance(invoice_output, InvoiceOutput):
                raise ValueError(f"Invalid extraction from invoice: {invoice_output}")
            # save output template to file
            with open(invoice_output_path, "w") as fp:
                fp.write(invoice_output.model_dump_json())
            # json.dump(invoice_output.model_dump(), open(invoice_output_path, "w"))
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=f">> Invoice data: {invoice_output.dict()}"))

        return InvoiceOutputEvent(invoice_output=invoice_output)

    @step
    async def find_relevant_contract(
        self, ctx: Context, ev: InvoiceOutputEvent
    ) -> InvoiceContractEvent:
        """Find relevant contracts, and match the one that seems relevant."""

        query = f"Please find the relevant vendor agreement from {ev.invoice_output.vendor_name}"
        # should retrieve one doc
        contract_docs = self.retriever.retrieve(query)
        if len(contract_docs) != 1:
            raise ValueError("There should be one and only one contract document returned.")
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=f">> Found document: {contract_docs[0].metadata}"))

        # NOTE: assuming vendor contract doesn't overflow LLM context window
        # if it does, you'll likely want to use our SummaryIndex
        prompt = ChatPromptTemplate.from_messages([
            ("user", CONTRACT_EXTRACT_PROMPT),
        ])
        vendor_terms = await llm.astructured_predict(
            VendorContractTerms,
            prompt,
            contract_data=contract_docs[0].get_content(metadata_mode="all")
        )

        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=f">> Extracted vendor terms: {vendor_terms.dict()}"))
        
        return InvoiceContractEvent(
            invoice_output=ev.invoice_output,
            vendor_terms=vendor_terms
        )
    

    @step
    async def generate_output(
        self, ctx: Context, ev: InvoiceContractEvent
    ) -> StopEvent:
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=">> Generating Payments Due Report"))
        prompt = ChatPromptTemplate.from_messages([
            ("system", PAYMENTS_DUE_SYSTEM_PROMPT),
            ("user", PAYMENTS_DUE_USER_PROMPT)
        ])
        payments_due = await llm.astructured_predict(
            PaymentDueReport,
            prompt,
            invoice_output=str(ev.invoice_output.dict()),
            vendor_terms=str(ev.vendor_terms.dict())
        )

        return StopEvent(result=payments_due)

In [90]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")
workflow = InvoicePaymentsWorkflow(
    parser=parser,
    contract_retriever=retriever,
    llm=llm,
    verbose=True,
    timeout=None,  # don't worry about timeout to make sure it completes
)

#### Visualize the workflow

In [91]:
from llama_index.utils.workflow import draw_all_possible_flows

draw_all_possible_flows(InvoicePaymentsWorkflow, filename="invoice_workflow.html")

<class 'NoneType'>
<class '__main__.InvoiceContractEvent'>
<class 'llama_index.core.workflow.events.StopEvent'>
<class '__main__.InvoiceOutputEvent'>
invoice_workflow.html


## Run the Workflow

Let's run the full workflow and generate the output! 

In [92]:
from IPython.display import clear_output

handler = workflow.run(invoice_path="sample_invoice.docx")
async for event in handler.stream_events():
    if isinstance(event, LogEvent):

        if event.delta:
            print(event.msg, end="")
        else:
            print(event.msg)

response = await handler
print(str(response))

Running step parse_invoice
Step parse_invoice produced event InvoiceOutputEvent
>> Loading invoice from cache
>> Invoice data: {'invoice_number': 'INV-1001', 'invoice_date': datetime.date(2025, 3, 1), 'po_number': 'PO-2024-089', 'vendor_name': 'ACME Office Supply, Inc.', 'vendor_address': '123 Business Park Drive, Springfield, USA', 'line_items': [{'description': 'Printer Toner Cartridges (Model XY-500)', 'quantity': 20, 'unit_price': Decimal('45.0'), 'total': Decimal('900.0')}, {'description': 'A4 Copy Paper (5000 sheets/box)', 'quantity': 10, 'unit_price': Decimal('30.0'), 'total': Decimal('300.0')}], 'subtotal': Decimal('1200.0'), 'tax': Decimal('96.0'), 'total_amount': Decimal('1296.0'), 'payment_terms': 'Due on receipt'}
Running step find_relevant_contract
>> Found document: {'file_size': 5539, 'last_modified_at': '2024-12-08T19:04:14', 'file_path': 'acme_vendor_agreement.md', 'file_name': 'acme_vendor_agreement.md', 'external_file_id': 'acme_vendor_agreement.md', 'pipeline_id': '

In [94]:
print(response.render())

## Invoice Payment Recommendation

**Invoice Number:** INV-1001
**Original Amount Due:** $1,296.00

### Early Payment Option
- **Discounted Amount:** $1,270.08
- **Pay By:** 2025-03-11

### Bulk Discount
- **Applied:** Yes

### Recommendation
Pay early to save 2% and benefit from the bulk order discount.

### Notes
The original amount includes a 5% bulk order discount as the subtotal exceeded $1000. An additional 2% discount is available if paid by 2025-03-11.
