# Auto Insurance Claim Processing Workflow

<a href="https://colab.research.google.com/github/run-llama/llamacloud-demo/blob/main/examples/document_workflows/auto_insurance_claims/auto_insurance_claims.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This tutorial demonstrates how to build an agentic workflow that can parse auto insurance claims, retrieve and apply relevant policy guidelines, and produce a structured recommendation on whether and how to settle the claim. The workflow follows a similar pattern to the patient case summary workflow, but adapted for insurance data.

![](auto_insurance_claims.png)

The workflow will:

1. Parse the Claim Document: Extract key fields (claim number, date of loss, claimant name, policy number, loss description, estimated damage costs).
2. Index/Load Insurance Policy Documents: Either via LlamaCloud or another indexing solution.
3. Generate Relevant Queries: Given the claim details, construct vector-based queries to retrieve the appropriate coverage sections from the policy index.
3. Match Conditions Against Policy: Use LLM reasoning to determine if the claim is covered, what deductible applies, whether special endorsements are triggered, and what the recommended settlement amount should be.
4. Produce a Structured Output: Summarize the final recommended settlement and conditions for payment.

In [None]:
!pip install llama-index llama-index-indices-managed-llama-cloud llama-cloud llama-parse 

## Setup

In [147]:
import nest_asyncio
nest_asyncio.apply()

from typing import List, Optional
from pydantic import BaseModel, Field

### Define Schemas
We define schemas for our claim data and final recommendation. Similar to the patient workflow, we’ll have:

`ClaimInfo`: Captures details from the claim document.
`PolicyCondition`: Represents extracted or relevant policy conditions.
`ClaimEvaluation`: Represents the outcome after evaluating the claim against the policy.

#### ClaimInfo Schema

In [4]:
class ClaimInfo(BaseModel):
    """Extracted Insurance claim information."""
    claim_number: str
    policy_number: str
    claimant_name: str
    date_of_loss: str
    loss_description: str
    estimated_repair_cost: float
    vehicle_details: Optional[str] = None

#### PolicyCondition and PolicyQueries

We will also define a schema for generating guideline (in this case, coverage guideline) queries and for storing recommendations.

In [5]:
class PolicyQueries(BaseModel):
    queries: List[str] = Field(
        default_factory=list,
        description="A list of query strings to retrieve relevant policy sections."
    )

#### Guideline/Policy Recommendation Schema

We want to produce a structured recommendation about claim coverage.

In [6]:
class PolicyRecommendation(BaseModel):
    """Policy recommendation regarding a given claim."""
    policy_section: str = Field(..., description="The policy section or clause that applies.")
    recommendation_summary: str = Field(..., description="A concise summary of coverage determination.")
    deductible: Optional[float] = Field(None, description="The applicable deductible amount.")
    settlement_amount: Optional[float] = Field(None, description="Recommended settlement payout.")

#### Final Claim Decision Schema

In [7]:
class ClaimDecision(BaseModel):
    claim_number: str
    covered: bool
    deductible: float
    recommended_payout: float
    notes: Optional[str] = None

### Loading the Claim Document

In a real scenario, we’d have a PDF or text form with claim details. For this demonstration, we assume we have a JSON file containing claim data.

In [13]:
import json

def parse_claim(file_path: str) -> ClaimInfo:
    with open(file_path, "r") as f:
        data = json.load(f)
    # Validate and return
    return ClaimInfo.model_validate(data)

Example Claim Input (john.json):

In [14]:
claim_info = parse_claim("data/john.json")
claim_info.dict()

{'claim_number': 'CLAIM-0001',
 'policy_number': 'POLICY-ABC123',
 'claimant_name': 'John Smith',
 'date_of_loss': '2024-04-10',
 'loss_description': 'While delivering pizzas, collided with a parked car, causing damage to the parked car’s door.',
 'estimated_repair_cost': 1500.0,
 'vehicle_details': '2022 Honda Civic'}

(We assume this works and shows the parsed claim data.)

### Indexing Policy Documents

We will be indexing a sample [California Personal Automobile Policy](https://nationalgeneral.com/forms_catalog/CAIP400_03012006_CA.pdf) which we will validate the claims against.

Make sure to download the docment and upload it to [LlamaCloud](https://cloud.llamaindex.ai/). If you don't have access yet, you can use our open-source VectorStoreIndex.

In [70]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

index = LlamaCloudIndex(
    name="auto_insurance_policies_0",
    project_name="llamacloud_demo",
    # organization_id="...",
    # api_key="..."
)

retriever = index.as_retriever(rerank_top_n=3)

### Indexing Per-User Declarations Documents

Besides the general auto-insurance policy, we need a separate index to store the per-user declarations pages. These include specific details for each policy holder. They need to be filtered according to the right policy number during retrieval.

The declarations are stored in the `data` folder. In LlamaCloud, drag and drop the markdown files (not the JSON files) into a new LlamaCloud index. We will also attach the policy number as metadata.

In [85]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex
import os

declarations_index = LlamaCloudIndex(
    name="auto_insurance_declarations_0",
    project_name="llamacloud_demo",
    # organization_id="...",
    # api_key="..."
)

from llama_cloud.client import LlamaCloud

client = LlamaCloud(
    base_url="https://api.cloud.llamaindex.ai",
    token=os.environ["LLAMA_CLOUD_API_KEY"],
)

We use the API endpoint to load custom documents into the index 

In [86]:
# TODO: make this function not hidden
declarations_pipeline_id = declarations_index._get_pipeline_id()
declarations_project_id = declarations_index._get_project_id()

person_policy_map = {}
for p in ["alice", "john"]:
    claim_info = parse_claim(f"data/{p}.json")
    policy_num = claim_info.policy_number
    person_policy_map[f"{p}-declarations.md"] = policy_num

pipeline_docs = client.pipelines.list_pipeline_documents(declarations_pipeline_id)
for doc in pipeline_docs:
    doc.metadata["policy_number"] = person_policy_map[doc.metadata["file_name"]]
upserted_docs = client.pipelines.upsert_batch_pipeline_documents(declarations_pipeline_id, request=pipeline_docs)
upserted_docs[0].metadata

{'file_size': 1410,
 'last_modified_at': '2024-12-15T20:44:48',
 'file_path': 'john-declarations.md',
 'file_name': 'john-declarations.md',
 'external_file_id': 'john-declarations.md',
 'pipeline_id': 'daaa2430-6e34-46d9-9948-5ed80f677a9b',
 'policy_number': 'POLICY-ABC123'}

Check that it's been set appropriately

In [112]:
pipeline_docs = client.pipelines.list_pipeline_documents(declarations_pipeline_id)
len(pipeline_docs)
# inspect the first document
pipeline_docs[0].metadata

{'file_size': 1453,
 'last_modified_at': '2024-12-15T20:44:48',
 'file_path': 'alice-declarations.md',
 'file_name': 'alice-declarations.md',
 'external_file_id': 'alice-declarations.md',
 'pipeline_id': 'daaa2430-6e34-46d9-9948-5ed80f677a9b',
 'policy_number': 'POLICY-XYZ789'}

In [133]:
from llama_index.core.vector_stores.types import (
    MetadataInfo,
    MetadataFilters,
)

def get_declarations_docs(policy_number: str, top_k: int = 1):
    """Get declarations retriever."""
    # build retriever and query engine
    filters = MetadataFilters.from_dicts([
        {"key": "policy_number", "value": policy_number}
    ])
    retriever = declarations_index.as_retriever(
        # TODO: do file-level retrieval
        # retrieval_mode="files_via_metadata", 
        rerank_top_n=top_k, 
        filters=filters
    )
    # semantic query matters less here
    return retriever.retrieve(f"declarations page for {policy_number}")

In [134]:
# try it out 
docs = get_declarations_docs("POLICY-ABC123")
print(len(docs))
print(docs[0].get_content(metadata_mode="all"))

1
file_size: 1410
last_modified_at: 2024-12-15T20:44:48
file_path: john-declarations.md
file_name: john-declarations.md
external_file_id: john-declarations.md
pipeline_id: daaa2430-6e34-46d9-9948-5ed80f677a9b
policy_number: POLICY-ABC123
document_1_page_label: 1

# CALIFORNIA PERSONAL AUTO POLICY DECLARATIONS PAGE
**Policy Number:** CAP-ABC123-01  
**Policy Period:** 01/01/2024 to 07/01/2024  
(12:01 A.M. standard time at the address below)

**Named Insured:**  
John Smith  
456 Delivery Lane  
San Francisco, CA 94112

**Vehicle Information:**  
Vehicle: 2022 Honda Civic LX Sedan  
VIN: 2HGFE2F54NH123456  
Principal Operator: John Smith  
Usage: Personal

**Coverages and Premiums:**

- Bodily Injury Liability: $100,000/$300,000 [$450]
- Property Damage Liability: $50,000 [$295]
- Medical Payments: $5,000 [$80]
- Uninsured/Underinsured Motorist: $100,000/$300,000 [$115]
- Collision Coverage: $500 deductible [$425]
- Other Than Collision: $250 deductible [$210]
- Rental Reimbursement: $3

### Prompts

#### Generating Policy Queries

We prompt the LLM to generate queries for retrieving relevant policy sections. For example:

- Coverage conditions for collision damage.
- Deductible conditions.
- Any special endorsements for rental coverage or waived deductible scenarios.

In [135]:
GENERATE_POLICY_QUERIES_PROMPT = """\
You are an assistant tasked with determining what insurance policy sections to consult for a given auto claim.

**Instructions:**
1. Review the claim data, including the type of loss (rear-end collision), estimated repair cost, and policy number.
2. Identify what aspects of the policy we need:
   - Collision coverage conditions
   - Deductible application
   - Any special endorsements related to rear-end collisions or no-fault scenarios
3. Produce 3-5 queries that can be used against a vector index of insurance policies to find relevant clauses.

Claim Data:
{claim_info}

Return a JSON object matching the PolicyQueries schema.
"""

#### Policy Recommendation Prompt
Once we have queries, we’ll run them against the policy index, retrieve the text, and feed it back to the LLM to produce a PolicyRecommendation.

In [136]:
POLICY_RECOMMENDATION_PROMPT = """\
Given the retrieved policy sections for this claim, determine:
- If the collision is covered
- The applicable deductible
- Recommended settlement amount (e.g., cost minus deductible)
- Which policy section applies

Claim Info:
{claim_info}

Policy Text:
{policy_text}

Return a JSON object matching PolicyRecommendation schema.
"""

## Auto Insurance Claim Processing Workflow

This workflow takes an auto insurance claim, generates queries to retrieve relevant policy sections, evaluates coverage and deductibles, and produces a final claim decision with recommended payout.

In [148]:
from llama_index.core.workflow import (
    Event,
    StartEvent,
    StopEvent,
    Context,
    Workflow,
    step
)
from llama_index.core.llms import LLM
from llama_index.core.prompts import ChatPromptTemplate
from llama_index.llms.openai import OpenAI
from llama_index.core.retrievers import BaseRetriever

class ClaimInfoEvent(Event):
    claim_info: ClaimInfo

class PolicyQueryEvent(Event):
    queries: PolicyQueries

class PolicyMatchedEvent(Event):
    policy_text: str

class RecommendationEvent(Event):
    recommendation: PolicyRecommendation

class DecisionEvent(Event):
    decision: ClaimDecision

class LogEvent(Event):
    msg: str
    delta: bool = False


def parse_claim(file_path: str) -> ClaimInfo:
    import json
    with open(file_path, "r") as f:
        data = json.load(f)
    return ClaimInfo.model_validate(data)  # replace "ClaimInfo".model_validate with actual ClaimInfo class method

class AutoInsuranceWorkflow(Workflow):
    def __init__(
        self, 
        policy_retriever: BaseRetriever, 
        llm: LLM | None = None, 
        output_dir: str = "data_out", 
        **kwargs
    ) -> None:
        super().__init__(**kwargs)
        self.policy_retriever = policy_retriever
        self.llm = llm or OpenAI(model="gpt-4o")

    @step
    async def load_claim_info(self, ctx: Context, ev: StartEvent) -> ClaimInfoEvent:
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=">> Loading Claim Info"))
        claim_info = parse_claim(ev.claim_json_path)
        await ctx.set("claim_info", claim_info)
        return ClaimInfoEvent(claim_info=claim_info)

    @step
    async def generate_policy_queries(self, ctx: Context, ev: ClaimInfoEvent) -> PolicyQueryEvent:
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=">> Generating Policy Queries"))
        prompt = ChatPromptTemplate.from_messages([("user", GENERATE_POLICY_QUERIES_PROMPT)])
        queries = await self.llm.astructured_predict(
            PolicyQueries,
            prompt,
            claim_info=ev.claim_info.model_dump_json()
        )
        return PolicyQueryEvent(queries=queries)

    @step
    async def retrieve_policy_text(self, ctx: Context, ev: PolicyQueryEvent) -> PolicyMatchedEvent:
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=">> Retrieving policy sections"))

        claim_info = await ctx.get("claim_info")
        
        combined_docs = {}
        for query in ev.queries.queries:
            if self._verbose:
                ctx.write_event_to_stream(LogEvent(msg=f">> Query: {query}"))
            # fetch policy text
            docs = await self.policy_retriever.aretrieve(query)
            for d in docs:
                combined_docs[d.id_] = d

        # also fetch the declarations page for the policy holder
        d_doc = get_declarations_docs(claim_info.policy_number)[0]
        combined_docs[d_doc.id_] = d_doc
        
        policy_text = "\n\n".join([doc.get_content() for doc in combined_docs.values()])
        await ctx.set("policy_text", policy_text)
        return PolicyMatchedEvent(policy_text=policy_text)

    @step
    async def generate_recommendation(self, ctx: Context, ev: PolicyMatchedEvent) -> RecommendationEvent:
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=">> Generating Policy Recommendation"))
        claim_info = await ctx.get("claim_info")
        prompt = ChatPromptTemplate.from_messages([("user", POLICY_RECOMMENDATION_PROMPT)])
        recommendation = await self.llm.astructured_predict(
            PolicyRecommendation,
            prompt,
            claim_info=claim_info.model_dump_json(),
            policy_text=ev.policy_text
        )
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=f">> Recommendation: {recommendation.model_dump_json()}"))
        return RecommendationEvent(recommendation=recommendation)

    @step
    async def finalize_decision(self, ctx: Context, ev: RecommendationEvent) -> DecisionEvent:
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=">> Finalizing Decision"))
        claim_info = await ctx.get("claim_info")
        rec = ev.recommendation
        covered = "covered" in rec.recommendation_summary.lower() or (rec.settlement_amount is not None and rec.settlement_amount > 0)
        deductible = rec.deductible if rec.deductible is not None else 0.0
        recommended_payout = rec.settlement_amount if rec.settlement_amount else 0.0
        decision = ClaimDecision(
            claim_number=claim_info.claim_number,
            covered=covered,
            deductible=deductible,
            recommended_payout=recommended_payout,
            notes=rec.recommendation_summary
        )
        return DecisionEvent(decision=decision)

    @step
    async def output_result(self, ctx: Context, ev: DecisionEvent) -> StopEvent:
        if self._verbose:
            ctx.write_event_to_stream(LogEvent(msg=f">> Decision: {ev.decision.model_dump_json()}"))
        return StopEvent(result={"decision": ev.decision})

In [149]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")
workflow = AutoInsuranceWorkflow(
    policy_retriever=retriever,
    llm=llm,
    verbose=True,
    timeout=None,  # don't worry about timeout to make sure it completes
)

#### Visualize the workflow

In [150]:
from llama_index.utils.workflow import draw_all_possible_flows

draw_all_possible_flows(AutoInsuranceWorkflow, filename="auto_insurance_workflow.html")

<class 'NoneType'>
<class '__main__.DecisionEvent'>
<class '__main__.PolicyQueryEvent'>
<class '__main__.RecommendationEvent'>
<class '__main__.ClaimInfoEvent'>
<class 'llama_index.core.workflow.events.StopEvent'>
<class '__main__.PolicyMatchedEvent'>
auto_insurance_workflow.html


## Run the Workflow

Let's run the full workflow and generate the output! 

In [151]:
from IPython.display import clear_output

async def stream_workflow(workflow, **workflow_kwargs):
    handler = workflow.run(**workflow_kwargs)
    async for event in handler.stream_events():
        if isinstance(event, LogEvent):
            if event.delta:
                print(event.msg, end="")
            else:
                print(event.msg)

    return await handler


response_dict = await stream_workflow(workflow, claim_json_path="data/john.json")
print(str(response_dict["decision"]))

Running step load_claim_info
Step load_claim_info produced event ClaimInfoEvent
>> Loading Claim Info
Running step generate_policy_queries
>> Generating Policy Queries
Step generate_policy_queries produced event PolicyQueryEvent
Running step retrieve_policy_text
>> Retrieving policy sections
>> Query: Collision coverage conditions for POLICY-ABC123
>> Query: Deductible application for rear-end collision under POLICY-ABC123
>> Query: Special endorsements for rear-end collisions in POLICY-ABC123
>> Query: No-fault coverage details in POLICY-ABC123
>> Query: Repair cost coverage limits for POLICY-ABC123
Step retrieve_policy_text produced event PolicyMatchedEvent
Running step generate_recommendation
>> Generating Policy Recommendation
Step generate_recommendation produced event RecommendationEvent
>> Recommendation: {"policy_section":"PART D COVERAGE FOR DAMAGE TO YOUR AUTO - INSURING AGREEMENT - COLLISION","recommendation_summary":"The collision is not covered due to the exclusion for car

In [152]:
response_dict = await stream_workflow(workflow, claim_json_path="data/alice.json")
print(str(response_dict["decision"]))

Running step load_claim_info
Step load_claim_info produced event ClaimInfoEvent
>> Loading Claim Info
Running step generate_policy_queries
>> Generating Policy Queries
Step generate_policy_queries produced event PolicyQueryEvent
Running step retrieve_policy_text
>> Retrieving policy sections
>> Query: Collision coverage conditions for rear-end collisions
>> Query: Deductible application for rear-end collision claims
>> Query: Special endorsements related to rear-end collisions
>> Query: No-fault scenario coverage for rear-end collisions
>> Query: Policy clauses for repair cost coverage in rear-end collisions
Step retrieve_policy_text produced event PolicyMatchedEvent
Running step generate_recommendation
>> Generating Policy Recommendation
Step generate_recommendation produced event RecommendationEvent
>> Recommendation: {"policy_section":"PART D COVERAGE FOR DAMAGE TO YOUR AUTO - INSURING AGREEMENT - COLLISION","recommendation_summary":"The collision is covered under Part D - Coverage 