# LlamaParse - Fast checking Insurance Contract for Coverage

<a href="https://colab.research.google.com/github/run-llama/llama_parse/blob/main/examples/demo_insurance.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In this notebook we will look at how LlamaParse can be used to extract structured coverage information from an insurance policy.

## Installation of required packages

In [1]:
!pip install llama-index llama-parse

zsh:1: command not found: pip


## Download an insurance policy fron IRDAI

The Insurance Regulatory and Development Authority of India (IRDAI) maintains a great resource: https://policyholder.gov.in/web/guest/non-life-insurance-products where all insurance policies available in India are publicly available for download! Let's download a complex health insurance policy as an example.

In [2]:
!wget "https://policyholder.gov.in/documents/37343/931203/NBHTGBP22011V012223.pdf/c392bcc1-f6a8-cadd-ab84-495b3273d2c3?version=1.0&t=1669350459879&download=true" -O "./policy.pdf"

--2024-03-19 14:30:27--  https://policyholder.gov.in/documents/37343/931203/NBHTGBP22011V012223.pdf/c392bcc1-f6a8-cadd-ab84-495b3273d2c3?version=1.0&t=1669350459879&download=true
Resolving policyholder.gov.in (policyholder.gov.in)... 13.107.246.61, 13.107.213.61
Connecting to policyholder.gov.in (policyholder.gov.in)|13.107.246.61|:443... connected.
HTTP request sent, awaiting response... 200 
Length: 1341586 (1.3M) [application/pdf]
Saving to: ‘./policy.pdf’


2024-03-19 14:30:29 (1.37 MB/s) - ‘./policy.pdf’ saved [1341586/1341586]



## Initializing LlamaIndex and LlamaParse

In [3]:
# llama-parse is async-first, running the sync code in a notebook requires the use of nest_asyncio
import nest_asyncio
nest_asyncio.apply()

In [5]:
import os
os.environ["LLAMA_CLOUD_API_KEY"] = "llx-..."
os.environ["OPENAI_API_KEY"] = "sk-..."

In [6]:
from llama_index.llms.openai import OpenAI
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.core import VectorStoreIndex
from llama_index.core import Settings

# for the purpose of this example, we will use the small model embedding and gpt3.5
embed_model=OpenAIEmbedding(model="text-embedding-3-small")
llm = OpenAI(model="gpt-3.5-turbo-0125")

Settings.llm = llm

## Vanilla Approach - Parse the Policy with LlamaParse into Markdown

In [7]:
from llama_parse import LlamaParse

documents = LlamaParse(result_type="markdown").load_data("./policy.pdf")

Started parsing the file under job_id f6ef66a3-a085-4fa5-8300-479adcdea779


In [8]:
print(documents[0].text[0:1000])

## Preamble

This ‘Travel Infinity’ Policy is a contract of insurance between You and Us which is subject to payment of full premium in advance and the terms, conditions and exclusions of this Policy. Expense incurred outside the policy period will NOT be covered. Unutilized Sum Insured will expire at the end of the policy year. All applicable benefits, details and limits are mentioned in your Certificate of insurance. We will cover only allopathic treatments in this policy.

## Defined Terms

The terms listed below in this Section and used elsewhere in the Policy in Initial Capitals shall have the meaning set out against them in this Section.

### Standard Definitions

|2.1|Accident or Accidental|means sudden, unforeseen and involuntary event caused by external, visible and violent means.|
|---|---|---|
|2.2|Co-payment|means a cost sharing requirement under a health insurance policy that provides that the policyholder/insured will bear a specified percentage of the admissible claims a

### Markdown Element Node Parser
Our markdown element node parser works well for parsing the markdown output of LlamaParse into a set of table and text nodes.

In [9]:
from llama_index.core.node_parser import MarkdownElementNodeParser

node_parser = MarkdownElementNodeParser(llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8)

In [10]:
nodes = node_parser.get_nodes_from_documents(documents)



Embeddings have been explicitly disabled. Using MockEmbedding.


111it [00:00, 86184.33it/s]
100%|██████████| 111/111 [00:35<00:00,  3.15it/s]


In [11]:
base_nodes, objects = node_parser.get_nodes_and_objects(nodes)

recursive_index = VectorStoreIndex(nodes=base_nodes+objects)

In [12]:
query_engine = recursive_index.as_query_engine(similarity_top_k=25)

### Querying the model for coverage

In [9]:
query_1 = "My trip was delay and I paid 45, how much am I cover for?"

response_1 = query_engine.query(query_1)
print(str(response_1))

You are covered for the expenses incurred on any alternate travel booking under any mode of transport, up to the limit of the Sum Insured as mentioned in the Certificate of insurance, if the delay of the airlines was caused due to specific reasons outlined in the policy. The amount you are covered for will depend on the specific terms and conditions of your policy, including the maximum coverage limit specified in the Certificate of insurance.


The information is split across the document which leads to retrieval issues. Let's try some parsing instructions to improve our result.

In [22]:
documents_with_instruction = LlamaParse(result_type="markdown", parsing_instruction="""
This document is an insurance policy.
When a benefits/coverage/exlusion is describe in the document ammend to it add a text in the follwing benefits string format (where coverage could be an exclusion).

Benefits for {nameofrisk} is {benefitsDescription}, with amount: {benefitsAmount}, and conditions: {benefitsCondition}. 
                                        
If the document contain a benefits TABLE that describe coverage amounts, do not ouput it as a table, but instead as a list of benefits string.
                                       
""").load_data("./policy.pdf")

Started parsing the file under job_id 5b110ac2-e403-4eb3-a92b-97438d97199a
.....

Let see how the 2 parsing compare (change target page to explore)

In [23]:
target_page = 51
pages_vanilla = documents[0].text.split("\n---\n")
pages_with_instructions = documents_with_instruction[0].text.split("\n---\n")

print(pages_vanilla[target_page])
print("\n\n=========================================================\n\n")
print(pages_with_instructions[target_page])

## Repatriation of Mortal remains

|Indemnity|USD|
|---|---|
|5K, 10K, 15K, 20K, 25K, 50K, 75K, 100K, 150K, 200K 250K|Nil/50/100|

## Repatriation of Mortal remains

|Indemnity|USD|
|---|---|
|3Lac, 5Lac|Independent SI|

## Total Loss of Checked-in Baggage

|Benefit|USD|
|---|---|
|100 to 1000 in multiples of 100|NIL|

## Delay of Checked-in Baggage

|Benefit|USD|
|---|---|
|100 to 500 in multiples of 100|1/2/3/5/6/24 hours|
|25 to 100 in multiple of 25|3/6/8/12 hours|

## Trip Delay

|Benefit|USD|
|---|---|
|Post that in multiple of 50 up to 500|hours|
|500, 750, 1000. Post that in|NIL/100/25|

## Trip Cancellation

|Indemnity|USD|
|---|---|
|multiple of 1000 till 10000|0/500/100|

|Benefit|USD|
|---|---|
|multiple of 500 till 5000|0/500/100|

## Trip Interruption

|Indemnity|USD|
|---|---|
|multiple of 500 till 10000|0/500/100|

|Benefit|USD|
|---|---|
|multiple of 500 till 5000|0/500/100|

## Loss of Passport

|Indemnity|USD|
|---|---|
|200 to 500 in multiple of 100|Nil|

|Benefit|U

In [15]:
node_parser_instruction = MarkdownElementNodeParser(llm=OpenAI(model="gpt-3.5-turbo-0125"), num_workers=8)
nodes_instruction = node_parser.get_nodes_from_documents(documents_with_instruction)
base_nodes_instruction, objects_instruction = node_parser_instruction.get_nodes_and_objects(nodes_instruction)

recursive_index_instruction = VectorStoreIndex(nodes=base_nodes_instruction+objects_instruction)
query_engine_instruction = recursive_index_instruction.as_query_engine(similarity_top_k=25)

Embeddings have been explicitly disabled. Using MockEmbedding.


1it [00:00, 25575.02it/s]
100%|██████████| 1/1 [00:02<00:00,  2.13s/it]


## Comparing Instruction-Augmented Parsing vs. Vanilla Parsing

When we parse the document with natural language instructions to add context on insurance coverage, we are able to correctly answer a wide range of queries in our RAG pipeline. In contrast, a RAG pipeline built with the vanilla method is not able to answer these queries.

In [16]:
query_1 = "My trip was delayed and I paid 45, how much am I covered for?"

response_1 = query_engine.query(query_1)
print("Vanilla:")
print(response_1)

print("With instructions:")
response_1_i = query_engine_instruction.query(query_1)
print(response_1_i)


Vanilla:
You are covered for the expenses you incurred due to the delay of your trip. The amount you paid, which was 45, will be covered up to the limit specified in the certificate of insurance for every block of hours of delay, as mentioned in the policy.
With instructions:
For Trip Delay coverage, the payment amount for every block of hours of delay is as mentioned in the certificate of insurance.


Looking at the policy it says in list I that one expense not covered is Baby food

In [16]:
query_2 = "I just had a baby, is baby food covered?"

response_2 = query_engine.query(query_2)
print("Vanilla:")
print(response_2)

print("With instructions:")
response_2_i = query_engine_instruction.query(query_2)
print(response_2_i)

Vanilla:
Baby food is not explicitly mentioned in the provided context information regarding insurance coverages and benefits.
With instructions:
Baby food is excluded from coverage according to the policy terms.


In [18]:
query_3 = "How is gauze used in my operation covered?"

response_3 = query_engine.query(query_3)
print("Vanilla:")
print(response_3)

print("With instructions:")
response_3_i = query_engine_instruction.query(query_3)
print(response_3_i)

Vanilla:
Gauze used in your operation would typically be covered under the "Emergency In-patient Medical Treatment" or "Emergency In-patient Medical Treatment with OPD" benefits of the policy.
With instructions:
Gauze is not covered for use in your operation as it falls under the category of items that are excluded from coverage in the insurance policy.
