### 1. Load Libraries

In [1]:
import os 
from pprint import pprint

import pymupdf4llm
from langchain_community.document_loaders import DirectoryLoader
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.text_splitter import TokenTextSplitter
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.llms import LlamaCpp
from langchain.vectorstores import Chroma
from IPython.display import Markdown, display


### 2. Get all the files in the directory

In [2]:
data_folder = r'/Users/user/Documents/personal_projects/rag_insurance/data'

pdf_files = []
for file in os.listdir(data_folder):
    if file.endswith(".pdf"):
        pdf_files.append(file)
pprint(pdf_files)
print('There are {} pdf files in the folder'.format(len(pdf_files)))

['gels-pdt-gpa-brochure.pdf']
There are 1 pdf files in the folder


### 3. Convert all pdf into markdown

In [3]:
processed_data_dir = '/Users/user/Documents/personal_projects/rag_insurance/data/processed_data'
for files in pdf_files:
    pdf_file = os.path.join(data_folder, files)
    split_file_name = files.split('.')
    split_file_name[-1] = 'md'
    markdown_file_name = '.'.join(split_file_name)
    pdf_document = pymupdf4llm.to_markdown(pdf_file)
    save_path = os.path.join(processed_data_dir, markdown_file_name)
    with open(save_path, 'w') as file:
        file.write(pdf_document)

    print("Markdown file has been saved as 'example.md'")
    print('Closed the pdf file')

Processing /Users/user/Documents/personal_projects/rag_insurance/data/gels-pdt-gpa-brochure.pdf...
Markdown file has been saved as 'example.md'
Closed the pdf file


### 4. Load the files as langchain document 

In [4]:
loader = DirectoryLoader(processed_data_dir, glob="**/*.md")
docs = loader.load()
len(docs)

libmagic is unavailable but assists in filetype detection. Please consider installing libmagic for better results.


1

In [5]:
pprint(f"{docs[0].metadata}\n")
print('-------------------------')
pprint(docs[0].page_content)

("{'source': "
 "'/Users/user/Documents/personal_projects/rag_insurance/data/processed_data/gels-pdt-gpa-brochure.md'}\n")
-------------------------
('GREAT Protector Active\n'
 '\n'
 'All the protection you need to enjoy a more active lifestyle\n'
 '\n'
 'Empowering you to lead the active life you love with high protection against '
 'the unexpected\n'
 '\n'
 'While the YOLO lifestyle is all about embracing spontaneity to experience '
 'all things exciting, it could also bring about a few misadventures. As such, '
 'it’s wise to give yourself a financial safety net with high personal '
 'accident protection.\n'
 '\n'
 'GREAT Protector Active offers up to S$3 million in personal accident '
 'coverage and provides the assurance you need on your adventures including '
 'increased-risk activities[1] like scuba diving and rock-climbing, and even '
 'during your daily commute. Furthermore, your coverage gets boosted up to 1.5 '
 'times when you are injured from most traffic-related accident

### 5. Split the text 
- Here i am splitting the text with chunk_size of 200 and chunk_overlap of 10

In [6]:
# Create embeddings
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
# Split documents into chunks
text_splitter = TokenTextSplitter(chunk_size=1000, chunk_overlap=300)

chunks_gels = text_splitter.split_documents(docs)

  from .autonotebook import tqdm as notebook_tqdm


In [7]:
for chunk in chunks_gels:
    print(chunk)
    print('-------------------------')

page_content='GREAT Protector Active

All the protection you need to enjoy a more active lifestyle

Empowering you to lead the active life you love with high protection against the unexpected

While the YOLO lifestyle is all about embracing spontaneity to experience all things exciting, it could also bring about a few misadventures. As such, it’s wise to give yourself a financial safety net with high personal accident protection.

GREAT Protector Active offers up to S$3 million in personal accident coverage and provides the assurance you need on your adventures including increased-risk activities[1] like scuba diving and rock-climbing, and even during your daily commute. Furthermore, your coverage gets boosted up to 1.5 times when you are injured from most traffic-related accidents[2] in Singapore and for any accidents while overseas.

Sign up online with affordable premium from S$0.89* a day and enjoy the active life you deserve.

Why GREAT Protector Active

Get up to S$3 million in c

load embedding

### Load embedding model from huggingface 
- To embed all the chunks 
- Here i am using 'all-MiniLM-L6-v2'

In [8]:
# Create vector store
vectorstore_2 = Chroma.from_documents(documents=chunks_gels, embedding=embeddings)

In [9]:
retriever_2 = vectorstore_2.as_retriever(
    search_type="mmr", search_kwargs={"k": 10, "fetch_k": 15}
)
result = retriever_2.invoke("Terminal Illness")

for content in result :
    print(content)
    print('-------------------------')

Number of requested results 15 is greater than number of elements in index 3, updating n_results = 3


page_content=' Benefit Booster4 4,500 6,000 7,500 For claim under Complementary Medicine Practitioner and/or Allied Health Professional Sub-limit6 per accident 1,000 For claim under Sickness5 500

3 Excludes Accidental Major Permanent Disablement & Accidental Other Permanent Disablement from food poisoning.

5 Refers to any confirmed diagnosis of (i) Injury and/ or condition due to a bite, sting, attack or such similar incident by an insect or animal (ii) food poisoning or (iii) Hand Foot and Mouth Disease

All ages specified refer to age next birthday.

This advertisement has not been reviewed by the Monetary Authority of Singapore.

Singapore Citizens, Permanent Residents (PRs) and holders of Employment Pass who are aged between 17 and 65, may purchase any plan (Basic, Classic, Elite) under this policy. Juveniles (aged between 1 to 16) and holders of S Pass, Dependant’s Pass or Student’s Pass may purchase only the Basic Plan under this policy.

The above is for general information on

In [10]:
# Helper function for printing docs
def pretty_print_docs(docs):
    print(
        f"\n{'-' * 100}\n".join(
            [f"Document {i+1}:\n\n" + d.page_content for i, d in enumerate(docs)]
        )
    )

In [11]:
question = "Accidental Death amount payable"

In [13]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import CrossEncoderReranker
from langchain_community.cross_encoders import HuggingFaceCrossEncoder

model = HuggingFaceCrossEncoder(model_name="BAAI/bge-reranker-base")
compressor = CrossEncoderReranker(model=model, top_n=5)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor, base_retriever=retriever_2
)

compressed_docs = compression_retriever.invoke(question)
pretty_print_docs(compressed_docs)

Number of requested results 15 is greater than number of elements in index 3, updating n_results = 3


Document 1:

$74.32/month[#]

S$865.89/year[#]

Here’s how GREAT Protector Active supports your active lifestyle

Danielle, a graduate school student who loves outdoor activities and travelling, signs up for GREAT Protector Active (Elite) with a Sum Assured of S$1,000,000 to safeguard against the unexpected. Her annual premium is S$865.89[#].

^ Daily rates are based on the annual premium of GREAT Protector Active Basic, Classic or Elite, whichever applicable, divided by 365 days and rounded off to the nearest cent.

P i t i l d th ili t f GST Th ili t f GST i bj t t h Th i t f thi

Col1 Claim event(s) Disability payout Amount payable (S$) Col5 Col6 Basic Classic Elite A Accidental Death (includes food poisoning) Sum Assured 200,000 500,000 1,000,000 With Benefit Booster4 300,000 750,000 1,500,000 B Accidental Major Permanent Disablement3 (I) Total & Permanent Disability, Loss of Both Arms/Legs, One Arm & One Leg, Sight in Both Eyes, One Arm/Leg & Sight in One Eye, Complete spinal trac

LLM

In [19]:
question = "What is this plan about?"

In [20]:
# prompt = f"""
# <|im_start|>system
# You are an agent for a life insurance company.<|im_end|>
# <|im_start|>user
# A customer has asked you this question{question}.
# Answer using this document{compressed_docs}

# Show the reference document as well<|im_end|>
# <|im_start|>assistant
# """

prompt = f"""
You are an agent for a life insurance company.<|im_end|>
{question}.
Answer using this document{compressed_docs}

"""

In [15]:
n_gpu_layers = -1  # The number of layers to put on the GPU. The rest will be on the CPU. If you don't know how many layers there are, you can use -1 to move all to GPU.
n_batch = 512 # Should be between 1 and n_ctx, consider the amount of RAM of your Apple Silicon Chip.
# Make sure the model path is correct for your system!
llm = LlamaCpp(
    model_path="/Users/user/Documents/personal_projects/rag_insurance/model/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf",
    n_gpu_layers=n_gpu_layers,
    n_batch=n_batch,
    f16_kv=True,  # MUST set to True, otherwise you will run into problem after a couple of calls
    verbose=True,  # Verbose is required to pass to the callback manager
    n_ctx=15000,
    temperature=0.0,
    seed=42,
    max_tokens=8000, # This max token works
    max_token_length=8000, # This doesnt
    
)

                max_token_length was transferred to model_kwargs.
                Please confirm that max_token_length is what you intended.
  if await self.run_code(code, result, async_=asy):
llama_model_load_from_file_impl: using device Metal (Apple M1) - 8643 MiB free
llama_model_loader: loaded meta data with 32 key-value pairs and 292 tensors from /Users/user/Documents/personal_projects/rag_insurance/model/DeepSeek-R1-Distill-Llama-8B-Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = DeepSeek R1 Distill Llama 8B
llama_model_loader: - kv   3:                           general.basename str              = DeepSeek-R1

In [21]:
answer = llm.invoke(prompt)
display(Markdown(answer))


Llama.generate: 17 prefix-match hit, remaining 2583 prompt tokens to eval
llama_perf_context_print:        load time =   36263.49 ms
llama_perf_context_print: prompt eval time =   29032.95 ms /  2583 tokens (   11.24 ms per token,    88.97 tokens per second)
llama_perf_context_print:        eval time =  109387.46 ms /   923 runs   (  118.51 ms per token,     8.44 tokens per second)
llama_perf_context_print:       total time =  141695.95 ms /  3506 tokens


Okay, so I need to figure out what this insurance plan is about. Let me start by reading through the provided document content.

The first thing I notice is that the document mentions "GREAT Protector Active" as the name of the plan. That gives me a clear starting point.

Next, the document talks about the protection offered by this plan. It mentions high personal accident coverage up to S$3 million. This includes protection against Accidental Major Permanent Disablement and Other Permanent Disablements, which are covered under different sections with varying payout amounts.

The document also specifies that the coverage gets boosted up to 1.5 times with a Benefit Booster. This applies in cases of injuries from most traffic-related accidents in Singapore or while overseas.

Additionally, the plan offers high medical expenses reimbursement, covering outpatient and hospitalisation expenses up to S$7,500 per accident. The insured can choose their preferred medical treatment and care by a Complementary Medicine Practitioner or Allied Health Professional after an accident.

The document also mentions that this insurance plan is suitable for Singapore Citizens, Permanent Residents (PRs), and holders of Employment Pass who are aged between 17 and 65. Juveniles (aged between 1 to 16) and holders of certain passes may purchase only the Basic Plan under this policy.

It's also noted that replacing an existing accident and health plan with a new one, such as this GREAT Protector Active plan, is usually not advisable. A penalty may be imposed for early plan termination, and the new plan might cost more or have fewer benefits at the same cost.

Finally, the document mentions that the information provided is correct as of 1 January 2024 and that in case of discrepancy between the English and Chinese versions, the English version shall prevail.

So, putting all this together, the insurance plan is called GREAT Protector Active. It offers high personal accident coverage up to S$3 million, including protection against various types of disabilities and medical expenses. The plan also includes a benefit booster that increases payout amounts in certain cases. Additionally, it provides reimbursement for medical expenses after accidents, allowing the insured to choose their preferred treatment and care providers.

The plan is targeted towards individuals who are Singapore Citizens, Permanent Residents, or holders of an Employment Pass, aged between 17 and 65. It's important to note that this plan may not be suitable for everyone, and replacing an existing policy without consulting a qualified financial advisor could lead to penalties or less favorable terms.

Overall, the GREAT Protector Active plan seems to provide comprehensive personal accident coverage with various benefits aimed at safeguarding against unexpected events while allowing the insured to maintain an active lifestyle.
</think>

**GREAT Protector Active Insurance Plan Overview**

**Plan Name:**  
GREAT Protector Active

**Target Audience:**  
- Singapore Citizens
- Permanent Residents (PRs)
- Holders of Employment Pass, aged between 17 and 65.

**Key Features:**

1. **High Personal Accident Coverage:**
   - Up to S$3 million in coverage.
   - Includes protection against:
     - Accidental Major Permanent Disablement (e.g., loss of limbs, sight).
     - Other Permanent Disablements (e.g., hearing loss, thumb and four fingers in one hand).

2. **Benefit Booster:**
   - Enhances payout amounts by up to 1.5 times.
   - Applies to injuries from most traffic-related accidents in Singapore or while overseas.

3. **Medical Expenses Reimbursement:**
   - Covers outpatient and hospitalisation expenses up to S$7,500 per accident.
   - Allows the insured to choose their preferred medical treatment and care providers, such as Complementary Medicine Practitioners or Allied Health Professionals.

4. **Policyholder Considerations:**
   - **Replacement of Existing Policies:** It is generally not advisable to replace an existing accident and health plan without consulting a qualified financial advisor. This could result in penalties for early termination and may lead to less favorable terms at the same cost.
   - **Target Audience Suitability:** The plan is specifically designed for individuals within certain age brackets and employment statuses. It's crucial to evaluate personal circumstances and consult with a financial advisor before deciding on this or any other insurance plan.

**Conclusion:**

The GREAT Protector Active plan offers comprehensive personal accident coverage with enhanced benefits, including medical expenses reimbursement and a benefit booster feature. This plan is tailored for individuals who lead active lifestyles and require robust protection against potential accidents. However, it's essential to carefully consider personal circumstances and consult with a qualified financial advisor before purchasing this or any other insurance plan.