# Build Mr.HelpMate AI


### Objective

Develop a reliable generative search system that can efficiently and precisely respond to queries based on provided policy document (a single long life insurance policy document).

### Design

Solution will broadly contain three layers:

**The Embedding Layer**: The PDF document needs to be effectively processed, cleaned, and chunked for the embeddings. As PDF is  large blocks of text, before generating the embeddings, you need to generate the chunksHere, the choice of the chunking strategy will have a large impact on the final quality of the retrieved results. So, make sure that you try out various stratgies and compare their performances.

Another important aspect in the embedding layer is the choice of the embedding model. You can choose to embed your chunks using the OpenAI embedding model or any model from the SentenceTransformers library on HuggingFace.
 
**The Semantic Search Layer**: Here, you first need to design at least 3 queries against which you will test your system. You need to understand and skim through the document, and accordingly come up with some queries, the answers to which can be found in the policy document.

Next, you need to embed the queries and search your ChromaDB vector database against each of these queries. Implementing a cache mechanism is also mandatory.

Finally, you need to implement the re-ranking block, and for this you can choose from a range of cross-encoding models on HuggingFace.
 
**The Generation Layer**: In the generation layer, the final prompt that you design is the major component. Make sure that the prompt is exhaustive in its instructions, and the relevant information is correctly passed to the prompt. You may also choose to provide some few-shot examples in an attempt to improve the LLM output.




## 1.Importing Important Library

In [452]:
# Import all the required Libraries

import pdfplumber
from pathlib import Path
import pandas as pd
from operator import itemgetter
import json
import tiktoken
import openai
import chromadb

## 2.Read, Process, and Chunk the PDF Files


In [453]:
pdf_path=r'D:\GenAI\HelpMateAI_MM\Principal-Sample-Life-Insurance-Policy.pdf'

#### 2.1 Reading a single PDF file and exploring it through pdfplumber

In [454]:
# Open the PDF file
with pdfplumber.open(pdf_path) as pdf:

    # Get one of the pages from the PDF and examine it
    single_page = pdf.pages[6]

    # Extract text from the first page
    text = single_page.extract_text()

    # Extract tables from the first page
    tables = single_page.extract_tables()

    # Print the extracted text
    print(text)

Section A – Eligibility
Member Life Insurance Article 1
Member Accidental Death and Dismemberment Insurance Article 2
Dependent Life Insurance Article 3
Section B - Effective Dates
Member Life Insurance Article 1
Member Accidental Death and Dismemberment Insurance Article 2
Dependent Life Insurance Article 3
Section C - Individual Terminations
Member Life Insurance Article 1
Member Accidental Death and Dismemberment Insurance Article 2
Dependent Life Insurance Article 3
Termination for Fraud Article 4
Coverage While Outside of the United States Article 5
Section D - Continuation
Member Life Insurance Article 1
Dependent Insurance - Developmentally Disabled or
Physically Handicapped Children Article 2
Section E - Reinstatement
Reinstatement Article 1
Federal Required Family and Medical Leave Act (FMLA) Article 2
Reinstatement of Coverage for a Member or Dependent When
Coverage Ends due to Living Outside of the United States Article 3
Section F - Individual Purchase Rights
Member Life In

In [455]:
# Function to check whether a word is present in a table or not for segregation of regular text and tables

def check_bboxes(word, table_bbox):
    # Check whether word is inside a table bbox.
    l = word['x0'], word['top'], word['x1'], word['bottom']
    r = table_bbox
    return l[0] > r[0] and l[1] > r[1] and l[2] < r[2] and l[3] < r[3]

In [456]:
# Function to extract text from a PDF file.
# 1. Declare a variable p to store the iteration of the loop that will help us store page numbers alongside the text
# 2. Declare an empty list 'full_text' to store all the text files
# 3. Use pdfplumber to open the pdf pages one by one
# 4. Find the tables and their locations in the page
# 5. Extract the text from the tables in the variable 'tables'
# 6. Extract the regular words by calling the function check_bboxes() and checking whether words are present in the table or not
# 7. Use the cluster_objects utility to cluster non-table and table words together so that they retain the same chronology as in the original PDF
# 8. Declare an empty list 'lines' to store the page text
# 9. If a text element in present in the cluster, append it to 'lines', else if a table element is present, append the table
# 10. Append the page number and all lines to full_text, and increment 'p'
# 11. When the function has iterated over all pages, return the 'full_text' list

def extract_text_from_pdf(pdf_path):
    p = 0
    full_text = []


    with pdfplumber.open(pdf_path) as pdf:
        for page in pdf.pages:
            page_no = f"Page {p+1}"
            text = page.extract_text()
            # Extract heading
            heading = None
            if text:
                heading = text.split('\n')[0].strip()  # First line of text as heading


            tables = page.find_tables()
            table_bboxes = [i.bbox for i in tables]
            tables = [{'table': i.extract(), 'top': i.bbox[1]} for i in tables]
            non_table_words = [word for word in page.extract_words() if not any(
                [check_bboxes(word, table_bbox) for table_bbox in table_bboxes])]
            lines = []

            for cluster in pdfplumber.utils.cluster_objects(non_table_words + tables, itemgetter('top'), tolerance=5):

                if 'text' in cluster[0]:
                    try:
                        lines.append(' '.join([i['text'] for i in cluster]))
                    except KeyError:
                        pass

                elif 'table' in cluster[0]:
                    lines.append(json.dumps(cluster[0]['table']))


            full_text.append([page_no,heading, " ".join(lines)])
            p +=1

    return full_text

*Now that we have defined the function for extracting the text and tables from a PDF, let's iterate and call this function for all the PDFs in our drive and store them in a list.*

In [457]:
# Call the function to extract the text from the PDF
extracted_text = extract_text_from_pdf(pdf_path)
# Convert the extracted list to a PDF, and add a column to store document names
extracted_text_df = pd.DataFrame(extracted_text, columns=['Page No.','heading','Page_Text'])
print("All PDFs have been processed.")

All PDFs have been processed.


In [458]:
extracted_text_df.head()

Unnamed: 0,Page No.,heading,Page_Text
0,Page 1,DOROTHEA GLAUSE S655,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/01/2014 711 HIGH STREET GEORGE RI 02903 GROUP POLICY FOR: RHODE ISLAND JOHN DOE ALL MEMBERS Group Member Life Insurance Print Date: 07/16/2014
1,Page 2,This page left blank intentionally,This page left blank intentionally
2,Page 3,POLICY RIDER,"POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial Services Hotline and Grief Support Services or any other value added service for the employees of that employer group. In addition, The Principal may arrange for third party service providers (i.e., optometrists, health clubs), to provide discounted goods and services to those employer groups who apply for coverage with The Principal or who become insureds/enrollees of The Principal. While The Principal has arranged these goods, services and/or ..."
3,Page 4,This page left blank intentionally,This page left blank intentionally
4,Page 5,PRINCIPAL LIFE INSURANCE COMPANY,"PRINCIPAL LIFE INSURANCE COMPANY (called The Principal in this Group Policy) Des Moines, Iowa 50392-0002 This group insurance policy is issued to: RHODE ISLAND JOHN DOE (called the Policyholder in this Group Policy) The Date of Issue is November 1, 2007. In return for the Policyholder's application and payment of all premiums when due, The Principal agrees to provide: MEMBER LIFE INSURANCE MEMBER ACCIDENTAL DEATH AND DISMEMBERMENT INSURANCE DEPENDENT LIFE INSURANCE subject to the terms and conditions described in this Group Policy. GROUP POLICY NO. GL S655 RENEWABLE TERM - NON-PARTICIPATING CONTRACT STATE OF ISSUE: RHODE ISLAND This policy has been updated effective January 1, 2014 GC 6000 TITLE PAGE"


In [459]:
extracted_text_df.Page_Text[7]

'Section A - Member Life Insurance Schedule of Insurance Article 1 Death Benefits Payable Article 2 Beneficiary Article 3 Facility of Payment Article 4 Settlement of Proceeds Article 5 Member Life Insurance - Coverage During Disability Article 6 Accelerated Benefits Article 7 Section B - Member Accidental Death and Dismemberment Insurance Schedule of Insurance Article 1 Benefit Qualification Article 2 Benefits Payable Article 3 Seat Belt Benefit Article 4 Loss of Use or Paralysis Benefit Article 5 Loss of Speech and/or Hearing Benefit Article 6 Repatriation Benefit Article 7 Educational Benefit Article 8 Limitations Article 9 Section C - Dependent Life Insurance Schedule of Insurance Article 1 Death Benefits Payable Article 2 Beneficiary Article 3 Section D - Claim Procedures Notice of Claim Article 1 Claim Forms Article 2 Proof of Loss Article 3 Payment, Denial and Review Article 4 Medical Examinations Article 5 Autopsy Article 6 Legal Action Article 7 Time Limits Article 8 This polic

In [460]:
extracted_text_df['Text_Length'] = extracted_text_df['Page_Text'].apply(lambda x: len(x.split(' ')))

In [461]:
extracted_text_df['Text_Length']

0      30
1       5
2     230
3       5
4     110
5     153
6     176
7     171
8     387
9     251
10    299
11    352
12    260
13    316
14     36
15    325
16    280
17    291
18    150
19    321
20    370
21    222
22    345
23    113
24     79
25    250
26     87
27    367
28    408
29    462
30    449
31    429
32    460
33     94
34    244
35    333
36    124
37    317
38    206
39    322
40    253
41    376
42    392
43    359
44    179
45    289
46    391
47    420
48    380
49    414
50    273
51    215
52    287
53    368
54    327
55    307
56    321
57    214
58    240
59    285
60    418
61    322
62      5
63      8
Name: Text_Length, dtype: int64

In [462]:
extracted_text_df = extracted_text_df.loc[extracted_text_df['Text_Length'] >= 10]
extracted_text_df.head()

Unnamed: 0,Page No.,heading,Page_Text,Text_Length
0,Page 1,DOROTHEA GLAUSE S655,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/01/2014 711 HIGH STREET GEORGE RI 02903 GROUP POLICY FOR: RHODE ISLAND JOHN DOE ALL MEMBERS Group Member Life Insurance Print Date: 07/16/2014,30
2,Page 3,POLICY RIDER,"POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial Services Hotline and Grief Support Services or any other value added service for the employees of that employer group. In addition, The Principal may arrange for third party service providers (i.e., optometrists, health clubs), to provide discounted goods and services to those employer groups who apply for coverage with The Principal or who become insureds/enrollees of The Principal. While The Principal has arranged these goods, services and/or ...",230
4,Page 5,PRINCIPAL LIFE INSURANCE COMPANY,"PRINCIPAL LIFE INSURANCE COMPANY (called The Principal in this Group Policy) Des Moines, Iowa 50392-0002 This group insurance policy is issued to: RHODE ISLAND JOHN DOE (called the Policyholder in this Group Policy) The Date of Issue is November 1, 2007. In return for the Policyholder's application and payment of all premiums when due, The Principal agrees to provide: MEMBER LIFE INSURANCE MEMBER ACCIDENTAL DEATH AND DISMEMBERMENT INSURANCE DEPENDENT LIFE INSURANCE subject to the terms and conditions described in this Group Policy. GROUP POLICY NO. GL S655 RENEWABLE TERM - NON-PARTICIPATING CONTRACT STATE OF ISSUE: RHODE ISLAND This policy has been updated effective January 1, 2014 GC 6000 TITLE PAGE",110
5,Page 6,TABLE OF CONTENTS,TABLE OF CONTENTS PART I - DEFINITIONS PART II - POLICY ADMINISTRATION Section A – Contract Entire Contract Article 1 Policy Changes Article 2 Policyholder Eligibility Requirements Article 3 Policy Incontestability Article 4 Individual Incontestability Article 5 Information to be Furnished Article 6 Certificates Article 7 Assignments Article 8 Dependent Rights Article 9 Policy Interpretation Article 10 Electronic Transactions Article 11 Section B – Premium Payment Responsibility; Due Dates; Grace Period Article 1 Premium Rates Article 2 Premium Rate Changes Article 3 Premium Amount Article 4 Contributions from Members Article 5 Section C - Policy Termination Failure to Pay Premium Article 1 Termination Rights of the Policyholder Article 2 Termination Rights of The Principal Article 3 P...,153
6,Page 7,Section A – Eligibility,Section A – Eligibility Member Life Insurance Article 1 Member Accidental Death and Dismemberment Insurance Article 2 Dependent Life Insurance Article 3 Section B - Effective Dates Member Life Insurance Article 1 Member Accidental Death and Dismemberment Insurance Article 2 Dependent Life Insurance Article 3 Section C - Individual Terminations Member Life Insurance Article 1 Member Accidental Death and Dismemberment Insurance Article 2 Dependent Life Insurance Article 3 Termination for Fraud Article 4 Coverage While Outside of the United States Article 5 Section D - Continuation Member Life Insurance Article 1 Dependent Insurance - Developmentally Disabled or Physically Handicapped Children Article 2 Section E - Reinstatement Reinstatement Article 1 Federal Required Family and Medical ...,176


###### Store the metadata for each page in a separate column



In [463]:
extracted_text_df['Metadata'] = extracted_text_df.apply(lambda x: {'Section': x['heading'][:20], 'Page_No.': x['Page No.']}, axis=1)

#### Chunking
overlap chunking with chunk_size 300 and overlap 100 words


In [467]:
def chunk_text(text, chunk_size=300, overlap_size=100):
    """
    Splits the text into chunks with overlap.

    Parameters:
    - text: The text to be split.
    - chunk_size: The size of each chunk (number of words).
    - overlap_size: The number of words that should overlap between consecutive chunks.

    Returns:
    - chunks: A list of text chunks.
    """
    words = text.split()
    chunks = []
    
    for i in range(0, len(words), chunk_size - overlap_size):
        chunk = " ".join(words[i:i + chunk_size])
        chunks.append(chunk)
    
    return chunks


In [468]:
# Assuming extracted_text_df is your DataFrame containing the extracted text
chunk_size = 300  # Number of words in each chunk
overlap_size = 100  # Number of overlapping words between chunks

# Create a new column in the DataFrame to store the chunks
extracted_text_df['Chunks'] = extracted_text_df['Page_Text'].apply(lambda x: chunk_text(x, chunk_size, overlap_size))

# Flatten the DataFrame to have one row per chunk
chunked_df = extracted_text_df.explode('Chunks').reset_index(drop=True)

# Add an identifier to each chunk to keep track of the page and chunk number
chunked_df['Chunk_ID'] = chunked_df.index + 1


In [469]:
chunked_df.head()


Unnamed: 0,Page No.,heading,Page_Text,Text_Length,Metadata,Chunks,Chunk_ID
0,Page 1,DOROTHEA GLAUSE S655,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/01/2014 711 HIGH STREET GEORGE RI 02903 GROUP POLICY FOR: RHODE ISLAND JOHN DOE ALL MEMBERS Group Member Life Insurance Print Date: 07/16/2014,30,"{'Section': 'DOROTHEA GLAUSE S655', 'Page_No.': 'Page 1'}",DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/01/2014 711 HIGH STREET GEORGE RI 02903 GROUP POLICY FOR: RHODE ISLAND JOHN DOE ALL MEMBERS Group Member Life Insurance Print Date: 07/16/2014,1
1,Page 3,POLICY RIDER,"POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial Services Hotline and Grief Support Services or any other value added service for the employees of that employer group. In addition, The Principal may arrange for third party service providers (i.e., optometrists, health clubs), to provide discounted goods and services to those employer groups who apply for coverage with The Principal or who become insureds/enrollees of The Principal. While The Principal has arranged these goods, services and/or ...",230,"{'Section': 'POLICY RIDER', 'Page_No.': 'Page 3'}","POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial Services Hotline and Grief Support Services or any other value added service for the employees of that employer group. In addition, The Principal may arrange for third party service providers (i.e., optometrists, health clubs), to provide discounted goods and services to those employer groups who apply for coverage with The Principal or who become insureds/enrollees of The Principal. While The Principal has arranged these goods, services and/or ...",2
2,Page 3,POLICY RIDER,"POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial Services Hotline and Grief Support Services or any other value added service for the employees of that employer group. In addition, The Principal may arrange for third party service providers (i.e., optometrists, health clubs), to provide discounted goods and services to those employer groups who apply for coverage with The Principal or who become insureds/enrollees of The Principal. While The Principal has arranged these goods, services and/or ...",230,"{'Section': 'POLICY RIDER', 'Page_No.': 'Page 3'}","AS SPECIFICALLY DESCRIBED IN THIS RIDER, ALL OTHER BENEFITS AND PROVISIONS WILL BE AS DESCRIBED IN THE GROUP POLICY. PRINCIPAL LIFE INSURANCE COMPANY DES MOINES, IOWA 50392-0001 GC 806 VAL",3
3,Page 5,PRINCIPAL LIFE INSURANCE COMPANY,"PRINCIPAL LIFE INSURANCE COMPANY (called The Principal in this Group Policy) Des Moines, Iowa 50392-0002 This group insurance policy is issued to: RHODE ISLAND JOHN DOE (called the Policyholder in this Group Policy) The Date of Issue is November 1, 2007. In return for the Policyholder's application and payment of all premiums when due, The Principal agrees to provide: MEMBER LIFE INSURANCE MEMBER ACCIDENTAL DEATH AND DISMEMBERMENT INSURANCE DEPENDENT LIFE INSURANCE subject to the terms and conditions described in this Group Policy. GROUP POLICY NO. GL S655 RENEWABLE TERM - NON-PARTICIPATING CONTRACT STATE OF ISSUE: RHODE ISLAND This policy has been updated effective January 1, 2014 GC 6000 TITLE PAGE",110,"{'Section': 'PRINCIPAL LIFE INSUR', 'Page_No.': 'Page 5'}","PRINCIPAL LIFE INSURANCE COMPANY (called The Principal in this Group Policy) Des Moines, Iowa 50392-0002 This group insurance policy is issued to: RHODE ISLAND JOHN DOE (called the Policyholder in this Group Policy) The Date of Issue is November 1, 2007. In return for the Policyholder's application and payment of all premiums when due, The Principal agrees to provide: MEMBER LIFE INSURANCE MEMBER ACCIDENTAL DEATH AND DISMEMBERMENT INSURANCE DEPENDENT LIFE INSURANCE subject to the terms and conditions described in this Group Policy. GROUP POLICY NO. GL S655 RENEWABLE TERM - NON-PARTICIPATING CONTRACT STATE OF ISSUE: RHODE ISLAND This policy has been updated effective January 1, 2014 GC 6000 TITLE PAGE",4
4,Page 6,TABLE OF CONTENTS,TABLE OF CONTENTS PART I - DEFINITIONS PART II - POLICY ADMINISTRATION Section A – Contract Entire Contract Article 1 Policy Changes Article 2 Policyholder Eligibility Requirements Article 3 Policy Incontestability Article 4 Individual Incontestability Article 5 Information to be Furnished Article 6 Certificates Article 7 Assignments Article 8 Dependent Rights Article 9 Policy Interpretation Article 10 Electronic Transactions Article 11 Section B – Premium Payment Responsibility; Due Dates; Grace Period Article 1 Premium Rates Article 2 Premium Rate Changes Article 3 Premium Amount Article 4 Contributions from Members Article 5 Section C - Policy Termination Failure to Pay Premium Article 1 Termination Rights of the Policyholder Article 2 Termination Rights of The Principal Article 3 P...,153,"{'Section': 'TABLE OF CONTENTS', 'Page_No.': 'Page 6'}",TABLE OF CONTENTS PART I - DEFINITIONS PART II - POLICY ADMINISTRATION Section A – Contract Entire Contract Article 1 Policy Changes Article 2 Policyholder Eligibility Requirements Article 3 Policy Incontestability Article 4 Individual Incontestability Article 5 Information to be Furnished Article 6 Certificates Article 7 Assignments Article 8 Dependent Rights Article 9 Policy Interpretation Article 10 Electronic Transactions Article 11 Section B – Premium Payment Responsibility; Due Dates; Grace Period Article 1 Premium Rates Article 2 Premium Rate Changes Article 3 Premium Amount Article 4 Contributions from Members Article 5 Section C - Policy Termination Failure to Pay Premium Article 1 Termination Rights of the Policyholder Article 2 Termination Rights of The Principal Article 3 P...,5


Updating metadata with including chunk id

In [473]:
chunked_df['Metadata'] = chunked_df.apply(lambda x: {'Section': x['heading'][:20], 'Page_No.': x['Page No.'],'Chunk_id':x['Chunk_ID']}, axis=1)

In [474]:
chunked_df.head()

Unnamed: 0,Page No.,heading,Page_Text,Text_Length,Metadata,Chunks,Chunk_ID
0,Page 1,DOROTHEA GLAUSE S655,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/01/2014 711 HIGH STREET GEORGE RI 02903 GROUP POLICY FOR: RHODE ISLAND JOHN DOE ALL MEMBERS Group Member Life Insurance Print Date: 07/16/2014,30,"{'Section': 'DOROTHEA GLAUSE S655', 'Page_No.': 'Page 1', 'Chunk_id': 1}",DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/01/2014 711 HIGH STREET GEORGE RI 02903 GROUP POLICY FOR: RHODE ISLAND JOHN DOE ALL MEMBERS Group Member Life Insurance Print Date: 07/16/2014,1
1,Page 3,POLICY RIDER,"POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial Services Hotline and Grief Support Services or any other value added service for the employees of that employer group. In addition, The Principal may arrange for third party service providers (i.e., optometrists, health clubs), to provide discounted goods and services to those employer groups who apply for coverage with The Principal or who become insureds/enrollees of The Principal. While The Principal has arranged these goods, services and/or ...",230,"{'Section': 'POLICY RIDER', 'Page_No.': 'Page 3', 'Chunk_id': 2}","POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial Services Hotline and Grief Support Services or any other value added service for the employees of that employer group. In addition, The Principal may arrange for third party service providers (i.e., optometrists, health clubs), to provide discounted goods and services to those employer groups who apply for coverage with The Principal or who become insureds/enrollees of The Principal. While The Principal has arranged these goods, services and/or ...",2
2,Page 3,POLICY RIDER,"POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effective on the later of the Date of Issue of this Group Policy or March 1, 2005, the following will apply to your Policy: From time to time The Principal may offer or provide certain employer groups who apply for coverage with The Principal a Financial Services Hotline and Grief Support Services or any other value added service for the employees of that employer group. In addition, The Principal may arrange for third party service providers (i.e., optometrists, health clubs), to provide discounted goods and services to those employer groups who apply for coverage with The Principal or who become insureds/enrollees of The Principal. While The Principal has arranged these goods, services and/or ...",230,"{'Section': 'POLICY RIDER', 'Page_No.': 'Page 3', 'Chunk_id': 3}","AS SPECIFICALLY DESCRIBED IN THIS RIDER, ALL OTHER BENEFITS AND PROVISIONS WILL BE AS DESCRIBED IN THE GROUP POLICY. PRINCIPAL LIFE INSURANCE COMPANY DES MOINES, IOWA 50392-0001 GC 806 VAL",3
3,Page 5,PRINCIPAL LIFE INSURANCE COMPANY,"PRINCIPAL LIFE INSURANCE COMPANY (called The Principal in this Group Policy) Des Moines, Iowa 50392-0002 This group insurance policy is issued to: RHODE ISLAND JOHN DOE (called the Policyholder in this Group Policy) The Date of Issue is November 1, 2007. In return for the Policyholder's application and payment of all premiums when due, The Principal agrees to provide: MEMBER LIFE INSURANCE MEMBER ACCIDENTAL DEATH AND DISMEMBERMENT INSURANCE DEPENDENT LIFE INSURANCE subject to the terms and conditions described in this Group Policy. GROUP POLICY NO. GL S655 RENEWABLE TERM - NON-PARTICIPATING CONTRACT STATE OF ISSUE: RHODE ISLAND This policy has been updated effective January 1, 2014 GC 6000 TITLE PAGE",110,"{'Section': 'PRINCIPAL LIFE INSUR', 'Page_No.': 'Page 5', 'Chunk_id': 4}","PRINCIPAL LIFE INSURANCE COMPANY (called The Principal in this Group Policy) Des Moines, Iowa 50392-0002 This group insurance policy is issued to: RHODE ISLAND JOHN DOE (called the Policyholder in this Group Policy) The Date of Issue is November 1, 2007. In return for the Policyholder's application and payment of all premiums when due, The Principal agrees to provide: MEMBER LIFE INSURANCE MEMBER ACCIDENTAL DEATH AND DISMEMBERMENT INSURANCE DEPENDENT LIFE INSURANCE subject to the terms and conditions described in this Group Policy. GROUP POLICY NO. GL S655 RENEWABLE TERM - NON-PARTICIPATING CONTRACT STATE OF ISSUE: RHODE ISLAND This policy has been updated effective January 1, 2014 GC 6000 TITLE PAGE",4
4,Page 6,TABLE OF CONTENTS,TABLE OF CONTENTS PART I - DEFINITIONS PART II - POLICY ADMINISTRATION Section A – Contract Entire Contract Article 1 Policy Changes Article 2 Policyholder Eligibility Requirements Article 3 Policy Incontestability Article 4 Individual Incontestability Article 5 Information to be Furnished Article 6 Certificates Article 7 Assignments Article 8 Dependent Rights Article 9 Policy Interpretation Article 10 Electronic Transactions Article 11 Section B – Premium Payment Responsibility; Due Dates; Grace Period Article 1 Premium Rates Article 2 Premium Rate Changes Article 3 Premium Amount Article 4 Contributions from Members Article 5 Section C - Policy Termination Failure to Pay Premium Article 1 Termination Rights of the Policyholder Article 2 Termination Rights of The Principal Article 3 P...,153,"{'Section': 'TABLE OF CONTENTS', 'Page_No.': 'Page 6', 'Chunk_id': 5}",TABLE OF CONTENTS PART I - DEFINITIONS PART II - POLICY ADMINISTRATION Section A – Contract Entire Contract Article 1 Policy Changes Article 2 Policyholder Eligibility Requirements Article 3 Policy Incontestability Article 4 Individual Incontestability Article 5 Information to be Furnished Article 6 Certificates Article 7 Assignments Article 8 Dependent Rights Article 9 Policy Interpretation Article 10 Electronic Transactions Article 11 Section B – Premium Payment Responsibility; Due Dates; Grace Period Article 1 Premium Rates Article 2 Premium Rate Changes Article 3 Premium Amount Article 4 Contributions from Members Article 5 Section C - Policy Termination Failure to Pay Premium Article 1 Termination Rights of the Policyholder Article 2 Termination Rights of The Principal Article 3 P...,5


## 3. <font color = purple> Generate and Store Embeddings using OpenAI and ChromaDB

In this section, we will embed the pages in the dataframe through OpenAI's `text-embedding-ada-002` model, and store them in a ChromaDB collection.

In [476]:
# Set the API key
filepath = r"D:\GenAI\Keys\\"

with open(filepath + "OpenAI_API_Key.txt", "r") as f:
  openai.api_key = ' '.join(f.readlines())

In [477]:
# Import the OpenAI Embedding Function into chroma

from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

In [478]:
# Define the path where chroma collections will be stored

chroma_data_path = r'D:\GenAI\HelpMateAI_MM\ChromaDB_Database'

In [479]:
import chromadb

In [480]:
# Call PersistentClient()

client = chromadb.PersistentClient(path=chroma_data_path)

In [481]:
# Set up the embedding function using the OpenAI embedding model

model = "text-embedding-ada-002"
embedding_function = OpenAIEmbeddingFunction(api_key=openai.api_key, model_name=model)

In [482]:
# Initialise a collection in chroma and pass the embedding_function to it so that it used OpenAI embeddings to embed the documents

insurance_collection = client.get_or_create_collection(name='RAG_on_InsurancePolicy', embedding_function=embedding_function)

In [483]:
# Convert the page text and metadata from your dataframe to lists to be able to pass it to chroma

documents_list = chunked_df["Chunks"].tolist()
metadata_list = chunked_df['Metadata'].tolist()

In [484]:
# Add the documents and metadata to the collection alongwith generic integer IDs. You can also feed the metadata information as IDs by combining the policy name and page no.

insurance_collection.add(
    documents= documents_list,
    ids = [str(i) for i in range(0, len(documents_list))],
    metadatas = metadata_list
)

In [485]:
# Let's take a look at the first few entries in the collection

insurance_collection.get(
    ids = ['0','1','2'],
    include = ['embeddings', 'documents', 'metadatas']
)

{'ids': ['0', '1', '2'],
 'embeddings': [[-0.022469419986009598,
   0.01871146820485592,
   -0.02729734405875206,
   -0.031290166079998016,
   -0.0036731369327753782,
   0.027923669666051865,
   0.0008538575493730605,
   0.023656828328967094,
   -0.016571523621678352,
   0.004253792576491833,
   -0.005251998547464609,
   0.004058065824210644,
   -0.006273039150983095,
   0.023252326995134354,
   -0.011299951933324337,
   0.030977005138993263,
   0.002877181861549616,
   -0.024726800620555878,
   0.03055945597589016,
   0.007920404896140099,
   0.006413309834897518,
   0.01886804960668087,
   0.00953188817948103,
   0.012689611874520779,
   -0.012356876395642757,
   -0.009440548717975616,
   0.017889415845274925,
   -0.021686512976884842,
   0.023526344448328018,
   0.0030223457142710686,
   0.013283316045999527,
   -0.0145490150898695,
   -0.03361279144883156,
   -0.041180890053510666,
   -0.00845539104193449,
   0.013531235978007317,
   -0.0026716687716543674,
   -0.003868863452225923

In [486]:
cache_collection = client.get_or_create_collection(name='Insurance_Cache', embedding_function=embedding_function)

In [487]:
cache_collection.peek()

{'ids': [],
 'embeddings': [],
 'metadatas': [],
 'documents': [],
 'uris': None,
 'data': None,
 'included': ['embeddings', 'metadatas', 'documents']}

## 4. <font color = purple> Semantic Search with Cache

In this section, we will perform a semantic search of a query in the collections embeddings to get several top semantically similar results.

In [489]:
query_1=input()

What is the procedure for filing a claim in the event of the policyholder's death?


In [490]:
# Searh the Cache collection first
# Query the collection against the user query and return the top 20 results
cache_results= cache_collection.query(
         query_texts=query_1,
          n_results=1
     )


In [491]:
print(cache_results)

{'ids': [[]], 'distances': [[]], 'metadatas': [[]], 'embeddings': None, 'documents': [[]], 'uris': None, 'data': None, 'included': ['metadatas', 'documents', 'distances']}


In [492]:
results = insurance_collection.query(
query_texts=query_1,
n_results=10
)
# results.items()

In [493]:
# Implementing Cache in Semantic Search

# Set a threshold for cache search
threshold = 0.2

ids_1 = []
documents_1 = []
distances_1 = []
metadatas_1 = []
results_df_1 = pd.DataFrame()


# If the distance is greater than the threshold, then return the results from the main collection.

if cache_results['distances'][0] == [] or cache_results['distances'][0][0] > threshold:
      # Query the collection against the user query and return the top 10 results
      results = insurance_collection.query(
      query_texts=query_1,
      n_results=10
      )

      # Store the query in cache_collection as document w.r.t to ChromaDB so that it can be embedded and searched against later
      # Store retrieved text, ids, distances and metadatas in cache_collection as metadatas, so that they can be fetched easily if a query indeed matches to a query in cache
      Keys = []
      Values = []

      for key, val in results.items():
        if val is None:
          continue
        for i in range(9):
          Keys.append(str(key)+str(i))
          Values.append(str(val[0][i]))


      cache_collection.add(
          documents= [query_1],
          ids = [query_1],  # Or if you want to assign integers as IDs 0,1,2,.., then you can use "len(cache_results['documents'])" as will return the no. of queries currently in the cache and assign the next digit to the new query."
          metadatas = dict(zip(Keys, Values))
      )

      print("Not found in cache. Found in main collection.")

      result_dict = {'Metadatas_1': results['metadatas'][0], 'Documents_1': results['documents'][0], 'Distances_1': results['distances'][0], "IDs":results["ids"][0]}
      results_df_1 = pd.DataFrame.from_dict(result_dict)
      results_df_1


# If the distance is, however, less than the threshold, you can return the results from cache

elif cache_results['distances'][0][0] <= threshold:
      cache_result_dict = cache_results['metadatas'][0][0]

      # Loop through each inner list and then through the dictionary
      for key, value in cache_result_dict.items():
          if 'ids' in key:
              ids.append(value)
          elif 'documents' in key:
              documents.append(value)
          elif 'distances' in key:
              distances.append(value)
          elif 'metadatas' in key:
              metadatas.append(value)

      print("Found in cache!")

      # Create a DataFrame
      results_df_1 = pd.DataFrame({
        'IDs_1': ids,
        'Documents_1': documents,
        'Distances_1': distances,
        'Metadatas_1': metadatas
      })

Not found in cache. Found in main collection.


In [495]:
results_df_1.head()

Unnamed: 0,Metadatas_1,Documents_1,Distances_1,IDs
0,"{'Chunk_id': 115, 'Page_No.': 'Page 62', 'Section': 'A claimant may reque'}","6 - Autopsy If payment for loss of life is claimed, The Principal may require an autopsy. The Principal will pay for any such autopsy. Article 7 - Legal Action Legal action to recover benefits under this Group Policy may not be started earlier than 90 days after required proof of loss has been filed and before the appeal procedures have been exhausted. Further, no legal action may be started later than three years after that proof is required to be filed. Article 8 - Time Limits Any time limits in this section will be adjusted as required by law. This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6018 Section D - Claim Procedures, Page 2",0.288088,114
1,"{'Chunk_id': 109, 'Page_No.': 'Page 60', 'Section': 'I f a Dependent who '}","I f a Dependent who was insured dies during the 31-day individual purchase period described in PART III, Section F, Article 2, The Principal will pay the individual policy amount, if any, the Dependent had the right to purchase. Unless a Beneficiary has been designated, payment will be to the Member if he or she survives the Dependent. If the Member does not survive the Dependent and a beneficiary for Dependent Life has not been named, payment will be to the person named as beneficiary for Member Life Insurance. However, if the Member is suspected or charged with the Dependent's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If the Member is found guilty of the Dependent's death, he or she may be disqualified...",0.31418,108
2,"{'Chunk_id': 81, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}","M ember's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If a Member who was insured dies within the 31-day individual purchase period described in PART III, Section F, The Principal will pay his or her beneficiary the individual policy amount, if any, the Member had the right to purchase. No payment will be made before The Principal receives Written proof of the Member's death. Article 3 - Beneficiary A beneficiary should be named at the time a Member applies or enrolls under this Group Policy. A Member may name or later change a named beneficiary by sending a Written request to The Principal. A change will not be effective until recorded by The Principal. Once recorded, the change will apply as of the date ...",0.31451,80
3,"{'Chunk_id': 80, 'Page_No.': 'Page 46', 'Section': 'PART IV - BENEFITS'}","2 - Death Benefits Payable If a Member dies while insured for Member Life Insurance under this Group Policy, The Principal will pay his or her beneficiary the Scheduled Benefit (or approved amount, if applicable) in force on the date of death, less any Accelerated Benefit payment as described in PART IV, Section A, Article 7. However, if a beneficiary is suspected or charged with the This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6013 Section A - Member Life Insurance, Page 1",0.324766,79
4,"{'Chunk_id': 111, 'Page_No.': 'Page 61', 'Section': 'Section D - Claim Pr'}","Section D - Claim Procedures Article 1 - Notice of Claim Written notice must be sent to The Principal by or for a Member or Dependent who wishes to file claim for benefits under this Group Policy. This notice must be sent within 20 days after the date of the loss for which claim is being made. Failure to give notice within the time specified will not invalidate or reduce any claim if notice is given as soon as reasonably possible. Article 2 - Claim Forms The Principal, when it receives notice of claim, will provide appropriate claim forms for filing proof of loss. If the forms are not provided within 15 days after The Principal receives notice, the person will be considered to have complied with the requirements of this Group Policy upon submitting, within the time specified below for ...",0.325197,110


#### for Query 2

In [496]:
query_2=input()

What are the terms and conditions for terminating the policy?


In [497]:
# Searh the Cache collection first
# Query the collection against the user query and return the top 20 results
cache_results= cache_collection.query(
         query_texts=query_2,
          n_results=1
     )


In [498]:
cache_results

{'ids': [["What is the procedure for filing a claim in the event of the policyholder's death?"]],
 'distances': [[0.3154336615033297]],
 'metadatas': [[{'distances0': '0.288087572114366',
    'distances1': '0.3141802810712838',
    'distances2': '0.3145095109939575',
    'distances3': '0.3247655928134918',
    'distances4': '0.32519695667142534',
    'distances5': '0.32608700828783643',
    'distances6': '0.33153223991394043',
    'distances7': '0.33235064148902893',
    'distances8': '0.3377012014389038',
    'documents0': '6 - Autopsy If payment for loss of life is claimed, The Principal may require an autopsy. The Principal will pay for any such autopsy. Article 7 - Legal Action Legal action to recover benefits under this Group Policy may not be started earlier than 90 days after required proof of loss has been filed and before the appeal procedures have been exhausted. Further, no legal action may be started later than three years after that proof is required to be filed. Article 8

In [499]:
results = insurance_collection.query(
query_texts=query_2,
n_results=10
)
# results.items()

In [500]:
# Implementing Cache in Semantic Search

# Set a threshold for cache search
threshold = 0.2

ids_2 = []
documents_2 = []
distances_2 = []
metadatas_2 = []
results_df_2 = pd.DataFrame()


# If the distance is greater than the threshold, then return the results from the main collection.

if cache_results['distances'][0] == [] or cache_results['distances'][0][0] > threshold:
      # Query the collection against the user query and return the top 10 results
      results = insurance_collection.query(
      query_texts=query_2,
      n_results=10
      )

      # Store the query in cache_collection as document w.r.t to ChromaDB so that it can be embedded and searched against later
      # Store retrieved text, ids, distances and metadatas in cache_collection as metadatas, so that they can be fetched easily if a query indeed matches to a query in cache
      Keys = []
      Values = []

      for key, val in results.items():
        if val is None:
          continue
        for i in range(9):
          Keys.append(str(key)+str(i))
          Values.append(str(val[0][i]))


      cache_collection.add(
          documents= [query_2],
          ids = [query_2],  # Or if you want to assign integers as IDs 0,1,2,.., then you can use "len(cache_results['documents'])" as will return the no. of queries currently in the cache and assign the next digit to the new query."
          metadatas = dict(zip(Keys, Values))
      )

      print("Not found in cache. Found in main collection.")

      result_dict_2 = {'Metadatas_2': results['metadatas'][0], 'Documents_2': results['documents'][0], 'Distances_2': results['distances'][0], "IDs":results["ids"][0]}
      results_df_2 = pd.DataFrame.from_dict(result_dict_2)
      results_df_1


# If the distance is, however, less than the threshold, you can return the results from cache

elif cache_results['distances'][0][0] <= threshold:
      cache_result_dict = cache_results['metadatas'][0][0]

      # Loop through each inner list and then through the dictionary
      for key, value in cache_result_dict.items():
          if 'ids' in key:
              ids.append(value)
          elif 'documents' in key:
              documents.append(value)
          elif 'distances' in key:
              distances.append(value)
          elif 'metadatas' in key:
              metadatas.append(value)

      print("Found in cache!")

      # Create a DataFrame
      results_df_2 = pd.DataFrame({
        'IDs_1': ids,
        'Documents_2': documents,
        'Distances_2': distances,
        'Metadatas_2': metadatas
      })

Not found in cache. Found in main collection.


In [501]:
results_df_2.head()

Unnamed: 0,Metadatas_2,Documents_2,Distances_2,IDs
0,"{'Chunk_id': 36, 'Page_No.': 'Page 24', 'Section': 'T he Principal may t'}","T he Principal may terminate the Policyholder's coverage on any premium due date if the Policyholder relocates to a state where this Group Policy is not marketed, by giving the Policyholder 31 days advanced notice in Writing. Article 4 - Policyholder Responsibility to Members If this Group Policy terminates for any reason, the Policyholder must: a. notify each Member of the effective date of the termination; and b. refund or otherwise account to each Member all contributions received or withheld from Members for premiums not actually paid to The Principal. This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6005 Section C - Policy Termination, Page 2",0.256959,35
1,"{'Chunk_id': 34, 'Page_No.': 'Page 23', 'Section': 'Section C - Policy T'}",Section C - Policy Termination Article 1 - Failure to Pay Premium This Group Policy will terminate at the end of the Grace Period if total premium due has not been received by The Principal before the end of the Grace Period. Failure by the Policyholder to pay the premium within the Grace Period will be deemed notice by the Policyholder to The Principal to discontinue this Group Policy at the end of the Grace Period. Article 2 - Termination Rights of the Policyholder The Policyholder may terminate this Group Policy effective on the day before any premium due date by giving Written notice to The Principal prior to that premium due date. The Policyholder's issuance of a stop-payment order for any amounts used to pay premiums for the Policyholder's coverage will be considered Written noti...,0.282504,33
2,"{'Chunk_id': 59, 'Page_No.': 'Page 35', 'Section': 'Section C - Individu'}","Section C - Individual Terminations Article 1 - Member Life Insurance A Member's insurance under this Group Policy will terminate on the earliest of: a. the date this Group Policy is terminated; or b. the date the last premium is paid for the Member's insurance; or c. any date desired, if requested by the Member before that date; or d. the date the Member ceases to be a Member as defined in PART I; or e. the date the Member ceases to be in a class for which Member Life Insurance is provided; or f. the date the Member retires; or g. the date the Member ceases Active Work. Article 2 - Member Accidental Death and Dismemberment Insurance A Member's Accidental Death and Dismemberment Insurance under this Group Policy will terminate on the earliest of: a. the date his or her Member Life Insu...",0.301435,58
3,"{'Chunk_id': 73, 'Page_No.': 'Page 42', 'Section': 'Section F - Individu'}","Member Life Insurance terminates because this Group Policy terminates or is amended to exclude the Member's insurance class; or (3) the Member's Coverage During Disability as described in PART IV, Section A, ceases because Total Disability ends and he or she does not return to Active Work within 31 days; or (4) the Member's Accelerated Benefits Premium Waiver Period as described in PART IV, Section A, ceases and he or she does not qualify for Coverage During Disability . c. Application/Effective Date Notice of the individual purchase right must be given to the Member by the Policyholder before insurance under this Group Policy terminates, or as soon as reasonably possible thereafter. A Member must apply for individual purchase and the first premium for the individual policy must be pai...",0.301809,72
4,"{'Chunk_id': 74, 'Page_No.': 'Page 43', 'Section': 'Any individual polic'}","Any individual policy issued will then be in force on the 32nd day following such termination date. d. Individual Policy Amount The amount of insurance that may be purchased may vary: (1) If termination is as described in b. (1) above, the maximum amount will be the Member Life Insurance benefit in force on the date of termination or the portion of Member Life Insurance that has terminated, less any individual policy amount purchased earlier under this Article 1, and less any Accelerated Benefit payment as described in PART IV, Section A, Article 7. (2) If termination is as described in b. (2) above, the maximum amount will be the lesser of: - $10,000; or - the Member Life Insurance benefit in force on the date of termination, less any Accelerated Benefit payment as described in PART I...",0.317474,73


###### for query 3

In [538]:
query_3=input()

How is the beneficiary of the policy defined and what are their rights?


In [539]:
# Searh the Cache collection first
# Query the collection against the user query and return the top 20 results
cache_results= cache_collection.query(
         query_texts=query_3,
          n_results=1
     )

In [540]:
cache_results

{'ids': [['What are the terms and conditions for terminating the policy?']],
 'distances': [[0.29038874906412165]],
 'metadatas': [[{'distances0': '0.2569594085216522',
    'distances1': '0.2825036942958832',
    'distances2': '0.3014345169067383',
    'distances3': '0.3018088638782501',
    'distances4': '0.3174736201763153',
    'distances5': '0.3193272352218628',
    'distances6': '0.32072514295578003',
    'distances7': '0.3215514123439789',
    'distances8': '0.3291952311992645',
    'documents0': "T he Principal may terminate the Policyholder's coverage on any premium due date if the Policyholder relocates to a state where this Group Policy is not marketed, by giving the Policyholder 31 days advanced notice in Writing. Article 4 - Policyholder Responsibility to Members If this Group Policy terminates for any reason, the Policyholder must: a. notify each Member of the effective date of the termination; and b. refund or otherwise account to each Member all contributions received or

In [541]:
# Implementing Cache in Semantic Search

# Set a threshold for cache search
threshold = 0.2

ids_3 = []
documents_3 = []
distances_3 = []
metadatas_3 = []
results_df_3 = pd.DataFrame()


# If the distance is greater than the threshold, then return the results from the main collection.

if cache_results['distances'][0] == [] or cache_results['distances'][0][0] > threshold:
      # Query the collection against the user query and return the top 10 results
      results = insurance_collection.query(
      query_texts=query_3,
      n_results=10
      )

      # Store the query in cache_collection as document w.r.t to ChromaDB so that it can be embedded and searched against later
      # Store retrieved text, ids, distances and metadatas in cache_collection as metadatas, so that they can be fetched easily if a query indeed matches to a query in cache
      Keys = []
      Values = []

      for key, val in results.items():
        if val is None:
          continue
        for i in range(9):
          Keys.append(str(key)+str(i))
          Values.append(str(val[0][i]))


      cache_collection.add(
          documents= [query_3],
          ids = [query_3],  # Or if you want to assign integers as IDs 0,1,2,.., then you can use "len(cache_results['documents'])" as will return the no. of queries currently in the cache and assign the next digit to the new query."
          metadatas = dict(zip(Keys, Values))
      )

      print("Not found in cache. Found in main collection.")

      result_dict_3 = {'Metadatas_3': results['metadatas'][0], 'Documents_3': results['documents'][0], 'Distances_3': results['distances'][0], "IDs":results["ids"][0]}
      results_df_3 = pd.DataFrame.from_dict(result_dict_3)
      results_df_3


# If the distance is, however, less than the threshold, you can return the results from cache

elif cache_results['distances'][0][0] <= threshold:
      cache_result_dict = cache_results['metadatas'][0][0]

      # Loop through each inner list and then through the dictionary
      for key, value in cache_result_dict.items():
          if 'ids' in key:
              ids.append(value)
          elif 'documents' in key:
              documents.append(value)
          elif 'distances' in key:
              distances.append(value)
          elif 'metadatas' in key:
              metadatas.append(value)

      print("Found in cache!")

      # Create a DataFrame
      results_df_3 = pd.DataFrame({
        'IDs_1': ids,
        'Documents_3': documents,
        'Distances_3': distances,
        'Metadatas_3': metadatas
      })

Not found in cache. Found in main collection.


In [542]:
results_df_3.head()

Unnamed: 0,Metadatas_3,Documents_3,Distances_3,IDs
0,"{'Chunk_id': 82, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}","reinstated, his or her beneficiary will be as recorded on the date of termination. If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue, the beneficiary named in such replaced insurance and recorded by the Policyholder or The Principal will be the beneficiary under this Group Policy until a new beneficiary is named. Article 4 - Facility of Payment If any of the below occur, benefits will be paid as stated. All such payments will discharge The Principal to the full extent of those payments. a. If a beneficiary is found guilty of the Member's death, such beneficiary may be disqualified from receiving any benefit due. Payment may then be made to any contingent beneficiary or to th...",0.298785,81
1,"{'Chunk_id': 81, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}","M ember's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If a Member who was insured dies within the 31-day individual purchase period described in PART III, Section F, The Principal will pay his or her beneficiary the individual policy amount, if any, the Member had the right to purchase. No payment will be made before The Principal receives Written proof of the Member's death. Article 3 - Beneficiary A beneficiary should be named at the time a Member applies or enrolls under this Group Policy. A Member may name or later change a named beneficiary by sending a Written request to The Principal. A change will not be effective until recorded by The Principal. Once recorded, the change will apply as of the date ...",0.319675,80
2,"{'Chunk_id': 80, 'Page_No.': 'Page 46', 'Section': 'PART IV - BENEFITS'}","2 - Death Benefits Payable If a Member dies while insured for Member Life Insurance under this Group Policy, The Principal will pay his or her beneficiary the Scheduled Benefit (or approved amount, if applicable) in force on the date of death, less any Accelerated Benefit payment as described in PART IV, Section A, Article 7. However, if a beneficiary is suspected or charged with the This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6013 Section A - Member Life Insurance, Page 1",0.337434,79
3,"{'Chunk_id': 26, 'Page_No.': 'Page 18', 'Section': 'c . a copy of the fo'}","insurance and will describe the basic features of the coverage. They will not be considered a part of this Group Policy. Article 8 - Assignments No assignments of Member Life Insurance will be allowed under this Group Policy. Article 9 - Dependent Rights A Dependent will have no rights under this Group Policy except as set forth in PART III, Section F, Article 2. Article 10 - Policy Interpretation This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6003 Section A - Contract, Page 3",0.339352,25
4,"{'Chunk_id': 110, 'Page_No.': 'Page 60', 'Section': 'I f a Dependent who '}","A Member may name or later change a named beneficiary by sending a Written request to The Principal. A change will not be effective until recorded by The Principal. Once recorded, the change will apply as of the date the request was Signed. If The Principal properly pays any benefit before a change request is received, that payment may not be contested. This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6016 Section C - Dependent Life Insurance, Page 2",0.341135,109


## 5. <font color = Purple> Re-Ranking with a Cross Encoder


In [543]:
# Import the CrossEncoder library from sentence_transformers

from sentence_transformers import CrossEncoder, util

In [544]:
# Initialise the cross encoder model

cross_encoder = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2')

In [545]:
# Test the cross encoder model

scores = cross_encoder.predict([['what is the Termination Rights of the Policyholder?', 'The Policyholder may terminate this Group Policy effective on the day before any premium due'],
                                ['what is the Termination Rights of the Policyholder?', 'The premium rates for various age groups are given as follows. Age group (<18 years): Premium rate']])

In [546]:
scores

array([  4.4987617, -11.4560795], dtype=float32)

##### For query 1

In [547]:
# Input (query, response) pairs for each of the top  responses received from the semantic search to the cross encoder
# Generate the cross_encoder scores for these pairs

cross_inputs_1 = [[query_1, response] for response in results_df_1['Documents_1']]
cross_rerank_scores_1 = cross_encoder.predict(cross_inputs_1)

In [548]:
cross_rerank_scores_1

array([-3.043668 , -4.8800874, -2.185757 , -6.555569 , -0.500461 ,
       -4.260059 , -8.632756 , -8.198573 , -6.71474  , -3.4110293],
      dtype=float32)

In [549]:
# Store the rerank_scores in results_df

results_df_1['Reranked_scores'] = cross_rerank_scores_1

In [550]:
results_df_1.head()

Unnamed: 0,Metadatas_1,Documents_1,Distances_1,IDs,Reranked_scores
0,"{'Chunk_id': 115, 'Page_No.': 'Page 62', 'Section': 'A claimant may reque'}","6 - Autopsy If payment for loss of life is claimed, The Principal may require an autopsy. The Principal will pay for any such autopsy. Article 7 - Legal Action Legal action to recover benefits under this Group Policy may not be started earlier than 90 days after required proof of loss has been filed and before the appeal procedures have been exhausted. Further, no legal action may be started later than three years after that proof is required to be filed. Article 8 - Time Limits Any time limits in this section will be adjusted as required by law. This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6018 Section D - Claim Procedures, Page 2",0.288088,114,-3.043668
1,"{'Chunk_id': 109, 'Page_No.': 'Page 60', 'Section': 'I f a Dependent who '}","I f a Dependent who was insured dies during the 31-day individual purchase period described in PART III, Section F, Article 2, The Principal will pay the individual policy amount, if any, the Dependent had the right to purchase. Unless a Beneficiary has been designated, payment will be to the Member if he or she survives the Dependent. If the Member does not survive the Dependent and a beneficiary for Dependent Life has not been named, payment will be to the person named as beneficiary for Member Life Insurance. However, if the Member is suspected or charged with the Dependent's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If the Member is found guilty of the Dependent's death, he or she may be disqualified...",0.31418,108,-4.880087
2,"{'Chunk_id': 81, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}","M ember's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If a Member who was insured dies within the 31-day individual purchase period described in PART III, Section F, The Principal will pay his or her beneficiary the individual policy amount, if any, the Member had the right to purchase. No payment will be made before The Principal receives Written proof of the Member's death. Article 3 - Beneficiary A beneficiary should be named at the time a Member applies or enrolls under this Group Policy. A Member may name or later change a named beneficiary by sending a Written request to The Principal. A change will not be effective until recorded by The Principal. Once recorded, the change will apply as of the date ...",0.31451,80,-2.185757
3,"{'Chunk_id': 80, 'Page_No.': 'Page 46', 'Section': 'PART IV - BENEFITS'}","2 - Death Benefits Payable If a Member dies while insured for Member Life Insurance under this Group Policy, The Principal will pay his or her beneficiary the Scheduled Benefit (or approved amount, if applicable) in force on the date of death, less any Accelerated Benefit payment as described in PART IV, Section A, Article 7. However, if a beneficiary is suspected or charged with the This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6013 Section A - Member Life Insurance, Page 1",0.324766,79,-6.555569
4,"{'Chunk_id': 111, 'Page_No.': 'Page 61', 'Section': 'Section D - Claim Pr'}","Section D - Claim Procedures Article 1 - Notice of Claim Written notice must be sent to The Principal by or for a Member or Dependent who wishes to file claim for benefits under this Group Policy. This notice must be sent within 20 days after the date of the loss for which claim is being made. Failure to give notice within the time specified will not invalidate or reduce any claim if notice is given as soon as reasonably possible. Article 2 - Claim Forms The Principal, when it receives notice of claim, will provide appropriate claim forms for filing proof of loss. If the forms are not provided within 15 days after The Principal receives notice, the person will be considered to have complied with the requirements of this Group Policy upon submitting, within the time specified below for ...",0.325197,110,-0.500461


In [551]:
import pandas as pd
# change the display properties of pandas to max
pd.set_option('display.max_colwidth', 800)
pd.set_option('display.max_columns', 800)
pd.set_option('display.max_rows', 5000)

In [552]:
# Return the top 3 results from semantic search

top_3_semantic_1 = results_df_1.sort_values(by='Distances_1')
top_3_semantic_1[:3]

Unnamed: 0,Metadatas_1,Documents_1,Distances_1,IDs,Reranked_scores
0,"{'Chunk_id': 115, 'Page_No.': 'Page 62', 'Section': 'A claimant may reque'}","6 - Autopsy If payment for loss of life is claimed, The Principal may require an autopsy. The Principal will pay for any such autopsy. Article 7 - Legal Action Legal action to recover benefits under this Group Policy may not be started earlier than 90 days after required proof of loss has been filed and before the appeal procedures have been exhausted. Further, no legal action may be started later than three years after that proof is required to be filed. Article 8 - Time Limits Any time limits in this section will be adjusted as required by law. This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6018 Section D - Claim Procedures, Page 2",0.288088,114,-3.043668
1,"{'Chunk_id': 109, 'Page_No.': 'Page 60', 'Section': 'I f a Dependent who '}","I f a Dependent who was insured dies during the 31-day individual purchase period described in PART III, Section F, Article 2, The Principal will pay the individual policy amount, if any, the Dependent had the right to purchase. Unless a Beneficiary has been designated, payment will be to the Member if he or she survives the Dependent. If the Member does not survive the Dependent and a beneficiary for Dependent Life has not been named, payment will be to the person named as beneficiary for Member Life Insurance. However, if the Member is suspected or charged with the Dependent's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If the Member is found guilty of the Dependent's death, he or she may be disqualified...",0.31418,108,-4.880087
2,"{'Chunk_id': 81, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}","M ember's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If a Member who was insured dies within the 31-day individual purchase period described in PART III, Section F, The Principal will pay his or her beneficiary the individual policy amount, if any, the Member had the right to purchase. No payment will be made before The Principal receives Written proof of the Member's death. Article 3 - Beneficiary A beneficiary should be named at the time a Member applies or enrolls under this Group Policy. A Member may name or later change a named beneficiary by sending a Written request to The Principal. A change will not be effective until recorded by The Principal. Once recorded, the change will apply as of the date ...",0.31451,80,-2.185757


In [553]:
# Return the top 3 results after reranking

top_3_rerank_1 = results_df_1.sort_values(by='Reranked_scores', ascending=False)
top_3_rerank_1[:3]

Unnamed: 0,Metadatas_1,Documents_1,Distances_1,IDs,Reranked_scores
4,"{'Chunk_id': 111, 'Page_No.': 'Page 61', 'Section': 'Section D - Claim Pr'}","Section D - Claim Procedures Article 1 - Notice of Claim Written notice must be sent to The Principal by or for a Member or Dependent who wishes to file claim for benefits under this Group Policy. This notice must be sent within 20 days after the date of the loss for which claim is being made. Failure to give notice within the time specified will not invalidate or reduce any claim if notice is given as soon as reasonably possible. Article 2 - Claim Forms The Principal, when it receives notice of claim, will provide appropriate claim forms for filing proof of loss. If the forms are not provided within 15 days after The Principal receives notice, the person will be considered to have complied with the requirements of this Group Policy upon submitting, within the time specified below for ...",0.325197,110,-0.500461
2,"{'Chunk_id': 81, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}","M ember's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If a Member who was insured dies within the 31-day individual purchase period described in PART III, Section F, The Principal will pay his or her beneficiary the individual policy amount, if any, the Member had the right to purchase. No payment will be made before The Principal receives Written proof of the Member's death. Article 3 - Beneficiary A beneficiary should be named at the time a Member applies or enrolls under this Group Policy. A Member may name or later change a named beneficiary by sending a Written request to The Principal. A change will not be effective until recorded by The Principal. Once recorded, the change will apply as of the date ...",0.31451,80,-2.185757
0,"{'Chunk_id': 115, 'Page_No.': 'Page 62', 'Section': 'A claimant may reque'}","6 - Autopsy If payment for loss of life is claimed, The Principal may require an autopsy. The Principal will pay for any such autopsy. Article 7 - Legal Action Legal action to recover benefits under this Group Policy may not be started earlier than 90 days after required proof of loss has been filed and before the appeal procedures have been exhausted. Further, no legal action may be started later than three years after that proof is required to be filed. Article 8 - Time Limits Any time limits in this section will be adjusted as required by law. This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6018 Section D - Claim Procedures, Page 2",0.288088,114,-3.043668


In [554]:
top_3_RAG_1 = top_3_rerank_1[["Documents_1", "Metadatas_1"]][:3]

In [555]:
query_1

"What is the procedure for filing a claim in the event of the policyholder's death?"

In [556]:
top_3_RAG_1

Unnamed: 0,Documents_1,Metadatas_1
4,"Section D - Claim Procedures Article 1 - Notice of Claim Written notice must be sent to The Principal by or for a Member or Dependent who wishes to file claim for benefits under this Group Policy. This notice must be sent within 20 days after the date of the loss for which claim is being made. Failure to give notice within the time specified will not invalidate or reduce any claim if notice is given as soon as reasonably possible. Article 2 - Claim Forms The Principal, when it receives notice of claim, will provide appropriate claim forms for filing proof of loss. If the forms are not provided within 15 days after The Principal receives notice, the person will be considered to have complied with the requirements of this Group Policy upon submitting, within the time specified below for ...","{'Chunk_id': 111, 'Page_No.': 'Page 61', 'Section': 'Section D - Claim Pr'}"
2,"M ember's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If a Member who was insured dies within the 31-day individual purchase period described in PART III, Section F, The Principal will pay his or her beneficiary the individual policy amount, if any, the Member had the right to purchase. No payment will be made before The Principal receives Written proof of the Member's death. Article 3 - Beneficiary A beneficiary should be named at the time a Member applies or enrolls under this Group Policy. A Member may name or later change a named beneficiary by sending a Written request to The Principal. A change will not be effective until recorded by The Principal. Once recorded, the change will apply as of the date ...","{'Chunk_id': 81, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}"
0,"6 - Autopsy If payment for loss of life is claimed, The Principal may require an autopsy. The Principal will pay for any such autopsy. Article 7 - Legal Action Legal action to recover benefits under this Group Policy may not be started earlier than 90 days after required proof of loss has been filed and before the appeal procedures have been exhausted. Further, no legal action may be started later than three years after that proof is required to be filed. Article 8 - Time Limits Any time limits in this section will be adjusted as required by law. This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6018 Section D - Claim Procedures, Page 2","{'Chunk_id': 115, 'Page_No.': 'Page 62', 'Section': 'A claimant may reque'}"


### For the second Query

In [557]:
# Input (query, response) pairs for each of the top  responses received from the semantic search to the cross encoder
# Generate the cross_encoder scores for these pairs

cross_inputs_2 = [[query_2, response] for response in results_df_2['Documents_2']]
cross_rerank_scores_2 = cross_encoder.predict(cross_inputs_2)

In [558]:
cross_rerank_scores_2

array([ 1.6221391 ,  2.0967107 , -0.1461549 ,  1.0548372 , -1.2023337 ,
        0.46937525, -6.253815  ,  0.4261986 , -4.7700276 , -3.7148657 ],
      dtype=float32)

In [559]:
# Store the rerank_scores in results_df

results_df_2['Reranked_scores'] = cross_rerank_scores_2

In [560]:
# Return the top 3 results from semantic search

top_3_semantic_2 = results_df_2.sort_values(by='Distances_2')
top_3_semantic_2[:3]

Unnamed: 0,Metadatas_2,Documents_2,Distances_2,IDs,Reranked_scores
0,"{'Chunk_id': 36, 'Page_No.': 'Page 24', 'Section': 'T he Principal may t'}","T he Principal may terminate the Policyholder's coverage on any premium due date if the Policyholder relocates to a state where this Group Policy is not marketed, by giving the Policyholder 31 days advanced notice in Writing. Article 4 - Policyholder Responsibility to Members If this Group Policy terminates for any reason, the Policyholder must: a. notify each Member of the effective date of the termination; and b. refund or otherwise account to each Member all contributions received or withheld from Members for premiums not actually paid to The Principal. This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6005 Section C - Policy Termination, Page 2",0.256959,35,1.622139
1,"{'Chunk_id': 34, 'Page_No.': 'Page 23', 'Section': 'Section C - Policy T'}",Section C - Policy Termination Article 1 - Failure to Pay Premium This Group Policy will terminate at the end of the Grace Period if total premium due has not been received by The Principal before the end of the Grace Period. Failure by the Policyholder to pay the premium within the Grace Period will be deemed notice by the Policyholder to The Principal to discontinue this Group Policy at the end of the Grace Period. Article 2 - Termination Rights of the Policyholder The Policyholder may terminate this Group Policy effective on the day before any premium due date by giving Written notice to The Principal prior to that premium due date. The Policyholder's issuance of a stop-payment order for any amounts used to pay premiums for the Policyholder's coverage will be considered Written noti...,0.282504,33,2.096711
2,"{'Chunk_id': 59, 'Page_No.': 'Page 35', 'Section': 'Section C - Individu'}","Section C - Individual Terminations Article 1 - Member Life Insurance A Member's insurance under this Group Policy will terminate on the earliest of: a. the date this Group Policy is terminated; or b. the date the last premium is paid for the Member's insurance; or c. any date desired, if requested by the Member before that date; or d. the date the Member ceases to be a Member as defined in PART I; or e. the date the Member ceases to be in a class for which Member Life Insurance is provided; or f. the date the Member retires; or g. the date the Member ceases Active Work. Article 2 - Member Accidental Death and Dismemberment Insurance A Member's Accidental Death and Dismemberment Insurance under this Group Policy will terminate on the earliest of: a. the date his or her Member Life Insu...",0.301435,58,-0.146155


In [561]:
# Return the top 3 results after reranking

top_3_rerank_2 = results_df_2.sort_values(by='Reranked_scores', ascending=False)
top_3_rerank_2[:3]

Unnamed: 0,Metadatas_2,Documents_2,Distances_2,IDs,Reranked_scores
1,"{'Chunk_id': 34, 'Page_No.': 'Page 23', 'Section': 'Section C - Policy T'}",Section C - Policy Termination Article 1 - Failure to Pay Premium This Group Policy will terminate at the end of the Grace Period if total premium due has not been received by The Principal before the end of the Grace Period. Failure by the Policyholder to pay the premium within the Grace Period will be deemed notice by the Policyholder to The Principal to discontinue this Group Policy at the end of the Grace Period. Article 2 - Termination Rights of the Policyholder The Policyholder may terminate this Group Policy effective on the day before any premium due date by giving Written notice to The Principal prior to that premium due date. The Policyholder's issuance of a stop-payment order for any amounts used to pay premiums for the Policyholder's coverage will be considered Written noti...,0.282504,33,2.096711
0,"{'Chunk_id': 36, 'Page_No.': 'Page 24', 'Section': 'T he Principal may t'}","T he Principal may terminate the Policyholder's coverage on any premium due date if the Policyholder relocates to a state where this Group Policy is not marketed, by giving the Policyholder 31 days advanced notice in Writing. Article 4 - Policyholder Responsibility to Members If this Group Policy terminates for any reason, the Policyholder must: a. notify each Member of the effective date of the termination; and b. refund or otherwise account to each Member all contributions received or withheld from Members for premiums not actually paid to The Principal. This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6005 Section C - Policy Termination, Page 2",0.256959,35,1.622139
3,"{'Chunk_id': 73, 'Page_No.': 'Page 42', 'Section': 'Section F - Individu'}","Member Life Insurance terminates because this Group Policy terminates or is amended to exclude the Member's insurance class; or (3) the Member's Coverage During Disability as described in PART IV, Section A, ceases because Total Disability ends and he or she does not return to Active Work within 31 days; or (4) the Member's Accelerated Benefits Premium Waiver Period as described in PART IV, Section A, ceases and he or she does not qualify for Coverage During Disability . c. Application/Effective Date Notice of the individual purchase right must be given to the Member by the Policyholder before insurance under this Group Policy terminates, or as soon as reasonably possible thereafter. A Member must apply for individual purchase and the first premium for the individual policy must be pai...",0.301809,72,1.054837


In [562]:
top_3_RAG_2 = top_3_rerank_2[["Documents_2", "Metadatas_2"]][:3]

In [563]:
query_2

'What are the terms and conditions for terminating the policy?'

In [564]:
top_3_RAG_2

Unnamed: 0,Documents_2,Metadatas_2
1,Section C - Policy Termination Article 1 - Failure to Pay Premium This Group Policy will terminate at the end of the Grace Period if total premium due has not been received by The Principal before the end of the Grace Period. Failure by the Policyholder to pay the premium within the Grace Period will be deemed notice by the Policyholder to The Principal to discontinue this Group Policy at the end of the Grace Period. Article 2 - Termination Rights of the Policyholder The Policyholder may terminate this Group Policy effective on the day before any premium due date by giving Written notice to The Principal prior to that premium due date. The Policyholder's issuance of a stop-payment order for any amounts used to pay premiums for the Policyholder's coverage will be considered Written noti...,"{'Chunk_id': 34, 'Page_No.': 'Page 23', 'Section': 'Section C - Policy T'}"
0,"T he Principal may terminate the Policyholder's coverage on any premium due date if the Policyholder relocates to a state where this Group Policy is not marketed, by giving the Policyholder 31 days advanced notice in Writing. Article 4 - Policyholder Responsibility to Members If this Group Policy terminates for any reason, the Policyholder must: a. notify each Member of the effective date of the termination; and b. refund or otherwise account to each Member all contributions received or withheld from Members for premiums not actually paid to The Principal. This policy has been updated effective January 1, 2014 PART II - POLICY ADMINISTRATION GC 6005 Section C - Policy Termination, Page 2","{'Chunk_id': 36, 'Page_No.': 'Page 24', 'Section': 'T he Principal may t'}"
3,"Member Life Insurance terminates because this Group Policy terminates or is amended to exclude the Member's insurance class; or (3) the Member's Coverage During Disability as described in PART IV, Section A, ceases because Total Disability ends and he or she does not return to Active Work within 31 days; or (4) the Member's Accelerated Benefits Premium Waiver Period as described in PART IV, Section A, ceases and he or she does not qualify for Coverage During Disability . c. Application/Effective Date Notice of the individual purchase right must be given to the Member by the Policyholder before insurance under this Group Policy terminates, or as soon as reasonably possible thereafter. A Member must apply for individual purchase and the first premium for the individual policy must be pai...","{'Chunk_id': 73, 'Page_No.': 'Page 42', 'Section': 'Section F - Individu'}"


##### Query 3

In [565]:
# Input (query, response) pairs for each of the top responses received from the semantic search to the cross encoder
# Generate the cross_encoder scores for these pairs

cross_inputs_3 = [[query_3, response] for response in results_df_3['Documents_3']]
cross_rerank_scores_3 = cross_encoder.predict(cross_inputs_3)

In [566]:
cross_rerank_scores_3

array([ -2.2308857,  -2.7305987,  -4.4455004,  -6.4399176,  -3.5762393,
        -2.1695697, -10.131089 ,  -8.721973 ,  -2.7073295,  -8.6188135],
      dtype=float32)

In [567]:
# Store the rerank_scores in results_df

results_df_3['Reranked_scores'] = cross_rerank_scores_3

In [568]:
# Return the top 3 results from semantic search

top_3_semantic_3 = results_df_3.sort_values(by='Distances_3')
top_3_semantic_3[:3]

Unnamed: 0,Metadatas_3,Documents_3,Distances_3,IDs,Reranked_scores
0,"{'Chunk_id': 82, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}","reinstated, his or her beneficiary will be as recorded on the date of termination. If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue, the beneficiary named in such replaced insurance and recorded by the Policyholder or The Principal will be the beneficiary under this Group Policy until a new beneficiary is named. Article 4 - Facility of Payment If any of the below occur, benefits will be paid as stated. All such payments will discharge The Principal to the full extent of those payments. a. If a beneficiary is found guilty of the Member's death, such beneficiary may be disqualified from receiving any benefit due. Payment may then be made to any contingent beneficiary or to th...",0.298785,81,-2.230886
1,"{'Chunk_id': 81, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}","M ember's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If a Member who was insured dies within the 31-day individual purchase period described in PART III, Section F, The Principal will pay his or her beneficiary the individual policy amount, if any, the Member had the right to purchase. No payment will be made before The Principal receives Written proof of the Member's death. Article 3 - Beneficiary A beneficiary should be named at the time a Member applies or enrolls under this Group Policy. A Member may name or later change a named beneficiary by sending a Written request to The Principal. A change will not be effective until recorded by The Principal. Once recorded, the change will apply as of the date ...",0.319675,80,-2.730599
2,"{'Chunk_id': 80, 'Page_No.': 'Page 46', 'Section': 'PART IV - BENEFITS'}","2 - Death Benefits Payable If a Member dies while insured for Member Life Insurance under this Group Policy, The Principal will pay his or her beneficiary the Scheduled Benefit (or approved amount, if applicable) in force on the date of death, less any Accelerated Benefit payment as described in PART IV, Section A, Article 7. However, if a beneficiary is suspected or charged with the This policy has been updated effective January 1, 2014 PART IV - BENEFITS GC 6013 Section A - Member Life Insurance, Page 1",0.337434,79,-4.4455


In [569]:
# Return the top 3 results after reranking

top_3_rerank_3 = results_df_3.sort_values(by='Reranked_scores', ascending=False)
top_3_rerank_3[:3]

Unnamed: 0,Metadatas_3,Documents_3,Distances_3,IDs,Reranked_scores
5,"{'Chunk_id': 109, 'Page_No.': 'Page 60', 'Section': 'I f a Dependent who '}","I f a Dependent who was insured dies during the 31-day individual purchase period described in PART III, Section F, Article 2, The Principal will pay the individual policy amount, if any, the Dependent had the right to purchase. Unless a Beneficiary has been designated, payment will be to the Member if he or she survives the Dependent. If the Member does not survive the Dependent and a beneficiary for Dependent Life has not been named, payment will be to the person named as beneficiary for Member Life Insurance. However, if the Member is suspected or charged with the Dependent's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If the Member is found guilty of the Dependent's death, he or she may be disqualified...",0.34319,108,-2.16957
0,"{'Chunk_id': 82, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}","reinstated, his or her beneficiary will be as recorded on the date of termination. If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue, the beneficiary named in such replaced insurance and recorded by the Policyholder or The Principal will be the beneficiary under this Group Policy until a new beneficiary is named. Article 4 - Facility of Payment If any of the below occur, benefits will be paid as stated. All such payments will discharge The Principal to the full extent of those payments. a. If a beneficiary is found guilty of the Member's death, such beneficiary may be disqualified from receiving any benefit due. Payment may then be made to any contingent beneficiary or to th...",0.298785,81,-2.230886
8,"{'Chunk_id': 25, 'Page_No.': 'Page 18', 'Section': 'c . a copy of the fo'}","c . a copy of the form which contains the statement is given to the insured or the insured's beneficiary at the time insurance is contested. However, these provisions will not preclude the assertion at any time of defenses based upon the person's ineligibility for insurance under this Group Policy or upon the provisions of this Group Policy. In addition, if an individual's age is misstated, The Principal may at any time adjust premium and benefits to reflect the correct age. Article 6 - Information to be Furnished The Policyholder must, upon request, give The Principal all information needed to administer this Group Policy. If a clerical error is found in this information, The Principal may at any time adjust premium to reflect the facts. An error will not invalidate insurance that wou...",0.347124,24,-2.70733


In [570]:
top_3_RAG_3 = top_3_rerank_3[["Documents_3", "Metadatas_3"]][:3]

In [571]:
query_3

'How is the beneficiary of the policy defined and what are their rights?'

In [572]:
top_3_RAG_3

Unnamed: 0,Documents_3,Metadatas_3
5,"I f a Dependent who was insured dies during the 31-day individual purchase period described in PART III, Section F, Article 2, The Principal will pay the individual policy amount, if any, the Dependent had the right to purchase. Unless a Beneficiary has been designated, payment will be to the Member if he or she survives the Dependent. If the Member does not survive the Dependent and a beneficiary for Dependent Life has not been named, payment will be to the person named as beneficiary for Member Life Insurance. However, if the Member is suspected or charged with the Dependent's death, the Death Benefits Payable may be withheld until additional information has been received or the trial has been held. If the Member is found guilty of the Dependent's death, he or she may be disqualified...","{'Chunk_id': 109, 'Page_No.': 'Page 60', 'Section': 'I f a Dependent who '}"
0,"reinstated, his or her beneficiary will be as recorded on the date of termination. If a Member is insured under this Group Policy on its Date of Issue and this insurance replaces insurance in force on the day immediately before the Date of Issue, the beneficiary named in such replaced insurance and recorded by the Policyholder or The Principal will be the beneficiary under this Group Policy until a new beneficiary is named. Article 4 - Facility of Payment If any of the below occur, benefits will be paid as stated. All such payments will discharge The Principal to the full extent of those payments. a. If a beneficiary is found guilty of the Member's death, such beneficiary may be disqualified from receiving any benefit due. Payment may then be made to any contingent beneficiary or to th...","{'Chunk_id': 82, 'Page_No.': 'Page 47', 'Section': 'M ember's death, the'}"
8,"c . a copy of the form which contains the statement is given to the insured or the insured's beneficiary at the time insurance is contested. However, these provisions will not preclude the assertion at any time of defenses based upon the person's ineligibility for insurance under this Group Policy or upon the provisions of this Group Policy. In addition, if an individual's age is misstated, The Principal may at any time adjust premium and benefits to reflect the correct age. Article 6 - Information to be Furnished The Policyholder must, upon request, give The Principal all information needed to administer this Group Policy. If a clerical error is found in this information, The Principal may at any time adjust premium to reflect the facts. An error will not invalidate insurance that wou...","{'Chunk_id': 25, 'Page_No.': 'Page 18', 'Section': 'c . a copy of the fo'}"


## 6. Retrieval Augmented Generation


In [580]:
# Define the function to generate the response. Provide a comprehensive prompt that passes the user query and the top 3 results to the model

def generate_response(query, top_3_RAG):
    """
    Generate a response using GPT-3.5's ChatCompletion based on the user query and retrieved information.
    """
    messages = [
        {"role": "system", "content": "You are a helpful assistant in the insurance domain who can effectively answer user queries about insurance policies and documents."},
        {"role": "user", "content": f"""
            You are a helpful assistant in the insurance domain who can effectively answer user queries about insurance policies and documents.
            You have a question asked by the user in '{query}' and you have some search results from a corpus of insurance documents in the dataframe '{top_3_RAG}'. These search results are essentially one page of an insurance document that may be relevant to the user query.

            The column 'documents' inside this dataframe contains the actual text from the policy document and the column 'metadata' contains the policy name and source page. The text inside the document may also contain tables in the format of a list of lists where each of the nested lists indicates a row.

            Use the documents in '{top_3_RAG}' to answer the query '{query}'. Frame an informative answer and also, use the dataframe to return the relevant policy names and page numbers as citations.

            Follow the guidelines below when performing the task:
            1. Try to provide relevant/accurate numbers if available.
            2. You don’t have to necessarily use all the information in the dataframe. Only choose information that is relevant.
            3. If the document text has tables with relevant information, please reformat the table and return the final information in a tabular format.
            4. Use the 'metadata' columns in the dataframe to retrieve and cite the policy name(s) and page number(s) as citation.
            5. If you can't provide the complete answer, please also provide any information that will help the user to search specific sections in the relevant cited documents.
            6. You are a customer-facing assistant, so do not provide any information on internal workings, just answer the query directly.

            The generated response should answer the query directly addressing the user and avoiding additional information. If you think that the query is not relevant to the document, reply that the query is irrelevant. Provide the final response as a well-formatted and easily readable text along with the citation. Provide your complete response first with all information, and then provide the citations.

            ### Few-Shot Examples

            ### Example 1: Basic Query about Coverage
            **Query:**  
            What does the policy say about coverage for accidental death?

            **Top 3 RAG Results:**  
            - **Document 1:** "This policy provides coverage for accidental death. The insured amount for accidental death is 200% of the base coverage amount if the death occurs within 90 days of the accident..."
            - **Document 2:** "Accidental death benefits are payable under this policy if the insured dies as a result of an accident. The benefit amount equals double the coverage amount, provided the death is a direct result of the accident and occurs within a specified time frame..."
            - **Document 3:** "In the event of accidental death, the policy pays an additional benefit, which is equal to twice the original coverage amount. This benefit is contingent on the death occurring within 180 days from the date of the accident..."

            **Response:**  
            The policy provides coverage for accidental death, where the benefit amount is typically 200% of the base coverage. The death must occur as a direct result of an accident and within a specified period, which varies between 90 to 180 days depending on the policy.  
            **Citations:**  
            Document 1: Policy X, Page 5  
            Document 2: Policy Y, Page 12  
            Document 3: Policy Z, Page 7

            ### Example 2: Query about Exclusions
            **Query:**  
            Are there any exclusions for pre-existing conditions in this policy?

            **Top 3 RAG Results:**  
            - **Document 1:** "This policy excludes coverage for any conditions that were diagnosed or treated within 12 months prior to the policy's start date. However, if the condition remains stable for 24 months after the policy's start date, it may be eligible for coverage..."
            - **Document 2:** "Pre-existing conditions are generally not covered under this policy unless explicitly stated otherwise. Any condition that has shown symptoms or required medical attention in the 12 months before the policy start date is excluded..."
            - **Document 3:** "Exclusions apply to pre-existing conditions if they were present within a 12-month window before the policy commencement. After a waiting period of 24 months, these conditions may be reconsidered for coverage..."

            **Response:**  
            The policy excludes coverage for pre-existing conditions that were diagnosed or treated within 12 months before the policy start date. However, if the condition remains stable and does not require treatment for 24 months after the policy start date, it may be eligible for coverage.  
            **Citations:**  
            Document 1: Policy X, Page 8  
            Document 2: Policy Y, Page 15  
            Document 3: Policy Z, Page 10

            ### Example 3: Query about Beneficiaries
            **Query:**  
            How can I update the beneficiary for my life insurance policy?

            **Top 3 RAG Results:**  
            - **Document 1:** "To update the beneficiary, the policyholder must submit a written request to the insurance company. The request should include the policy number, current beneficiary, and the new beneficiary's details..."
            - **Document 2:** "Beneficiary changes can be made by filling out the 'Beneficiary Change Form,' which must be signed by the policyholder and submitted to the insurance provider. The change takes effect upon the company's receipt and acceptance of the form..."
            - **Document 3:** "The policyholder has the right to change the beneficiary at any time by providing written notice to the insurer. The change will be recorded and acknowledged by the insurer upon receipt..."

            **Response:**  
            To update the beneficiary of your life insurance policy, you need to submit a written request or complete a 'Beneficiary Change Form' provided by your insurer. This request must be signed by you as the policyholder, and the change will take effect once the insurance company has received and acknowledged it.  
            **Citations:**  
            Document 1: Policy X, Page 3  
            Document 2: Policy Y, Page 6  
            Document 3: Policy Z, Page 4

            ### Example 4: Complex Query about Premium Payments
            **Query:**  
            What are the options if I miss a premium payment?

            **Top 3 RAG Results:**  
            - **Document 1:** "If a premium payment is missed, the policyholder typically has a 30-day grace period to make the payment without penalty. If payment is not received within this period, the policy may lapse, and coverage could be lost..."
            - **Document 2:** "The policy includes a 30-day grace period for missed payments. During this time, coverage remains in force. If payment is not made by the end of the grace period, the policyholder may request reinstatement of the policy, subject to underwriting..."
            - **Document 3:** "In the event of a missed premium, a 30-day grace period is granted. If the premium is not paid within this period, the policyholder may choose to reinstate the policy, which may require proof of insurability and payment of overdue premiums..."

            **Response:**  
            If you miss a premium payment, your policy provides a 30-day grace period during which you can make the payment without losing coverage. If the payment is not made within this period, the policy may lapse. However, you may have the option to reinstate the policy by providing proof of insurability and paying the overdue premiums.  
            **Citations:**  
            Document 1: Policy X, Page 10  
            Document 2: Policy Y, Page 11  
            Document 3: Policy Z, Page 9
        """},
    ]

    response = openai.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=messages
    )

    return response.choices[0].message.content.split('\n')


In [581]:
query_1

"What is the procedure for filing a claim in the event of the policyholder's death?"

In [582]:
# Generate the response - For Query 1

response = generate_response(query_1, top_3_RAG_1)
print("Query 1: ","\n",query_1,"\n_________________________________________________________________________________________________________________\n_________________________________________________________________________________________________________________\n")
# Print the response
print("\n".join(response))

Query 1:  
 What is the procedure for filing a claim in the event of the policyholder's death? 
_________________________________________________________________________________________________________________
_________________________________________________________________________________________________________________

**Response:**  
In the event of the policyholder's death, the procedure for filing a claim involves the following steps as per the insurance document excerpts provided:
1. **Notice of Claim:** Written notice must be sent to The Principal by or for a Member or Dependent within 20 days of the loss. However, failure to give notice within the specified time will not invalidate or reduce any claim if notice is given as soon as reasonably possible.
2. **Claim Forms:** The Principal will provide appropriate claim forms for filing proof of loss upon receiving the notice of claim. If the forms are not provided within 15 days, the person will be considered to have complied wit

##### Query 2

In [583]:
query_2

'What are the terms and conditions for terminating the policy?'

In [584]:
# Generate the response - For Query 2

response = generate_response(query_2, top_3_RAG_2)
print("Query 2: ","\n",query_2,"\n_________________________________________________________________________________________________________________\n_________________________________________________________________________________________________________________\n")
# Print the response
print("\n".join(response))

Query 2:  
 What are the terms and conditions for terminating the policy? 
_________________________________________________________________________________________________________________
_________________________________________________________________________________________________________________

**Response:**  
The policy outlines specific terms and conditions for terminating it. Here are the termination terms based on the information found in the provided documents:

1. **Policy Termination Due to Failure to Pay Premiums:**
   - The Group Policy will terminate at the end of the Grace Period if the total premium due is not received by The Principal before the end of the Grace Period. Failure to pay the premium within this Grace Period will be considered notice by the Policyholder to discontinue the Group Policy.
   - **Citation:** Section C - Policy Termination, Page 23

2. **Policyholder Termination Rights:**
   - The Policyholder may terminate the Group Policy effective on the

#### Query 3

In [586]:
query_3

'How is the beneficiary of the policy defined and what are their rights?'

In [587]:
# Generate the response - For Query 3

response = generate_response(query_3, top_3_RAG_3)
print("Query 3: ","\n",query_3,"\n_________________________________________________________________________________________________________________\n_________________________________________________________________________________________________________________\n")
# Print the response
print("\n".join(response))

Query 3:  
 How is the beneficiary of the policy defined and what are their rights? 
_________________________________________________________________________________________________________________
_________________________________________________________________________________________________________________

The beneficiary of the policy is defined as the individual who will receive the benefits or proceeds from the insurance policy upon the insured individual's death. In the absence of a designated beneficiary, the payment will typically be made to the surviving Member. If the Member does not survive, and there is no specified beneficiary for the Dependent Life, the payment will then be made to the individual named as the beneficiary for Member Life Insurance.

The rights of the beneficiary include the entitlement to claim the benefits stipulated in the policy upon the insured's death. However, it is essential to note that if the beneficiary is found guilty of the Member's death, 