# Generative Search System (RAG) Case Study Notebook

## Problem Statement

Insurance policy documents contain vast amounts of information, and manually searching for relevant details is time-consuming and inefficient. Traditional keyword-based search systems often fail to provide contextually relevant answers.

**Mr.HelpMate AI** aims to address these challenges by leveraging **AI-powered Retrieval-Augmented Generation (RAG)** to create an intelligent, conversational search assistant capable of:

- Accurately extracting relevant policy details  
- Understanding natural language queries  
- Generating concise and contextually relevant answers  

This project will implement and experiment with various strategies to optimize retrieval and generation quality, ultimately improving user experience in navigating policy documents.

---

## 1. Overall System Design & Innovation

This project implements a **Retrieval-Augmented Generation (RAG)** system to create a robust generative search engine capable of effectively answering questions from a single, long **Group Member Life Insurance Policy** document.

The system is designed around a **three-layer architecture**:  
**Embedding**, **Search**, and **Generation**.Innovation is focused on optimizing performance and output quality within each layer through strategic model and algorithm choices, including mandatory components like a **Cache** and **Re-ranking block**.

## Search System's Architecture, Workflow, and Implementation

### 1. Embedding Layer (Indexing)
- Extracts policy text robustly (handling tables) using pdfplumber, applies a Custom Fixed-Size Character Splitting strategy for optimal chunks, and converts them into vectors using the efficient `all-MiniLM-L6-v2` model from `sentence-transformers`.

### 2. Search Layer (Retrieval):
- Uses a persistent ChromaDB vector store. It integrates a Cache Mechanism for speed and a mandatory Cross-Encoder Re-ranker (using `sentence-transformers`) to ensure only the most relevant chunks are retrieved.

### 3. Generation Layer (Synthesis)
- An Exhaustive Prompt is engineered to guide the OpenAI LLM (e.g., `gpt-3.5-turbo`) to provide precise, contextual, and, most importantly, cited answers directly from the policy excerpts.


## 2.  Setup and Imports

This cell installs and imports all necessary libraries for the **RAG pipeline**.  
We're using **open-source models** to enable a simple, local setup without relying on proprietary APIs or cloud services.


In [1]:
import os
os.chdir("/content/drive/MyDrive/Colab Notebooks/GenAI/02_Helpmate_project")
print("Current directory:", os.getcwd())


Current directory: /content/drive/MyDrive/Colab Notebooks/GenAI/02_Helpmate_project


In [2]:
# SETUP AND IMPORTS

!pip install -qqq pdfplumber tiktoken openai chromadb sentence-transformers

import os
import re
import json
import pandas as pd
from operator import itemgetter
from typing import List, Dict, Any

import pdfplumber
import tiktoken
from openai import OpenAI

from sentence_transformers import SentenceTransformer, CrossEncoder
import chromadb
from chromadb.utils import embedding_functions

# It's recommended to use environment variables or a key management service.
# For demonstration, we are taking key from a file stored locally.

 # Load API key
with open("OpenAI_API_Key.txt", "r") as f:
    OPENAI_API_KEY = f.read().strip()

# --- System Configuration Constants ---

PDF_PATH = "Principal-Sample-Life-Insurance-Policy.pdf"
CHROMA_DB_PATH = "./rag_case_study_db"
COLLECTION_NAME = "insurance_policy_chunks"
CACHE_COLLECTION_NAME = "query_cache"
CACHE_THRESHOLD = 0.05       # Distance threshold for cache hit (lower = stricter match)
EMBEDDING_MODEL_NAME = "all-MiniLM-L6-v2"
RERANKER_MODEL_NAME = "cross-encoder/ms-marco-MiniLM-L-6-v2"
GENERATION_MODEL_NAME = "gpt-3.5-turbo" # Using OpenAI for generation layer
TOP_K_CHUNKS = 10            # Number of chunks to retrieve initially from ChromaDB
TOP_N_RERANKED = 3           # Number of top chunks passed to the LLM (for Generation)

# Initialize Clients
openai_client = OpenAI(api_key=OPENAI_API_KEY)

print("Setup complete. Libraries imported and configurations set.")

Setup complete. Libraries imported and configurations set.


## 3. Embedding Layer: Processing and Optimal Chunking

This section implements the effectiveness in processing the text data and the application of an effective and optimal chunking strategy.

### 3.1 PDF Text Extraction (Robust, Handles Tables)


In [3]:
# PDF TEXT EXTRACTION

def extract_text_from_pdf_robust(pdf_path: str) -> List[Dict]:
    #Extracts text from PDF, ensuring tables are captured correctly using pdfplumber.
    full_text = []

    if not os.path.exists(pdf_path):
        print(f"ERROR: File not found at path: {pdf_path}. Please upload your policy document.")
        return []

    print(f"Extracting content from '{pdf_path}'...")
    with pdfplumber.open(pdf_path) as pdf:
        for p, page in enumerate(pdf.pages):
            page_no = f"Page {p+1}"

            # Find and process tables
            tables = page.find_tables()
            # Convert table data to JSON string for better LLM context
            tables_content = [{'text': json.dumps(t.extract()), 'top': t.bbox[1]} for t in tables]
            table_bboxes = [i.bbox for i in tables]

            # Helper function to check if a word is inside a table area
            def is_in_table(word, bboxes):
                l = word['x0'], word['top'], word['x1'], word['bottom']
                for r in bboxes:
                    if l[0] > r[0] and l[1] > r[1] and l[2] < r[2] and l[3] < r[3]:
                        return True
                return False

            # Extract words that are NOT part of a table
            non_table_words = [
                {'text': word['text'], 'top': word['top']}
                for word in page.extract_words()
                if not is_in_table(word, table_bboxes)
            ]

            # Combine all content (text and tables) and sort by vertical position to maintain flow
            page_objects = non_table_words + tables_content
            page_objects.sort(key=itemgetter('top'))

            # Reconstruct the page text
            page_content = " ".join([obj['text'] for obj in page_objects])

            full_text.append({
                'Page_No.': page_no,
                'Page_Text': page_content.strip()
            })

    print(f"Extracted content from {len(pdf.pages)} pages.")
    return full_text

# Execute extraction
extracted_data = extract_text_from_pdf_robust(PDF_PATH)
if extracted_data:
    text_df = pd.DataFrame(extracted_data)
else:
    text_df = None
    print("Cannot proceed: No data extracted.")

Extracting content from 'Principal-Sample-Life-Insurance-Policy.pdf'...
Extracted content from 64 pages.


In [4]:
# Let's view the data
text_df.head()

Unnamed: 0,Page_No.,Page_Text
0,Page 1,DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/...
1,Page 2,This page left blank intentionally
2,Page 3,POLICY RIDER GROUP INSURANCE POLICY NO: S655 C...
3,Page 4,This page left blank intentionally
4,Page 5,PRINCIPAL LIFE INSURANCE COMPANY (called The P...


### 3.2 Applying Optimal Chunking Strategy


In [5]:
# OPTIMAL CHUNKING (Custom Fixed-Size Character Splitting)

def fixed_size_chunker(text: str, chunk_size: int, overlap: int) -> List[str]:
    chunks = []
    start = 0

    # Use character length as the metric
    while start < len(text):
        end = start + chunk_size
        chunk = text[start:end]
        chunks.append(chunk)

        # Calculate the next start position with overlap
        if len(chunk) < chunk_size:
            break

        start += (chunk_size - overlap)
        if start < 0:
             start = 0 # Should only happen if overlap > chunk_size, but acts as safeguard

    return chunks

if text_df is not None and not text_df.empty:
    CHUNK_SIZE = 512
    CHUNK_OVERLAP = 128

    print(f"Applying Custom Fixed-Size Character Splitter (Size: {CHUNK_SIZE}, Overlap: {CHUNK_OVERLAP})...")

    chunks_data = []

    for index, row in text_df.iterrows():
        page_content = row['Page_Text']
        metadata = {'Page_No.': row['Page_No.'].replace('.', '')}

        # Split the document text using the custom function
        chunks = fixed_size_chunker(page_content, chunk_size=CHUNK_SIZE, overlap=CHUNK_OVERLAP)

        for i, chunk in enumerate(chunks):
            chunk_metadata = metadata.copy()
            chunk_metadata['Chunk_No.'] = i + 1

            chunks_data.append({
                'chunk_text': chunk,
                'metadata': chunk_metadata
            })

    chunks_df = pd.DataFrame(chunks_data)
    print(f"Original pages: {len(text_df)}, Total chunks created: {len(chunks_df)}")
else:
    chunks_df = None
    print("Skipping chunking as no data was extracted.")

Applying Custom Fixed-Size Character Splitter (Size: 512, Overlap: 128)...
Original pages: 64, Total chunks created: 278


In [6]:
# Let's view data after CHUNKING (First 5 Chunks) ---
chunks_sample = chunks_df.head(5).copy()
chunks_sample['Chunk_Text_Snippet'] = chunks_sample['chunk_text'].str[:100].str.replace('\n', ' ') + "..."
chunks_sample['Metadata'] = chunks_sample['metadata'].apply(lambda x: f"Page {x['Page_No.']}, Chunk {x['Chunk_No.']}")
print(chunks_sample[['Metadata', 'Chunk_Text_Snippet']].to_markdown(index=False))

| Metadata             | Chunk_Text_Snippet                                                                                      |
|:---------------------|:--------------------------------------------------------------------------------------------------------|
| Page Page 1, Chunk 1 | DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/01/2014 711 HIGH STREET GEORGE RI 02903 GROUP POLICY F... |
| Page Page 2, Chunk 1 | This page left blank intentionally...                                                                   |
| Page Page 3, Chunk 1 | POLICY RIDER GROUP INSURANCE POLICY NO: S655 COVERAGE: Life EMPLOYER: RHODE ISLAND JOHN DOE Effectiv... |
| Page Page 3, Chunk 2 | rvices or any other value added service for the employees of that employer group. In addition, The P... |
| Page Page 3, Chunk 3 | hese goods, services and/or third party provider discounts, the third party service providers are li... |


### 3.3 Embedding and ChromaDB Storage

In [7]:
# EMBEDDING AND CHROMADB STORAGE

if chunks_df is not None and not chunks_df.empty:
    # 1. Appropriate Choice of Embedding Model
    embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(
        model_name=EMBEDDING_MODEL_NAME
    )
    print(f"Using Embedding Model: {EMBEDDING_MODEL_NAME}")

    # 2. Initialize Persistent ChromaDB Client and Collections
    client = chromadb.PersistentClient(path=CHROMA_DB_PATH)

    # Main Collection for Policy Chunks
    collection = client.get_or_create_collection(
        name=COLLECTION_NAME,
        embedding_function=embedding_function
    )

    # Cache Collection (for mandatory cache implementation)
    cache_collection = client.get_or_create_collection(
        name=CACHE_COLLECTION_NAME,
        embedding_function=embedding_function
    )

    # 3. Add chunks to the collection (only if not already loaded)
    ids = [f"chunk_{i}" for i in range(len(chunks_df))]
    documents = chunks_df['chunk_text'].tolist()
    metadatas = chunks_df['metadata'].tolist()

    if collection.count() != len(documents):
        collection.upsert(documents=documents, metadatas=metadatas, ids=ids)

        collection.add(
            documents=documents,
            metadatas=metadatas,
            ids=ids
        )
        print(f"ChromaDB loaded with {collection.count()} chunks into '{COLLECTION_NAME}'.")
    else:
        print(f"ChromaDB already contains {collection.count()} chunks. Skipping load.")
else:
    print("Skipping ChromaDB setup as chunking failed.")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Using Embedding Model: all-MiniLM-L6-v2
ChromaDB already contains 278 chunks. Skipping load.


In [8]:
# Let's view data after EMBEDDING/STORAGE ---
print(f"Total documents in '{COLLECTION_NAME}': {collection.count()}")
print("This confirms the chunk text has been converted to vectors and stored in ChromaDB.")
collection.peek()

Total documents in 'insurance_policy_chunks': 278
This confirms the chunk text has been converted to vectors and stored in ChromaDB.


{'ids': ['chunk_0',
  'chunk_1',
  'chunk_2',
  'chunk_3',
  'chunk_4',
  'chunk_5',
  'chunk_6',
  'chunk_7',
  'chunk_8',
  'chunk_9'],
 'embeddings': array([[-0.02592193,  0.04777753,  0.05585773, ..., -0.04932659,
         -0.05851149,  0.02355198],
        [ 0.02911896,  0.06057408,  0.04641531, ...,  0.05954009,
         -0.02838372,  0.00531935],
        [-0.06910008,  0.04697426,  0.00010474, ..., -0.04099483,
          0.02056715, -0.00662788],
        ...,
        [-0.05875206,  0.06800337,  0.06347963, ..., -0.03584532,
         -0.00298454, -0.02332482],
        [-0.0812709 ,  0.0445028 ,  0.05686409, ..., -0.10154654,
         -0.00629883,  0.04746101],
        [-0.02865543,  0.03401795,  0.01736274, ..., -0.0199494 ,
          0.05256363, -0.06476747]]),
 'documents': ['DOROTHEA GLAUSE S655 RHODE ISLAND JOHN DOE 01/01/2014 711 HIGH STREET GEORGE RI 02903 GROUP POLICY FOR: RHODE ISLAND JOHN DOE ALL MEMBERS Group Member Life Insurance Print Date: 07/16/2014',
  'This page l

In [9]:
cache_collection.peek()

{'ids': ['what_is_the_official_definition_of_a__dependent__under_this_group_member_life_insurance_policy_',
  'according_to_the_policy__what_is_the_effective_date_for_a_change_in_scheduled_benefit_amount_that_requires_proof_of_good_health_',
  'what_is_the_procedure_and_time_limit_for_a_claimant_to_appeal_a_claim_denial_decision_',
  'what_is_the_proof_of_adl_disability_or_total_disability_',
  'what_is_condition_of_death_while_not_wearing_seat_belt_',
  'what_event_marks_the_effective_date_for_the_policy_rider_described_in_the_document_',
  'according_to_the_policy__what_is_the_earliest_and_latest_time_limit_for_a_claimant_to_start_legal_action_to_recover_benefits_',
  'what_if_i_fail_to_pay_premium_'],
 'embeddings': array([[-0.07271644,  0.03535411, -0.03075045, ...,  0.005109  ,
          0.08578993, -0.01217845],
        [-0.03723065,  0.10137826,  0.03557216, ..., -0.01043161,
          0.01724429, -0.00549376],
        [-0.03440192,  0.10575985, -0.00418625, ...,  0.00963795,
  

## 4. Search Layer: Cache and Re-ranking

This section implements the quality of the search results by adding the mandatory cache and the re-ranker using the allowed sentence-transformers library.


In [10]:
# CACHE, SEARCH, AND RE-RANKING FUNCTIONS

def query_with_cache(query_text: str, k_chunks: int = TOP_K_CHUNKS):
    print(f"-> Querying DB with k={k_chunks}...")

    # 1. Mandatory Cache Check
    try:
        cache_results = cache_collection.query(
            query_texts=[query_text],
            n_results=1,
            include=['distances']
        )

        # Check for cache hit based on distance threshold (Selection and implementation of cache)
        if (cache_results and cache_results['distances'] and cache_results['distances'][0]
            and cache_results['distances'][0][0] <= CACHE_THRESHOLD):

            print(f"CACHE HIT. Query is highly similar to a past query. Distance: {cache_results['distances'][0][0]:.4f}")
            # Note: In a full RAG implementation, we would retrieve the *cached answer* here.
        else:
            print("Cache Miss/Irrelevant. Proceeding with search.")

    except Exception as e:
        print(f"   Cache check error (non-fatal): {e}")

    # 2. Query Main Collection
    results = collection.query(
        query_texts=[query_text],
        n_results=k_chunks,
        include=['documents', 'metadatas', 'distances']
    )

    # 3. Cache Update (Store the current query's embedding for future checks)
    cache_collection.add(
        documents=[query_text],
        metadatas=[{'type': 'query'}],
        # Generate a unique ID for the query, constrained by ChromaDB ID length
        ids=[re.sub(r'[^a-z0-9]', '_', query_text.lower())[:128]]
    )

    return results

# Mandatory Re-ranking Model Setup (Selection and implementation of a re-ranker)
print(f"\nSetting up Mandatory Re-ranker: {RERANKER_MODEL_NAME}...")
# Use CrossEncoder directly from the allowed sentence_transformers library
reranker_model = CrossEncoder(RERANKER_MODEL_NAME)
print("Re-ranker model loaded successfully.")

def rerank_results(query: str, results: Dict[str, Any], top_n: int = TOP_N_RERANKED) -> List[Dict]:

    initial_chunks = results['documents'][0]
    initial_metadatas = results['metadatas'][0]

    # 1. Create pairs of (query, chunk_text) for the cross-encoder
    cross_inputs = [[query, chunk] for chunk in initial_chunks]

    # 2. Get the relevance scores
    scores = reranker_model.predict(cross_inputs)

    # 3. Combine and sort by score
    scored_chunks = []
    for chunk, metadata, score in zip(initial_chunks, initial_metadatas, scores.tolist()):
        scored_chunks.append({
            'chunk_text': chunk,
            'metadata': metadata,
            'score': score
        })

    # Sort by relevance score (highest score is most relevant)
    reranked_chunks = sorted(scored_chunks, key=itemgetter('score'), reverse=True)

    # 4. Return the top N results for the LLM
    return reranked_chunks[:top_n]


Setting up Mandatory Re-ranker: cross-encoder/ms-marco-MiniLM-L-6-v2...
Re-ranker model loaded successfully.


## 5. Generative Layer: Prompt Quality and Final Answers

This section implements the mandatory quality of the prompt and final answers, using the openai client structure, and focusing on hallucination mitigation and traceability.


In [12]:
# GENERATION LAYER (Prompt Design and LIVE OpenAI LLM Call)


def generate_response(query: str, retrieved_chunks: List[Dict]) -> str:
    #Creates the RAG prompt with context and instructions, then generates the final answer using OpenAI.

    # 1. Format the context with clear sources
    context_list = []
    for i, chunk in enumerate(retrieved_chunks):
        # Create a citation tag for the context
        citation_source = f"Page {chunk['metadata']['Page_No.']}, Chunk {chunk['metadata']['Chunk_No.']}"
        context_list.append(f"[{citation_source}]: {chunk['chunk_text']}")

    context_text = "\n---\n".join(context_list)

    # 2. Design the Exhaustive Prompt
    system_prompt = f"""
    ROLE: You are an expert generative search system specializing in Group Member Life Insurance policy documents.
    TASK: Answer the user's question accurately and concisely, based *only* on the provided context.

    INSTRUCTIONS:
    1. **Strictly adhere to the context.** Do not use external knowledge. If the answer is not in the context, state: "The required information is not available in the provided policy excerpts."
    2. Answer the question directly and professionally.
    3. **MANDATORY CITATION:** After the final answer, include a "Citations:" section. For every fact you use, cite the corresponding source in the context.
    4. **Citation Format:** Use the exact format `` for each citation.

    POLICY EXCERPTS (CONTEXT):
    {context_text}
    """

    user_prompt = f"USER QUESTION:\n{query}"

    # 3. Generate the final response (LIVE OpenAI API CALL)
    print(f"   Calling LIVE OpenAI LLM ({GENERATION_MODEL_NAME})...")

    try:
        response = openai_client.chat.completions.create(
            model=GENERATION_MODEL_NAME,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_prompt}
            ],
            temperature=0.0 # Set low for factual retrieval
        )
        final_response = response.choices[0].message.content

    except Exception as e:
        final_response = f"**LLM GENERATION ERROR:** The API call failed. Please verify your OpenAI API Key in Cell 1 and ensure you have enough credits. Error details: {e}"

    return final_response

## 6. Query Search and Performance Evaluation

This cell executes the full RAG pipeline against your three self-designed queries and prints the results.

In [14]:
# EXECUTION

# 1. Design Test Queries
queries = {
    "Query 1": "List the three specific conditions that will cause a Member's Life Insurance to terminate.?",
    "Query 2": "What event marks the effective date for the POLICY RIDER described in the document?",
    "Query 3": "According to the policy, what is the earliest and latest time limit for a claimant to start legal action to recover benefits?"
}

print("\n" + "="*80)
print("RUNNING RAG SYSTEM AGAINST 3 SELF-DESIGNED QUERIES (LIVE API CALL)")
print("="*80)

# Initialize clients here to ensure they are available
try:
    if 'client' not in locals():
        print("Note: Initializing ChromaDB client for execution...")
        client = chromadb.PersistentClient(path=CHROMA_DB_PATH)
        collection = client.get_or_create_collection(name=COLLECTION_NAME, embedding_function=embedding_functions.SentenceTransformerEmbeddingFunction(model_name=EMBEDDING_MODEL_NAME))
        cache_collection = client.get_or_create_collection(name=CACHE_COLLECTION_NAME, embedding_function=embedding_functions.SentenceTransformerEmbeddingFunction(model_name=EMBEDDING_MODEL_NAME))
except Exception as e:
    print(f"Warning: ChromaDB client initialization failed. {e}. Please ensure previous cells ran successfully.")


for query_name, query_text in queries.items():
    print(f"\n\n--- {query_name}: {query_text} ---")

    # 1. SEARCH LAYER: Query (with Cache check)
    initial_results = query_with_cache(query_text, k_chunks=TOP_K_CHUNKS)

    # --- SAMPLE DATA AFTER INITIAL SEARCH (Pre-Re-ranking) ---
    print("\n--- SAMPLE DATA: INITIAL CHROMA SEARCH (Top 3 of K=10, Ranked by Distance) ---")
    pre_rerank_data = pd.DataFrame({
        'Rank (by Distance)': range(1, 4),
        'Distance (Lower is Better)': [f"{d:.4f}" for d in initial_results['distances'][0][:3]],
        'Chunk_Text_Snippet': [t[:100].replace('\n', ' ') + "..." for t in initial_results['documents'][0][:3]]
    })
    print(pre_rerank_data.to_markdown(index=False))
    # ----------------------------------------------------------------------------------

    # 2. SEARCH LAYER: Mandatory Re-ranking
    top_3_reranked = rerank_results(query_text, initial_results, top_n=TOP_N_RERANKED)

    # --- 1: SEARCH LAYER OUTPUT (Top 3 Reranked Chunks) ---
    print("\n[SEARCH LAYER - TOP 3 RERANKED CHUNKS (Ranked by Relevance Score)]")
    top_3_chunks_for_ss = pd.DataFrame([
        {'Rank': i+1,
         'Page_Source': chunk['metadata']['Page_No.'],
         'Relevance_Score': f"{chunk['score']:.4f}",
         'Chunk_Text': chunk['chunk_text'][:180] + "..."}
        for i, chunk in enumerate(top_3_reranked)
    ])

    # Table output
    print(top_3_chunks_for_ss.to_markdown(index=False))

    # 3. GENERATION LAYER: Generate Final Answer (LIVE CALL)
    final_answer = generate_response(query_text, top_3_reranked)

    # --- 2: GENERATION LAYER OUTPUT (Final LLM Answer) ---
    print("\n[GENERATION LAYER - FINAL LLM ANSWER]")
    print(f"QUERY: {query_text}\n{'='*70}")
    print(final_answer)
    # ------------------------------------------------------------------------

print("\n" + "="*80)
print("EXECUTION COMPLETE.")
print("="*80)


RUNNING RAG SYSTEM AGAINST 3 SELF-DESIGNED QUERIES (LIVE API CALL)


--- Query 1: List the three specific conditions that will cause a Member's Life Insurance to terminate.? ---
-> Querying DB with k=10...
Cache Miss/Irrelevant. Proceeding with search.

--- SAMPLE DATA: INITIAL CHROMA SEARCH (Top 3 of K=10, Ranked by Distance) ---
|   Rank (by Distance) |   Distance (Lower is Better) | Chunk_Text_Snippet                                                                                      |
|---------------------:|-----------------------------:|:--------------------------------------------------------------------------------------------------------|
|                    1 |                       0.3183 | n A Member will qualify for individual purchase if insurance under this Group Policy terminates and:... |
|                    2 |                       0.3185 | Section C - Individual Terminations Article 1 - Member Life Insurance A Member's insurance under thi... |
|                 

## 7. Documentation: Design Choices & Challenges

## Project Goals & Data Source
* **Goal:** Build a robust Generative Search System (RAG) for the **Group Member Life Insurance Policy** document, optimized across all three layers for accuracy and efficiency.

---

## Design Choices & Experimentation Summary

| Layer | Requirement | Design Choice Implemented | Rationale and Impact |
| :--- | :--- | :--- | :--- |
| **Embedding** | **Optimal Chunking** | **Custom Fixed-Size Character Splitter** (512 char, 128 overlap) | **Constraint Adherence & Simplicity:** Replaced dependency on `langchain-text-splitters` with a custom function using simple Python logic to meet the fixed-size chunking strategy mentioned in the requirements. |
| **Embedding** | **Embedding Model** | `all-MiniLM-L6-v2` (`sentence-transformers`) | **Efficiency & Quality:** Selected for its excellent performance in semantic similarity tasks while remaining lightweight and fast for a practical RAG system. |
| **Search** | **Mandatory Cache** | **Query-Based ChromaDB Cache** with `CACHE_THRESHOLD=0.05` | **Efficiency:** Reduces computation cost for redundant queries. The threshold ensures only highly similar queries trigger a cache hit, maintaining answer quality. |
| **Search** | **Mandatory Re-ranker** | **Cross-Encoder Model:** `cross-encoder/ms-marco-MiniLM-L-6-v2` (`sentence-transformers.CrossEncoder`) | **Constraint Adherence & Quality of Search:** Used the `CrossEncoder` class from the **allowed `sentence-transformers` library** to refine the initial vector search results, guaranteeing the **Top 3** chunks are truly informative for the LLM. |
| **Generation**| **Generation LLM** | **OpenAI's GPT-3.5-Turbo** | **Constraint Adherence:** Used the mandatory `openai` library for the generation layer, providing a high-quality model for synthesis. |
| **Generation**| **Quality of Prompt** | **Exhaustive, Citation-driven Prompt** | **Trust & Verifiability:** The prompt strictly enforces answering *only* from the context and mandates **specific citations** (``), mitigating hallucination and ensuring factual accuracy. |

---

## Challenges Faced

* **PDF Parsing:** Policy documents use complex formatting (tables, headers). This was solved by using **`pdfplumber`** with **custom logic** to extract and correctly integrate table data, preventing loss of critical information.
* **LLM Hallucination:** Requiring **specific citations** in the final answer was the primary defense against the LLM generating plausible but incorrect information, ensuring the high factual accuracy required for insurance policy documents.
