# 🧪 Exercise 3: PDF Question Answering using Chunking, Vector Search, and LLM

In this exercise, you'll complete a **retrieval-augmented generation (RAG)** pipeline that:
- Chunks and embeds the content of a PDF
- Stores the chunks in an in-memory vector database (Qdrant)
- Uses a local LLM to answer yes/no/unknown questions based strictly on the PDF content

You will implement the missing components of this pipeline, focusing on document chunking, retrieval, and prompt construction.

---

### 🎯 Goal

Your objective is to build a system that can answer **yes**, **no**, or **unknown** questions based solely on the information in a given **PDF file**.

- The answer must be **one word only**: `"Yes"`, `"No"`, or `"Unknown"`
- The LLM must not use external knowledge — it must rely **only** on content retrieved from the PDF
- The total prompt sent to the LLM is limited to **2000 characters**, including:
  - The instruction
  - Retrieved context chunks  
  *(🚫 The question itself is not included in this limit and will be added separately)*

---

### 🧠 What You Need to Do

You must implement the following three core functions:

---

#### ✅ `chunk_and_store(text: str) -> tuple[QdrantClient, SentenceTransformer]`

This function prepares the document for retrieval.

**Responsibilities:**
- Chunk the input `text` into smaller segments
- Encode each chunk into a vector using a pre-trained embedding model such as `all-MiniLM-L6-v2`
- Store the vectors in an in-memory **Qdrant** database using `QdrantClient(":memory:")`
- Store relevant metadata for each chunk (e.g., start offset, method)

**Returns:**
- `client`: a Qdrant client object that contains the embedded chunks
- `model`: the SentenceTransformer model used for encoding

---

#### ✅ `create_prompt(question: str, client: QdrantClient, model: SentenceTransformer) -> str`

This function builds the prompt to be sent to the LLM.

**Responsibilities:**
- Retrieve the top-k most relevant chunks from Qdrant using the question as a query
- Construct a prompt that includes:
  - A fixed instruction (you may define this in the function)
  - The most relevant retrieved chunks
- The full prompt must not exceed **400 characters**, excluding the question

**Returns:**
- A string containing the prompt (instruction + context), **excluding** the question

---


### ✅ `my_call_llm(prompt: str, question: str) -> str`

This function provides an interface to the LLM, but must not invoke the LLM directly.

**Responsibilities:**
- Optionally apply logic to enhance or adapt the query (e.g., pre-processing the prompt, logging, enforcing formatting rules)
- Call the provided `call_llm(prompt, question)` function to actually interact with the model
- Return the result unchanged, or with controlled, explainable adjustments **that do not modify the content of the LLM’s response**

**Rules:**
- ❌ Must **not** embed, re-embed, or analyze any part of the original document or its chunks
- ❌ Must **not** call `subprocess`, `ollama`, or any direct LLM API
- ✅ Must **only** call `call_llm(prompt, question)` to obtain the response

**Returns:**
- A string (typically `"Yes"`, `"No"`, or `"Unknown"`) returned by the LLM, possibly post-processed for stability, format, or logging

**Purpose:**
- This function acts as a controlled gateway to LLM usage, allowing improvements in how prompts are used or tracked, without modifying or reprocessing the document or query logic

---


> 💡 **Tip:** You are encouraged to define helper functions to simplify your code and improve readability.

---

### ✨ Provided Function (DO NOT CHANGE)

#### ✅ `call_llm(prompt: str) -> str`

This function is already implemented for you.

- It calls the `llama3.2:3b` model via `ollama`
- Receives the question and your constructed prompt
- Returns the LLM's answer (expected: `"Yes"`, `"No"`, or `"Unknown"`)

You do **not** need to re-implement or modify this function.

---

### 🧪 Evaluation Criteria

- Your system will be evaluated using a **corpus of 100 questions** on a **known PDF document**
- You will be given in advance a **sample of 20 questions** from the evaluation corpus for development and testing
- Your code must generate the correct **yes/no/unknown** answers for the full 100-question set
- **Total execution time** will be measured for the entire run (reading, chunking, querying, and answering)

---

### 🚫 Restrictions

- **Do not** modify any code cell marked with `# DO NOT CHANGE`.
- **Do not** override any variable or function defined in those protected cells.
- Your code must run successfully in the **Lab 10002** environment (`GenAI025_CUDA`), using only the libraries provided.

---

> 💡 **Tip:** Write clean, modular code. Aim for accuracy, clarity, and runtime efficiency.

---

## ✅ Good luck!

---

## 📉 Points Deduction Rules

1. **Modifying restricted code**  
   - Changing any `# DO NOT CHANGE` cell or variable: **–50 points**

2. **Importing any additional library**  
   - Importing any library that is **not already used** in the template: **–5 points per library**  
   - ✅ *No penalty* for importing additional modules or functions from libraries that are already used (e.g., importing more from `langchain` or `sentence_transformers`)

3. **Code compatibility**  
   - Code fails to run in Lab 10002: **–100 points**

4. **Execution time (total run of 100 questions)**  
   - Runs for **5–10 minutes**: **–30 points**  
   - Runs for **>10 minutes**: **–100 points**

5.  **Violating restrictions inside `my_call_llm()`**  
   - ❌ Must **not** embed, re-embed, or analyze any part of the original document or its chunks  
   - ❌ Must **not** call `subprocess`, `ollama`, or any direct LLM API  
   - ✅ Must **only** call `call_llm(prompt, question)` to obtain the response  
   - Penalty: **–100 points**

---

## 🧮 Final Score Calculation

$$
\text{Final Score} = \min \left(100,\ \frac{\text{Your correct answers}}{\text{Gadi’s correct answers}} \times 100 \right) - \text{Total Deductions}
$$

---

📌 *Submit clean, working code. Only modify what you're allowed to. You got this!*


In [1]:
# SET PATH According to your configuration.
PDF_PATH = "MyBank Credit Card Brochure.pdf"
QUESTIONS_PATH = "questions.txt"


In [2]:
# DO NOT CHANGE
import fitz
import os
import uuid
import spacy
import subprocess
from langchain.text_splitter import (
    CharacterTextSplitter,
    NLTKTextSplitter,
    SpacyTextSplitter,
    RecursiveCharacterTextSplitter
)
from sentence_transformers import SentenceTransformer
from qdrant_client import QdrantClient, models
from qdrant_client.http.models import VectorParams, PointStruct
import time

In [3]:
# DO NOT CHANGE
def load_pdf(pdf_path):
    with fitz.open(pdf_path) as doc:
        text = "\n".join(page.get_text() for page in doc)
    return text


In [4]:
# DO NOT CHANGE
def load_questions(questions_path):
    with open(questions_path, "r") as f:
        questions = [q.strip() for q in f.readlines() if q.strip().endswith('?')]
    return questions


In [5]:
# HELPER FUNCTIONS AND GLOBAL VARIABLES
# ------------------------------------

# --- Model and Collection ---
MODEL_NAME = 'all-MiniLM-L6-v2'
# Define a FIXED collection name that both chunk_and_store and create_prompt will use
COLLECTION_NAME = "rag_exercise_pdf_collection_fixed_v1" # Using a specific, fixed name

# --- Chunking Parameters ---
CHUNK_METHOD = "RecursiveCharacterTextSplitter"
CHUNK_SIZE = 256
CHUNK_OVERLAP = 40

# --- Retrieval and Prompting Parameters ---
TOP_K = 3                            # Number of top relevant chunks to retrieve - TUNE THIS
PROMPT_CONTEXT_LIMIT = 400           # Max characters for the final prompt (instruction + context)


# --- Helper Function: find_chunks ---
def find_chunks(question: str, client: QdrantClient, model: SentenceTransformer, collection_name: str, top_k: int = TOP_K):
    """
    Encodes the question and searches Qdrant for the top_k most relevant chunks.

    Args:
        question: The user's question.
        client: Initialized QdrantClient.
        model: Initialized SentenceTransformer model.
        collection_name: The name of the Qdrant collection to search.
        top_k: The maximum number of chunks to retrieve (defaults to global TOP_K).

    Returns:
        A list of Qdrant ScoredPoint objects representing the relevant chunks.
        Returns an empty list if the search fails or yields no results.
    """
    if not collection_name:
        print("Error: Collection name is not set for find_chunks.")
        # Consider raising an error instead of just printing and returning empty list
        # raise ValueError("Collection name must be provided to find_chunks")
        return []

    # Use the provided top_k argument, which defaults to the global TOP_K
    print(f"  Searching for top {top_k} chunks in collection '{collection_name}'...")
    try:
        # 1. Embed the Question
        question_vector = model.encode(question).tolist()

        # 2. Search Qdrant
        search_results = client.search(
            collection_name=collection_name,
            query_vector=question_vector,
            limit=top_k
            # Optional: Add score_threshold=0.X if needed based on experimentation
            # score_threshold=0.4 # Example threshold
        )
        print(f"  Found {len(search_results)} candidate chunks.")
        return search_results

    except Exception as e:
        print(f"  Error during Qdrant search in find_chunks: {e}")
        # Consider re-raising the exception or logging it more formally
        return [] # Return empty list on error

# --- Type Hint Imports (Make sure these are at the top of the file) ---
# from typing import List
# from qdrant_client import QdrantClient
# from qdrant_client.http.models import ScoredPoint
# from sentence_transformers import SentenceTransformer
# import uuid # If generating collection names dynamically

### ✅ Task 1: Implement `chunk_and_store(text)`

In [6]:
def chunk_and_store(text: str):
    """
    Splits a given text into smaller chunks and stores them in a vector database or an internal memory structure.

    Parameters:
    ----------
    text : str
        The input text to be processed. This should be a large block of text (e.g., a document, an article, or a report).

    Behavior:
    --------
    1. The function splits the input `text` into manageable chunks based on predefined chunking rules 
       (e.g., maximum character count, sentence boundaries, semantic meaning).
    2. Each chunk is optionally enriched with metadata (e.g., chunk number, character offsets, original document ID).
    3. Each chunk is stored in a storage system such as:
       - An in-memory list or dictionary (for simple setups)
       - A vector database (e.g., Qdrant, FAISS, ChromaDB) after embedding the chunk using an encoder model
    
    Returns:
    -------
    client : qdrant_client.QdrantClient
        A Qdrant client object that contains the embedded and stored chunks.

    model : sentence_transformers.SentenceTransformer
        The SentenceTransformer model used for embedding the text chunks.
   
    Notes:
    -----
    - If using a vector database, the chunk is first passed through an embedding model to create a vector representation.
    - Chunking methods might vary (e.g., fixed-size, sentence-based, semantic-split) depending on implementation details.
    - The function assumes that the storage backend is already initialized and ready for storing chunks.

    Raises:
    ------
    ValueError
        If the input `text` is empty or not a valid string.

    Example:
    --------
    >>> chunk_and_store("This is a long article about machine learning...")
    # Splits the article into chunks and stores them internally or externally.

    """
    # Implementation goes here
    # TODO: implement chunking using multiple strategies
    # TODO: create in-memory Qdrant collection
    # TODO: embed each chunk and store in the DB with metadata (chunking method, start_offset)


    """
    Splits text into chunks using RecursiveCharacterTextSplitter, embeds them,
    and stores them in an in-memory Qdrant vector database using the globally
    defined COLLECTION_NAME.

    Parameters:
    ----------
    text : str
        The input text to be processed (e.g., content of a PDF).

    Returns:
    -------
    client : qdrant_client.QdrantClient
        A Qdrant client object connected to the in-memory database containing the embedded chunks.

    model : sentence_transformers.SentenceTransformer
        The SentenceTransformer model used for embedding the text chunks.

    Raises:
    ------
    ValueError
        If the input `text` is empty or not a valid string.
    """
    if not text or not isinstance(text, str):
        raise ValueError("Input text cannot be empty and must be a string.")

    print("-" * 50)
    print("Starting chunking and storing process...")
    print(f"Input text length: {len(text)} characters")
    start_time = time.time()

    # 1. Initialize Embedding Model
    print(f"Loading sentence transformer model: {MODEL_NAME}")
    try:
        model = SentenceTransformer(MODEL_NAME)
        embedding_size = model.get_sentence_embedding_dimension()
        if embedding_size is None:
             raise ValueError("Could not determine embedding dimension from the model.")
        print(f"Model loaded successfully. Embedding dimension: {embedding_size}")
    except Exception as e:
        print(f"Error loading SentenceTransformer model '{MODEL_NAME}': {e}")
        raise

    # 2. Initialize Qdrant Client (In-Memory)
    print("Initializing in-memory Qdrant client...")
    try:
        client = QdrantClient(":memory:")
        print("Qdrant client initialized.")
    except Exception as e:
        print(f"Error initializing Qdrant client: {e}")
        raise

    # 3. Create Qdrant Collection using the global COLLECTION_NAME
    print(f"Creating or recreating Qdrant collection: '{COLLECTION_NAME}'")
    try:
        client.recreate_collection(
            collection_name=COLLECTION_NAME, # Use the global constant
            vectors_config=models.VectorParams(size=embedding_size, distance=models.Distance.COSINE)
        )
        print(f"Collection '{COLLECTION_NAME}' created/recreated successfully.")
    except Exception as e:
        print(f"Error creating/recreating Qdrant collection '{COLLECTION_NAME}': {e}")
        raise

    # 4. Chunk the Text using the chosen strategy (defined globally)
    print(f"Chunking text using {CHUNK_METHOD} (Size: {CHUNK_SIZE}, Overlap: {CHUNK_OVERLAP})")
    try:
        if CHUNK_METHOD == "RecursiveCharacterTextSplitter":
            text_splitter = RecursiveCharacterTextSplitter(
                chunk_size=CHUNK_SIZE,
                chunk_overlap=CHUNK_OVERLAP,
                length_function=len,
            )
        else:
            # Add other methods here if needed, referencing CHUNK_METHOD
            raise ValueError(f"Unsupported chunking method specified globally: {CHUNK_METHOD}")

        chunks = text_splitter.split_text(text)
        print(f"Text split into {len(chunks)} chunks.")
    except Exception as e:
        print(f"Error during text chunking: {e}")
        raise

    if not chunks:
        print("Warning: No chunks were generated from the text. Returning empty client.")
        return client, model

    # 5. Embed Chunks and Prepare Points for Qdrant
    print("Embedding chunks and preparing points for Qdrant...")
    points_to_upsert = []
    chunk_processing_start_time = time.time()
    for i, chunk_text in enumerate(chunks):
        if not chunk_text.strip():
            print(f"Skipping empty chunk at index {i}")
            continue
        try:
            vector = model.encode(chunk_text).tolist()
            payload = { "text": chunk_text, "chunk_index": i, "method": CHUNK_METHOD }
            # Use a deterministic or random UUID for point IDs
            point_id = str(uuid.uuid4()) # Use random UUID
            # Or use deterministic ID if needed:
            # point_id = uuid.uuid5(uuid.NAMESPACE_DNS, f'{COLLECTION_NAME}_{i}_{chunk_text[:20]}').hex

            point = PointStruct(id=point_id, vector=vector, payload=payload)
            points_to_upsert.append(point)
        except Exception as e:
            print(f"Error processing or embedding chunk {i}: {e}")
            raise

    chunk_processing_end_time = time.time()
    print(f"Embedding and point preparation took {chunk_processing_end_time - chunk_processing_start_time:.2f} seconds.")

    # 6. Upsert Points to Qdrant Collection using the global COLLECTION_NAME
    if points_to_upsert:
        print(f"Upserting {len(points_to_upsert)} points into collection '{COLLECTION_NAME}'...")
        try:
            client.upsert(
                collection_name=COLLECTION_NAME, # Use the global constant
                points=points_to_upsert,
                wait=True
            )
            print("Upsert operation successful.")
        except Exception as e:
            print(f"Error upserting points into Qdrant: {e}")
            raise
    else:
        print("No valid points were generated to upsert.")

    end_time = time.time()
    print(f"Chunking and storing completed in {end_time - start_time:.2f} seconds.")
    print("-" * 50)

    # 7. Return only the client and model, as per original signature
    return client, model

### ✅ Task 2: Implement `create_prompt(question, client, model)`

In [7]:
def create_prompt(question: str, client, model):
    """
    Creates a context-only prompt for an LLM by retrieving relevant chunks from a vector database 
    based on a user question, using a vector similarity search.

    Parameters:
    ----------
    question : str
        The input question provided by the user. It should be a natural language query.

    client : qdrant_client.QdrantClient
        The Qdrant client connected to the database that contains stored and embedded text chunks.

    model : sentence_transformers.SentenceTransformer
        The SentenceTransformer model used to encode the input question into a vector embedding 
        for similarity search.

    Behavior:
    --------
    1. The function encodes the input `question` into a vector using the provided `model`.
    2. It queries the `client` (Qdrant database) using vector similarity search to find the most relevant chunks.
    3. It assembles a prompt by combining the retrieved chunks and other info (but without adding the question itself).
    4. The resulting prompt consists **only of context**, intended to be passed separately along with the question 
       in a later step when calling the LLM.

    Returns:
    -------
    prompt : str
        A fully formatted prompt string. 
        **The user's question is NOT included in the returned prompt.**

    Notes:
    -----
    - The search typically retrieves the top-k most similar chunks (e.g., top 5).
    - Retrieved chunks are usually concatenated together, separated by delimiters (e.g., "\n\n").
    - The question should be provided separately to the LLM after sending the prompt, or combined externally later.
    - This function assumes that both the client and model are already initialized and ready to use.

    Raises:
    ------
    ValueError
        If the input `question` is empty or not a valid string.

    Example:
    --------
    >>> context_prompt = create_prompt("What benefits does the Platinum Voyager Card offer?", client, model)
    >>> print(context_prompt)
    "Context:\n<retrieved chunks>"

    # Later, when calling the LLM:
    # final_prompt = context_prompt + "\n\nQuestion:\nWhat benefits does the Platinum Voyager Card offer?"
    """
    # TODO: use find_chunks()
    # TODO: build the prompt with CONTEXT_HEADER and top chunks
    # TODO: truncate to PROMPT_CHAR_LIMIT if needed

    """
    Creates a context-only prompt for an LLM by retrieving relevant chunks from
    the globally defined COLLECTION_NAME in Qdrant, based on a user question,
    adhering to character limits.

    Parameters:
    ----------
    question : str
        The input question provided by the user.
    client : qdrant_client.QdrantClient
        The Qdrant client connected to the database with embedded chunks.
    model : sentence_transformers.SentenceTransformer
        The SentenceTransformer model for encoding the question.

    Returns:
    -------
    prompt : str
        A formatted prompt string containing instructions and retrieved context,
        NOT including the original question. The total length is capped by
        PROMPT_CONTEXT_LIMIT.

    Raises:
    ------
    ValueError
        If the input `question` is empty or not a valid string.
    """
    print("-" * 50)
    print(f"Creating prompt for question: '{question[:100]}...'")

    if not question or not isinstance(question, str):
        raise ValueError("Input question cannot be empty and must be a string.")

    # 1. Define the Strict Instruction for the LLM
    instruction = (
    "Answer using ONLY the Context below. Respond with 'Yes', 'No', or 'Unknown'. "
    "If context is insufficient, answer 'Unknown'.\n\n"
    "Context:\n---\n"
    )
    context_end_delimiter = "\n---"

    # 2. Find Relevant Chunks directly using the global COLLECTION_NAME
    print(f"Finding relevant chunks in fixed collection: {COLLECTION_NAME} using TOP_K={TOP_K}...")
    try:
        question_vector = model.encode(question).tolist()
        relevant_chunks: List[ScoredPoint] = client.search(
            collection_name=COLLECTION_NAME, # Use global constant directly
            query_vector=question_vector,
            limit=TOP_K # Use global constant
            # Optional: score_threshold=0.4 # Add if tuning improves results
        )
        print(f"  Found {len(relevant_chunks)} candidate chunks.")
    except Exception as e:
        print(f"  Error during Qdrant search in create_prompt: {e}")
        relevant_chunks = [] # Continue with empty context on search error

    # 3. Build Context String, Respecting Character Limit (PROMPT_CONTEXT_LIMIT)
    print(f"Building context string, respecting limit of {PROMPT_CONTEXT_LIMIT} chars for instruction + context...")
    context_parts = []
    # Calculate length of fixed parts (instruction + final delimiter)
    current_length = len(instruction) + len(context_end_delimiter)
    added_chunks_count = 0

    if not relevant_chunks:
        print("  No relevant chunks found or retrieved.")
    else:
        for i, hit in enumerate(relevant_chunks):
            chunk_text = hit.payload.get("text", "").strip() # Get text and strip whitespace
            if not chunk_text:
                print(f"  Skipping empty chunk from search result {i}.")
                continue

            # Calculate length needed for this chunk (+1 for newline separator if not the first chunk)
            length_needed = len(chunk_text) + (1 if context_parts else 0)

            print(f"  Considering chunk {i} (Score: {hit.score:.4f}, Length: {len(chunk_text)}). Needs {length_needed} chars.")

            if current_length + length_needed <= PROMPT_CONTEXT_LIMIT:
                context_parts.append(chunk_text)
                current_length += length_needed
                added_chunks_count += 1
                print(f"    Added chunk {i}. Current prompt length: {current_length}/{PROMPT_CONTEXT_LIMIT}")
            else:
                print(f"    Skipped chunk {i}. Adding it would exceed limit ({current_length + length_needed}/{PROMPT_CONTEXT_LIMIT}). Stopping context assembly.")
                break # Stop adding chunks once limit is reached

    # 4. Assemble the Final Prompt
    context_string = "\n".join(context_parts) # Join collected chunks with newlines

    # Combine instruction, context (if any), and end delimiter
    final_prompt = instruction + context_string + context_end_delimiter
    final_length = len(final_prompt)

    print(f"Final prompt created. Added {added_chunks_count} chunks.")
    print(f"Final prompt length (instruction + context): {final_length} characters.")

    # Safeguard: Ensure the limit wasn't somehow exceeded (shouldn't happen with the logic above)
    if final_length > PROMPT_CONTEXT_LIMIT:
        print(f"ERROR: Final prompt length ({final_length}) exceeded limit ({PROMPT_CONTEXT_LIMIT}). Truncating brutally.")
        # Fallback: truncate the context part to fit.
        excess = final_length - PROMPT_CONTEXT_LIMIT
        # Ensure context_string has enough characters to remove 'excess' amount
        if len(context_string) >= excess:
             truncated_context_string = context_string[:-excess]
        else:
             # If context is somehow shorter than the excess (edge case), empty it
             truncated_context_string = ""
        final_prompt = instruction + truncated_context_string + context_end_delimiter
        print(f"Truncated prompt length: {len(final_prompt)}")

    print("-" * 50)
    return final_prompt


In [8]:
# DO NOT CHANGE
# LLM via Ollama
def call_llm(prompt: str, question: str) -> str:
    """
    Calls a local LLM using the Ollama CLI and returns the model's response.

    This function sends a prompt to the locally hosted `llama3.2:3b` model via the `ollama` command-line interface.
    It ensures the prompt does not exceed 500 characters and captures the model's output.

    Parameters:
        prompt (str): The full input prompt to be sent to the LLM. It should include context and instructions,
                      but not the question itself if using external control.

    Returns:
        str: The raw response generated by the model. If the model call times out, returns "Unknown".

    Notes:
        - The prompt is truncated to a maximum of 2000 characters before being sent.
        - The model is expected to return a one-word answer such as "Yes", "No", or "Unknown".
    """
    prompt = prompt[:2000] + "\nQuestion: " + question
    try:
        result = subprocess.run(
            ["ollama", "run", "llama3.2:3b"],
            input=prompt.encode("utf-8"),
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            timeout=30
        )
        return result.stdout.decode("utf-8").strip()
    except subprocess.TimeoutExpired:
        return "Unknown"


### ✅ Task 3: Implement `my_call_llm(prompt: str, question: str)`

In [9]:
def my_call_llm(prompt: str, question: str) -> str:
    """
    A wrapper function for controlled interaction with the local LLM.

    This function allows for preprocessing, logging, or evaluation logic
    around a call to the provided `call_llm()` function, but it must not
    directly interact with the LLM (e.g., via subprocess or embedding logic).

    🚫 Restrictions:
        - Must NOT embed, re-embed, or analyze any part of the original document or its chunks
        - Must NOT call `subprocess`, `ollama`, or any direct LLM APIs
        - Must ONLY interact with the LLM via the provided `call_llm(prompt, question)` function

    ✅ Allowed:
        - Logging or printing
        - Handling empty prompts or question formats
        - Calling `call_llm()` multiple times for retry logic or consistency checking
        - Standard string manipulations (if needed)

    Parameters:
        prompt (str): The constructed prompt (instructions + context, excluding the question).
        question (str): The original user question (to be passed to the LLM interface).

    Returns:
        str: - return a one-word answer typically one of "Yes", "No", or "Unknown".

    Example:
        >>> my_call_llm("Context: Data is collected by Google...", "Does Google share my location?")
        "Yes"
    """

    """
    A wrapper function for controlled interaction with the local LLM via the
    provided call_llm function, ensuring the final output is one of "Yes", "No", or "Unknown".

    🚫 Restrictions:
        - Must NOT embed, re-embed, or analyze document content or chunks.
        - Must NOT call subprocess, ollama, or any direct LLM API.
        - Must ONLY call the provided `call_llm(prompt, question)`.

    ✅ Allowed:
        - Logging, printing.
        - Calling `call_llm()` multiple times (e.g., for retries - not implemented here).
        - Standard string manipulations on the *response* from call_llm for validation/formatting.

    Parameters:
        prompt (str): The constructed prompt (instructions + context, excluding the question).
        question (str): The original user question (to be passed to the LLM interface).

    Returns:
        str: A one-word answer: "Yes", "No", or "Unknown".
    """
    print("-" * 50)
    print("Executing my_call_llm...")
    # print(f"  Received Prompt (Instruction + Context) length: {len(prompt)}")
    # print(f"  Received Question: {question}")

    # --- Pre-computation / Checks (Optional and Allowed) ---
    if not prompt or not question:
        print("  Warning: Received empty prompt or question in my_call_llm. Returning 'Unknown'.")
        return "Unknown"

    # --- The Required Call to the Provided Function ---
    print("  Calling the provided 'call_llm' function...")
    start_time = time.time()
    # Ensure call_llm is defined in the environment this runs in
    try:
        raw_llm_response = call_llm(prompt, question)
    except NameError:
         print("FATAL ERROR: The required 'call_llm' function is not defined in the execution environment!")
         # In a real scenario, might want to raise or handle differently,
         # but for the exercise, returning Unknown might be the safest fallback if possible.
         return "Unknown" # Fallback if call_llm doesn't exist
    except Exception as e:
         print(f"ERROR during the call to 'call_llm': {e}")
         return "Unknown" # Fallback on other errors during the external call

    end_time = time.time()
    # Inside my_call_llm, after getting raw_llm_response:
    cleaned_response = raw_llm_response.strip().capitalize() # Keep consistent capitalization
    final_answer: str
    
    # --- Prioritize Exact Matches ---
    if cleaned_response == "Yes":
        final_answer = "Yes"
    elif cleaned_response == "No":
        final_answer = "No"
    elif cleaned_response == "Unknown":
        final_answer = "Unknown"
    # --- Heuristics ONLY if no exact match ---
    elif cleaned_response.startswith("Yes"):
        final_answer = "Yes"
        print(f"  INFO: Interpreted '{cleaned_response}' as 'Yes'.")
    elif cleaned_response.startswith("No"):
        final_answer = "No"
        print(f"  INFO: Interpreted '{cleaned_response}' as 'No'.")
    # Optional: Add keyword checks for Unknown here if needed
    elif "not mentioned" in raw_llm_response.lower() or "does not say" in raw_llm_response.lower():
        final_answer = "Unknown"
        print(f"  INFO: Interpreted '{cleaned_response}' as 'Unknown' based on keywords.")
    else:
        # Default to Unknown if no exact match and heuristics fail
        final_answer = "Unknown"
        print(f"  WARNING: Raw response '{raw_llm_response}' -> Cleaned '{cleaned_response}' could not be mapped. Defaulting to 'Unknown'.")
    
    print(f"  Returning final validated answer: '{final_answer}'")
    print("-" * 50)
    return final_answer
   

In [10]:
# DO NOT CHANGE
def run_rag_pipeline(pdf_path,questions_path):
    """
    Runs the RAG pipeline for all questions in the input list, printing full results and tracking execution time.

    The process includes:
    1. Loading and chunking the PDF.
    2. Embedding and storing chunks in Qdrant.
    3. Answering each question using a locally hosted LLM (via Ollama).
    4. Printing the full Q&A pairs.
    5. Reporting total runtime with a warning if the run exceeds 5 or 10 minutes.
    6. Printing a summary of answers only (one per line).
    """
    start_time = time.time()

    text = load_pdf(pdf_path)
    questions = load_questions(questions_path)

    # Chunk and store once (not inside the loop)
    client, model = chunk_and_store(text) # your function

    all_answers = []

    print("🧠 Answering questions...")
    for question in questions:
        prompt = create_prompt(question, client, model) # your function
        answer = my_call_llm(prompt,question)
        all_answers.append((question, answer)) 
        # print(f"\nQ: {prompt} \n Q: {question} \n A: {answer} \n {'-'*60} \n")
        # print(f"Q: {question} \n A: {answer} \n {'-'*60} \n")

    total_time = time.time() - start_time
    minutes = total_time / 60

    print("\n⏱️ Total Runtime: {:.2f} seconds ({:.2f} minutes)".format(total_time, minutes))
    if minutes > 10:
        print("⚠️ Warning: Runtime exceeds 10 minutes!")
    elif minutes > 5:
        print("⚠️ Notice: Runtime exceeds 5 minutes.")

    print("\n📝 Summary of Answers:")
    i=0
    for _, answer in all_answers:
        i+=1
        print(i,". ",answer)

In [11]:
# DO NOT CHANGE
run_rag_pipeline(PDF_PATH,QUESTIONS_PATH)

--------------------------------------------------
Starting chunking and storing process...
Input text length: 21216 characters
Loading sentence transformer model: all-MiniLM-L6-v2
Model loaded successfully. Embedding dimension: 384
Initializing in-memory Qdrant client...
Qdrant client initialized.
Creating or recreating Qdrant collection: 'rag_exercise_pdf_collection_fixed_v1'
Collection 'rag_exercise_pdf_collection_fixed_v1' created/recreated successfully.
Chunking text using RecursiveCharacterTextSplitter (Size: 256, Overlap: 40)
Text split into 106 chunks.
Embedding chunks and preparing points for Qdrant...


  client.recreate_collection(


Embedding and point preparation took 0.62 seconds.
Upserting 106 points into collection 'rag_exercise_pdf_collection_fixed_v1'...
Upsert operation successful.
Chunking and storing completed in 2.64 seconds.
--------------------------------------------------
🧠 Answering questions...
--------------------------------------------------
Creating prompt for question: 'Will I receive 2,400 points if I spend $800 on a hotel with a Travel Rewards+ Card?...'
Finding relevant chunks in fixed collection: rag_exercise_pdf_collection_fixed_v1 using TOP_K=3...
  Found 3 candidate chunks.
Building context string, respecting limit of 400 chars for instruction + context...
  Considering chunk 0 (Score: 0.6592, Length: 248). Needs 248 chars.
    Added chunk 0. Current prompt length: 389/400
  Considering chunk 1 (Score: 0.5651, Length: 241). Needs 242 chars.
    Skipped chunk 1. Adding it would exceed limit (631/400). Stopping context assembly.
Final prompt created. Added 1 chunks.
Final prompt length (i

  relevant_chunks: List[ScoredPoint] = client.search(


  Returning final validated answer: 'No'
--------------------------------------------------
--------------------------------------------------
Creating prompt for question: 'Will I receive 1,500 points if I book a $500 flight with a Travel Rewards+ Card?...'
Finding relevant chunks in fixed collection: rag_exercise_pdf_collection_fixed_v1 using TOP_K=3...
  Found 3 candidate chunks.
Building context string, respecting limit of 400 chars for instruction + context...
  Considering chunk 0 (Score: 0.6209, Length: 223). Needs 223 chars.
    Added chunk 0. Current prompt length: 364/400
  Considering chunk 1 (Score: 0.5651, Length: 248). Needs 249 chars.
    Skipped chunk 1. Adding it would exceed limit (613/400). Stopping context assembly.
Final prompt created. Added 1 chunks.
Final prompt length (instruction + context): 364 characters.
--------------------------------------------------
--------------------------------------------------
Executing my_call_llm...
  Calling the provided 'call