# **Objective**

The goal of this project is to develop a **Google Colab notebook** that enables users to upload documents, store them in **ChromaDB**, and perform **question-answering (QA) with optimization** using **LLMs**. Additionally, the project will integrate **Agentic Retrieval-Augmented Generation (Agentic RAG)** to enhance retrieval and reasoning capabilities.

## **Key Features & Learning Goals**

### 1. **Chunk Storage & Retrieval**
- Allow users to upload documents.
- Process files into **text chunks** and store them in **ChromaDB** for efficient retrieval.

### 2. **Question Answering System**
- Users can ask questions about the uploaded content.
- Retrieve relevant chunks and generate an **initial answer** using an **LLM**.

### 3. **Chunk Quality Evaluation**
- Identify **poor-quality chunks** that negatively impact the answer.
- Provide **reasons** for their poor quality (e.g., missing context, irrelevant data, incomplete information).

### 4. **Chunk Optimization & Re-Answering**
- Utilize an **LLM** to improve chunking for **better answer quality**.
- Store the optimized chunks and **generate an enhanced answer**.

### 5. **Comparison & Insights**
- Compare **poor chunks vs. optimized chunks** and highlight improvements.
- Show how optimized chunks improve **retrieval accuracy and answer relevance**.

### 6. **Learning Agentic RAG**
- Implement **Agentic Retrieval-Augmented Generation (Agentic RAG)** to enable the system to dynamically refine retrieval strategies.
- Use an **agent-based approach** where the system iteratively improves the document chunking and retrieval for **continuous learning and better responses**.

This project will enhance document-based **QA systems** by combining **LLM-powered chunk optimization** and **Agentic RAG strategies** to improve retrieval, reasoning, and answer quality. 🚀



---



### **What is Agentic RAG?**
Agentic Retrieval-Augmented Generation (**Agentic RAG**) is an advanced approach to **RAG-based AI systems**, where the retrieval process is dynamically improved using **agent-based reasoning**. Unlike traditional RAG, where retrieval is static, **Agentic RAG autonomously evaluates and refines retrieval strategies**, ensuring more accurate and context-aware answers.

### **Real-Life Example Related to the Project**
Imagine you are working in a **customer support team** that handles large volumes of **technical documentation**. A customer asks a complex question about a **software issue**, but the knowledge base retrieves **irrelevant or incomplete chunks** of information, leading to **poor response quality**.

- In a **traditional RAG system**, the system would fetch the best-matching chunks and generate an answer, but if the chunks are low quality, the response remains **subpar**.
- With **Agentic RAG**, the system **identifies poor chunks, understands why they are inadequate (e.g., missing context or outdated info), and autonomously refines the retrieval** to fetch better data before generating an improved answer.




---

## **Problem Statement & Solution**

### **Problem Statement**
In traditional **document-based Q&A systems**, retrieving relevant chunks from large documents is often unreliable. The system may:
- Retrieve **irrelevant** or **incomplete** chunks, leading to poor answers.
- Fail to recognize when a chunk lacks context or contains outdated information.
- Be unable to refine its retrieval process dynamically, leading to **suboptimal responses**.

### **Solution We Are Offering**
To address these challenges, we propose an **Agentic RAG-powered Google Colab Notebook** that improves document-based Q&A by:

1. **Chunking and Storing Documents Efficiently**:
   - Users upload documents.
   - The system processes them into **semantically meaningful chunks** and stores them in **ChromaDB**.

2. **Retrieving and Answering User Queries**:
   - The user asks a question.
   - The system retrieves the most relevant chunks and generates an **initial answer**.

3. **Evaluating Chunk Quality**:
   - The system **analyzes** retrieved chunks to detect poor-quality data.
   - It provides **reasons** why certain chunks are ineffective.

4. **Optimizing the Chunks**:
   - Using an **LLM**, the system restructures, refines, and **optimizes the chunking process**.
   - The improved chunks are stored back in **ChromaDB**.

5. **Generating an Improved Answer**:
   - The system **retrieves the optimized chunks** and generates a **better-quality answer**.
   - A **comparison** is made between the original and optimized chunks to show improvements.

This ensures **continuous learning and refinement** of the document-based QA system, ultimately enhancing **retrieval accuracy, answer relevance, and user experience**.

In [2]:
# Install required libraries
!pip install -q langchain chromadb openai pypdf transformers sentence-transformers
!pip install -q torch torchvision torchaudio


[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m611.1/611.1 kB[0m [31m24.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m61.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m302.0/302.0 kB[0m [31m17.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m284.2/284.2 kB[0m [31m14.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m94.9/94.9 kB[0m [31m6.0 MB/s[0m eta [36m0:0

In [3]:
! pip install langchain-community

Collecting langchain-community
  Downloading langchain_community-0.3.19-py3-none-any.whl.metadata (2.4 kB)
Collecting langchain-core<1.0.0,>=0.3.41 (from langchain-community)
  Downloading langchain_core-0.3.41-py3-none-any.whl.metadata (5.9 kB)
Collecting langchain<1.0.0,>=0.3.20 (from langchain-community)
  Downloading langchain-0.3.20-py3-none-any.whl.metadata (7.7 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Downloading pydantic_settings-2.8.1-py3-none-any.whl.metadata (3.5 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting marshmallow<4.0.0,>=3.18.0 (from dataclasses-json<0.7,>=0.5.7->langchain-community)
  Downloading marshmallow-3.26.1-py3-none-any.whl.metadata (7.3 kB)
Collecting typing-inspect<1,>=0.4.0 (from dataclasses-

Here's a short description of why each import is used in your Document QA system with Agent Routing:  

- **`import os`** – For interacting with the operating system (e.g., file paths, environment variables).  
- **`import torch`** – Required for deep learning operations, primarily for running models on GPU (PyTorch framework).  
- **`import chromadb`** – For storing and retrieving embeddings using ChromaDB, a vector database.  
- **`from langchain.text_splitter import RecursiveCharacterTextSplitter`** – Splits large documents into smaller chunks for efficient embedding and retrieval.  
- **`from langchain.document_loaders import PyPDFLoader`** – Loads and extracts text from PDF documents.  
- **`from langchain.embeddings import HuggingFaceEmbeddings`** – Generates vector embeddings from text using pre-trained Hugging Face models.  
- **`from transformers import AutoTokenizer, AutoModelForQuestionAnswering`** – Loads a transformer-based model for question-answering tasks.  
- **`import google.colab`** – Used when running code in Google Colab to interact with its environment (e.g., file handling, hardware setup).  



In [4]:
import os
import torch
import chromadb
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import HuggingFaceEmbeddings
from transformers import AutoTokenizer, AutoModelForQuestionAnswering
import google.colab

## Set Up OpenAI API Key

Please add your OpenAI API key in the empty string (`""`) below.

[Generate Your OpenAI API Key](https://github.com/initmahesh/MLAI-community-labs/tree/main/Class-Labs/Lab-0(Pre-requisites)

```python
   os.environ["OPENAI_API_KEY"] = "your_api_key_here"
   ```

In [2]:
# Set up OpenAI API Key (replace with your key)
# Explicitly set OpenAI API Key
# Option 1: Direct setting (replace with your actual key)
os.environ["OPENAI_API_KEY"] = ""

## Upload Your Document

Add your file below by clicking on the **Upload File** button when you run this cell.

**You can try it out with the following reference document:**
[Reference Document Link Here](https://drive.google.com/file/d/1msnU5C_uAu3Z161q_MmcNTePhk9Jw1RW/view?usp=sharing)



In [6]:
from google.colab import files

# Upload document
uploaded = files.upload()

# Get the filename of the uploaded file
filename = list(uploaded.keys())[0]

Saving CONTRACTOR INSURANCE REQUIREMENTS.pdf to CONTRACTOR INSURANCE REQUIREMENTS.pdf


## Initial Chunking of the Document  

This code performs **initial chunking** of the uploaded document using the `RecursiveCharacterTextSplitter` from LangChain. Chunking is essential for handling large documents efficiently when processing with embeddings and retrieval models.

### **Why Do We Need This?**
- Large documents need to be broken into smaller chunks for effective text embedding and retrieval.
- Improves the performance of language models by ensuring contextually relevant inputs.
- Prevents exceeding token limits when processing with LLMs.

### **What This Code Does?**
1. **Defines the `initial_chunking` function**  
   - Uses `RecursiveCharacterTextSplitter` with:  
     - `chunk_size=500`: Each chunk contains ~500 characters.  
     - `chunk_overlap=50`: Ensures overlapping content between chunks for better continuity.  
   - Loads the document using `PyPDFLoader`.  
   - Splits the document into smaller chunks.  

2. **Performs Initial Chunking**  
   - Calls `initial_chunking(filename)` to process the uploaded PDF.  
   - Stores the resulting chunks in `poor_chunks`.  

🚀 **This step helps in preparing the document for embedding and retrieval in a structured way!**  


In [7]:
# Initial chunking with default parameters
def initial_chunking(file_path):
    # Default chunking with basic parameters
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=500,  # Large, potentially low-quality chunks
        chunk_overlap=50,
        length_function=len
    )

    # Load and split the document
    loader = PyPDFLoader(file_path)
    documents = loader.load_and_split(text_splitter)

    return documents

# Perform initial chunking
poor_chunks = initial_chunking(filename)

## Assessing Chunk Quality  

This function evaluates the **quality of document chunks** based on three key criteria:  
1. **Chunk Length Consistency** – Ensures chunks are of optimal length.  
2. **Contextual Coherence** – Assesses whether chunks contain meaningful content.  
3. **Semantic Diversity** – Measures word uniqueness across chunks.  

### **Why Do We Need This?**
- Ensures that document chunks are well-structured for effective embedding and retrieval.  
- Prevents chunks that are too short (loss of context) or too long (inefficient for LLMs).  
- Helps optimize chunking strategies for better performance in downstream tasks like QA systems.  

### **What This Code Does?**
1. **Chunk Length Consistency**  
   - Computes min, max, and average chunk lengths.  
   - Normalizes a **length score** based on deviation from the ideal chunk length (350 characters).  

2. **Contextual Coherence Assessment**  
   - Splits chunks into words and assigns coherence scores:  
     - **0** for very short chunks (<10 words).  
     - **0.5** for overly long chunks (>50 words).  
     - **1** for optimal-length chunks.  
   - Computes the **average coherence score** across all chunks.  

3. **Semantic Diversity Calculation**  
   - Uses unique word count to measure diversity.  
   - Normalizes the diversity score within the **0-1 range**.  

4. **Final Quality Metrics**  
   - Combines all three metrics (`length_score`, `avg_coherence`, `semantic_diversity`) into a dictionary for further analysis.  

🚀 **This ensures that document chunks are well-balanced, contextually meaningful, and diverse for effective processing in AI applications.**  


In [8]:
def assess_chunk_quality(chunks):
    """
    Assess the quality of chunks based on multiple normalized criteria
    """
    # Calculate chunk lengths
    chunk_lengths = [len(chunk.page_content) for chunk in chunks]

    # Normalize chunk length
    min_length = min(chunk_lengths)
    max_length = max(chunk_lengths)
    avg_length = sum(chunk_lengths) / len(chunks)

    # Length similarity score (closer to optimal length is better)
    ideal_chunk_length = 350  # Adjust based on your preference
    length_deviation = abs(avg_length - ideal_chunk_length) / ideal_chunk_length
    length_score = 1 - min(length_deviation, 1)  # Normalize to 0-1

    # Contextual coherence assessment
    coherence_scores = []
    for chunk in chunks:
        # Check meaningful content
        words = chunk.page_content.split()

        # Assess word count and meaningful content
        if len(words) < 10:
            coherence_scores.append(0)  # Very short chunks
        elif len(words) > 50:
            coherence_scores.append(0.5)  # Potentially too long
        else:
            coherence_scores.append(1)  # Optimal length

    avg_coherence = sum(coherence_scores) / len(coherence_scores)

    # Semantic diversity (basic implementation)
    # This could be enhanced with more sophisticated embedding-based diversity calculation
    unique_words = set()
    total_words = 0
    for chunk in chunks:
        chunk_words = set(chunk.page_content.lower().split())
        unique_words.update(chunk_words)
        total_words += len(chunk.page_content.split())

    semantic_diversity = len(unique_words) / total_words if total_words > 0 else 0

    # Combine metrics with weights
    quality_metrics = {
        'avg_chunk_length': length_score,
        'contextual_coherence': avg_coherence,
        'semantic_diversity': min(semantic_diversity, 1)  # Ensure 0-1 range
    }

    return quality_metrics

## Optimizing Chunks with LLM and Storing in ChromaDB  

This code optimizes text chunks using **LLM-guided refinement** and then stores the improved chunks in **ChromaDB**, a vector database for efficient retrieval.  

### **Why Do We Need This?**
- Ensures document chunks are **clear, coherent, and contextually complete** before embedding.  
- Removes irrelevant content while **preserving semantic integrity**.  
- Improves information retrieval quality for downstream tasks like question-answering.  
- Stores optimized chunks in **ChromaDB** for fast and structured retrieval.  

### **What This Code Does?**  

1. **Initialize ChromaDB Collection**  
   - Creates or retrieves a collection named `"optimized_chunks"` in ChromaDB.  

2. **Initialize Embeddings**  
   - Uses `HuggingFaceEmbeddings()` to generate vector embeddings for chunks.  

3. **Define `optimize_chunks()` Function**  
   - Uses **GPT-3.5 Turbo (via ChatOpenAI)** to refine chunks for **clarity, coherence, and context**.  
   - Applies the following refinements:  
     - **Enhancing contextual completeness**  
     - **Removing irrelevant details**  
     - **Maintaining semantic accuracy**  
   - Processes each chunk, refining the content using an **LLM-generated response**.  

4. **Optimize and Evaluate Chunks**  
   - Calls `optimize_chunks(poor_chunks)` to generate refined chunks.  
   - Assesses the quality of optimized chunks using `assess_chunk_quality()`.  

5. **Store Optimized Chunks in ChromaDB**  
   - Each optimized chunk is:  
     - Assigned a unique ID (e.g., `"optimized_chunk_0"`).  
     - Stored in ChromaDB with its refined content.  
     - Embedded using `embeddings.embed_query()`.  

### **Outcome**  
🚀 The optimized chunks are **more relevant, structured, and high-quality**, improving retrieval and response accuracy in AI applications.  


In [10]:
from langchain.chat_models import ChatOpenAI
import chromadb

# Initialize the ChromaDB collection
optimized_collection = chromadb.Client().get_or_create_collection(name="optimized_chunks")

# Initialize your embeddings (assuming you have already defined your embeddings)
# For example, if you are using HuggingFaceEmbeddings:
from langchain.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings()

def optimize_chunks(chunks):
    """
    Optimize chunks using LLM-guided refinement
    """
    llm = ChatOpenAI(temperature=0.2, model="gpt-3.5-turbo")
    optimized_chunks = []

    for chunk in chunks:
        # Refine chunk content
        refinement_prompt = f"""Improve the following text chunk for clarity, coherence, and context:

        Original Chunk: {chunk.page_content}

        Refine the chunk by:
        1. Ensuring contextual completeness
        2. Removing irrelevant information
        3. Maintaining semantic integrity
        """

        # Get LLM-optimized chunk
        optimized_content = llm.predict(refinement_prompt)

        # Create new chunk with optimized content
        optimized_chunk = chunk.copy()
        optimized_chunk.page_content = optimized_content
        optimized_chunks.append(optimized_chunk)

    return optimized_chunks

# Optimize chunks
optimized_chunks = optimize_chunks(poor_chunks)

# Assess optimized chunks
optimized_chunk_metrics = assess_chunk_quality(optimized_chunks)
print("Optimized Chunk Quality Metrics:", optimized_chunk_metrics)

# Store optimized chunks
for i, chunk in enumerate(optimized_chunks):
    optimized_collection.add(
        ids=[f"optimized_chunk_{i}"],
        documents=[chunk.page_content],
        embeddings=embeddings.embed_query(chunk.page_content)
    )

  embeddings = HuggingFaceEmbeddings()
  embeddings = HuggingFaceEmbeddings()
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/438M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/239 [00:00<?, ?B/s]

1_Pooling%2Fconfig.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

<ipython-input-10-36c478ed18bd>:35: PydanticDeprecatedSince20: The `copy` method is deprecated; use `model_copy` instead. See the docstring of `BaseModel.copy` for details about how to handle `include` and `exclude`. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  optimized_chunk = chunk.copy()
<ipython-input-10-36c478ed18bd>:35: PydanticDeprecatedSince20: The `copy` method is deprecated; use `model_copy` instead. See the docstring of `BaseModel.copy` for details about how to handle `include` and `exclude`. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  optimized_chunk = chunk.copy()
<ipython-input-10-36c478ed18bd>:35: PydanticDeprecatedSince20: The `copy` method is deprecated; use `model_copy` instead. See the docstring of `BaseModel.copy` for details about how to handle `include` and `exclude`. Deprecated in Pydantic 

Optimized Chunk Quality Metrics: {'avg_chunk_length': 0.8691428571428571, 'contextual_coherence': 0.7, 'semantic_diversity': 0.4369942196531792}


## Question Answering Using Vector Search  

This function performs **question answering (QA) using vector search** by retrieving the most relevant document chunks from **ChromaDB** and generating answers using **GPT-3.5 Turbo**.  

### **Why Do We Need This?**
- Enables **efficient and context-aware** QA by searching for relevant chunks instead of processing entire documents.  
- Uses **vector embeddings** to retrieve the most relevant chunks based on semantic similarity.  
- Ensures that the **LLM-generated response** is grounded in the retrieved context, improving accuracy and reducing hallucinations.  

### **What This Code Does?**  

1. **Initialize the LLM (GPT-3.5 Turbo)**  
   - Sets `temperature=0.2` for controlled, **less random** responses.  

2. **Embed the Question**  
   - Converts the question into a **vector representation** using `embeddings.embed_query(question)`.  

3. **Retrieve Top-k Relevant Chunks**  
   - Performs a **vector search** in the ChromaDB collection.  
   - Retrieves the `top_k` most relevant document chunks.  

4. **Prepare Context for Answer Generation**  
   - Extracts retrieved chunk contents and joins them into a **context string**.  

5. **Generate Answer Using LLM**  
   - Constructs a **structured prompt** with:  
     - **Context:** The retrieved chunks.  
     - **Question:** The user’s input.  
   - Uses `llm.predict(answer_prompt)` to generate a precise answer **based on the retrieved context**.  

6. **Return Answer & Retrieved Chunks**  
   - The function returns:  
     - **`answer`** – LLM-generated response.  
     - **`retrieved_chunks`** – The actual document chunks used for context.  

### **Outcome**  
🚀 This approach ensures that responses are **relevant, context-aware, and accurate**, improving the reliability of the QA system!  


In [11]:
def qa_with_chunks(question, collection, top_k=3):
    """
    Perform question answering using vector search
    """
    llm = ChatOpenAI(temperature=0.2, model="gpt-3.5-turbo")

    # Embed the question
    question_embedding = embeddings.embed_query(question)

    # Retrieve top k relevant chunks
    search_results = collection.query(
        query_embeddings=[question_embedding],
        n_results=top_k
    )

    # Prepare context for answer generation
    context = "\n\n".join(search_results['documents'][0])

    # Generate answer using LLM
    answer_prompt = f"""Context: {context}

    Question: {question}

    Provide a detailed and precise answer based on the given context.
    """

    answer = llm.predict(answer_prompt)

    return {
        'answer': answer,
        'retrieved_chunks': search_results['documents'][0]
    }

In [158]:
! pip install rich tabulate



## Comparing QA Results: Poor Chunks vs Optimized Chunks  

This function **compares question-answering (QA) performance** between **poorly chunked** and **optimized** document segments, highlighting improvements using structured metrics and visual formatting with **Rich**.  

### **Why Do We Need This?**
- Evaluates whether **optimized document chunks** lead to **better question-answering performance**.  
- Provides a structured **quality comparison** between **poorly chunked** and **optimized** text.  
- Uses **ChromaDB** for retrieval and **GPT-3.5 Turbo** for answer generation.  
- Displays results using **Rich tables and panels** for better readability.  

### **What This Code Does?**  

#### **1. Store Poor Chunks in ChromaDB**
- Creates a new ChromaDB collection (`poor_chunks`).  
- Stores **poor-quality document chunks** along with their embeddings.  

#### **2. Define `compare_qa_results()`**
- **Takes a user question as input.**  
- Retrieves answers using `qa_with_chunks()` for:  
  - **Poor Chunks** (retrieved from `poor_collection`).  
  - **Optimized Chunks** (retrieved from `optimized_collection`).  

#### **3. Compute Chunk Quality Metrics**
- Uses `assess_chunk_quality()` to evaluate **poor** and **optimized chunks**.  
- Compares metrics like:  
  - **Chunk Length Score**  
  - **Contextual Coherence**  
  - **Semantic Diversity**  

#### **4. Display Results Using Rich**
- **Creates a table** comparing **poor vs optimized chunk metrics**.  
- **Displays QA results** in structured panels:  
  - **Poor Chunks Answer + Retrieved Chunks** (Yellow Panel)  
  - **Optimized Chunks Answer + Retrieved Chunks** (Green Panel)  

#### **5. Run Interactive QA**
- Calls `compare_qa_results()` to let users input questions and see **side-by-side comparisons**.  

### **Outcome**  
🚀 This approach helps visualize the impact of **optimized chunking** on QA accuracy, making it easy to assess improvements in retrieval quality!  


In [14]:
from tabulate import tabulate
from rich.console import Console
from rich.panel import Panel
from rich.table import Table
from rich.text import Text
import chromadb # Import chromadb

# Initialize the ChromaDB collection for poor chunks
poor_collection = chromadb.Client().get_or_create_collection(name="poor_chunks")  # Added this line

# Store poor chunks in the poor_collection
for i, chunk in enumerate(poor_chunks):  # Assuming 'poor_chunks' is already defined
    poor_collection.add(
        ids=[f"poor_chunk_{i}"],
        documents=[chunk.page_content],
        embeddings=embeddings.embed_query(chunk.page_content)
    )

def compare_qa_results():
    # Get user question
    user_question = input("Enter your question: ")

    # QA with poor chunks
    poor_chunks_result = qa_with_chunks(user_question, poor_collection)

    # QA with optimized chunks
    optimized_chunks_result = qa_with_chunks(user_question, optimized_collection)

    # Create a Rich Console
    console = Console()

    # Calculate poor_chunk_metrics here before using it
    # Assuming assess_chunk_quality is defined and accessible
    poor_chunk_metrics = assess_chunk_quality(poor_chunks)

    # Create a table for chunk metrics comparison
    metrics_table = Table(title="Chunk Quality Metrics Comparison")
    metrics_table.add_column("Metric", style="cyan")
    metrics_table.add_column("Poor Chunks", style="magenta")
    metrics_table.add_column("Optimized Chunks", style="green")

    # Add metrics to the table
    for metric in poor_chunk_metrics.keys():
        metrics_table.add_row(
            str(metric),
            f"{poor_chunk_metrics[metric]:.2f}",
            f"{optimized_chunk_metrics[metric]:.2f}"
        )

    # Display results with Rich
    console.print("\n[bold yellow]Question Analysis[/bold yellow]")
    console.print(f"[blue]Question:[/blue] {user_question}")

    # Poor Chunks Results Panel
    poor_panel = Panel(
        f"[bold]Answer:[/bold]\n{poor_chunks_result['answer']}\n\n"
        f"[bold]Retrieved Chunks:[/bold]\n{poor_chunks_result['retrieved_chunks']}",
        title="Poor Chunks Result",
        border_style="yellow"
    )

    # Optimized Chunks Results Panel
    optimized_panel = Panel(
        f"[bold]Answer:[/bold]\n{optimized_chunks_result['answer']}\n\n"
        f"[bold]Retrieved Chunks:[/bold]\n{optimized_chunks_result['retrieved_chunks']}",
        title="Optimized Chunks Result",
        border_style="green"
    )

    # Print the results
    console.print("\n[bold underline]Chunk Quality Comparison[/bold underline]")
    console.print(metrics_table)

    console.print("\n[bold underline]Detailed Results[/bold underline]")
    console.print(poor_panel)
    console.print(optimized_panel)

# Run interactive QA
compare_qa_results()



Enter your question: Give me the imp key terms from the document
