## **Advanced Document Intelligence: Multi Agent RAG in Action with Docling LangGraph**

### DocChat

Have you ever struggled to extract precise information from long, complex documents? Whether it’s a research paper, legal contract, technical report, or environmental study, finding the exact details you need can feel overwhelming. That’s where `DocChat` comes in—a `multi-agent RAG` tool designed to help us ask questions about our documents and receive fact-checked, hallucination-free answers.

Sure, we could use ChatGPT or DeepSeek to accomplish this task, but when dealing with long documents containing multiple tables, images, and dense text, these models struggle with retrieval and are prone to hallucinations. They often misinterpret tables, miss key data hidden in footnotes, or even fabricate citations—which I will demonstrate below. The problem? These models lack document-aware reasoning and don’t verify their responses against structured sources.

That’s why DocChat takes a different approach. Instead of relying on a single LLM, it combines multiple AI agents, each with a specific role:

- A `Hybrid Retriever` that intelligently combines BM25 keyword search and **vector embeddings** to retrieve the most relevant passages.
- A `Research Agent` that analyzes the retrieved content and generates an initial response.
- A `Verification Agent` that cross-checks the response against the original document to detect hallucinations and flag unsupported claims.
- A `Self-Correction Mechanism` that re-runs the research step if any contradictions or unsupported statements are found.

This multi-step, verification-driven approach ensures that DocChat provides precise, document-grounded answers, even for complex and long-form documents that general-purpose chatbots struggle with. Whether we need to extract specific data points, summarize sections, compare multiple reports, or analyze tables, DocChat is built to help us navigate our documents with confidence.


## __Table of Contents__

<ol>
    <li><a href="#Objectives">Objectives</a></li>
    <li>
        <a href="#Setup">Setup</a>
        <ol>
            <li><a href="#Installing-Required-Libraries">Installing Required Libraries</a></li>
            <li><a href="#Importing-Required-Libraries">Importing Required Libraries</a></li>
        </ol>
    </li>
    <li><a href="#Why-to-use-multi-agent-RAG">Why to use multi-agent RAG</a></li>
    <li><a href="#DocChat-Workflow">DocChat-Workflow</a></li>
    <li><a href="#Document-Parsing-with-Docling">Document Parsing with Docling</a></li>
    <li><a href="#Build-a-Vector-Database-with-ChromaDB">Build a Vector Database with ChromaDB</a></li>
    <li><a href="#Logging-and-Configuration">Logging and Configuration</a></li>
    <li><a href="#Document-Preprocessor-Module">Document Preprocessor Module</a></li>
    <li><a href="#LangGraph-multi-agent-system-structure">LangGraph multi-agent system structure</a></li>
    <li><a href="#Relevance-Checker:-ensuring-query-document-alignment">Relevance Checker: ensuring query-document alignment</a></li>
    <li><a href="#Research-Agent:-Generating-Document-Based-Answers">Research Agent: Generating Document-Based Answers</a></li>
    <li><a href="#Verification-Agent:-Validating-Answer-Accuracy-and-Relevance">Verification Agent: Validating Answer Accuracy and Relevance</a></li>
    <li><a href="#Hybrid-retriever:-combining-BM25-and-vector-search-for-optimal-document-retrieval">Hybrid retriever: combining BM25 and vector search for optimal document retrieval</a></li>
    <li><a href="#Agent-Workflow:-Orchestrating-the-Multi-Agent-RAG-System">Agent Workflow: Orchestrating the Multi-Agent RAG System</a></li>
    <li><a href="#Main-Process-Function:-Tying-It-All-Together">Main Process Function: Tying It All Together</a></li>
    <li><a href="#Example-Usage:-Running-the-RAG-System">Example Usage: Running the RAG System</a></li>
</ol>

## Objectives

By the end of this project, we will be able to:

1. **Implement a multi-agent RAG system**: Develop a sophisticated question-answering system that leverages specialized agents for document processing, relevance checking, research, and verification.

2. **Create efficient document processing pipelines**: Build a system that processes various document formats (PDF, DOCX, TXT, MD) into searchable chunks with caching for improved performance.

3. **Design hybrid retrieval mechanisms**: Implement a combined BM25 and vector search system that balances keyword precision with semantic understanding.

4. **Orchestrate complex agent workflows**: Use LangGraph to create state-based workflows that enable sophisticated branching logic and feedback loops.

5. **Integrate with IBM WatsonX AI**: Connect to IBM's AI services for embeddings and large language models, enabling advanced text processing capabilities.

6. **Implement robust verification systems**: Create agents that fact-check AI-generated responses against source documents to ensure accuracy and reliability.

7. **Handle errors gracefully**: Implement comprehensive error handling throughout the system to ensure resilience and helpful messaging.

8. **Test your system with real documents**: Evaluate the system with practical document-based queries that demonstrate its ability to extract accurate information from complex content.

This project equips us with the skills to create sophisticated document intelligence systems that can accurately answer questions based on document content, making information retrieval more accessible and reliable.


----


## Setup

For this project, we will be using the following libraries:
* `docling` for document extraction and processing from various file formats.
* `langchain` for building modular AI applications with retrievers and document loaders.
* `langgraph` for creating directed graph workflows with conditional routing for agent coordination.
* `langchain_text_splitters` for dividing documents into processable chunks using markdown headers.
* `langchain_community` for integrating retrieval systems like BM25 and vectorstores.
* `chromadb` for efficient vector storage and retrieval of document embeddings.
* `ibm_watsonx_ai` for accessing IBM's foundation models and embedding services.
* `rank_bm25` for implementing the BM25 retrieval algorithm.
* `pickle` for serializing and deserializing Python objects in the caching system.
* `hashlib` for generating hash values for efficient document deduplication and caching.


### Installing Required Libraries

In [1]:
%pip install docling==2.30.0 | tail -n 1
%pip install config==0.5.1 | tail -n 1
%pip install langgraph | tail -n 1
%pip install langchain | tail -n 1
%pip install langchain_community | tail -n 1
%pip install langchain_ibm | tail -n 1
%pip install ibm_watsonx_ai | tail -n 1
%pip install chromadb | tail -n 1
%pip install rank_bm25 | tail -n 1

Successfully installed Shapely-2.1.0 XlsxWriter-3.2.3 click-8.2.0 docling-2.30.0 docling-core-2.30.0 docling-ibm-models-3.4.3 docling-parse-4.0.1 easyocr-1.7.2 et-xmlfile-2.0.0 filelock-3.18.0 filetype-1.2.0 fsspec-2025.3.2 hf-xet-1.1.0 huggingface_hub-0.31.1 imageio-2.37.0 jsonlines-3.1.0 jsonref-1.1.0 latex2mathml-3.78.0 lazy-loader-0.4 lxml-5.4.0 markdown-it-py-3.0.0 marko-2.1.3 mdurl-0.1.2 mpire-2.10.2 mpmath-1.3.0 multiprocess-0.70.18 networkx-3.4.2 ninja-1.11.1.4 numpy-2.2.5 nvidia-cublas-cu12-12.6.4.1 nvidia-cuda-cupti-cu12-12.6.80 nvidia-cuda-nvrtc-cu12-12.6.77 nvidia-cuda-runtime-cu12-12.6.77 nvidia-cudnn-cu12-9.5.1.17 nvidia-cufft-cu12-11.3.0.4 nvidia-cufile-cu12-1.11.1.6 nvidia-curand-cu12-10.3.7.77 nvidia-cusolver-cu12-11.7.1.2 nvidia-cusparse-cu12-12.5.4.2 nvidia-cusparselt-cu12-0.6.3 nvidia-nccl-cu12-2.26.2 nvidia-nvjitlink-cu12-12.6.85 nvidia-nvtx-cu12-12.6.77 opencv-python-headless-4.11.0.86 openpyxl-3.1.5 pandas-2.2.3 pillow-11.2.1 pyclipper-1.3.0.post6 pydantic-settin

### Importing Required Libraries

Import all required libraries here:


In [1]:
from typing import Dict, List, TypedDict, Optional, Union
import time
import os
# Core components
from docling.document_converter import DocumentConverter  # For PDF/document parsing
from langgraph.graph import StateGraph, END  # For workflow management
from ibm_watsonx_ai.foundation_models import ModelInference
from ibm_watsonx_ai import Credentials, APIClient
from langchain.schema import Document
from langchain.retrievers import EnsembleRetriever
from langchain_community.vectorstores import Chroma
from langchain_community.retrievers import BM25Retriever
from ibm_watsonx_ai.metanames import EmbedTextParamsMetaNames
from langchain_ibm import WatsonxEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
import pickle

## Why to use multi-agent RAG

A [Naïve RAG (Retrieval-Augmented Generation)](https://research.ibm.com/blog/retrieval-augmented-generation-RAG?utm_source=skills_network&utm_content=in_lab_content_link&utm_id=Lab-DocChat-v1_1738281360) pipeline is often insufficient for handling long, structured documents due to several limitations:

🔴 Limited query understanding – Naïve RAG processes queries at a single level, failing to break down complex questions into multiple reasoning steps. This results in shallow or incomplete answers when dealing with multi-faceted queries.

🔴 No hallucination detection or error handling – Traditional RAG pipelines lack a verification step. This means that if a response contains hallucinated or incorrect information, there’s no mechanism to detect, correct, or refine the output.

🔴 Inability to handle out-of-scope queries – Without a proper scope-checking mechanism, Naïve RAG may attempt to generate answers even when no relevant information exists, leading to misleading or fabricated responses.

🔴 Inefficient multi-document retrieval – When multiple documents are uploaded, a Naïve RAG system might retrieve irrelevant or suboptimal passages, failing to select the most relevant content dynamically.

To overcome these challenges, DocChat implements a Multi-Agent RAG research system, which introduces intelligent agents to enhance retrieval, reasoning, and verification.


### How multi-agent RAG solves these issues?

✅ Scope checking & routing
- A Scope-Checking Agent first determines whether the user’s question is relevant to the uploaded documents. If the query is out of scope, DocChat explicitly informs the user instead of generating hallucinated responses.

✅ Dynamic multi-step query processing
- For complex queries, an Agent Workflow ensures the question is broken into smaller sub-steps, retrieving the necessary information before synthesizing a complete response.
- For example, if a question requires comparing two sections of a document, an agent-based approach recognizes this need, retrieves both parts separately, and constructs a comparative analysis in the final answer.

✅ Hybrid retrieval for multi-document contexts
- When multiple documents are uploaded, the Hybrid Retriever (BM25 + Vector Search) ensures that the most relevant document(s) are selected dynamically, improving accuracy over traditional retrieval pipelines.

✅ Fact verification & self-correction
- After an initial response is generated, a Verification Agent cross-checks the output against the retrieved documents.
- If any contradictions or unsupported claims are found, the Self-Correction Mechanism refines the answer before presenting it to the user.

✅ Shared global state for context awareness
- The Agent Workflow maintains a shared state, allowing each step (retrieval, reasoning, verification) to reference previous interactions and refine responses dynamically.
- This enables context-aware follow-up questions, ensuring users can refine their queries without losing track of previous answers.


## DocChat Workflow

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/mbb8Z17CXEGPYd3V5FhZYg/project-overview.jpg" alt="project overview">

1️⃣ User query processing & relevance analysis
- The system starts when a user submits a question about their uploaded document(s).
- Before retrieving any data, DocChat first analyzes query relevance to determine if the question is within the scope of the uploaded content.

2️⃣ Routing & query categorization
- The query is routed through an intelligent agent that decides whether the system can answer it using the document(s):
    - ✅ In Scope: Proceed with document retrieval and response generation.
    - ❌ Not in Scope: Inform the user that the question cannot be answered based on the provided documents, preventing hallucinations.

3️⃣ Multi-agent research & document retrieval
- If the query is relevant, DocChat retrieves relevant document sections from a hybrid search system:
    - Docling converts the document into a structured Markdown format for better chunking.
    - LangChain splits the document into logical chunks based on headers and stores them in ChromaDB (a vector store).
    - The retrieval module searches for the most contextually relevant document chunks using BM25 and vector search.

4️⃣ Answer generation & verification loop
- Conduct research:
    - The Research Agent generates an initial answer based on retrieved content.
    - A sub-process starts where queries are dynamically generated for more precise retrieval.

- Verification process:
    - The Verification Agent cross-checks the generated response against the retrieved content.
    - If the response is fully supported, the system finalizes and returns the answer.
    - If verification fails (e.g., hallucinations, unsupported claims), the system re-runs the research step until a verifiable response is found.

5️⃣ Response finalization
- After verification is complete, DocChat returns the final response to the user.
- The workflow ensures that each answer is sourced directly from the provided document(s), preventing fabrication or unreliable outputs.


## Document Parsing with Docling

Processing PDFs with complex structures, tables, and intricate layouts requires careful selection of a reliable document parsing tool. Many libraries struggle with accuracy when dealing with nested tables, multi-column formats, or scanned PDFs, often resulting in misaligned text, missing data, or broken layouts.

To overcome these challenges, DocChat leverages [Docling](https://github.com/docling-project/docling)—an open-source document processing library designed for high-precision parsing and structured data extraction.

### Why Docling?

✅ Accurate Table & Layout Parsing – Recognizes complex table structures, reading sequences, and multi-column layouts.

✅ Multi-Format Support – Reads and exports documents in Markdown, JSON, PDF, DOCX, PPTX, XLSX, HTML, AsciiDoc, and images.

✅ OCR for Scanned PDFs – Extracts text from scanned documents using optical character recognition (OCR).

✅ Seamless Integration with LangChain – Enables structured chunking for better retrieval and vector search in ChromaDB.

### How Docling parses text?

- Uses `DocumentConverter` to extract structured content.
- Converts the PDF into Markdown format.
- Splits the extracted content based on headers using `MarkdownHeaderTextSplitter`.
- Prints the full extracted sections for review.

Docling essentially has a built-in OCR (Optical Character Recognition) capabilities, so it can process PDFs that contain scanned images of text, making it effective for handling historical documents, academic papers, or other non-digitally generated content. Docling is equipped to handle both structured and unstructured PDFs, including scanned documents, making it a far more versatile and reliable tool for text extraction in complex scenarios. This demonstrates Docling’s advantage in working with real-world PDFs, where many documents are scanned rather than digitally created.


## Build a Vector Database with ChromaDB

Once documents have been parsed and structured using Docling, the next step is to efficiently store and retrieve relevant document chunks. This is where ChromaDB comes into play—a high-performance vector database optimized for fast and accurate similarity search.

### What is ChromaDB?
[Chroma DB](https://github.com/chroma-core/chroma) is an open-source vector database optimized for fast and scalable similarity search. It enables efficient storage, retrieval, and ranking of document embeddings, making it a key component of RAG workflows.

### Why ChromaDB?
✅ Blazing-Fast Vector Search – Finds the most relevant document chunks in milliseconds.

✅ Persistent Storage – Keeps embeddings saved for reuse across sessions.

✅ Seamless LangChain Integration – Works natively with LangChain for retrieval-augmented generation (RAG).

✅ Scalable and Lightweight – Handles millions of embeddings efficiently without complex infrastructure.


## Logging and Configuration

We configure the logging system to provide informative output during execution. This helps in debugging and tracking the flow of our program, showing timestamps, module names, and message levels (INFO, ERROR, etc.).

### Configuration Settings
The Settings class defines configuration parameters for our system:

- `CACHE_DIR`: Where processed document chunks are cached to avoid reprocessing
- `CACHE_EXPIRE_DAYS`: How long to keep cached documents before refreshing
- `CHROMA_DB_PATH`: Where the vector database stores embeddings
- `VECTOR_SEARCH_K`: How many similar vectors to retrieve when searching
- `HYBRID_RETRIEVER_WEIGHTS`: Balance between keyword-based (BM25) and semantic (vector) search

### Constants
These constants define limits and constraints:

- `MAX_TOTAL_SIZE`: Maximum total size of documents (100MB)
- `ALLOWED_TYPES`: File extensions that our system can process

### IBM Watson AI Credentials
This section initializes the connection to IBM Watson AI services:

- Creates credentials for accessing IBM WatsonX AI
- Initializes the API client with those credentials
- This enables our system to use IBM's language models for document processing and question answering


In [3]:
import logging
import hashlib
from datetime import datetime, timedelta 

# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

# Configuration settings
class Settings:
    CACHE_DIR = "./cache"
    CACHE_EXPIRE_DAYS = 7
    CHROMA_DB_PATH = "./chroma_db"
    VECTOR_SEARCH_K = 5
    HYBRID_RETRIEVER_WEIGHTS = [0.5, 0.5]

settings = Settings()

MAX_TOTAL_SIZE = 100 * 1024 * 1024  # 100MB
ALLOWED_TYPES = ['.pdf', '.docx', '.txt', '.md']

# IBM Watson credentials
from ibm_watsonx_ai import Credentials, APIClient
credentials = Credentials(url="https://us-south.ml.cloud.ibm.com")
client = APIClient(credentials)

2025-05-12 15:59:44,345 - ibm_watsonx_ai.client - INFO - Client successfully initialized


## Document Preprocessor Module

The `DocumentProcessor` class is responsible for handling document parsing, caching, and chunking. It ensures efficient processing by:

- **Validating file sizes** before processing.
- **Using caching** to avoid redundant processing of previously uploaded files.
- **Extracting structured content** from documents using Docling.
- **Splitting text into chunks** using **MarkdownHeaderTextSplitter** for better retrieval in vector databases.

### Function Breakdown

| Function | Purpose |
|----------|---------|
| `__init__()` | Initializes cache directory and header settings. |
| `validate_files(files: List)` | Ensures uploaded files do not exceed the size limit. |
| `process(files: List) -> List` | Handles document processing, caching, and deduplication. |
| `_process_file(file) -> List` | Converts a document into Markdown and splits it into chunks. |
| `_generate_hash(content: bytes) -> str` | Creates a unique hash of file content. |
| `_save_to_cache(chunks: List, cache_path: Path)` | Saves processed document chunks to cache. |
| `_load_from_cache(cache_path: Path) -> List` | Loads cached document chunks if available. |
| `_is_cache_valid(cache_path: Path) -> bool` | Checks if a cached file is still valid and not corrupted. |


In [4]:
from pathlib import Path

# Document Processing
class DocumentProcessor:
    def __init__(self):
        self.headers = [("#", "Header 1"), ("##", "Header 2")]
        self.cache_dir = Path(settings.CACHE_DIR)
        self.cache_dir.mkdir(parents=True, exist_ok=True)
        
    def validate_files(self, files: List) -> None:
        """Validate the total size of the uploaded files."""
        total_size = sum(os.path.getsize(f.name) for f in files)
        if total_size > MAX_TOTAL_SIZE:
            raise ValueError(f"Total size exceeds {MAX_TOTAL_SIZE//1024//1024}MB limit")

    def process(self, files: List) -> List:
        """Process files with caching for subsequent queries"""
        self.validate_files(files)
        all_chunks = []
        seen_hashes = set()
        
        for file in files:
            try:
                # Generate content-based hash for caching
                with open(file.name, "rb") as f:
                    file_hash = self._generate_hash(f.read())
                
                cache_path = self.cache_dir / f"{file_hash}.pkl"
                
                if self._is_cache_valid(cache_path):
                    logger.info(f"Loading from cache: {file.name}")
                    chunks = self._load_from_cache(cache_path)
                else:
                    logger.info(f"Processing and caching: {file.name}")
                    chunks = self._process_file(file)
                    self._save_to_cache(chunks, cache_path)
                
                # Deduplicate chunks across files
                for chunk in chunks:
                    chunk_hash = self._generate_hash(chunk.page_content.encode())
                    if chunk_hash not in seen_hashes:
                        all_chunks.append(chunk)
                        seen_hashes.add(chunk_hash)
                        
            except Exception as e:
                logger.error(f"Failed to process {file.name}: {str(e)}")
                continue
                
        logger.info(f"Total unique chunks: {len(all_chunks)}")
        return all_chunks

    def _process_file(self, file) -> List:
        """Original processing logic with Docling"""
        if not file.name.endswith(('.pdf', '.docx', '.txt', '.md')):
            logger.warning(f"Skipping unsupported file type: {file.name}")
            return []

        converter = DocumentConverter()
        markdown = converter.convert(file.name).document.export_to_markdown()
        splitter = MarkdownHeaderTextSplitter(self.headers)
        return splitter.split_text(markdown)

    def _generate_hash(self, content: bytes) -> str:
        return hashlib.sha256(content).hexdigest()

    def _save_to_cache(self, chunks: List, cache_path: Path):
        with open(cache_path, "wb") as f:
            pickle.dump({
                "timestamp": datetime.now().timestamp(),
                "chunks": chunks
            }, f)

    def _load_from_cache(self, cache_path: Path) -> List:
        with open(cache_path, "rb") as f:
            data = pickle.load(f)
        return data["chunks"]

    def _is_cache_valid(self, cache_path: Path) -> bool:
        if not cache_path.exists():
            return False
        
        # Check if file size is too small (possibly corrupt)
        if cache_path.stat().st_size < 100:  # Arbitrary small size
            return False
            
        cache_age = datetime.now() - datetime.fromtimestamp(cache_path.stat().st_mtime)
        return cache_age < timedelta(days=settings.CACHE_EXPIRE_DAYS)

## LangGraph multi-agent system structure

A [multi-agent system (MAS)](https://www.ibm.com/think/topics/multiagent-system?utm_source=skills_network&utm_content=in_lab_content_link&utm_id=Lab-nutricoach-v1_1737477753) is composed of multiple artificial intelligence (AI) agents collaborating to carry out tasks for a user or another system. Here are some of the difference between Agentic AI and MAS.

| Characteristic | Agentic AI | Multi-Agent Systems (MAS) |
|----------------|------------|---------------------------|
| **Autonomy** | Central focus—autonomous task execution | May include agents with varying levels of autonomy |
| **Interaction** | Limited to tools, systems, or environments | Key focus—agents interact, communicate, and coordinate |
| **Scope** | Individual agent | Multiple agents in a shared system |
| **Dependency** | Agentic AI can exist independently | MAS may involve agentic AI but doesn't require it |

Here, we are going to use [LangGraph](https://langchain-ai.github.io/langgraph/) is an open-source Python framework designed for multi-agent workflows in AI applications. It extends LangChain by enabling graph-based state management, making it easier to coordinate multiple AI agents in structured workflows. LangGraph is particularly useful in RAG and multi-step reasoning, where multiple agents collaborate to refine, verify, and improve responses dynamically.

### How Does LangGraph Work?

LangGraph operates on the principle of stateful workflows, where each step in the process is defined as a node in a directed graph. The edges define transitions between nodes based on logic.

A LangGraph workflow consists of:

- **Nodes →** Represent individual processing steps (e.g., research, verification).
- **Edges →** Define the flow of execution (e.g., go to verification after research).
- **State Objects →** Store data passed between agents.
- **Conditional Transitions →** Allow decision-making between nodes.

### Graph Structure for this Project
The AgentWorkflow class constructs the multi-agent system using LangGraph’s `StateGraph`, ensuring a structured approach to information retrieval and verification.

<img src="https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/3u1WREmhVmC-X-BzeFWIZw/graph-diagram.jpg" width=80% height=80% alt="graph-diagram"><br>

#### Workflow Breakdown
1️⃣ **Check Relevance** – The `RelevanceChecker` determines if the query can be answered based on the retrieved documents.
- If relevant → Proceed to research.
- If irrelevant → Terminate workflow.

2️⃣ **Research Step** – The `ResearchAgent` generates a draft answer using relevant documents.

3️⃣ **Verification Step** – The `VerificationAgent` assesses the draft answer for accuracy and relevance.

4️⃣ **Decision Making** – Based on verification:
- If the answer lacks support → Re-research and refine.
- If verified → End workflow.


## Relevance Checker: ensuring query-document alignment

The `RelevanceChecker` is responsible for determining whether retrieved documents contain relevant information to answer a given question. It uses an **ensemble retriever** to fetch document chunks and then leverages IBM WatsonX AI for classification. The goal is to categorize relevance into three possible labels:

* **"CAN_ANSWER"** – The documents provide sufficient information for a full answer.
* **"PARTIAL"** – The documents mention the topic but lack complete details.
* **"NO_MATCH"** – The documents do not discuss the question at all.


The `RelevanceChecker` follows a specific workflow to determine document relevance:

1. **Document Retrieval**: When a question is received, the checker uses an ensemble retriever (combining BM25 and vector search) to fetch the top-k most relevant document chunks.

2. **Content Aggregation**: The retrieved chunks are combined into a single text passage, preserving their individual content but merging them for analysis.

3. **Prompt Engineering**: A carefully crafted prompt is sent to the IBM WatsonX Granite model, asking it to classify the relevance of the document content to the question. The prompt includes:
  - Clear instructions about the classification task
  - Detailed definitions of the three possible labels
  - Guidelines for choosing between ambiguous cases
  - Both the original question and the document content

4. **LLM Classification**: The Granite-3-8b-instruct model processes the prompt and returns a single label classification.

5. **Response Validation**: The checker validates that the response is one of the expected labels, defaulting to "NO_MATCH" if an invalid response is received.

6. **Error Handling**: Comprehensive error handling captures API failures, unexpected response structures, or empty document returns, defaulting to "NO_MATCH" in problematic cases.

7. **Workflow Integration**: The returned classification determines if the agent workflow should proceed to the research phase or respond with a "cannot answer" message to the user.


In [5]:
# Relevance Checker
from ibm_watsonx_ai.foundation_models import ModelInference
class RelevanceChecker:
    def __init__(self):
        # Initialize the WatsonX ModelInference
        self.model = ModelInference(
            model_id="ibm/granite-3-8b-instruct",
            credentials=credentials,
            project_id="skills-network",
            params={"temperature": 0, "max_tokens": 10},
        )

    def check(self, question: str, retriever, k=3) -> str:
        """
        1. Retrieve the top-k document chunks from the global retriever.
        2. Combine them into a single text string.
        3. Pass that text + question to the LLM for classification.

        Returns: "CAN_ANSWER", "PARTIAL", or "NO_MATCH".
        """
        logger.debug(f"RelevanceChecker.check called with question='{question}' and k={k}")

        # Retrieve doc chunks from the ensemble retriever
        top_docs = retriever.invoke(question)
        if not top_docs:
            logger.debug("No documents returned from retriever.invoke(). Classifying as NO_MATCH.")
            return "NO_MATCH"

        # Combine the top k chunk texts into one string
        document_content = "\n\n".join(doc.page_content for doc in top_docs[:k])

        # Create a prompt for the LLM to classify relevance
        prompt = f"""
        You are an AI relevance checker between a user's question and provided document content.

        **Instructions:**
        - Classify how well the document content addresses the user's question.
        - Respond with only one of the following labels: CAN_ANSWER, PARTIAL, NO_MATCH.
        - Do not include any additional text or explanation.

        **Labels:**
        1) "CAN_ANSWER": The passages contain enough explicit information to fully answer the question.
        2) "PARTIAL": The passages mention or discuss the question's topic but do not provide all the details needed for a complete answer.
        3) "NO_MATCH": The passages do not discuss or mention the question's topic at all.

        **Important:** If the passages mention or reference the topic or timeframe of the question in any way, even if incomplete, respond with "PARTIAL" instead of "NO_MATCH".

        **Question:** {question}
        **Passages:** {document_content}

        **Respond ONLY with one of the following labels: CAN_ANSWER, PARTIAL, NO_MATCH**
        """

        # Call the LLM
        try:
            response = self.model.chat(
                messages=[
                    {
                        "role": "user",
                        "content": prompt  # Changed from list to string
                    }
                ]
            )
        except Exception as e:
            logger.error(f"Error during model inference: {e}")
            return "NO_MATCH"

        # Extract the content from the response
        try:
            llm_response = response['choices'][0]['message']['content'].strip().upper()
            logger.debug(f"LLM response: {llm_response}")
        except (IndexError, KeyError) as e:
            logger.error(f"Unexpected response structure: {e}")
            return "NO_MATCH"

        print(f"Checker response: {llm_response}")

        # Validate the response
        valid_labels = {"CAN_ANSWER", "PARTIAL", "NO_MATCH"}
        if llm_response not in valid_labels:
            logger.debug("LLM did not respond with a valid label. Forcing 'NO_MATCH'.")
            classification = "NO_MATCH"
        else:
            logger.debug(f"Classification recognized as '{llm_response}'.")
            classification = llm_response

        return classification

## Research Agent: Generating Document-Based Answers

The `ResearchAgent` is responsible for **generating an initial draft answer** using retrieved documents. It interacts with **IBM WatsonX AI** to synthesize responses based on relevant content. This step is crucial in the RAG pipeline, ensuring that AI-generated answers are grounded in the provided data.

### Key Functions of the Research Agent
✅ **Context-Aware Answer Generation** – Produces fact-based responses using retrieved documents.

✅ **Structured Prompting** – Ensures the AI model adheres to precise instructions for accurate outputs.

✅ **Response Sanitization** – Cleans and formats LLM responses for better readability.

### How the Research Agent Works

1. **Document Aggregation**: When provided with relevant document chunks, the agent combines them into a single context string.

2. **Prompt Construction**: A well-structured prompt is created that includes:
  - Clear instructions about answering based only on provided context
  - Guidelines for clarity, conciseness, and factuality
  - The user's original question
  - The aggregated document context

3. **Model Invocation**: The prompt is sent to the Meta Llama 3 90B model (via IBM WatsonX), configured with:
  - A moderate temperature (0.3) for balance between creativity and determinism
  - Appropriate token limits (300) for concise but informative answers

4. **Response Processing**: The model's response is extracted from the API return value and sanitized to remove unnecessary whitespace.

5. **Error Handling**: Comprehensive error handling addresses API failures and unexpected response structures, providing fallback responses when necessary.

6. **Result Packaging**: The final answer is returned along with the context used, enabling verification in subsequent steps.


In [6]:
# Research Agent
class ResearchAgent:
    def __init__(self):
        """
        Initialize the research agent with the IBM WatsonX ModelInference.
        """
        # Initialize the WatsonX ModelInference
        print("Initializing ResearchAgent with IBM WatsonX ModelInference...")
        self.model = ModelInference(
            model_id="meta-llama/llama-3-2-90b-vision-instruct", 
            credentials=credentials,
            project_id="skills-network",
            params={
                "max_tokens": 300,            # Adjust based on desired response length
                "temperature": 0.3,           # Controls randomness; lower values make output more deterministic
            }
        )
        print("ModelInference initialized successfully.")

    def sanitize_response(self, response_text: str) -> str:
        """
        Sanitize the LLM's response by stripping unnecessary whitespace.
        """
        return response_text.strip()

    def generate_prompt(self, question: str, context: str) -> str:
        """
        Generate a structured prompt for the LLM to generate a precise and factual answer.
        """
        prompt = f"""
        You are an AI assistant designed to provide precise and factual answers based on the given context.

        **Instructions:**
        - Answer the following question using only the provided context.
        - Be clear, concise, and factual.
        - Return as much information as you can get from the context.
        
        **Question:** {question}
        **Context:**
        {context}

        **Provide your answer below:**
        """
        return prompt

    def generate(self, question: str, documents: List[Document]) -> Dict:
        """
        Generate an initial answer using the provided documents.
        """
        print(f"ResearchAgent.generate called with question='{question}' and {len(documents)} documents.")

        # Combine the top document contents into one string
        context = "\n\n".join([doc.page_content for doc in documents])
        print(f"Combined context length: {len(context)} characters.")

        # Create a prompt for the LLM
        prompt = self.generate_prompt(question, context)
        print("Prompt created for the LLM.")

        # Call the LLM to generate the answer
        try:
            print("Sending prompt to the model...")
            response = self.model.chat(
                messages=[
                    {
                        "role": "user",
                        "content": prompt  # Ensure content is a string
                    }
                ]
            )
            print("LLM response received.")
        except Exception as e:
            print(f"Error during model inference: {e}")
            raise RuntimeError("Failed to generate answer due to a model error.") from e

        # Extract and process the LLM's response
        try:
            llm_response = response['choices'][0]['message']['content'].strip()
            print(f"Raw LLM response:\n{llm_response}")
        except (IndexError, KeyError) as e:
            print(f"Unexpected response structure: {e}")
            llm_response = "I cannot answer this question based on the provided documents."

        # Sanitize the response
        draft_answer = self.sanitize_response(llm_response) if llm_response else "I cannot answer this question based on the provided documents."

        print(f"Generated answer: {draft_answer}")

        return {
            "draft_answer": draft_answer,
            "context_used": context
        }


## Verification Agent: Validating Answer Accuracy and Relevance

The `VerificationAgent` is responsible for **fact-checking and validating generated answers** using the retrieved documents. This agent ensures that the AI-generated response is:
1. **Supported by factual evidence** from the documents.
2. **Free from contradictions** or misinformation.
3. **Relevant to the original question.**
It interacts with **IBM WatsonX AI** to analyze the relationship between the answer and its source documents, producing a structured verification report.

### How the Verification Agent Works

1. **Evidence Aggregation**: When provided with an answer and relevant document chunks, the agent combines all document contents into a comprehensive context string.

2. **Structured Verification Prompt**: A detailed prompt is created that instructs the model to:
  - Compare the answer against the provided context
  - Check for factual support (direct or indirect)
  - Identify any unsupported claims
  - Detect contradictions between the answer and context
  - Assess relevance to the question
  - Provide any additional explanatory details

3. **Zero-Temperature Inference**: The prompt is sent to IBM's Granite 3-8b-instruct model with temperature set to 0.0 for maximum consistency and determinism in verification.

4. **Response Parsing**: The model's response is parsed into a structured format with specific fields:
  - Supported: YES/NO
  - Unsupported Claims: List of claims
  - Contradictions: List of contradictions
  - Relevant: YES/NO
  - Additional Details: Explanatory text

5. **Formatting for Readability**: The structured verification data is formatted into a human-readable report with clear section headers and concise content.

6. **Error Handling**: Comprehensive error handling addresses issues like:
  - API failures
  - Unexpected response structures
  - Empty responses
  - Failures in response parsing
  - Each providing appropriate fallback verification reports

7. **Workflow Integration**: The verification report determines whether the answer is trustworthy or if the workflow should cycle back for additional research.


In [7]:
# Verification Agent
class VerificationAgent:
    def __init__(self):
        """
        Initialize the verification agent with the IBM WatsonX ModelInference.
        """
        # Initialize the WatsonX ModelInference
        print("Initializing VerificationAgent with IBM WatsonX ModelInference...")
        self.model = ModelInference(
            model_id="ibm/granite-3-8b-instruct", 
            credentials=credentials,
            project_id="skills-network",
            params={
                "max_tokens": 200,            # Adjust based on desired response length
                "temperature": 0.0,           # Remove randomness for consistency
            }
        )
        print("ModelInference initialized successfully.")

    def sanitize_response(self, response_text: str) -> str:
        """
        Sanitize the LLM's response by stripping unnecessary whitespace.
        """
        return response_text.strip()

    def generate_prompt(self, answer: str, context: str) -> str:
        """
        Generate a structured prompt for the LLM to verify the answer against the context.
        """
        prompt = f"""
        You are an AI assistant designed to verify the accuracy and relevance of answers based on provided context.

        **Instructions:**
        - Verify the following answer against the provided context.
        - Check for:
        1. Direct/indirect factual support (YES/NO)
        2. Unsupported claims (list any if present)
        3. Contradictions (list any if present)
        4. Relevance to the question (YES/NO)
        - Provide additional details or explanations where relevant.
        - Respond in the exact format specified below without adding any unrelated information.

        **Format:**
        Supported: YES/NO
        Unsupported Claims: [item1, item2, ...]
        Contradictions: [item1, item2, ...]
        Relevant: YES/NO
        Additional Details: [Any extra information or explanations]

        **Answer:** {answer}
        **Context:**
        {context}

        **Respond ONLY with the above format.**
        """
        return prompt

    def parse_verification_response(self, response_text: str) -> Dict:
        """
        Parse the LLM's verification response into a structured dictionary.
        """
        try:
            lines = response_text.split('\n')
            verification = {}
            for line in lines:
                if ':' in line:
                    key, value = line.split(':', 1)
                    key = key.strip().capitalize()
                    value = value.strip()
                    if key in {"Supported", "Unsupported claims", "Contradictions", "Relevant", "Additional details"}:
                        if key in {"Unsupported claims", "Contradictions"}:
                            # Convert string list to actual list
                            if value.startswith('[') and value.endswith(']'):
                                items = value[1:-1].split(',')
                                # Remove any surrounding quotes and whitespace
                                items = [item.strip().strip('"').strip("'") for item in items if item.strip()]
                                verification[key] = items
                            else:
                                verification[key] = []
                        elif key == "Additional details":
                            verification[key] = value
                        else:
                            verification[key] = value.upper()
            # Ensure all keys are present
            for key in ["Supported", "Unsupported Claims", "Contradictions", "Relevant", "Additional Details"]:
                if key not in verification:
                    if key in {"Unsupported Claims", "Contradictions"}:
                        verification[key] = []
                    elif key == "Additional Details":
                        verification[key] = ""
                    else:
                        verification[key] = "NO"

            return verification
        except Exception as e:
            print(f"Error parsing verification response: {e}")
            return None

    def format_verification_report(self, verification: Dict) -> str:
        """
        Format the verification report dictionary into a readable paragraph.
        """
        supported = verification.get("Supported", "NO")
        unsupported_claims = verification.get("Unsupported Claims", [])
        contradictions = verification.get("Contradictions", [])
        relevant = verification.get("Relevant", "NO")
        additional_details = verification.get("Additional Details", "")

        report = f"**Supported:** {supported}\n"
        if unsupported_claims:
            report += f"**Unsupported Claims:** {', '.join(unsupported_claims)}\n"
        else:
            report += f"**Unsupported Claims:** None\n"

        if contradictions:
            report += f"**Contradictions:** {', '.join(contradictions)}\n"
        else:
            report += f"**Contradictions:** None\n"

        report += f"**Relevant:** {relevant}\n"

        if additional_details:
            report += f"**Additional Details:** {additional_details}\n"
        else:
            report += f"**Additional Details:** None\n"

        return report

    def check(self, answer: str, documents: List[Document]) -> Dict:
        """
        Verify the answer against the provided documents.
        """
        print(f"VerificationAgent.check called with answer='{answer}' and {len(documents)} documents.")

        # Combine all document contents into one string without truncation
        context = "\n\n".join([doc.page_content for doc in documents])
        print(f"Combined context length: {len(context)} characters.")

        # Create a prompt for the LLM to verify the answer
        prompt = self.generate_prompt(answer, context)
        print("Prompt created for the LLM.")

        # Call the LLM to generate the verification report
        try:
            print("Sending prompt to the model...")
            response = self.model.chat(
                messages=[
                    {
                        "role": "user",
                        "content": prompt  # Ensure content is a string
                    }
                ]
            )
            print("LLM response received.")
        except Exception as e:
            print(f"Error during model inference: {e}")
            raise RuntimeError("Failed to verify answer due to a model error.") from e

        # Extract and process the LLM's response
        try:
            llm_response = response['choices'][0]['message']['content'].strip()
            print(f"Raw LLM response:\n{llm_response}")
        except (IndexError, KeyError) as e:
            print(f"Unexpected response structure: {e}")
            verification_report = {
                "Supported": "NO",
                "Unsupported Claims": [],
                "Contradictions": [],
                "Relevant": "NO",
                "Additional Details": "Invalid response structure from the model."
            }
            verification_report_formatted = self.format_verification_report(verification_report)
            print(f"Verification report:\n{verification_report_formatted}")
            print(f"Context used: {context}")
            return {
                "verification_report": verification_report_formatted,
                "context_used": context
            }

        # Sanitize the response
        sanitized_response = self.sanitize_response(llm_response) if llm_response else ""
        if not sanitized_response:
            print("LLM returned an empty response.")
            verification_report = {
                "Supported": "NO",
                "Unsupported Claims": [],
                "Contradictions": [],
                "Relevant": "NO",
                "Additional Details": "Empty response from the model."
            }
        else:
            # Parse the response into the expected format
            verification_report = self.parse_verification_response(sanitized_response)
            if verification_report is None:
                print("LLM did not respond with the expected format. Using default verification report.")
                verification_report = {
                    "Supported": "NO",
                    "Unsupported Claims": [],
                    "Contradictions": [],
                    "Relevant": "NO",
                    "Additional Details": "Failed to parse the model's response."
                }

        # Format the verification report into a paragraph
        verification_report_formatted = self.format_verification_report(verification_report)
        print(f"Verification report:\n{verification_report_formatted}")
        print(f"Context used: {context}")

        return {
            "verification_report": verification_report_formatted,
            "context_used": context
        }


## Hybrid retriever: combining BM25 and vector search for optimal document retrieval

The `RetrieverBuilder` class implements a **hybrid retrieval system** by combining:
1. **BM25 (Lexical Search)** – Traditional keyword-based retrieval.
2. **Vector Search (Embedding-based)** – Semantic retrieval using embeddings.

This combination enhances the accuracy of RAG by leveraging the strengths of both approaches.

### Why use a hybrid retriever?
✅ **Improves Recall** – Captures both exact keyword matches and semantically similar content.

✅ **Balances Precision & Relevance** – BM25 retrieves highly precise keyword matches, while vector retrieval finds related concepts.

✅ **Handles Misspellings & Variations** – Vector embeddings allow for **fuzzy matching** beyond exact keyword searches.

✅ **Optimized for Multi-Agent Systems** – Ensures robust document retrieval before passing data to AI agents.

### Why this is essential for RAG?
✅ Ensures high-quality document retrieval for multi-agent research workflows.

✅ Improves AI response accuracy by providing both keyword-based and semantic matches.

✅ Enhances retrieval diversity, ensuring no relevant document is overlooked.

With hybrid retrieval, the system achieves a balance between precision and recall, ensuring AI-generated responses are grounded in the most relevant information.

### How the Hybrid Retriever Works

1. **Embedding Initialization**: The builder initializes IBM WatsonX embeddings using the Slate-125m English retriever model, configured with specific parameters for token truncation and input text return options.

2. **BM25 Creation**: When provided with document chunks, the system first creates a BM25 retriever, which uses a probabilistic retrieval model based on term frequency and inverse document frequency.

3. **Vector Store Creation**: In parallel, it creates a Chroma vector store with a unique collection name based on the current timestamp, using the initialized WatsonX embeddings to convert document content into vector representations.

4. **Vector Retriever Setup**: A retriever is created from the vector store, configured to return the top-k most similar documents (controlled by the VECTOR_SEARCH_K setting).

5. **Ensemble Integration**: The BM25 and vector retrievers are combined into an EnsembleRetriever, with customizable weights (controlled by HYBRID_RETRIEVER_WEIGHTS) to balance their contributions.

6. **Weighted Retrieval**: When a query is submitted, both retrievers independently find matching documents, and their results are combined based on the configured weights to produce a final, optimized result set.

7. **Error Management**: Comprehensive error handling ensures that any failures in the retrieval pipeline are properly logged and reported.


In [8]:
# Retriever Builder
class RetrieverBuilder:
    def __init__(self):
        """Initialize the retriever builder with embeddings."""
        embed_params = {
            EmbedTextParamsMetaNames.TRUNCATE_INPUT_TOKENS: 3,
            EmbedTextParamsMetaNames.RETURN_OPTIONS: {"input_text": True},
        }

        watsonx_embedding = WatsonxEmbeddings(
            model_id="ibm/slate-125m-english-rtrvr",
            url="https://us-south.ml.cloud.ibm.com",
            project_id="skills-network",
            params=embed_params
        )
        self.embeddings = watsonx_embedding
        
    def build_hybrid_retriever(self, docs):
        """Build a hybrid retriever using BM25 and vector-based retrieval."""
        try:
            # Create BM25 retriever
            bm25 = BM25Retriever.from_documents(docs)
            logger.info("BM25 retriever created successfully.")
            
            # Create Chroma vector store with minimal parameters
            import time
            collection_name = f"collection_{int(time.time())}"
            
            vector_store = Chroma.from_documents(
                documents=docs,
                embedding=self.embeddings,
                collection_name=collection_name  # Just use a unique collection name
            )
            logger.info("Vector store created successfully.")
            
            # Create vector-based retriever
            vector_retriever = vector_store.as_retriever(search_kwargs={"k": settings.VECTOR_SEARCH_K})
            logger.info("Vector retriever created successfully.")
            
            # Combine retrievers into a hybrid retriever
            hybrid_retriever = EnsembleRetriever(
                retrievers=[bm25, vector_retriever],
                weights=settings.HYBRID_RETRIEVER_WEIGHTS
            )
            logger.info("Hybrid retriever created successfully.")
            return hybrid_retriever
        except Exception as e:
            logger.error(f"Failed to build hybrid retriever: {e}")
            raise


## Agent Workflow: Orchestrating the Multi-Agent RAG System

The `AgentWorkflow` class serves as the **central orchestrator** of the multi-agent RAG system, leveraging LangGraph for state management and flow control. It coordinates the interactions between specialized agents to process questions, retrieve relevant documents, generate answers, and verify their accuracy.

### Key Components of the Agent Workflow

✅ **State-Based Graph Architecture** – Uses StateGraph to maintain and transition between different states of processing.

✅ **Conditional Branching Logic** – Dynamically determines the next processing step based on agent outputs.

✅ **Feedback Loop Mechanism** – Enables answer refinement through verification-driven research iterations.

✅ **Error Handling and Recovery** – Provides robust error handling throughout the workflow.

### How the Agent Workflow Works

1. **Graph Construction**: During initialization, the workflow builds a directed graph with nodes representing different processing steps and edges defining the transitions between them.

2. **Relevance Check**: The workflow begins with the relevance checker determining if the documents can answer the question:
  - If relevant (CAN_ANSWER or PARTIAL), proceeds to research
  - If irrelevant (NO_MATCH), provides a "cannot answer" response

3. **Research Phase**: If documents are relevant, the research agent generates a draft answer using the retrieved document chunks.

4. **Verification Phase**: The verification agent fact-checks the draft answer against the source documents.

5. **Decision Point**: Based on the verification report:
  - If issues are found (unsupported claims or irrelevance), the workflow cycles back to research
  - If no issues, the workflow proceeds to completion

6. **Final State**: The end step prepares the final output containing the verified answer and verification report.

7. **Complete Pipeline**: The `full_pipeline` method ties everything together, initializing the state with the question and retriever, invoking the workflow, and returning the results.

### Function Breakdown

| Function | Purpose |
|----------|---------|
| `__init__()` | Initializes the agents and compiles the workflow graph. |
| `build_workflow()` | Constructs the StateGraph with nodes and conditional edges. |
| `_check_relevance_step()` | Determines if documents are relevant to the question. |
| `_decide_after_relevance_check()` | Routes to research or end based on relevance. |
| `_research_step()` | Generates a draft answer using the research agent. |
| `_verification_step()` | Verifies the accuracy of the draft answer. |
| `_end_step()` | Prepares the final output. |
| `_decide_next_step()` | Determines whether to refine the answer or complete. |
| `full_pipeline()` | Executes the complete workflow from question to answer. |


In [9]:
# Workflow
class AgentState(TypedDict):
    question: str
    documents: List[Document]
    draft_answer: str
    verification_report: str
    is_relevant: bool
    retriever: EnsembleRetriever

class AgentWorkflow:
    def __init__(self):
        self.researcher = ResearchAgent()
        self.verifier = VerificationAgent()
        self.relevance_checker = RelevanceChecker()
        self.compiled_workflow = self.build_workflow()  # Compile once during initialization
        
    def build_workflow(self):
        """Create and compile the multi-agent workflow."""
        workflow = StateGraph(AgentState)
        
        # Add nodes
        workflow.add_node("check_relevance", self._check_relevance_step)
        workflow.add_node("research", self._research_step)
        workflow.add_node("verify", self._verification_step)
        workflow.add_node("end", self._end_step)
        
        # Define edges
        workflow.set_entry_point("check_relevance")
        workflow.add_conditional_edges(
            "check_relevance",
            self._decide_after_relevance_check,
            {
                "relevant": "research",
                "irrelevant": "end"
            }
        )
        workflow.add_edge("research", "verify")
        workflow.add_conditional_edges(
            "verify",
            self._decide_next_step,
            {
                "re_research": "research",
                "end": "end"
            }
        )
        
        # End node leads to completion
        workflow.add_edge("end", END)
        
        return workflow.compile()
    
    def _check_relevance_step(self, state: AgentState) -> Dict:
        retriever = state["retriever"]
        classification = self.relevance_checker.check(
            question=state["question"], 
            retriever=retriever, 
            k=20
        )

        if classification == "CAN_ANSWER":
            # We have enough info to proceed
            return {"is_relevant": True}
        elif classification == "PARTIAL":
            # There's partial coverage, but we can still proceed
            return {"is_relevant": True}
        else:  # classification == "NO_MATCH"
            return {
                "is_relevant": False,
                "draft_answer": "This question isn't related (or there's no data) for your query. Please ask another question relevant to the uploaded document(s)."
            }

    def _decide_after_relevance_check(self, state: AgentState) -> str:
        decision = "relevant" if state["is_relevant"] else "irrelevant"
        print(f"[DEBUG] _decide_after_relevance_check -> {decision}")
        return decision
    
    def _research_step(self, state: AgentState) -> Dict:
        print(f"[DEBUG] Entered _research_step with question='{state['question']}'")
        result = self.researcher.generate(state["question"], state["documents"])
        print("[DEBUG] Researcher returned draft answer.")
        return {"draft_answer": result["draft_answer"]}
    
    def _verification_step(self, state: AgentState) -> Dict:
        print("[DEBUG] Entered _verification_step. Verifying the draft answer...")
        result = self.verifier.check(state["draft_answer"], state["documents"])
        print("[DEBUG] VerificationAgent returned a verification report.")
        return {"verification_report": result["verification_report"]}
        
    def _end_step(self, state: AgentState) -> Dict:
        """Final step that prepares the output"""
        logger.info("Workflow reached end step")
        # This is just a passthrough step to match the flowchart
        return {}
    
    def _decide_next_step(self, state: AgentState) -> str:
        verification_report = state["verification_report"]
        print(f"[DEBUG] _decide_next_step with verification_report='{verification_report}'")
        if "Supported: NO" in verification_report or "Relevant: NO" in verification_report:
            logger.info("[DEBUG] Verification indicates re-research needed.")
            return "re_research"
        else:
            logger.info("[DEBUG] Verification successful, ending workflow.")
            return "end"
    
    def full_pipeline(self, question: str, retriever: EnsembleRetriever):
        try:
            print(f"[DEBUG] Starting full_pipeline with question='{question}'")
            documents = retriever.invoke(question)
            logger.info(f"Retrieved {len(documents)} relevant documents (from .invoke)")

            initial_state = AgentState(
                question=question,
                documents=documents,
                draft_answer="",
                verification_report="",
                is_relevant=False,
                retriever=retriever
            )
            
            final_state = self.compiled_workflow.invoke(initial_state)
            
            return {
                "draft_answer": final_state["draft_answer"],
                "verification_report": final_state["verification_report"]
            }
        except Exception as e:
            logger.error(f"Workflow execution failed: {e}")
            raise


## Main Process Function: Tying It All Together

The `process_documents_and_answer_questions` function serves as the **main entry point** for the entire RAG system. It orchestrates the end-to-end process from document processing to question answering.

#### Function Overview

This function takes a list of document file paths and questions as input, then returns answers with verification reports. It seamlessly integrates all components of the system:
- Document processing and chunking
- Hybrid retrieval system setup
- Multi-agent workflow execution
- Error handling and reporting


In [10]:
# Main function
def process_documents_and_answer_questions(file_paths: List[str], questions: List[str]) -> List[Dict]:
    """Process documents and answer a list of questions"""
    # Initialize components
    processor = DocumentProcessor()
    retriever_builder = RetrieverBuilder()
    workflow = AgentWorkflow()
    
    try:
        # Process documents
        logger.info(f"Processing documents: {file_paths}")
        
        # Convert file paths to File-like objects expected by the processor
        class SimpleFile:
            def __init__(self, path):
                self.name = path
        
        files = [SimpleFile(path) for path in file_paths]
        
        # Process files into chunks
        chunks = processor.process(files)
        logger.info(f"Generated {len(chunks)} document chunks")
        
        # Build retriever
        retriever = retriever_builder.build_hybrid_retriever(chunks)
        
        # Process each question
        results = []
        for i, question in enumerate(questions):
            logger.info(f"Processing question {i+1}/{len(questions)}: {question}")
            
            # Run the workflow
            result = workflow.full_pipeline(question, retriever)
            
            # Store results
            results.append({
                "question": question,
                "answer": result["draft_answer"],
                "verification": result["verification_report"]
            })
            
        return results
    except Exception as e:
        logger.error(f"Error processing documents: {e}")
        return [{
            "question": q,
            "answer": f"Error: {str(e)}",
            "verification": ""
        } for q in questions]

## Example Usage: Running the RAG System

The example code demonstrates how to use the document question-answering system in a script or notebook context. This serves as both a practical example for users and a testing mechanism for the system.

### How the Example Works

1. **Document and Question Definition**:
  - Specifies file paths to documents that will be processed
  - Defines a list of questions to be answered about those documents
  - In this example, it uses Google's 2024 Environmental Report as the document source

2. **System Execution**:
  - Calls the main `process_documents_and_answer_questions` function
  - Passes the documents and questions as parameters
  - The function handles all processing and returns structured results

3. **Result Presentation**:
  - Formats and displays the results in a clear, readable format
  - For each question, it shows:
    - The original question
    - The generated answer
    - The verification report with fact-checking information
  - Uses visual separators to distinguish between different questions

### Usage Flexibility

This pattern can be easily modified for various use cases:
- Processing multiple documents simultaneously
- Asking domain-specific questions about technical content
- Integrating into larger applications or workflows
- Running in batch mode for document analysis

The simplicity of the interface belies the sophisticated multi-agent system working underneath, delivering a user-friendly experience for document-based question answering.


In [11]:
from langchain_text_splitters.markdown import MarkdownHeaderTextSplitter

In [12]:
# Example usage
if __name__ == "__main__":
    # Example documents and questions
    documents = ["Pandas Cheat Sheet.pdf"]
    questions = [
        "What is the main topic discussed in the document?",
        "Summarize the key findings in the document."
    ]
    
    # Process documents and get answers
    results = process_documents_and_answer_questions(documents, questions)
    
    # Print results
    for i, result in enumerate(results):
        print(f"\n===== Question {i+1} =====")
        print(f"Q: {result['question']}")
        print(f"\nAnswer: {result['answer']}")
        print(f"\nVerification: {result['verification']}")
        print("="*50)

2025-05-12 16:02:50,873 - ibm_watsonx_ai.client - INFO - Client successfully initialized


Initializing ResearchAgent with IBM WatsonX ModelInference...


2025-05-12 16:02:51,301 - ibm_watsonx_ai.client - INFO - Client successfully initialized
2025-05-12 16:02:52,338 - httpx - INFO - HTTP Request: GET https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-04-23&project_id=skills-network&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200 "HTTP/1.1 200 OK"
2025-05-12 16:02:52,344 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished Get available foundation models for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-04-23&project_id=skills-network&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200'


ModelInference initialized successfully.
Initializing VerificationAgent with IBM WatsonX ModelInference...


2025-05-12 16:02:52,576 - ibm_watsonx_ai.client - INFO - Client successfully initialized
2025-05-12 16:02:53,247 - httpx - INFO - HTTP Request: GET https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-04-23&project_id=skills-network&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200 "HTTP/1.1 200 OK"
2025-05-12 16:02:53,260 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished Get available foundation models for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-04-23&project_id=skills-network&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200'
2025-05-12 16:02:53,399 - ibm_watsonx_ai.client - INFO - Client successfully initialized


ModelInference initialized successfully.


2025-05-12 16:02:53,859 - httpx - INFO - HTTP Request: GET https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-04-23&project_id=skills-network&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200 "HTTP/1.1 200 OK"
2025-05-12 16:02:53,870 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished Get available foundation models for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/foundation_model_specs?version=2025-04-23&project_id=skills-network&filters=function_text_generation%2C%21lifecycle_withdrawn%3Aand&limit=200'
2025-05-12 16:02:53,890 - __main__ - INFO - Processing documents: ['Pandas Cheat Sheet.pdf']
2025-05-12 16:02:53,946 - __main__ - INFO - Processing and caching: Pandas Cheat Sheet.pdf
2025-05-12 16:02:53,984 - docling.document_converter - INFO - Going to convert document batch...
2025-05-12 16:02:53,985 - docling.document_converter - INFO - Initializing pipeline for StandardPdfPipeline with options hash 70041f74270850b7bedf7c8f

[DEBUG] Starting full_pipeline with question='What is the main topic discussed in the document?'


2025-05-12 16:09:56,828 - httpx - INFO - HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2025-04-23 "HTTP/1.1 200 OK"
2025-05-12 16:09:56,830 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2025-04-23'
2025-05-12 16:09:57,120 - httpx - INFO - HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23 "HTTP/1.1 200 OK"
2025-05-12 16:09:57,121 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished chat for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23'


Checker response: CAN_ANSWER
[DEBUG] _decide_after_relevance_check -> relevant
[DEBUG] Entered _research_step with question='What is the main topic discussed in the document?'
ResearchAgent.generate called with question='What is the main topic discussed in the document?' and 9 documents.
Combined context length: 6255 characters.
Prompt created for the LLM.
Sending prompt to the model...


2025-05-12 16:10:03,976 - httpx - INFO - HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23 "HTTP/1.1 200 OK"
2025-05-12 16:10:03,979 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished chat for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23'


LLM response received.
Raw LLM response:
The main topic discussed in the document is the Pandas library in Python, specifically its data structures and tools for data manipulation and analysis, with a focus on DataFrames and various operations that can be performed on them.
Generated answer: The main topic discussed in the document is the Pandas library in Python, specifically its data structures and tools for data manipulation and analysis, with a focus on DataFrames and various operations that can be performed on them.
[DEBUG] Researcher returned draft answer.
[DEBUG] Entered _verification_step. Verifying the draft answer...
VerificationAgent.check called with answer='The main topic discussed in the document is the Pandas library in Python, specifically its data structures and tools for data manipulation and analysis, with a focus on DataFrames and various operations that can be performed on them.' and 9 documents.
Combined context length: 6255 characters.
Prompt created for the LLM.

2025-05-12 16:10:05,980 - httpx - INFO - HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23 "HTTP/1.1 200 OK"
2025-05-12 16:10:05,981 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished chat for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23'
2025-05-12 16:10:05,982 - __main__ - INFO - [DEBUG] Verification successful, ending workflow.
2025-05-12 16:10:05,983 - __main__ - INFO - Workflow reached end step
2025-05-12 16:10:05,984 - __main__ - INFO - Processing question 2/2: Summarize the key findings in the document.
2025-05-12 16:10:06,148 - httpx - INFO - HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2025-04-23 "HTTP/1.1 200 OK"
2025-05-12 16:10:06,150 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2025-04-23'
2025-05-12 16:10:06,156 - __main__ - INFO - Retrieved 9 relevant document

LLM response received.
Raw LLM response:
Supported: YES
Unsupported Claims: None
Contradictions: None
Relevant: YES
Additional Details: The answer accurately reflects the main topic of the provided context, which is the Pandas library in Python, focusing on its data structures, specifically DataFrames, and various operations that can be performed on them. The answer covers selecting rows and columns, calculating descriptive statistics, reshaping DataFrames using stack and unstack, applying functions to columns and entire DataFrames, and converting between wide and long formats using melt and pivot functions. The answer also includes code examples demonstrating these operations.
Verification report:
**Supported:** YES
**Unsupported Claims:** None
**Contradictions:** None
**Relevant:** YES
**Additional Details:** None

Context used: Pandas is a powerful, open-source Python library for data manipulation and analysis. It provides data structures for efficiently storing large datasets and t

2025-05-12 16:10:06,295 - httpx - INFO - HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2025-04-23 "HTTP/1.1 200 OK"
2025-05-12 16:10:06,296 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished generate for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/embeddings?version=2025-04-23'
2025-05-12 16:10:06,789 - httpx - INFO - HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23 "HTTP/1.1 200 OK"
2025-05-12 16:10:06,791 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished chat for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23'


Checker response: PARTIAL
[DEBUG] _decide_after_relevance_check -> relevant
[DEBUG] Entered _research_step with question='Summarize the key findings in the document.'
ResearchAgent.generate called with question='Summarize the key findings in the document.' and 9 documents.
Combined context length: 5014 characters.
Prompt created for the LLM.
Sending prompt to the model...


2025-05-12 16:10:38,557 - httpx - INFO - HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23 "HTTP/1.1 200 OK"
2025-05-12 16:10:38,558 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished chat for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23'


LLM response received.
Raw LLM response:
There are no specific key findings in the provided document as it appears to be a reference guide or documentation for Pandas, a Python library for data manipulation and analysis. The document provides an overview of various Pandas functions and operations, including data frame creation, merging, concatenation, and data manipulation.

However, some key points that can be summarized from the document are:

1. Pandas provides various functions for data frame creation, including `df.mean()`, `df.median()`, `df.min()`, `df.max()`, `df.std()`, `df.sum()`, `df.count()`, and `df.quantile(q)`.

2. Data frames can be merged based on common columns using `pd.merge()` with different merge types, including inner, left, right, and outer merges.

3. Data frames can be concatenated along rows (vertically) or columns (horizontally) using `pd.concat()`.

4. Pandas provides various functions for data frame manipulation, including `df.head(n)`, `df.tail(n)`, `df.i

2025-05-12 16:10:39,829 - httpx - INFO - HTTP Request: POST https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23 "HTTP/1.1 200 OK"
2025-05-12 16:10:39,830 - ibm_watsonx_ai.wml_resource - INFO - Successfully finished chat for url: 'https://us-south.ml.cloud.ibm.com/ml/v1/text/chat?version=2025-04-23'
2025-05-12 16:10:39,831 - __main__ - INFO - [DEBUG] Verification successful, ending workflow.
2025-05-12 16:10:39,832 - __main__ - INFO - Workflow reached end step


LLM response received.
Raw LLM response:
Supported: YES
Unsupported Claims: None
Contradictions: None
Relevant: YES
Additional Details: The provided answer accurately summarizes the key functions and operations of the Pandas library for data manipulation and analysis, as described in the context. It covers data frame creation, merging, concatenation, and data manipulation functions, providing direct factual support for each point. The answer is highly relevant to the question, as it directly addresses the functions and operations of the Pandas library.
Verification report:
**Supported:** YES
**Unsupported Claims:** None
**Contradictions:** None
**Relevant:** YES
**Additional Details:** None

Context used: | Function       | Description                                                                           |
|----------------|---------------------------------------------------------------------------------------|
| df.mean()      | Calculates the mean of each column.                 