<h1 style="
    color: #f09; 
    background-color: #ffe5ec; 
    font-family: serif; 
    text-align: center; 
    padding: 10px; 
    border-radius: 5px;
    box-shadow: 0px 3px 8px rgba(0, 0, 0, 0.3);
    margin-top: 20px;
">
    Developing an Adaptive RAG Pipeline - Enhancing Retrieval-Augmented LLMs with Query Analysis, Active Self-RAG, and Web Search
</h1>


<h2 style="
    color: #fb6f92; 
    background-color: #0a0a0a; 
    font-family: serif;
    text-align: center; 
    padding: 15px; 
    border-radius: 10px;
    box-shadow: 0px 4px
           10px rgba(0, 0, 0, 0.7); 
    width: 100%; 
    margin: 20px auto;
">
    1. Project Introduction
</h2>


![Alt text](Adaptive_RAG_Pipeline_Architechure.png)


<h1>Adaptive Retrieval-Augmented Generation (RAG) Pipeline</h1>
<p>This project focuses on building an <strong>Adaptive Retrieval-Augmented Generation (RAG)</strong> pipeline. The strategy combines <strong>(1) Query Analysis</strong> with <strong>(2) Self-RAG</strong>, dynamically tailoring retrieval and generation processes to optimize efficiency and accuracy.</p>

<h2 style="color:#f09">Problem Description</h2>
<p>While Retrieval-Augmented Generation (RAG) systems enhance response accuracy by combining generative capabilities of large language models (LLMs) with retrieval mechanisms, existing systems face key challenges:</p>
<ul>
    <li><strong>Handling Simple Queries:</strong> Current systems often apply unnecessary computational overhead to simple queries, making them inefficient.</li>
    <li><strong>Addressing Complex Queries:</strong> These systems struggle with multi-step queries, leading to incomplete or suboptimal responses.</li>
</ul>
<p>Real-world queries exhibit varying complexity levels, necessitating a dynamic and adaptive approach to optimize performance for diverse tasks.</p>

<h2 style="color:#f09">Solution</h2>
<p>The <strong>Adaptive RAG Pipeline</strong> addresses these challenges by dynamically adjusting retrieval and generation strategies based on query complexity. This ensures:</p>
<ul>
    <li><strong>Efficiency:</strong> Simple queries are handled with lightweight or no retrieval, minimizing resource usage.</li>
    <li><strong>Accuracy:</strong> Complex queries are tackled with iterative retrieval and multi-step reasoning to provide accurate and context-aware responses.</li>
    <li><strong>Adaptability:</strong> A query classifier selects the optimal strategy (e.g., Self-RAG or web search) based on query complexity.</li>
</ul>

<h2 style="color:#f09">Project Flow</h2>
<p style="color:red">To implement this pipeline, the project uses <strong>state machines</strong> for constructing and managing diverse RAG flows, leveraging LangGraph for flow engineering. Nodes and edges will represent components of the pipeline, compiled into a cohesive architecture.</p>


<h3>Phase 1: Pre-Retrieval</h3>
<ul>
    <li><strong>Indexing:</strong> Prepare data for efficient retrieval by creating indexed representations of external knowledge bases.</li>
    <li><strong>Query Analysis and Routing:</strong> Analyze query complexity and determine routing:
        <ul>
            <li>If the query is related to the index, route it to <strong>Self-RAG.</strong></li>
            <li>If the query is unrelated to the index, route it to <strong>web search.</strong></li>
        </ul>
    </li>
</ul>

<h3>Phase 2A: Active Self-RAG</h3>
<p>Build nodes and edges for an <strong>Active Self-RAG</strong> process to handle queries related to the indexed data. This involves:</p>
<ul>
    <li><strong>Active RAG:</strong> Dynamically decides <em>when</em> and <em>what</em> to retrieve, rewrite, or re-retrieve based on query analysis and LLM reasoning.</li>
    <li><strong>Self-RAG:</strong> Incorporates self-reflection and self-grading on retrieved documents and generated responses, improving quality through iterative refinement.</li>
</ul>

<h3>Phase 2B: Active Web Search</h3>
<p>Build nodes and edges for integrating web search into the pipeline for queries unrelated to the indexed data. Use retrieval and generation strategies tailored to external web-based information.</p>

<h3>Phase 3: Graph Construction</h3>
<p>The Adaptive RAG pipeline uses a <strong>state graph</strong> to represent the flow of operations. This phase involves:</p>
<ul>
    <li><strong>Defining Nodes:</strong> Each node represents a specific operation in the pipeline, such as retrieving documents, grading their relevance, or generating an answer.</li>
    <li><strong>Adding Edges:</strong> Edges connect nodes and define the sequence of operations. Conditional edges determine the next step based on the output of the current node, allowing for dynamic decision-making.</li>
    <li><strong>Workflow Compilation:</strong> All nodes and edges are compiled into a cohesive workflow using LangGraph, which supports flexible flow engineering and dynamic execution.</li>
</ul>
<p>For example, the pipeline dynamically routes a query through nodes like <strong>retrieve</strong>, <strong>grade_documents</strong>, or <strong>web_search</strong>, depending on the query's complexity and relevance to indexed data.</p>

<h3>Phase 4: Graph Usage</h3>
<p>In this phase, the constructed graph is executed to process user queries:</p>
<ul>
    <li><strong>Streaming Execution:</strong> The graph is streamed iteratively, with each node updating the graph's state.</li>
    <li><strong>Dynamic Routing:</strong> Conditional edges ensure that queries are routed to the appropriate nodes (e.g., <strong>web_search</strong> for unrelated queries or <strong>generate</strong> for producing answers).</li>
    <li><strong>State Updates:</strong> Each node processes the input graph state and produces an updated state, which is passed to the next node.</li>
    <li><strong>Final Generation:</strong> The pipeline outputs a comprehensive response to the user’s query, whether retrieved from indexed data or web-based information.</li>
</ul>
<p>This dynamic execution ensures an efficient and accurate response generation process, adapting seamlessly to the complexity of each query.</p>

<h2 style="color:#f09">Usage of RRR Framework</h2>
<p>The project incorporates the <strong>Rewriting, Re-ranking, and Re-retrieving (RRR)</strong> framework within the Adaptive RAG pipeline to enhance retrieval and generation processes:</p>
<ul>
    <li><strong>Rewriting:</strong> The <em>Query Rewriter</em> optimizes queries for better alignment with indexed data or retrieval mechanisms, especially when initial retrieval results are ambiguous or insufficient.</li>
    <li><strong>Re-ranking:</strong> The <em>Retrieval Grader</em> evaluates and prioritizes the relevance of retrieved documents, filtering out irrelevant ones to ensure high-quality inputs for answer generation.</li>
    <li><strong>Re-retrieving:</strong> The pipeline iteratively refines retrieval using the <em>Transform Query</em> node, which rewrites the question to address gaps in the initial retrieval phase.</li>
</ul>
<p>By leveraging the RRR framework, the Adaptive RAG pipeline ensures a robust and adaptive retrieval process, balancing efficiency and accuracy for diverse query complexities.</p>




<h2 style="color:#f09">Referenced Paper</h2>
<p><em>Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity</em></p>
<p><a href="https://arxiv.org/abs/2403.14403" target="_blank">https://arxiv.org/abs/2403.14403</a></p>


<h2 style="
    color: #fb6f92; 
    background-color: #0a0a0a; 
    font-family: serif;
    text-align: center; 
    padding: 15px; 
    border-radius: 10px;
    box-shadow: 0px 4px
           10px rgba(0, 0, 0, 0.7); 
    width: 100%; 
    margin: 20px auto;
">
    2. Project Implementation 
</h2>


## 2.1 Setup

First, let's install our required packages and set our API keys

In [1]:
%%capture --no-stderr
! pip install -U langchain_community tiktoken langchain-openai langchain-cohere langchainhub chromadb langchain langgraph  tavily-python

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
plantcv 3.14.1 requires numpy<1.23,>=1.11, but you have numpy 1.26.4 which is incompatible.
langchain-chroma 0.1.4 requires chromadb!=0.5.4,!=0.5.5,<0.6.0,>=0.4.0, but you have chromadb 0.6.2 which is incompatible.


<div class="admonition tip">
    <p class="admonition-title">Here we will set up <a href="https://smith.langchain.com">LangSmith</a> for LangGraph development. We will use LangSmith to quickly spot issues and improve the performance of your LangGraph projects. LangSmith lets you use trace data to debug, test, and monitor your LLM apps built with LangGraph — read more about how to get started <a href="https://docs.smith.langchain.com">here</a>. 
    </p>
</div>    

In [19]:
import os
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = ""
os.environ['OPENAI_API_KEY'] = ""
os.environ['TAVILY_API_KEY']= ""

## 2.2 Phase 1: Pre-Retrieval

<h3>How it works?</h3>

<h3 style="color:red">Step 1: Creating Index/Vectorstore</h3>
<p>For indexing, we have utilized the <strong>chunk optimization indexing technique</strong> for efficient retrieval.</p>
<h3 style="color:red">Step 2: Defining LLMs for Query Analysis and Routing</h3>
<p>We use an LLM-based routing mechanism to analyze the query and decide whether to route it to the vectorstore or web search based on the query's content.</p>
<h3 style="color:red">Step 3: Constructing the Graph</h3>
<p>We capture the flow as a graph, where the graph state includes information about the query, retrieved documents, and any generated responses.</p>

<h3 style="color:red">Step 4:  Defining Graph Edges as Functions</h3>
<p>The graph edges are defined as functions that determine the flow based on the current graph state.</p>

<p>In this phase, the query is analyzed and routed either to the <strong>vectorstore</strong> for retrieval or to <strong>web search</strong> for external information. This pre-retrieval phase ensures optimal resource utilization and accuracy in subsequent steps.</p>


### 2.2.1 Step 1: Creating Index/Vectorstore

For indexing techniques, here we have used chunk optimization indexing techniques

In [2]:
### Build Index

from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

### from langchain_cohere import CohereEmbeddings

# Set embeddings
embd = OpenAIEmbeddings()

# Docs to index
urls = [
    "https://lilianweng.github.io/posts/2023-06-23-agent/",
    "https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/",
    "https://lilianweng.github.io/posts/2023-10-25-adv-attack-llm/",
]

# Load
docs = [WebBaseLoader(url).load() for url in urls]
docs_list = [item for sublist in docs for item in sublist]

# Split
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=500, chunk_overlap=0
)
doc_splits = text_splitter.split_documents(docs_list)

# Add to vectorstore
vectorstore = Chroma.from_documents(
    documents=doc_splits,
    collection_name="rag-chroma",
    embedding=embd,
)
retriever = vectorstore.as_retriever()

USER_AGENT environment variable not set, consider setting it to identify your requests.


### 2.2.2 Step 2: Defining LLMs for Query Analysis and Routing

In [3]:
### Router

from typing import Literal

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field
from langchain_openai import ChatOpenAI


# Data model
class RouteQuery(BaseModel):
    """Route a user query to the most relevant datasource."""

    datasource: Literal["vectorstore", "web_search"] = Field(
        ...,
        description="Given a user question choose to route it to web search or a vectorstore.",
    )


# LLM with function call
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm_router = llm.with_structured_output(RouteQuery)

# Prompt
system = """You are an expert at routing a user question to a vectorstore or web search.
The vectorstore contains documents related to agents, prompt engineering, and adversarial attacks.
Use the vectorstore for questions on these topics. Otherwise, use web-search."""
route_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
)

question_router = route_prompt | structured_llm_router
print(question_router.invoke({"question": "Who will the Bears draft first in the NFL draft?"}))
print(question_router.invoke({"question": "What are the types of agent memory?"}))


For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  exec(code_obj, self.user_global_ns, self.user_ns)


datasource='web_search'
datasource='vectorstore'


### 2.2.3 Step 3: Constructing the Graph¶

In [4]:
#We will be capturing the flow as a graph. Hence we first define the graph state.
from typing import List

from typing_extensions import TypedDict


class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: question
        generation: LLM generation
        documents: list of documents
    """

    question: str
    generation: str
    documents: List[str]

### 2.2.4: Step 4: Defining Graph Edges as Functions

In [5]:
### Edges ###
def route_question(state):
    """
    Route question to web search or RAG.

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("---ROUTE QUESTION---")
    question = state["question"]
    source = question_router.invoke({"question": question})
    if source.datasource == "web_search":
        print("---ROUTE QUESTION TO WEB SEARCH---")
        return "web_search"
    elif source.datasource == "vectorstore":
        print("---ROUTE QUESTION TO RAG---")
        return "vectorstore"

## 2.3 Phase 2A: Active Self-RAG (if query is related to index)


### 2.3.1 Description of Pipeline


<h2>Active Self-RAG Pipeline</h2>

<h3 style="color:red">Step 1: Building Required LLMs</h3>
<p>The following LLMs are essential for the <strong>Active Self-RAG</strong> pipeline, each playing a specific role in retrieval, evaluation, generation, and query refinement:</p>
<ol>
    <li><strong>Answer Generator</strong>
        <ul>
            <li><strong>Purpose:</strong> Generate a comprehensive response based on the input question and retrieved documents.</li>
            <li><strong>Output:</strong> Generated answer as text.</li>
        </ul>
    </li>
    <li><strong>Retrieval Grader</strong>
        <ul>
            <li><strong>Purpose:</strong> Evaluate whether a retrieved document is relevant to the user’s query.</li>
            <li><strong>Output:</strong> Binary decision (<code>Yes</code> or <code>No</code>).</li>
        </ul>
    </li>
    <li><strong>Hallucination Grader</strong>
        <ul>
            <li><strong>Purpose:</strong> Assess if the LLM-generated response is grounded in the retrieved facts.</li>
            <li><strong>Output:</strong> Binary decision (<code>Yes</code> or <code>No</code>).</li>
        </ul>
    </li>
    <li><strong>Answer Grader</strong>
        <ul>
            <li><strong>Purpose:</strong> Determine if the LLM-generated response directly addresses the user’s query.</li>
            <li><strong>Output:</strong> Binary decision (<code>Yes</code> or <code>No</code>).</li>
        </ul>
    </li>
    <li><strong>Question Rewriter</strong>
        <ul>
            <li><strong>Purpose:</strong> Improve the phrasing of a previous LLM-generated question to enhance retrieval and generation quality.</li>
            <li><strong>Output:</strong> Rewritten question as text.</li>
        </ul>
    </li>
</ol>

<h3 style="color:red">Step 2: Chains Construction</h3>
<p>The LLMs are encapsulated in chains by combining:</p>
<ul>
    <li><strong>Prompt templates:</strong> Define input-output formatting.</li>
    <li><strong>LLM models:</strong> To process inputs and produce outputs.</li>
    <li><strong>StrOutputParser():</strong> For handling string-based outputs.</li>
</ul>
<p><strong>Constructed Chains:</strong></p>
<ol>
    <li><strong>answer_generator:</strong> For generating responses.</li>
    <li><strong>retrieval_grader:</strong> For filtering irrelevant documents.</li>
    <li><strong>hallucination_grader:</strong> For validating grounding of generated answers.</li>
    <li><strong>answer_grader:</strong> For checking if the query is addressed.</li>
    <li><strong>question_rewriter:</strong> For refining questions.</li>
</ol>

<h3 style="color:red">Step 3: Nodes Construction</h3>
<p>Nodes define functions that take the current graph state and return an updated graph state.</p>
<p><strong>Example state structure:</strong></p>
<pre>
state = {"documents": documents, "question": question, "generation": generation}
</pre>

<p><strong>Nodes:</strong></p>
<ol>
    <li><strong>retrieve(state):</strong>
        <ul>
            <li><strong>Functionality:</strong> Uses the retriever (from the index/vector store) to fetch relevant documents for the query.</li>
            <li><strong>Updates:</strong> Adds <code>documents</code> to the state.</li>
        </ul>
    </li>
    <li><strong>generate(state):</strong>
        <ul>
            <li><strong>Functionality:</strong> Passes the <code>documents</code> and <code>question</code> as input to the <strong>Answer Generator</strong> chain to create a response.</li>
            <li><strong>Updates:</strong> Adds <code>generation</code> to the state.</li>
        </ul>
    </li>
    <li><strong>grade_documents(state):</strong>
        <ul>
            <li><strong>Functionality:</strong> Filters out irrelevant documents by evaluating each document using the <strong>Retrieval Grader</strong>.</li>
            <li><strong>Updates:</strong> Retains only relevant <code>documents</code> in the state.</li>
        </ul>
    </li>
    <li><strong>transform_query(state):</strong>
        <ul>
            <li><strong>Functionality:</strong> Uses the <strong>Question Rewriter</strong> to refine the input question.</li>
            <li><strong>Updates:</strong> Replaces the <code>question</code> with the rewritten version in the state.</li>
        </ul>
    </li>
</ol>

<h3 style="color:red">Step 4: Router Construction</h3>
<p><strong>Routers:</strong></p>
<ol>
    <li><strong>decide_to_generate:</strong>
        <ul>
            <li><strong>Input:</strong> Filtered documents from <code>grade_documents</code>.</li>
            <li><strong>Conditions:</strong>
                <ul>
                    <li>If no relevant documents: Return <code>"transform_query"</code>.</li>
                    <li>If relevant documents exist: Return <code>"generate"</code>.</li>
                </ul>
            </li>
        </ul>
    </li>
    <li><strong>grade_generation_v_documents_and_question:</strong>
        <ul>
            <li><strong>Input:</strong> Generated response and retrieved documents.</li>
            <li><strong>Conditions:</strong>
                <ul>
                    <li>If <strong>hallucination_grader</strong> returns <code>Yes</code>:
                        <ul>
                            <li>Pass to <strong>answer_grader</strong>:
                                <ul>
                                    <li>If <strong>answer_grader</strong> returns <code>Yes</code>: Return <code>"useful"</code> (end pipeline).</li>
                                    <li>If <strong>answer_grader</strong> returns <code>No</code>: Return <code>"not useful"</code> (route to <code>transform_query</code>).</li>
                                </ul>
                            </li>
                        </ul>
                    </li>
                    <li>If <strong>hallucination_grader</strong> returns <code>No</code>: Return <code>"not supported"</code> (route back to <code>generate</code> for regeneration).</li>
                </ul>
            </li>
        </ul>
    </li>
</ol>


### 2.3.2 Step 1 & 2: Define LLMs and Constucting chains that needs to be used

In [6]:
### 1.Retrieval Grader
# Data model
class GradeDocuments(BaseModel):
    """Binary score for relevance check on retrieved documents."""

    binary_score: str = Field(
        description="Documents are relevant to the question, 'yes' or 'no'"
    )


# LLM with function call
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeDocuments)

# Prompt
system = """You are a grader assessing relevance of a retrieved document to a user question. \n 
    If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n
    It does not need to be a stringent test. The goal is to filter out erroneous retrievals. \n
    Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."""
grade_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Retrieved document: \n\n {document} \n\n User question: {question}"),
    ]
)

retrieval_grader = grade_prompt | structured_llm_grader
question = "agent memory"
docs = retriever.get_relevant_documents(question)
doc_txt = docs[1].page_content
print(retrieval_grader.invoke({"question": question, "document": doc_txt}))

  docs = retriever.get_relevant_documents(question)


binary_score='no'


In [7]:
### 2. ANSWER GENERATOR

from langchain import hub
from langchain_core.output_parsers import StrOutputParser

# Prompt
prompt = hub.pull("rlm/rag-prompt")
# LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
# Post-processing
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

question = "agent memory"

# Chain
answer_generator = prompt | llm | StrOutputParser()
# Run
generation = answer_generator.invoke({"context": docs, "question": question})
print(generation)

Agent memory includes short-term memory for in-context learning and long-term memory for retaining and recalling information over extended periods. The memory module records agents' experiences in natural language and surfaces context to inform behavior based on relevance, recency, and importance. Reflection mechanisms synthesize memories into higher-level inferences over time to guide future behavior.


In [8]:
### 3. Hallucination Grader


# Data model
class GradeHallucinations(BaseModel):
    """Binary score for hallucination present in generation answer."""

    binary_score: str = Field(
        description="Answer is grounded in the facts, 'yes' or 'no'"
    )


# LLM with function call
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeHallucinations)

# Prompt
system = """You are a grader assessing whether an LLM generation is grounded in / supported by a set of retrieved facts. \n 
     Give a binary score 'yes' or 'no'. 'Yes' means that the answer is grounded in / supported by the set of facts."""
hallucination_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "Set of facts: \n\n {documents} \n\n LLM generation: {generation}"),
    ]
)

hallucination_grader = hallucination_prompt | structured_llm_grader
hallucination_grader.invoke({"documents": docs, "generation": generation})

GradeHallucinations(binary_score='yes')

In [9]:
### 4. Answer Grader

# Data model
class GradeAnswer(BaseModel):
    """Binary score to assess answer addresses question."""

    binary_score: str = Field(
        description="Answer addresses the question, 'yes' or 'no'"
    )


# LLM with function call
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm_grader = llm.with_structured_output(GradeAnswer)

# Prompt
system = """You are a grader assessing whether an answer addresses / resolves a question \n 
     Give a binary score 'yes' or 'no'. Yes' means that the answer resolves the question."""
answer_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "User question: \n\n {question} \n\n LLM generation: {generation}"),
    ]
)

answer_grader = answer_prompt | structured_llm_grader
answer_grader.invoke({"question": question, "generation": generation})

GradeAnswer(binary_score='yes')

In [10]:
### 5. Question Re-writer

# LLM
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)

# Prompt
system = """You a question re-writer that converts an input question to a better version that is optimized \n 
     for vectorstore retrieval. Look at the input and try to reason about the underlying semantic intent / meaning."""
re_write_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        (
            "human",
            "Here is the initial question: \n\n {question} \n Formulate an improved question.",
        ),
    ]
)

question_rewriter = re_write_prompt | llm | StrOutputParser()
question_rewriter.invoke({"question": question})

"What is the role of memory in an agent's functioning?"

### 2.3.3 Step 3: Define graph nodes using functions

<p><strong>Graph Nodes Defined:</strong></p>
<ol>
    <li><strong>retrieve(state):</strong>
        <ul>
            <li><strong>Functionality:</strong> Uses the retriever (from the index/vector store) to fetch relevant documents for the query.</li>
            <li><strong>Updates:</strong> Adds <code>documents</code> to the state.</li>
        </ul>
    </li>
    <li><strong>generate(state):</strong>
        <ul>
            <li><strong>Functionality:</strong> Passes the <code>documents</code> and <code>question</code> as input to the <strong>Answer Generator</strong> chain to create a response.</li>
            <li><strong>Updates:</strong> Adds <code>generation</code> to the state.</li>
        </ul>
    </li>
    <li><strong>grade_documents(state):</strong>
        <ul>
            <li><strong>Functionality:</strong> Filters out irrelevant documents by evaluating each document using the <strong>Retrieval Grader</strong>.</li>
            <li><strong>Updates:</strong> Retains only relevant <code>documents</code> in the state.</li>
        </ul>
    </li>
    <li><strong>transform_query(state):</strong>
        <ul>
            <li><strong>Functionality:</strong> Uses the <strong>Question Rewriter</strong> to refine the input question.</li>
            <li><strong>Updates:</strong> Replaces the <code>question</code> with the rewritten version in the state.</li>
        </ul>
    </li>
</ol>


In [11]:
def retrieve(state):
    """
    Retrieve documents

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, documents, that contains retrieved documents
    """
    print("---RETRIEVE---")
    question = state["question"]

    # Retrieval
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question}


def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    print("---GENERATE---")
    question = state["question"]
    documents = state["documents"]

    # RAG generation
    generation = answer_generator.invoke({"context": documents, "question": question})
    return {"documents": documents, "question": question, "generation": generation}


def grade_documents(state):
    """
    Determines whether the retrieved documents are relevant to the question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates documents key with only filtered relevant documents
    """

    print("---CHECK DOCUMENT RELEVANCE TO QUESTION---")
    question = state["question"]
    documents = state["documents"]

    # Score each doc
    filtered_docs = []
    for d in documents:
        score = retrieval_grader.invoke(
            {"question": question, "document": d.page_content}
        )
        grade = score.binary_score
        if grade == "yes":
            print("---GRADE: DOCUMENT RELEVANT---")
            filtered_docs.append(d)
        else:
            print("---GRADE: DOCUMENT NOT RELEVANT---")
            continue
    return {"documents": filtered_docs, "question": question}


def transform_query(state):
    """
    Transform the query to produce a better question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates question key with a re-phrased question
    """

    print("---TRANSFORM QUERY---")
    question = state["question"]
    documents = state["documents"]

    # Re-write question
    better_question = question_rewriter.invoke({"question": question})
    return {"documents": documents, "question": better_question}



### 2.3.4 Step 4: Define routers (using functions) for graph conditional edges 

<h3>Routers Construction</h3>
<p><strong>Router:</strong></p>
<ol>
    <li><strong>decide_to_generate:</strong>
        <ul>
            <li><strong>Input:</strong> Filtered documents from <code>grade_documents</code>.</li>
            <li><strong>Conditions:</strong>
                <ul>
                    <li>If no relevant documents: Return <code>"transform_query"</code>.</li>
                    <li>If relevant documents exist: Return <code>"generate"</code>.</li>
                </ul>
            </li>
        </ul>
    </li>
    <li><strong>grade_generation_v_documents_and_question:</strong>
        <ul>
            <li><strong>Input:</strong> Generated response and retrieved documents.</li>
            <li><strong>Conditions:</strong>
                <ul>
                    <li>If <strong>hallucination_grader</strong> returns <code>Yes</code>:
                        <ul>
                            <li>Pass to <strong>answer_grader</strong>:
                                <ul>
                                    <li>If <strong>answer_grader</strong> returns <code>Yes</code>: Return <code>"useful"</code> (end pipeline).</li>
                                    <li>If <strong>answer_grader</strong> returns <code>No</code>: Return <code>"not useful"</code> (route to <code>transform_query</code>).</li>
                                </ul>
                            </li>
                        </ul>
                    </li>
                    <li>If <strong>hallucination_grader</strong> returns <code>No</code>: Return <code>"not supported"</code> (route back to <code>generate</code> for regeneration).</li>
                </ul>
            </li>
        </ul>
    </li>
</ol>

In [12]:
def decide_to_generate(state):
    """
    Determines whether to generate an answer, or re-generate a question.

    Args:
        state (dict): The current graph state

    Returns:
        str: Binary decision for next node to call
    """

    print("---ASSESS GRADED DOCUMENTS---")
    state["question"]
    filtered_documents = state["documents"]

    if not filtered_documents:
        # All documents have been filtered check_relevance
        # We will re-generate a new query
        print(
            "---DECISION: ALL DOCUMENTS ARE NOT RELEVANT TO QUESTION, TRANSFORM QUERY---"
        )
        return "transform_query"
    else:
        # We have relevant documents, so generate answer
        print("---DECISION: GENERATE---")
        return "generate"


def grade_generation_v_documents_and_question(state):
    """
    Determines whether the generation is grounded in the document and answers question.

    Args:
        state (dict): The current graph state

    Returns:
        str: Decision for next node to call
    """

    print("---CHECK HALLUCINATIONS---")
    question = state["question"]
    documents = state["documents"]
    generation = state["generation"]

    score = hallucination_grader.invoke(
        {"documents": documents, "generation": generation}
    )
    grade = score.binary_score

    # Check hallucination
    if grade == "yes":
        print("---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---")
        # Check question-answering
        print("---GRADE GENERATION vs QUESTION---")
        score = answer_grader.invoke({"question": question, "generation": generation})
        grade = score.binary_score
        if grade == "yes":
            print("---DECISION: GENERATION ADDRESSES QUESTION---")
            return "useful"
        else:
            print("---DECISION: GENERATION DOES NOT ADDRESS QUESTION---")
            return "not useful"
    else:
        pprint("---DECISION: GENERATION IS NOT GROUNDED IN DOCUMENTS, RE-TRY---")
        return "not supported"

## 2.4 Phase 2B: Web Search (If Query is unrelated to index)

### 2.4.1 Description of Pipeline


<h2>Web Search Pipeline</h2>
<p>The web search pipeline is utilized when the query is unrelated to the indexed data. It incorporates a web search tool to fetch relevant real-time results.</p>

<h3>How it works:</h3>
<ol>
    <li>
        <strong style="color:red">Step 1: Building the Web Search Tool</strong>
        <ul>
            <li>The pipeline uses the <code>TavilySearchResults</code> tool, built on the Tavily API, which is specifically optimized for AI agents (LLMs).</li>
            <li>It retrieves the top 3 search results (<code>k=3</code>) and delivers real-time, accurate, and factual information.</li>
        </ul>
    </li>
    <li>
        <strong style="color:red">Step 2: Defining Graph Nodes as Functions</strong>
        <ul>
            <li>
                <strong><code>web_search</code> Node:</strong>
                <ul>
                    <li><strong>Input:</strong> Current graph state containing the query (<code>state["question"]</code>).</li>
                    <li><strong>Operation:</strong>
                        <ol>
                            <li>Extract the query from the graph state.</li>
                            <li>Pass the query to the <code>web_search_tool</code> to retrieve the top search results.</li>
                            <li>Format the results into a <code>Document</code> object using the <code>langchain.schema.Document</code> class.</li>
                        </ol>
                    </li>
                    <li><strong>Output:</strong> Updated graph state with the retrieved documents stored in the <code>state["documents"]</code> key.</li>
                </ul>
            </li>
        </ul>
    </li>
</ol>

### 2.4.2 Step 1: Building the Web Search tool

In [13]:
from langchain_community.tools.tavily_search import TavilySearchResults
web_search_tool = TavilySearchResults(k=3)

### 2.4.3 Step 2: Defining Graph Nodes as Functions

In [14]:
from langchain.schema import Document
def web_search(state):
    """
    Web search based on the re-phrased question.

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Updates documents key with appended web results
    """

    print("---WEB SEARCH---")
    question = state["question"]

    # Web search
    docs = web_search_tool.invoke({"query": question})
    web_results = "\n".join([d["content"] for d in docs])
    web_results = Document(page_content=web_results)

    return {"documents": web_results, "question": question}

## 2.5 Graph Construction

<h1>Adaptive RAG Pipeline: Flow Construction Using <code>langgraph.graph</code></h1>

<p>To implement the Adaptive Retrieval-Augmented Generation (RAG) process, we use the <code>langgraph.graph</code> library for efficient flow engineering. Here's how the pipeline is constructed:</p>

<h2>1. Defining the Workflow</h2>
<p>The pipeline begins with a <strong>StateGraph</strong>, which represents the entire workflow. This graph operates on a central state object (e.g., query, retrieved documents, generated responses). Nodes in the graph interact with this state, updating key-value pairs as operations progress.</p>

<h2>2. Adding Nodes</h2>
<p>Nodes represent distinct stages of the RAG pipeline. Each node corresponds to a predefined function or process:</p>
<ul>
    <li><strong><code>web_search</code>:</strong> Executes web searches for queries unrelated to the index.</li>
    <li><strong><code>retrieve</code>:</strong> Fetches relevant documents from the vectorstore.</li>
    <li><strong><code>grade_documents</code>:</strong> Evaluates the relevance of retrieved documents.</li>
    <li><strong><code>generate</code>:</strong> Uses the LLM to generate answers based on the documents and query.</li>
    <li><strong><code>transform_query</code>:</strong> Rewrites the query for better retrieval or regeneration.</li>
</ul>

<h2>3. Adding Edges and Conditional Edges</h2>
<p>Edges define the workflow sequence. There are two types of edges:</p>
<ul>
    <li><strong>Normal Edges:</strong> Always execute the next node in sequence. For example, after retrieving documents, grading their relevance always follows.</li>
    <li><strong>Conditional Edges:</strong> Decide the next node based on certain conditions, typically evaluated by a routing function. For example:
        <ul>
            <li>After query analysis, route the query to either <code>web_search</code> or <code>retrieve</code>.</li>
            <li>After document grading, decide whether to generate an answer or transform the query.</li>
        </ul>
    </li>
</ul>

<h2>Pipeline Flow Description</h2>
<ol>
    <li><strong>Start of Workflow:</strong> 
        <ul>
            <li>The graph begins with a query. A routing function (<code>route_question</code>) determines the first step:</li>
            <ul>
                <li>If the query is unrelated to the index, route to <code>web_search</code>.</li>
                <li>If related, route to <code>retrieve</code>.</li>
            </ul>
        </ul>
    </li>
    <li><strong>Web Search Node:</strong> 
        <ul>
            <li>If routed to <code>web_search</code>, the node fetches the top web results and sends them to the <code>generate</code> node.</li>
        </ul>
    </li>
    <li><strong>Retrieve Node:</strong> 
        <ul>
            <li>Fetch relevant documents from the vectorstore, then pass them to the <code>grade_documents</code> node.</li>
        </ul>
    </li>
    <li><strong>Document Grading Node:</strong> 
        <ul>
            <li>Filter irrelevant documents. Based on the filtered results:</li>
            <ul>
                <li>If no relevant documents remain, route to <code>transform_query</code>.</li>
                <li>If relevant documents are present, route to <code>generate</code>.</li>
            </ul>
        </ul>
    </li>
    <li><strong>Query Transformation Node:</strong> 
        <ul>
            <li>Rewrite the query to improve relevance, then send it back to the <code>retrieve</code> node.</li>
        </ul>
    </li>
    <li><strong>Generation Node:</strong> 
        <ul>
            <li>Generate an answer using the documents and query. Evaluate the generation with:</li>
            <ul>
                <li><strong>Hallucination Grader:</strong> Checks if the answer is grounded in retrieved documents.</li>
                <li><strong>Answer Grader:</strong> Ensures the answer addresses the query.</li>
                <li>If both pass, the workflow ends with the generated response.</li>
                <li>If not, regenerate (hallucination) or transform the query(answer did not address question).</li>
            </ul>
        </ul>
    </li>
</ol>

<p>This pipeline dynamically adjusts to query complexity, ensuring efficient and accurate responses by combining retrieval-augmented techniques and adaptive query processing.</p>


In [15]:
from langgraph.graph import END, StateGraph, START

workflow = StateGraph(GraphState)

# Define the nodes
workflow.add_node("web_search", web_search)  # web search
workflow.add_node("retrieve", retrieve)  # retrieve
workflow.add_node("grade_documents", grade_documents)  # grade documents
workflow.add_node("generate", generate)  # generatae
workflow.add_node("transform_query", transform_query)  # transform_query

# Build graph
workflow.add_conditional_edges(
    START,
    route_question,
    {
        "web_search": "web_search",
        "vectorstore": "retrieve",
    },
)
workflow.add_edge("web_search", "generate")
workflow.add_edge("retrieve", "grade_documents")
workflow.add_conditional_edges(
    "grade_documents",
    decide_to_generate,
    {
        "transform_query": "transform_query",
        "generate": "generate",
    },
)
workflow.add_edge("transform_query", "retrieve")
workflow.add_conditional_edges(
    "generate",
    grade_generation_v_documents_and_question,
    {
        "not supported": "generate",
        "useful": END,
        "not useful": "transform_query",
    },
)

# Compile
app = workflow.compile()

## 2.6 Use Graph

 <h3>1. Inputs and Initialization</h3>
<p>The pipeline starts with an input dictionary containing the user’s question.</p>
<ul>
    <li><strong>Example Question 1:</strong> "What player at the Bears expected to draft first in the 2024 NFL draft?"</li>
    <li><strong>Example Question 2:</strong> "What are the types of agent memory?"</li>
</ul>

<h3>2. Streaming Execution of the Graph</h3>
<p><code>app.stream(inputs)</code> iteratively executes the nodes in the state graph. At each iteration:</p>
<ul>
    <li>The current node being executed is identified.</li>
    <li>The node’s name and the state after execution are printed.</li>
</ul>

<h3>3. Routing Questions</h3>
<p><strong>Routing Node:</strong> The first step in the pipeline determines whether the query should be handled by:</p>
<ul>
    <li><strong>Web Search:</strong> For questions unrelated to indexed documents.</li>
    <li><strong>RAG:</strong> For questions that align with the indexed data.</li>
</ul>
<p>Based on the routing decision, the next node is invoked.</p>

In [16]:
from pprint import pprint

# Run
inputs = {
    "question": "What player at the Bears expected to draft first in the 2024 NFL draft?"
}
for output in app.stream(inputs):
    for key, value in output.items():
        # Node
        pprint(f"Node '{key}':")
        # Optional: print full state at each node
        # pprint.pprint(value["keys"], indent=2, width=80, depth=None)
    pprint("\n---\n")

# Final generation
pprint(value["generation"])

---ROUTE QUESTION---
---ROUTE QUESTION TO WEB SEARCH---
---WEB SEARCH---
"Node 'web_search':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
"Node 'generate':"
'\n---\n'
('Caleb Williams from USC is expected to be drafted first by the Bears in the '
 '2024 NFL draft.')


Trace on LangSmith:https://smith.langchain.com/public/96c1df71-b9ee-4ffd-88de-b01e2974d6b2/r

In [17]:
# Run
inputs = {"question": "What are the types of agent memory?"}
for output in app.stream(inputs):
    for key, value in output.items():
        # Node
        pprint(f"Node '{key}':")
        # Optional: print full state at each node
        # pprint.pprint(value["keys"], indent=2, width=80, depth=None)
    pprint("\n---\n")

# Final generation
pprint(value["generation"])

---ROUTE QUESTION---
---ROUTE QUESTION TO RAG---
---RETRIEVE---
"Node 'retrieve':"
'\n---\n'
---CHECK DOCUMENT RELEVANCE TO QUESTION---
---GRADE: DOCUMENT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT NOT RELEVANT---
---GRADE: DOCUMENT RELEVANT---
---ASSESS GRADED DOCUMENTS---
---DECISION: GENERATE---
"Node 'grade_documents':"
'\n---\n'
---GENERATE---
---CHECK HALLUCINATIONS---
---DECISION: GENERATION IS GROUNDED IN DOCUMENTS---
---GRADE GENERATION vs QUESTION---
---DECISION: GENERATION ADDRESSES QUESTION---
"Node 'generate':"
'\n---\n'
('The types of agent memory are short-term memory, which is utilized for '
 'in-context learning and is restricted by the context window length, and '
 'long-term memory, which involves an external vector store for storing and '
 'recalling information over extended periods. Sensory memory is also '
 'mentioned as the earliest stage of memory, retaining sensory impressions '
 'briefly after stimuli end.')


Trace on LangSmith:https://smith.langchain.com/public/b833d773-47d3-4e5d-b385-6ef1ba3bce49/r