# Agentic RAG

In this tutorial we will build a retrieval agent. Retrieval agents are useful when you want an LLM to make a decision about whether to retrieve context from a vectorstore or respond to the user directly.

By the end of the tutorial we will have done the following:

Fetch and preprocess documents that will be used for retrieval.
Index those documents for semantic search and create a retriever tool for the agent.
Build an agentic RAG system that can decide when to use the retriever tool.
Screenshot 2024-02-14 at 3.43.58 PM.png

## Setup
Let's download the required packages and set our API keys:

In [None]:
%%capture --no-stderr
# %pip install -U --quiet langgraph "langchain[openai]" langchain-community langchain-text-splitters

### 1. Preprocess documents

1.Fetch documents to use in our RAG system.
We will use three of the most recent pages from Lilian Weng's excellent blog. We'll start by fetching the content of the pages using WebBaseLoader utility:

In [None]:
from langchain_community.document_loaders import WebBaseLoader
urls = [
    "https://lilianweng.github.io/posts/2024-11-28-reward-hacking/",
    "https://lilianweng.github.io/posts/2024-07-07-hallucination/",
    "https://lilianweng.github.io/posts/2024-04-12-diffusion-video/",
]

docs = [WebBaseLoader(url).load() for url in urls]

In [None]:
docs[0][0].page_content

In [None]:
len(docs)

### 2. Split the fetched documents into smaller chunks for indexing into our vectorstore:


In [None]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

docs_list = [item for sublist in docs for item in sublist]
docs_list

In [None]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=500, chunk_overlap=50
)
doc_splits = text_splitter.split_documents(docs_list)

In [None]:
doc_splits[0].page_content.strip()

### 3 Use an in-memory vectorstore to index our documents:



In [None]:
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings

vectorstore = InMemoryVectorStore.from_documents(
    documents=doc_splits, embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()
retriever

### 4. Create a retriever tool using LangChain's prebuilt  `create_retriever_tool`:

In [None]:
from langchain.tools.retriever import create_retriever_tool

retriever_tool = create_retriever_tool(
    retriever,
    "retrieve_blog_posts",
    "Search and return information about Lilian Weng blog posts.",
)

In [None]:
retriever_tool.invoke({"query": "types of reward hacking"})

# Generate Query

Now we will start building components ([nodes](https://langchain-ai.github.io/langgraph/concepts/low_level/#nodes) and [edges](https://langchain-ai.github.io/langgraph/concepts/low_level/#edges)) for our agentic RAG graph.

Note that the components will operate on the [MessagesState](https://langchain-ai.github.io/langgraph/concepts/low_level/#messagesstate) — graph state that contains a messages key with a list of [chat messages](https://python.langchain.com/docs/concepts/messages/?_gl=1*1o9160y*_gcl_au*ODEzMzAxMDkyLjE3NDg5MjA4ODQ.*_ga*MTkyMjk2NzY0Ny4xNzUyMDM0ODEx*_ga_47WX3HKKY2*czE3NTM5NDU0NjQkbzQ4JGcxJHQxNzUzOTQ1NDkyJGozMiRsMCRoMA..).

Build a `generate_query_or_respond` node. It will call an LLM to generate a response based on the current graph state (list of messages). Given the input messages, it will decide to retrieve using the retriever tool, or respond directly to the user. Note that we're giving the chat model access to the `retriever_tool` we created earlier via `.bind_tools`:

In [None]:
from langgraph.graph import MessagesState
from langchain.chat_models import init_chat_model

response_model = init_chat_model("openai:gpt-4.1", temperature=0)

def generate_query_or_respond(state: MessagesState):
    """Call the model to generate a response based on the current state. Given
    the question, it will decide to retrieve using the retriever tool, or simply respond to the user.
    """
    resp = response_model.bind_tools([retriever_tool]).invoke(state["messages"])

    return {"messages": [resp]}

Try it on a random input:

In [None]:
input = {"messages": [{"role": "user", "content": "hello!"}]}
generate_query_or_respond(input)["messages"][-1].pretty_print()

Ask a question that requires semantic search:

In [None]:
input = {
    "messages": [
        {
            "role": "user",
            "content": "What does Lilian Weng say about types of reward hacking?",
        }
    ]
}
response = generate_query_or_respond(input)["messages"][-1]
response.pretty_print()

## 3.1 ChatGPT 5 Generated Code for RAG

In [None]:
from typing import TypedDict, List, Literal
from langgraph.graph import StateGraph, END
from langchain_core.documents import Document
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate

# ---------- Setup (one-time) ----------
# 1) Build the vector store
docs = [  # however you load them
    Document(page_content="Llamas are cool. They live in the Andes.", metadata={"id": "llama_1"}),
    # ...
]
splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=120)
chunks = splitter.split_documents(docs)

emb = OpenAIEmbeddings(model="text-embedding-3-small")
vs = FAISS.from_documents(chunks, emb)
retriever = vs.as_retriever(search_kwargs={"k": 6})

# 2) LLM and prompt
llm = ChatOpenAI(model="gpt-4o-mini")  # pick what you like
PROMPT = ChatPromptTemplate.from_messages([
    ("system", "You are a precise, source-citing assistant. Use CONTEXT if helpful."),
    ("system", "CONTEXT:\n{context}"),
    ("human", "QUESTION:\n{question}")
])
PROMPT

In [None]:
# ---------- Graph State ----------
class RAGState(TypedDict):
    question: str
    retrieved: List[Document]
    context: str
    answer: str

# ---------- Nodes ----------
def retrieve_node(state: RAGState) -> RAGState:
    docs = retriever.invoke(state["question"])
    return {**state, "retrieved": docs}

def compose_context_node(state: RAGState) -> RAGState:
    # Simple render: title + excerpt + source id. You can add line numbers/citations here.
    lines = []
    for d in state.get("retrieved", []):
        src = d.metadata.get("id", "unknown")
        lines.append(f"[{src}] {d.page_content}")
    context = "\n\n".join(lines)
    return {**state, "context": context}

def answer_node(state: RAGState) -> RAGState:
    msg = PROMPT.format_messages(question=state["question"], context=state.get("context", ""))
    resp = llm.invoke(msg)
    return {**state, "answer": resp.content}

# ---------- Wiring the graph ----------
g = StateGraph(RAGState)
g.add_node("retrieve", retrieve_node)
g.add_node("compose", compose_context_node)
g.add_node("answer", answer_node)

g.set_entry_point("retrieve")
g.add_edge("retrieve", "compose")
g.add_edge("compose", "answer")
g.add_edge("answer", END)

app = g.compile()

# Run
result = app.invoke({"question": "Where do llamas live?"})
print(result["answer"])

# 4. Grade documents

Add a [conditional edge](https://langchain-ai.github.io/langgraph/concepts/low_level/#conditional-edges) — `grade_documents `— to determine whether the retrieved documents are relevant to the question. We will use a model with a structured output schema GradeDocuments for document grading. The `grade_documents` function will return the name of the node to go to based on the grading decision (`**generate_answer**` or `**rewrite_question**`):

In [None]:
from pydantic import BaseModel, Field
from typing import Literal
from langchain_core.prompts import ChatPromptTemplate



GRADE_PROMPT = (
    "You are a grader assessing relevance of a retrieved document to a user question. \n "
    "Here is the retrieved document: \n\n {context} \n\n"
    "Here is the user question: {question} \n"
    "If the document contains keyword(s) or semantic meaning related to the user question, grade it as relevant. \n"
    "Give a binary score 'yes' or 'no' score to indicate whether the document is relevant to the question."
)


class GradeDocuments(BaseModel):
    """Grade documents using a binary score for relevance check."""

    binary_score: str = Field(
        description="Relevance score: 'yes' if relevant, or 'no' if not relevant"
    )


grader_model = init_chat_model("openai:gpt-4.1", temperature=0)


def grade_documents(
    state: MessagesState,
) -> Literal["generate_answer", "rewrite_question"]:
    """Determine whether the retrieved documents are relevant to the question."""
    question = state["messages"][0].content
    context = state["messages"][-1].content

    prompt = GRADE_PROMPT.format(question=question, context=context)
    response = (
        grader_model
        .with_structured_output(GradeDocuments).invoke(
            [{"role": "user", "content": prompt}]
        )
    )
    score = response.binary_score

    if score == "yes":
        return "generate_answer"
    else:
        return "rewrite_question"

Run this with irrelevant documents in the tool response:


In [None]:
from langchain_core.messages import convert_to_messages

input = {
    "messages": convert_to_messages(
        [
            {
                "role": "user",
                "content": "What are the types of reward hacking?",
            },
            {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "1",
                        "name": "retrieve_blog_posts",
                        "args": {"query": "types of reward hacking"},
                    }
                ],
            },
            {"role": "tool", "content": "meow", "tool_call_id": "1"},
        ]
    )
}
grade_documents(input)

Confirm that the relevant documents are classified as such:


In [None]:
input = {
    "messages": convert_to_messages(
        [
            {
                "role": "user",
                "content": "What does Lilian Weng say about types of reward hacking?",
            },
            {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "1",
                        "name": "retrieve_blog_posts",
                        "args": {"query": "types of reward hacking"},
                    }
                ],
            },
            {
                "role": "tool",
                "content": "reward hacking can be categorized into two types: environment or goal misspecification, and reward tampering",
                "tool_call_id": "1",
            },
        ]
    )
}
grade_documents(input)

# 5. Rewrite question

Build the rewrite_question node. The retriever tool can return potentially irrelevant documents, which indicates a need to improve the original user question. To do so, we will call the rewrite_question node:

In [None]:
REWRITE_PROMPT = (
    "Look at the input and try to reason about the underlying semantic intent / meaning.\n"
    "Here is the initial question:"
    "\n ------- \n"
    "{question}"
    "\n ------- \n"
    "Formulate an improved question:"
)


def rewrite_question(state: MessagesState):
    """Rewrite the original user question."""
    messages = state["messages"]
    question = messages[0].content
    prompt = REWRITE_PROMPT.format(question=question)
    response = response_model.invoke([{"role": "user", "content": prompt}])
    return {"messages": [{"role": "user", "content": response.content}]}

In [None]:
input = {
    "messages": convert_to_messages(
        [
            {
                "role": "user",
                "content": "What does Lilian Weng say about types of reward hacking?",
            },
            {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "1",
                        "name": "retrieve_blog_posts",
                        "args": {"query": "types of reward hacking"},
                    }
                ],
            },
            {"role": "tool", "content": "meow", "tool_call_id": "1"},
        ]
    )
}

response = rewrite_question(input)
print(response["messages"][-1]["content"])

# 6. Generate an answer

Build `generate_answer `node: if we pass the grader checks, we can generate the final answer based on the original question and the retrieved context:

In [None]:
GENERATE_PROMPT = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer the question. "
    "If you don't know the answer, just say that you don't know. "
    "Use three sentences maximum and keep the answer concise.\n"
    "Question: {question} \n"
    "Context: {context}"
)


def generate_answer(state: MessagesState):
    """Generate an answer."""
    question = state["messages"][0].content
    context = state["messages"][-1].content
    prompt = GENERATE_PROMPT.format(question=question, context=context)
    response = response_model.invoke([{"role": "user", "content": prompt}])
    return {"messages": [response]}

In [None]:
input = {
    "messages": convert_to_messages(
        [
            {
                "role": "user",
                "content": "What does Lilian Weng say about types of reward hacking?",
            },
            {
                "role": "assistant",
                "content": "",
                "tool_calls": [
                    {
                        "id": "1",
                        "name": "retrieve_blog_posts",
                        "args": {"query": "types of reward hacking"},
                    }
                ],
            },
            {
                "role": "tool",
                "content": "reward hacking can be categorized into two types: environment or goal misspecification, and reward tampering",
                "tool_call_id": "1",
            },
        ]
    )
}

response = generate_answer(input)
response["messages"][-1].pretty_print()

# 7. Assemble the graph

- Start with a `generate_query_or_respond`and determine if we need to cal `retriever_tool`
- Route to next step using `tools_condition`:
    - If `generate_query_or_respond` returned `tool_calls`, call `retriever_tool` to retrieve context
    - Otherwise, respond directly to the user
- Grade retrieved document content for relevance to the question (`grade_documents`) and route to next step:
    - If not relevant, rewrite the question using `rewrite_question` and then call `generate_query_or_respond` again
    - If relevant, proceed to `generate_answer` and generate final response using the ToolMessage with the retrieved document context
API Reference: [StateGraph](https://langchain.readthedocs.io/en/latest/modules/agents/agent_types/state_graph.html#stategraph)StateGraph | START | END | ToolNode | tools_condition

In [None]:
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
from langgraph.prebuilt import tools_condition

workflow = StateGraph(MessagesState)

# Define the nodes we will cycle between
workflow.add_node(generate_query_or_respond)
workflow.add_node("retrieve", ToolNode([retriever_tool]))
workflow.add_node(rewrite_question)
workflow.add_node(generate_answer)

workflow.add_edge(START, "generate_query_or_respond")

# Decide whether to retrieve
workflow.add_conditional_edges(
    "generate_query_or_respond",
    # Assess LLM decision (call `retriever_tool` tool or respond to the user)
    tools_condition,
    {
        # Translate the condition outputs to nodes in our graph
        "tools": "retrieve",
        END: END,
    },
)

# Edges taken after the `action` node is called.
workflow.add_conditional_edges(
    "retrieve",
    # Assess agent decision
    grade_documents,
)
workflow.add_edge("generate_answer", END)
workflow.add_edge("rewrite_question", "generate_query_or_respond")

# Compile
graph = workflow.compile()

In [None]:
from IPython.display import Image, display

display(Image(graph.get_graph().draw_mermaid_png()))

# 8. Run the agentic RAG

In [None]:
for chunk in graph.stream(
    {
        "messages": [
            {
                "role": "user",
                "content": "What does Lilian Weng say about types of reward hacking?",
            }
        ]
    }
):
    for node, update in chunk.items():
        print("Update from node", node)
        update["messages"][-1].pretty_print()
        print("\n\n")