# Agentic RAG Powered by LangChain

In the following notebook, we'll be taking a look at how to build Agentic RAG Applications with LangChain.

We'll be relying on a few great tools to help us do this:

1. LangChain - more specifically LCEL
2. LangGraph
3. LangSmith

Let's get started with a quick overview of LCEL and build a simple RAG chain!

## Dependencies

We'll first install all our required libraries.

In [1]:
%pip install -qU langchain langchain_openai langgraph arxiv duckduckgo-search

Note: you may need to restart the kernel to use updated packages.


In [2]:
%pip install -qU faiss-cpu pymupdf langchain-community wikipedia

Note: you may need to restart the kernel to use updated packages.


## Environment Variables

We'll want to set both our OpenAI API key and our LangSmith environment variables.

In [3]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("OpenAI API Key:")

In [4]:
from uuid import uuid4

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"imd8465 - LangGraph - {uuid4().hex[0:8]}"
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("LangSmith API Key: ")

## Initialize a Simple Chain using LCEL

The first thing we'll do is familiarize ourselves with LCEL and the specific ins and outs of how we can use it!

### Retrieval

First, we'll set up a simple local retriever system that looks at Arxiv papers on the topic of Function Calling.

In [5]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import ArxivLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

docs = ArxivLoader(query="Function Calling", load_max_docs=5).load()

text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=350, chunk_overlap=50
)

chunked_documents = text_splitter.split_documents(docs)

faiss_vectorstore = FAISS.from_documents(
    documents=chunked_documents,
    embedding=OpenAIEmbeddings(model="text-embedding-3-small"),
)

retriever = faiss_vectorstore.as_retriever()

### Augmented

Now that we have our retrieval system ready to rock, we can create our RAG prompt!

In [6]:
from langchain_core.prompts import ChatPromptTemplate

RAG_PROMPT = """\
Use the following context to answer the user's query. If you cannot answer the question, please respond with 'I don't know'.

Question:
{question}

Context:
{context}
"""

rag_prompt = ChatPromptTemplate.from_template(RAG_PROMPT)

### Generation

Let's start by initializing our model. In this case, we'll be using OpenAI's `gpt-3.5-turbo` model.

In [7]:
from langchain_openai import ChatOpenAI

openai_chat_model = ChatOpenAI(model="gpt-3.5-turbo")

### LCEL RAG Chain

Now that we have our R, A, and G components - let's build our simple RAG chain!

In [8]:
from operator import itemgetter
from langchain.schema.output_parser import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough

retrieval_augmented_generation_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": rag_prompt | openai_chat_model, "context": itemgetter("context")}
)

And let's test it out!

In [35]:
await retrieval_augmented_generation_chain.ainvoke({"question" : "What is Function Calling in the context of AI?"})

{'response': AIMessage(content='Function Calling in the context of AI refers to the process where large language models (LLMs) generate and execute calls to interface with external tools and data sources. This interaction is typically synchronous, meaning each call blocks LLM inference until completion, limiting operational efficiency and concurrent function execution. To address this limitation, the concept of asynchronous LLM function calling has been introduced, allowing LLMs to generate and execute function calls concurrently. AsyncLM, a system for asynchronous LLM function calling, incorporates interrupt mechanisms to notify the LLM when function calls return, enabling more efficient task completion.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 117, 'prompt_tokens': 2660, 'total_tokens': 2777, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, '

## LangGraph - Building Cyclic Applications with LangChain

LangGraph is a tool that leverages LangChain Expression Language to build coordinated multi-actor and stateful applications that includes cyclic behaviour.

### Why Cycles?

In essence, we can think of a cycle in our graph as a more robust and customizable loop. It allows us to keep our application agent-forward while still giving the powerful functionality of traditional loops.

Due to the inclusion of cycles over loops, we can also compose rather complex flows through our graph in a much more readable and natural fashion. Effetively allowing us to recreate appliation flowcharts in code in an almost 1-to-1 fashion.

### Why LangGraph?

Beyond the agent-forward approach - we can easily compose and combine traditional "DAG" (directed acyclic graph) chains with powerful cyclic behaviour due to the tight integration with LCEL. This means it's a natural extension to LangChain's core offerings!

##  Creating our Tool Belt

As is usually the case, we'll want to equip our agent with a toolbelt to help answer questions and add external knowledge.

There's a tonne of tools in the [LangChain Community Repo](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools) but we'll stick to a couple just so we can observe the cyclic nature of LangGraph in action!

We'll leverage:

- [Duck Duck Go Web Search](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/ddg_search)
- [Arxiv](https://github.com/langchain-ai/langchain/tree/master/libs/community/langchain_community/tools/arxiv)
- [Wikipedia](https://python.langchain.com/docs/integrations/tools/wikipedia/)

In [116]:
from langchain_community.tools.ddg_search import DuckDuckGoSearchRun
from langchain_community.tools.arxiv.tool import ArxivQueryRun
from langchain_community.tools.wikipedia.tool import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper

tool_belt = [
    DuckDuckGoSearchRun(),
    ArxivQueryRun(),
    WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper())
]

### Actioning with Tools

Now that we've created our tool belt - we need to create a process that will let us leverage them when we need them.

We'll use the built-in [`ToolExecutor`](https://github.com/langchain-ai/langgraph/blob/main/langgraph/prebuilt/tool_executor.py) to do so.

In [117]:
from langgraph.prebuilt import ToolNode

tool_executor = ToolNode(tool_belt)

### Model

Now we can set-up our model! We'll leverage the familiar OpenAI model suite for this example - but it's not *necessary* to use with LangGraph. LangGraph supports all models - though you might not find success with smaller models - as such, they recommend you stick with:

- OpenAI's GPT-3.5 and GPT-4
- Anthropic's Claude
- Google's Gemini

> NOTE: Because we're leveraging the OpenAI function calling API - we'll need to use OpenAI *for this specific example* (or any other service that exposes an OpenAI-style function calling API.

In [118]:
from langchain_openai import ChatOpenAI

model = ChatOpenAI(temperature=0)

Now that we have our model set-up, let's "put on the tool belt", which is to say: We'll bind our LangChain formatted tools to the model in an OpenAI function calling format.

In [119]:
from langchain_core.utils.function_calling import convert_to_openai_function

functions = [convert_to_openai_function(t) for t in tool_belt]
model = model.bind_tools(functions)

## Putting the State in Stateful

Earlier we used this phrasing:

`coordinated multi-actor and stateful applications`

So what does that "stateful" mean?

To put it simply - we want to have some kind of object which we can pass around our application that holds information about what the current situation (state) is. Since our system will be constructed of many parts moving in a coordinated fashion - we want to be able to ensure we have some commonly understood idea of that state.

LangGraph leverages a `StatefulGraph` which uses an `AgentState` object to pass information between the various nodes of the graph.

There are more options than what we'll see below - but this `AgentState` object is one that is stored in a `TypedDict` with the key `messages` and the value is a `Sequence` of `BaseMessages` that will be appended to whenever the state changes.

Let's think about a simple example to help understand exactly what this means (we'll simplify a great deal to try and clearly communicate what state is doing):

1. We initialize our state object:
  - `{"messages" : []}`
2. Our user submits a query to our application.
  - New State: `HumanMessage(#1)`
  - `{"messages" : [HumanMessage(#1)}`
3. We pass our state object to an Agent node which is able to read the current state. It will use the last `HumanMessage` as input. It gets some kind of output which it will add to the state.
  - New State: `AgentMessage(#1, additional_kwargs {"function_call" : "WebSearchTool"})`
  - `{"messages" : [HumanMessage(#1), AgentMessage(#1, ...)]}`
4. We pass our state object to a "conditional node" (more on this later) which reads the last state to determine if we need to use a tool - which it can determine properly because of our provided object!

In [143]:
from typing import TypedDict, Annotated, Sequence
import operator
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
  messages: Annotated[Sequence[BaseMessage], operator.add]

## It's Graphing Time!

Now that we have state, and we have tools, and we have an LLM - we can finally start making our graph!

Let's take a second to refresh ourselves about what a graph is in this context.

Graphs, also called networks in some circles, are a collection of connected objects.

The objects in question are typically called nodes, or vertices, and the connections are called edges.

Let's look at a simple graph.

![image](https://i.imgur.com/2NFLnIc.png)

Here, we're using the coloured circles to represent the nodes and the yellow lines to represent the edges. In this case, we're looking at a fully connected graph - where each node is connected by an edge to each other node.

If we were to think about nodes in the context of LangGraph - we would think of a function, or an LCEL runnable.

If we were to think about edges in the context of LangGraph - we might think of them as "paths to take" or "where to pass our state object next".

Let's create some nodes and expand on our diagram.

> NOTE: Due to the tight integration with LCEL - we can comfortably create our nodes in an async fashion!

In [145]:
from langgraph.prebuilt import ToolInvocation
import json
from langchain_core.messages import FunctionMessage

def call_model(state):
  messages = state["messages"]
  response = model.invoke(messages)
  return {"messages" : [response]}

# note for future: new format for kwargs: additional_kwargs={'tool_calls': [{'id': 'call_84K2HOMGkf6Cot7MF4OUeyKm', 'function': {'arguments': '{"query":"Function Calling in Large Language Models"}', 'name': 'wikipedia'}, 'type': 'function'}], 'refusal': None},
def call_tool(state):
  last_message = state["messages"][-1]

  action = ToolInvocation(
      tool=last_message.additional_kwargs["function_call"]["name"],
      tool_input=json.loads(
          last_message.additional_kwargs["function_call"]["arguments"]
      )
  )

  response = tool_executor.invoke(action)

  function_message = FunctionMessage(content=str(response), name=action.tool)

  return {"messages" : [function_message]}

Now we have two total nodes. We have:

- `call_model` is a node that will...well...call the model
- `call_tool` is a node which will call a tool

Let's start adding nodes! We'll update our diagram along the way to keep track of what this looks like!


In [146]:
from langgraph.graph import StateGraph, END

workflow = StateGraph(AgentState)

workflow.add_node("agent", call_model)
workflow.add_node("action", call_tool)

<langgraph.graph.state.StateGraph at 0x30d9e9e50>

Let's look at what we have so far:

![image](https://i.imgur.com/md7inqG.png)

Next, we'll add our entrypoint. All our entrypoint does is indicate which node is called first.

In [147]:
workflow.set_entry_point("agent")

<langgraph.graph.state.StateGraph at 0x30d9e9e50>

![image](https://i.imgur.com/wNixpJe.png)

Now we want to build a "conditional edge" which will use the output state of a node to determine which path to follow.

We can help conceptualize this by thinking of our conditional edge as a conditional in a flowchart!

Notice how our function simply checks if there is a "function_call" kwarg present.

Then we create an edge where the origin node is our agent node and our destination node is *either* the action node or the END (finish the graph).

It's important to highlight that the dictionary passed in as the third parameter (the mapping) should be created with the possible outputs of our conditional function in mind. In this case `should_continue` outputs either `"end"` or `"continue"` which are subsequently mapped to the action node or the END node.

In [148]:
def should_continue(state):
  last_message = state["messages"][-1]

  # old: additional_kwargs={'function_call': {'arguments': '{"query":"RAG in the context of Large Language Models"}', 'name': 'duckduckgo_search'}},
  # new: additional_kwargs={'tool_calls': [{'id': 'call_84K2HOMGkf6Cot7MF4OUeyKm', 'function': {'arguments': '{"query":"Function Calling in Large Language Models"}', 'name': 'wikipedia'}, 'type': 'function'}], 'refusal': None},
  if "function_call" not in last_message.additional_kwargs:
    return "end"

  return "continue"

workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "continue" : "action",
        "end" : END
    }
)

<langgraph.graph.state.StateGraph at 0x30d9e9e50>

Let's visualize what this looks like.

![image](https://i.imgur.com/8ZNwKI5.png)

Finally, we can add our last edge which will connect our action node to our agent node. This is because we *always* want our action node (which is used to call our tools) to return its output to our agent!

In [149]:
workflow.add_edge("action", "agent")

<langgraph.graph.state.StateGraph at 0x30d9e9e50>

Let's look at the final visualization.

![image](https://i.imgur.com/NWO7usO.png)

All that's left to do now is to compile our workflow - and we're off!

In [150]:
app = workflow.compile()

## Using Our Graph

Now that we've created and compiled our graph - we can call it *just as we'd call any other* `Runnable`!

Let's try out a few examples to see how it fairs:

In [151]:
from langchain_core.messages import HumanMessage

inputs = {"messages" : [HumanMessage(content="What is Function Calling in the context of Large Language Models? When did it break onto the scene?")]}

app.invoke(inputs)

{'messages': [HumanMessage(content='What is Function Calling in the context of Large Language Models? When did it break onto the scene?', additional_kwargs={}, response_metadata={}),
  AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_XmjKHv3Le6MyyUm5df9QwX5S', 'function': {'arguments': '{"query":"Function Calling in Large Language Models"}', 'name': 'wikipedia'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 231, 'total_tokens': 250, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d943b0b3-3107-4eeb-a135-0c7a0f2a29b9-0', tool_calls=[{'name': 'wikipedia', 'args': {'query': 'Function Calling in Large Language Models'}, 'id'

Let's look at what happened:

1. Our state object was populated with our request
2. The state object was passed into our entry point (agent node) and the agent node added an `AIMessage` to the state object and passed it along the conditional edge
3. The conditional edge received the state object, found the "function_call" `additional_kwarg`, and sent the state object to the action node
4. The action node added the response from the OpenAI function calling endpoint to the state object and passed it along the edge to the agent node
5. The agent node added a response to the state object and passed it along the conditional edge
6. The conditional edge received the state object, could not find the "function_call" `additional_kwarg` and passed the state object to END where we see it output in the cell above!

## Agentic RAG with LangGraph and LCEL

Let's see what our final graph will look like:

![image](https://i.imgur.com/aL4Qs0E.png)

Now that we have our two major components, let's create our final agent!

Since we can add any LCEL `Runnable` to our graph as a node directly - we can add our RAG chain as a node with no additional steps!

However, since our `Runnable` was set-up without knowledge of our state object - we need to add some pre/post processing steps to ensure it fits into the flow!

> NOTE: There is only one cycle in this graph, as we cannot reach the RAG chain after the initial attempt to use it.









In [152]:
def convert_state_to_query(state_object):
  return {"question" : state_object["messages"][-1].content}

def convert_response_to_state(response):
  return {"messages" : [response["response"]]}

langgraph_node_rag_chain = convert_state_to_query | retrieval_augmented_generation_chain | convert_response_to_state

Let's test our our new chain and verify it works as expected!

> NOTE: We are still able to take advantage of the benefits of built in `async` provided by LCEL with this chain!

In [153]:
await langgraph_node_rag_chain.ainvoke(inputs)

{'messages': [AIMessage(content='Function calling in the context of Large Language Models (LLMs) refers to how LLMs use function calls to interact with external tools and data sources. The current approach to LLM function calling is synchronous, where each call blocks LLM inference, limiting LLM operation and concurrent function execution. The introduction of AsyncLM, proposed in a recent work, enables asynchronous LLM function calling, allowing LLMs to generate and execute function calls concurrently. AsyncLM introduces an interrupt mechanism to notify the LLM in-flight asynchronously when function calls return, improving operational efficiency and reducing end-to-end task completion latency. AsyncLM broke onto the scene in 2024, as described in the document.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 139, 'prompt_tokens': 2732, 'total_tokens': 2871, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'r

Now we'll add our nodes - notice that we're including our newly built LCEL component as a node called `first_action`.

The basic idea is that we will use our private RAG set-up - and if that is deemed sufficient, we will return that response to our user; and if not we will augment our response will the other tools!

In [154]:
rag_agent = StateGraph(AgentState)

rag_agent.add_node("agent", call_model)
rag_agent.add_node("action", call_tool)
rag_agent.add_node("first_action", langgraph_node_rag_chain)

<langgraph.graph.state.StateGraph at 0x30d76d750>

Let's set our new entry point to be our RAG pipeline!

In [155]:
rag_agent.set_entry_point("first_action")

<langgraph.graph.state.StateGraph at 0x30d76d750>

Because we wish to have this conditional behaviour ("is the question fully answered by the RAG pipeline?") we'll need to add a new conditional node!

We'll start by describing a process by which we can ask the question: "Is this question fully answered by the response?"

This will let us boil down our paths to "Yes, it is fully answered", and "No, it is not fully answered".

The function below will do exactly that by leaning on Pydantic and GPT-4!

> NOTE: We now have an LCEL component as a node, and we have a chain *inside a function in as a node*. LangGraph is an extremely flexible framework!

In [156]:
from langchain.prompts import PromptTemplate
from pydantic import BaseModel, Field
from langchain.output_parsers.openai_tools import PydanticToolsParser
from langchain_core.utils.function_calling import convert_to_openai_tool

def is_fully_answered(state):

  ### Extract the question and response from our RAG pipeline
  question = state["messages"][0].content
  answer = state["messages"][-1].content

  ### Create a Pydantic model to capture our LLMs response
  class answered(BaseModel):
    binary_score: str = Field(description="Fully answered: 'yes' or 'no'")

  ### A powerful reasoning model will ensure we can answer our question properly
  model = ChatOpenAI(model="gpt-4-turbo-preview", temperature=0)

  ### Create and bind our tool to our model
  answered_tool = convert_to_openai_tool(answered)

  model = model.bind(
      tools=[answered_tool],
      tool_choice={"type" : "function", "function" : {"name" : "answered"}}
  )

  ### We'll want to parse the output into a usable format
  parser_tool = PydanticToolsParser(tools=[answered])

  prompt = PromptTemplate(
      template="""You will determine if the question is fully answered by the response.\n
      Question:
      {question}

      Response:
      {answer}

      You will respond with either 'yes' or 'no'.""",
      input_variables=["question", "answer"])

  ### Classic LCEL chain!
  fully_answered_chain = prompt | model | parser_tool

  response = fully_answered_chain.invoke({"question" : question, "answer" : answer})

  if response[0].binary_score == "no":
    return "continue"

  return "end"

Let's map and add that conditional edge now!

In [157]:
rag_agent.add_conditional_edges(
    "first_action",
    is_fully_answered,
    {
        "continue" : "agent",
        "end" : END
    }
)

<langgraph.graph.state.StateGraph at 0x30d76d750>

We'll still use our original prompt to determine if we need to use more tools or not.

In [158]:
def should_continue(state):
  last_message = state["messages"][-1]

  if "function_call" not in last_message.additional_kwargs:
    return "end"

  return "continue"

rag_agent.add_conditional_edges(
    "agent",
    should_continue,
    {
        "continue" : "action",
        "end" : END
    }
)

<langgraph.graph.state.StateGraph at 0x30d76d750>

Let's define the final edge.

In [159]:
rag_agent.add_edge("action", "agent")

<langgraph.graph.state.StateGraph at 0x30d76d750>

Time to compile!

In [160]:
rag_agent_app = rag_agent.compile()

Let's try it out!

In [161]:
question = "What is Function Calling in the Context of LLM?"

inputs = {"messages" : [HumanMessage(content=question)]}

rag_agent_app.invoke(inputs)

{'messages': [HumanMessage(content='What is Function Calling in the Context of LLM?', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Function Calling in the context of LLM refers to how large language models utilize function calls to interact with external tools and data sources. The current approach to LLM function calling is synchronous, where each call blocks LLM inference. To address this limitation, an asynchronous system called AsyncLM has been proposed to enable LLMs to generate and execute function calls concurrently. Instead of waiting for each call's completion, AsyncLM introduces an interrupt mechanism to asynchronously notify the LLM in-flight when function calls return. This improves LLM's operational efficiency and reduces end-to-end task completion latency compared to synchronous function calling.", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 122, 'prompt_tokens': 2704, 'total_tokens': 2826, 'completion_

Notice how we didn't enter into our tool cycle as the query was fully answered by our baseline RAG system!

Let's try another example!

In [162]:
question = "What is Function Calling in the Context of LLM and when it was introduced?"

inputs = {"messages" : [HumanMessage(content=question)]}

rag_agent_app.invoke(inputs)

{'messages': [HumanMessage(content='What is Function Calling in the Context of LLM and when it was introduced?', additional_kwargs={}, response_metadata={}),
  AIMessage(content="Function Calling in the context of LLM refers to the capability that allows large language models to access external tools and data sources by generating and executing function calls. The traditional approach to LLM function calling is synchronous, where each call blocks LLM inference until the function returns, limiting operational efficiency and concurrent function execution. Asynchronous LLM function calling was introduced in a work called AsyncLM, which enables LLMs to generate and execute function calls concurrently and introduces an interrupt mechanism to notify the LLM asynchronously when function calls return. AsyncLM was proposed to improve LLM's operational efficiency by reducing end-to-end task completion latency compared to synchronous function calling. It was introduced in 2024.", additional_kwarg