In [None]:
!pip install -qU langchain langchain-community langgraph beautifulsoup4 langchain-openai

# Consersational RAG

In many Q&A applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of "memory" of past questions and answers, and some logic for incorporating those into its current thinking.

We will cover two approaches in this section:
1. **Chains**, in which we always execute a retrieval stetp;
2. **Agents**, in which we give an LLM discretion over whether and how to execute a retrieval step (or multiple steps).

## Setup

In [None]:
import os

langchain_api_key = 'your_langchain_api_key_here'  # Replace with your actual LangChain API key
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_API_KEY'] = langchain_api_key

openai_api_key = 'your_openai_api_key_here'  # Replace with your actual OpenAI API key
os.environ['OPENAI_API_KEY'] = openai_api_key

## Chains

First, revisit the Q&A app we built over the LLM Powered Autonmous Agents blog post in the RAG tutorial.

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model='gpt-3.5-turbo')

In [None]:
import bs4
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 1. Load, chunk and index the contents of the blog to create a retriever
loader = WebBaseLoader(
    web_paths=('https://lilianweng.github.io/posts/2023-06-23-agent/',),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=('post-content', 'post-title', 'post-header')
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = InMemoryVectorStore.from_documents(
    documents=splits, embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()


# 2. Incorporate the retriever into a question-answering chain
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ('system', system_prompt),
        ('human', '{input}')
    ]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)



In [None]:
response = rag_chain.invoke({'input': 'what is Task Decomposition?'})
print(response['answer'])

Task decomposition involves breaking down a complex task into smaller and simpler steps to make it more manageable. Techniques like Chain of Thought and Tree of Thoughts help in decomposing hard tasks into multiple smaller tasks by instructing the model to think step by step or exploring multiple reasoning possibilities at each step. Task decomposition can be done using simple prompting with LLM, task-specific instructions, or with human inputs.


Here we hae used the built-in chain constructors `create_stuff_documents_chain` and `create_retrieval_chain`, so that the basic ingredients to our solution are:
1. retriever;
2. prompt;
3. LLM.

### Adding chat history

In a conversational setting, the user query might require conversational context to be understood. We need to update two things about our existing app:
1. **Prompt**: update our prompt to support historical messages as an input.
2. **Contextualizing questions**: add a sub-chain that takes the latest user question and reformulates it in the context of the chat history. This can be thought of simply as building a new "history aware" retriever. Whereas before we had:
  * `query` -> `retriever`
  
  Now we will have:
  * `(query, conversation history)` -> `LLM` -> `rephrased query` -> `retriever`

#### Contextualizing the question

We need to define a sub-chain that takes historical messages and the latest user question, and reformulates the question if it makes reference to any information in the historical information.

We will use a prompt that includes a `MessagesPlaceholder` variable under the name `chat_history`, which allows use to pass in a list of messages to the prompt using the `"chat_history"` input key.

A helper function `create_history_aware_retriever` is applied for this step, which manages the case where `chat_history` is empty, and otherwise applies `prompt | llm | StrOutputParser() | retriever` in sequence. `create_history_aware_retriever` constructs a chain that accepts keys `input` and `chat_history` as input, and has the same output schema as a retriever.

In [None]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder


contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understoo "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ('system', contextualize_q_system_prompt),
        MessagesPlaceholder(variable_name='chat_history'),
        ('human', '{input}'),
    ]
)

history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt,
)

This chain prepends a rephrasing of the input query to our retriever, so that the retrieval incorporates the context of the conversation.

For our full QA chain, we will use `create_stuff_documents_chain` to generate a `question_answer_chain`, with input keys `context`, `chat_history`, and `input` -- it accepts the retrieved context alongside the conversation history and query to generate an answer.

We build our final `rag_chain` with `create_retrieval_chain`. This chain applies the `history_aware_retriever` and `question_answer_chain` in sequence, retaining intermediate outputs such as the retrieved context for convenience. It has input keys `input` and `chat_history`, and includes `input`, `chat_history`, `context`, and `answer` in its output.

In [None]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ('system', system_prompt),
        MessagesPlaceholder(variable_name='chat_history'),
        ('human', '{input}')
    ]
)


question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(
    history_aware_retriever, question_answer_chain,
)

We can try asking a question and a follow-up question that requires contextualization to return a sensible response. Because our chain includes a `"chat_history"`, the caller needs to manage the chat history.

In [None]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

question = "What is Task Decomposition?"
ai_msg1 = rag_chain.invoke({'input': question, 'chat_history': chat_history})
chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_msg1['answer']),
    ]
)

question2 = "What are common ways of doing it?"
ai_msg2 = rag_chain.invoke({'input': question2, 'chat_history': chat_history})

print(ai_msg2['answer'])

Task decomposition can be achieved through various methods such as using prompting techniques like Chain of Thought, providing task-specific instructions, or incorporating human inputs. For example, prompting a Language Model with simple instructions like "Steps for XYZ" or providing specific guidelines like "Write a story outline" can help in breaking down tasks into manageable steps. Additionally, human inputs can also play a role in guiding the decomposition process for complex tasks.


#### Stateful management of chat history

In production, the Q&A application wil usually persist the chat history into a database, and be able to read and update it appropriately.

LangGraph implements a built-in persistence layer, making it ideal for chat applications that support multiple conversational turns. Wrapping our chat model in a minimal LangGraph applciation allows us to automatically persist the message history, simplifying the development of multi-turn applications.

In [None]:
from typing import Sequence

from langchain_core.messages import BaseMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, StateGraph
from langgraph.graph.message import add_messages
from typing_extensions import Annotated, TypedDict


# We define a dict representing the state of the applicaiton.
# This state has the same input and output keys as `rag_chain`.
class State(TypedDict):
    input: str
    chat_history: Annotated[Sequence[BaseMessage], add_messages]
    context: str
    answer: str


# We then define a simple node that runs the `rag_chain`.
# The `return` values of the node update the graph state, so here we just
# update the chat history with the input message and response.
def call_model(state: State):
    response = rag_chain.invoke(state)
    return {
        'chat_history': [
            HumanMessage(state['input']),
            AIMessage(response['answer']),
        ],
        'context': response['context'],
        'answer': response['answer'],
    }


# Our graph consists only of one node:
workflow = StateGraph(state_schema=State)
workflow.add_edge(START, 'model')
workflow.add_node('model', call_model)


# Finally, we compile the graph with a checkpointer object.
# This persists the state, in this case in memory.
memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

This application out-of-the-box supports multiple conversation threads.

In [None]:
config = {'configurable': {'thread_id': 'abc123'}}

result = app.invoke(
    {'input': 'What is Task Decomposition?'},
    config=config,
)

result['answer']

'Task decomposition is a technique used to break down complex tasks into smaller and more manageable steps. This approach helps agents or models tackle difficult problems by dividing them into simpler subtasks. Task decomposition can be achieved through prompting techniques like Chain of Thought or Tree of Thoughts, which guide the model in thinking step by step or exploring multiple reasoning possibilities at each step.'

In [None]:
result = app.invoke(
    {'input': 'What are common ways of doing it?'},
    config=config,
)

result['answer']

'Task decomposition can be done in several common ways:\n1. Using prompting techniques like Chain of Thought (CoT) with simple instructions to break down tasks into smaller steps.\n2. Providing task-specific instructions tailored to the nature of the task, such as asking for a story outline when writing a novel.\n3. Involving human inputs to guide the decomposition process, leveraging external knowledge and expertise to break down complex tasks effectively.'

The conversation history can be inspected via the state of the application:

In [None]:
chat_history = app.get_state(config).values['chat_history']

for message in chat_history:
    message.pretty_print()


What is Task Decomposition?

Task decomposition is a technique used to break down complex tasks into smaller and more manageable steps. This approach helps agents or models tackle difficult problems by dividing them into simpler subtasks. Task decomposition can be achieved through prompting techniques like Chain of Thought or Tree of Thoughts, which guide the model in thinking step by step or exploring multiple reasoning possibilities at each step.

What are common ways of doing it?

Task decomposition can be done in several common ways:
1. Using prompting techniques like Chain of Thought (CoT) with simple instructions to break down tasks into smaller steps.
2. Providing task-specific instructions tailored to the nature of the task, such as asking for a story outline when writing a novel.
3. Involving human inputs to guide the decomposition process, leveraging external knowledge and expertise to break down complex tasks effectively.


### Tying it together

In [16]:
from typing import Sequence

import bs4
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, StateGraph
from langgraph.graph.message import add_messages
from typing_extensions import Annotated, TypedDict

In [None]:
llm = ChatOpenAI(model='gpt-3.5-turbo', temperature=0)


### Construct retriever ###
loader = WebBaseLoader(
    web_paths=('https://lilianweng.github.io/posts/2023-06-23-agent/',),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=('post-content', 'post-title', 'post-header')
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = InMemoryVectorStore.from_documents(
    documents=splits, embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()


### Contextualize questions ###
contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ('system', contextualize_q_system_prompt),
        MessagesPlaceholder(variable_name='chat_history'),
        ('human', '{input}'),
    ]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt,
)



### Answer questions ###
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ('system', system_prompt),
        MessagesPlaceholder(variable_name='chat_history'),
        ('human', '{input}')
    ]
)
question_anwer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(
    history_aware_retriever, question_answer_chain,
)



### Statfully manage chat history ###
class State(TypedDict):
    input: str
    chat_history: Annotated[Sequence[BaseMessage], add_messages]
    context: str
    answer: str


def call_model(state: State):
    response = rag_chain.invoke(state)
    return {
        'chat_history': [
            HumanMessage(state['input']),
            AIMessage(response['answer']),
        ],
        'context': response['context'],
        'answer': response['answer'],
    }


workflow = StateGraph(state_schema=State)
workflow.add_edge(START, 'model')
workflow.add_node('model', call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

In [None]:
config = {"configurable": {"thread_id": "abc123"}}

result = app.invoke(
    {"input": "What is Task Decomposition?"},
    config=config,
)
print(result["answer"])

In [None]:
result = app.invoke(
    {"input": "What is one way of doing it?"},
    config=config,
)
print(result["answer"])

## Agents

Agents leverage the reasoning capabilities of LLMs to make decisions during execution. Using agents allow us to offload some discretion over the retrieval process. Although their behavior is less predictable than chains, they
* generate the input to the retriever directly, without necessarily needing us to explicitly build in contextualization, as we did before;
* can execute multiple retrieval steps in service of a query, or refrain from executing a retrieval step altogether (e.g., in response to a generic greeting from a user).

### Retrieval tool

Agents can access tools and manage their execution. In this case, we will convert our retriever into a LangChain tool to be wielded by the agent:

In [17]:
from langchain.tools.retriever import create_retriever_tool

tool = create_retriever_tool(
    retriever,
    'blog_post_retriever',
    'Searches and returns excerpts from the Autonomous Agents blog post.',
)

tools = [tool]

Tools are LangChain `Runnables`, and implement the usual interface:

In [18]:
tool.invoke('task decomposition')

'Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\n\nFig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The mode

### Agent constructor

Once we have defined the tools and the LLM, we can create the agent.

In [19]:
from langgraph.prebuilt import create_react_agent

agent_executor = create_react_agent(llm, tools)

We can try it out. Note that so far it is not stateful (we still need to add in memory):

In [20]:
query = "What is Task Decomposition?"

for event in agent_executor.stream(
    {'messages': [HumanMessage(content=query)]},
    stream_mode='values',
):
    event['messages'][-1].pretty_print()


What is Task Decomposition?
Tool Calls:
  blog_post_retriever (call_S5ObRo2QbIZ9Dm0cLlTgL4b2)
 Call ID: call_S5ObRo2QbIZ9Dm0cLlTgL4b2
  Args:
    query: Task Decomposition
Name: blog_post_retriever

Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple th

We can again use the built-in persistence to save stateful updates to memory:

In [21]:
from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()

agent_executor = create_react_agent(llm, tools, checkpointer=memory)

If we input a query that does not require a retrieval step:

In [22]:
config = {'configurable': {'thread_id': 'abc123'}}


for event in agent_executor.stream(
    {'messages': [HumanMessage(content="Hi! I'm Bin.")]},
    config=config,
    stream_mode='values',
):
    event['messages'][-1].pretty_print()


Hi! I'm Bin.

Hello Bin! How can I assist you today?


The agent does not execute one.

Further, if we input a query that does require a retrieval step,

In [23]:
query = "What is Task Decomposition?"

for event in agent_executor.stream(
    {'messages': [HumanMessage(content=query)]},
    config=config,
    stream_mode='values',
):
    event['messages'][-1].pretty_print()


What is Task Decomposition?
Tool Calls:
  blog_post_retriever (call_rjLw9bhJFeMRI3AijoXbOn0L)
 Call ID: call_rjLw9bhJFeMRI3AijoXbOn0L
  Args:
    query: Task Decomposition
Name: blog_post_retriever

Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A complicated task usually involves many steps. An agent needs to know what they are and plan ahead.
Task Decomposition#
Chain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple th

The agent generates the input to the tool.

When we continue the conversation,

In [24]:
query = "What according to the blog post are common ways of doing it? redo the search."

for event in agent_executor.stream(
    {'messages': [HumanMessage(content=query)]},
    config=config,
    stream_mode='values',
):
    event['messages'][-1].pretty_print()


What according to the blog post are common ways of doing it? redo the search.
Tool Calls:
  blog_post_retriever (call_Rg3Mw6vqV2OqqOJUkKZcsxwe)
 Call ID: call_Rg3Mw6vqV2OqqOJUkKZcsxwe
  Args:
    query: common ways of task decomposition
Name: blog_post_retriever

Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.
Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.

Fig. 1. Overview of a LLM-powered autonomous agent system.
Component One: Planning#
A com

The agent was able to infer that "it" in our query refers to "task decomposition", and generated a reasonable search query as a result.

### Tying it together

In [None]:
import bs4
from langchain.tools.retriever import create_retriever_tool
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.vectorstores import InMemoryVectorStore
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent

In [None]:
### Set up memory ###
memory = MemorySaver()
### Set up LLM ###
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

### Construct retriever ###
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = InMemoryVectorStore.from_documents(
    documents=splits, embedding=OpenAIEmbeddings()
)
retriever = vectorstore.as_retriever()


### Build retriever tool ###
tool = create_retriever_tool(
    retriever,
    "blog_post_retriever",
    "Searches and returns excerpts from the Autonomous Agents blog post.",
)
tools = [tool]


agent_executor = create_react_agent(llm, tools, checkpointer=memory)