# 创建一个拥有上下文记忆的RAG 链和agent应用 | 🦜️🔗 LangChain

[https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/](https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/)

在许多问答形式的应用程序中，允许用户进行多轮对话，这意味着应用程序需要记忆过去问题和答案，并且按照一定的方法将它们整合到当前对话中。

在本指南中，我们着重**添加整合历史消息的逻辑**。有关聊天历史管理的更多细节，请参阅[这里](https://python.langchain.com/v0.2/docs/how_to/message_history/)。

我们将介绍两种方法：

1. 链式方法，其中我们总是执行检索步骤；
2. agent方法，在这种方法中，我们让LLM自行决定是否以及如何执行检索步骤（或多个步骤）。

对于外部知识来源，我们将使用同一篇文章，来自[Lilian Weng的LLM动力自主代理](https://lilianweng.github.io/posts/2023-06-23-agent/)博客，来自[RAG教程](https://python.langchain.com/v0.2/docs/tutorials/rag/)。

## 设置

### 依赖

我们将在本教程中使用OpenAI嵌入模型和Chroma矢量存储，但这里演示的所有内容都可以使用langchain提供的任何[嵌入](https://python.langchain.com/v0.2/docs/concepts/#embedding-models)模型，[VectorStore](https://python.langchain.com/v0.2/docs/concepts/#vectorstores)向量存储或[Retriever](https://python.langchain.com/v0.2/docs/concepts/#retrievers)检索器。

In [1]:
pip install --upgrade --quiet langchain langchain-community langchainhub langchain-chroma bs4 langchain-openai langgraph

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.2/974.2 kB[0m [31m8.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m38.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.6/85.6 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.5/315.5 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.2/125.2 kB[0m [31m4.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m526.8/526.8 kB[0m [31m12.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m3.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m325.5/325.5 kB[0m [31m14.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━

In [None]:
pip list

Package                                  Version
---------------------------------------- ---------------------
absl-py                                  1.4.0
aiohttp                                  3.9.5
aiosignal                                1.3.1
alabaster                                0.7.16
albumentations                           1.3.1
altair                                   4.2.2
annotated-types                          0.7.0
anyio                                    3.7.1
argon2-cffi                              23.1.0
argon2-cffi-bindings                     21.2.0
array_record                             0.5.1
arviz                                    0.15.1
asgiref                                  3.8.1
astropy                                  5.3.4
astunparse                               1.6.3
async-timeout                            4.0.3
atpublic                                 4.1.0
attrs                                    23.2.0
audioread                            

我们需要设置环境变量`OPENAI_API_KEY`，可以直接设置，也可以从`.env`文件中加载，方法如下：

In [2]:
import os
from google.colab import userdata
os.environ["OPENAI_API_KEY"] = userdata.get('OPENAI_API_KEY')
os.environ["OPENAI_API_BASE"] = userdata.get('OPENAI_API_BASE')
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANGCHAIN_API_KEY')

### LangSmith

使用LangChain构建的许多应用程序将都包含多个步骤和多次调用LLM调用。随着这些应用程序变得越来越复杂，能够检查chain或agent内部发生的细节变得至关重要。我们可以使用[LangSmith](https://smith.langchain.com/)查看应用程序的内部细节。

请注意，LangSmith并非必需，它只是在我们开发调试应用的时候非常有用。如果想使用可以在[官网](https://smith.langchain.com/)注册后申请秘钥，每个月都会有一定的免费使用额度，足够我们学习和测试，将key设置在的环境变量中就可以轻松使用LangSmith，

## Chains[​](https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/#chains)

首先让我们看一下[上一讲](https://www.notion.so/RAG-LangChain-08a53958496f4d20913f855b92559c3c?pvs=21)提到的问答应用，本地知识库的文章还是使用[LLM Powered Autonomous Agents](https://lilianweng.github.io/posts/2023-06-23-agent/) 这篇文章。

In [3]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-3.5-turbo-0125")


In [4]:
import bs4
from langchain import hub
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# 1. Load, chunk and index the contents of the blog to create a retriever.
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

# 2. Incorporate the retriever into a question-answering chain.
system_prompt = (
"You are an assistant for question-answering tasks. "
"Use the following pieces of retrieved context to answer "
"the question. If you don't know the answer, say that you "
"don't know. Use three sentences maximum and keep the "
"answer concise."
"\n\n"
"{context}"
)

prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt),
("human", "{input}"),
]
)

question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)



**API 调用:**[create_retrieval_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html) | [create_stuff_documents_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html) | [WebBaseLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.web_base.WebBaseLoader.html) | [ChatPromptTemplate](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html) | [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) | [RecursiveCharacterTextSplitter](https://api.python.langchain.com/en/latest/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html)

In [None]:
response = rag_chain.invoke({"input": "What is Task Decomposition?"})
response["answer"]

'Task decomposition is the process of breaking down a complex task into smaller and more manageable subtasks. This can be done using techniques such as Chain of Thought (CoT) or Tree of Thoughts, which help the agent to think step by step and explore multiple reasoning possibilities at each step. Task decomposition can be carried out by language model models with simple prompting, task-specific instructions, or human inputs.'

```
"Task decomposition involves breaking down complex tasks into smaller and simpler steps to make them more manageable for an agent or model. This process helps in guiding the agent through the various subgoals required to achieve the overall task efficiently. Different techniques like Chain of Thought and Tree of Thoughts can be used to decompose tasks into step-by-step processes, enhancing performance and understanding of the model's thinking process."
```

我们使用了内置的chain构造函数`create_stuff_documents_chain`和`create_retrieval_chain`， `rag_chain` 的组成成员：

1. retriever检索器；
2. prompt提示词模板；
3. LLM。

这将简化合并聊天记录的过程。

### 添加历史对话

上面我们构建的链是使用输入问题来检索知识库相关的上下文，但在对话环境中，用户的问题可能是基于对话的上下文才。例如：

> Human: "What is Task Decomposition?"
>
>
> AI: "Task decomposition involves breaking down complex tasks into smaller and simpler steps to make them more manageable for an agent or model."
>
> Human: "What are common ways of doing it?"
>

为了理解第二个问题，我们的应用需要理解 "it" 代指的是 "Task Decomposition."

We'll need to update two things about our existing app:

在我们现有的应用代码中我们需要更改两块内容：

1. **Prompt**: 更新我们的提示词模板，去支持历史消息作为输入。
2. **Contextualizing questions（将问题放在上下文中重新表述）**: 添加一个子链，它获取最新的用户问题，并将其放在聊天记录的上下文中重新表述。这可以简单地被看作是构建一个新的“历史对话”的检索器。

之前的流程：

- `query` -> `retriever`

之后的流程：

- `(query, conversation history)` -> `LLM` -> `rephrased query` -> `retriever`

### **将问题放在上下文中重新表述**

首先，我们需要定义一个子链，它接收历史消息和最新用户问题，并且如果问题中涉及历史信息，就重新表述问题。

我们将需要传入一个包含名为“chat_history”的 `MessagesPlaceholder` 提示词模板变量。这样，我们可以使用“chat_history”输入键将消息列表传递给提示词模板，并且插入这些消息在系统消息之后和最新问题之前。

我们在代码中使用了 [`create_history_aware_retriever`](https://api.python.langchain.com/en/latest/chains/langchain.chains.history_aware_retriever.create_history_aware_retriever.html) 函数，组成的retriever链会依次调用 `prompt | llm | StrOutputParser() | retriever`。调用链需要传入 `input` 和 `chat_history` 参数，他的输出形式与`retriever`相同。

In [5]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
"Given a chat history and the latest user question "
"which might reference context in the chat history, "
"formulate a standalone question which can be understood "
"without the chat history. Do NOT answer the question, "
"just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
[
("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

**API 调用:**[create_history_aware_retriever](https://api.python.langchain.com/en/latest/chains/langchain.chains.history_aware_retriever.create_history_aware_retriever.html) | [MessagesPlaceholder](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.chat.MessagesPlaceholder.html)

这个链中，在执行本地知识库检索器的前面，添加根据历史对话重新生成的问题表述，以便检索过程能够整合对话的上下文。

现在我们可以建立完整的问答链。只需要更新检索器为我们的新`history_aware_retriever`。

我们还是使用 [`create_stuff_documents_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html) 来生成一个 `question_answer_chain`，这个链其实只需要关注模型输入输出的内容，关于知识库具体的查询逻辑并不关心。需要传入的参数：知识库检索上下文 `context`, 历史对话`chat_history` 和输入问题 `input` 。

我们使用 [`create_retrieval_chain`](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html) 方法构建最终的 `rag_chain`。构建方法需要传入`history_aware_retriever`和`question_answer_chain` 。调用`rag_chain` 时需要传入：问题`input`和历史对话`chat_history`，输出包括：问题`input`、历史对话`chat_history`、知识库检索到的上下文`context`和最终回答 `answer`。

In [6]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

qa_prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt),
        MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)

question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

**API 调用:**[create_retrieval_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html) | [create_stuff_documents_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html)

让我们尝试调用一下。下面我们提出一个问题和一个需要上下文的后续问题，看看是否能够返回正确的回答。

In [7]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

question = "What is Task Decomposition?"
ai_msg_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})
chat_history.extend(
[
        HumanMessage(content=question),
        AIMessage(content=ai_msg_1["answer"]),
]
)

second_question = "What are common ways of doing it?"
ai_msg_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})

print(ai_msg_2["answer"])


Task decomposition can be achieved through several common methods, including:
1. Using Language Model (LLM) with simple prompting like "Steps for XYZ" or asking for subgoals.
2. Providing task-specific instructions tailored to the desired outcome, such as "Write a story outline" for novel writing.
3. Incorporating human inputs to guide the decomposition process effectively.



langsmith

### 聊天记录状态管理

上面我们介绍了如何向 `rag_chain` 中加入历史输出中，但我们仍然是手动更新聊天记录。在一个真正的问答应用程序中，我们希望有一种持久化聊天记录的方式，并且有一种自动插入和更新它的方式。

这里我们可以使用：

- [BaseChatMessageHistory](https://api.python.langchain.com/en/latest/langchain_api_reference.html#module-langchain.memory): 保存聊天的历史记录
- [RunnableWithMessageHistory](https://python.langchain.com/v0.2/docs/how_to/message_history/): 将 `BaseChatMessageHistory` 包装成可以加入`rag_chain` 链中的对象，实现自动将聊天记录注入到提示词模板，并在每次调用后更新聊天记录。

要详细了解如何添加的历史消息并加入链中的，可以查看具体的官方文档[How to add message history (memory)](https://python.langchain.com/v0.2/docs/how_to/message_history/)。

以下，我们使用将聊天记录存储在一个简单的字典 `store` 中。LangChain 也支持与Redis和其他技术的内存集成。

接下来，我们定义一个`RunnableWithMessageHistory`的实例来帮助管理聊天记录。他需要一个配置参数，使用一个键（默认为“session_id”）来指定要获取的对话历史，并将其添加到输入的问题之前，同时将模型的输出追加到相同的对话历史中。

In [8]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
  if session_id not in store:
    store[session_id] = ChatMessageHistory()
  return store[session_id]

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)


**API 集成:**[ChatMessageHistory](https://api.python.langchain.com/en/latest/chat_history/langchain_core.chat_history.ChatMessageHistory.html) | [BaseChatMessageHistory](https://api.python.langchain.com/en/latest/chat_history/langchain_core.chat_history.BaseChatMessageHistory.html) | [RunnableWithMessageHistory](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html)

In [None]:
conversational_rag_chain.invoke(
{"input": "What is Task Decomposition?"},
config={
  "configurable": {"session_id": "abc123"}
},  # constructs a key "abc123" in `store`.
)["answer"]




'Task decomposition is the process of breaking down complex tasks into smaller and simpler steps to make them more manageable for an agent or model. It involves transforming big tasks into multiple smaller tasks, allowing for a more systematic approach to problem-solving. Task decomposition can be achieved through techniques like Chain of Thought (CoT) or Tree of Thoughts, which help in organizing the steps needed to accomplish a larger goal.'

In [None]:
conversational_rag_chain.invoke(
{"input": "What are common ways of doing it?"},
config={"configurable": {"session_id": "abc123"}},
)["answer"]




'Task decomposition can be done in several common ways:\n\n1. Using techniques like Chain of Thought (CoT) or Tree of Thoughts to break down complex tasks into smaller, more manageable steps.\n2. Providing simple prompts to guide the decomposition process, such as asking for subgoals or steps needed to achieve a specific task.\n3. Utilizing task-specific instructions tailored to the nature of the task, such as requesting a story outline for writing a novel.'

In [None]:
for message in store["abc123"].messages:
  if isinstance(message, AIMessage):
    prefix = "AI"
  else:
    prefix = "User"

  print(f"{prefix}: {message.content}\n")


User: What is Task Decomposition?

AI: Task decomposition is the process of breaking down complex tasks into smaller and simpler steps to make them more manageable for an agent or model. It involves transforming big tasks into multiple smaller tasks, allowing for a more systematic approach to problem-solving. Task decomposition can be achieved through techniques like Chain of Thought (CoT) or Tree of Thoughts, which help in organizing the steps needed to accomplish a larger goal.

User: What are common ways of doing it?

AI: Task decomposition can be done in several common ways:

1. Using techniques like Chain of Thought (CoT) or Tree of Thoughts to break down complex tasks into smaller, more manageable steps.
2. Providing simple prompts to guide the decomposition process, such as asking for subgoals or steps needed to achieve a specific task.
3. Utilizing task-specific instructions tailored to the nature of the task, such as requesting a story outline for writing a novel.




### 整合

![https://python.langchain.com/v0.2/assets/images/conversational_retrieval_chain-5c7a96abe29e582bc575a0a0d63f86b0.png](https://python.langchain.com/v0.2/assets/images/conversational_retrieval_chain-5c7a96abe29e582bc575a0a0d63f86b0.png)

为了方便查看，我们整合一下上面的代码：

In [None]:
import bs4
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

### Construct retriever ###
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

### Contextualize question ###
contextualize_q_system_prompt = (
"Given a chat history and the latest user question "
"which might reference context in the chat history, "
"formulate a standalone question which can be understood "
"without the chat history. Do NOT answer the question, "
"just reformulate it if needed and otherwise return it as is."
)
contextualize_q_prompt = ChatPromptTemplate.from_messages(
[
("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)
history_aware_retriever = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

### Answer question ###
system_prompt = (
"You are an assistant for question-answering tasks. "
"Use the following pieces of retrieved context to answer "
"the question. If you don't know the answer, say that you "
"don't know. Use three sentences maximum and keep the "
"answer concise."
"\n\n"
"{context}"
)
qa_prompt = ChatPromptTemplate.from_messages(
[
("system", system_prompt),
        MessagesPlaceholder("chat_history"),
("human", "{input}"),
]
)
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

### Statefully manage chat history ###
store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
  if session_id not in store:
        store[session_id] = ChatMessageHistory()
  return store[session_id]

conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)


**API 调用:**[create_history_aware_retriever](https://api.python.langchain.com/en/latest/chains/langchain.chains.history_aware_retriever.create_history_aware_retriever.html) | [create_retrieval_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.retrieval.create_retrieval_chain.html) | [create_stuff_documents_chain](https://api.python.langchain.com/en/latest/chains/langchain.chains.combine_documents.stuff.create_stuff_documents_chain.html) | [ChatMessageHistory](https://api.python.langchain.com/en/latest/chat_history/langchain_core.chat_history.ChatMessageHistory.html) | [WebBaseLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.web_base.WebBaseLoader.html) | [BaseChatMessageHistory](https://api.python.langchain.com/en/latest/chat_history/langchain_core.chat_history.BaseChatMessageHistory.html) | [ChatPromptTemplate](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html) | [MessagesPlaceholder](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.chat.MessagesPlaceholder.html) | [RunnableWithMessageHistory](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html) | [ChatOpenAI](https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html) | [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) | [RecursiveCharacterTextSplitter](https://api.python.langchain.com/en/latest/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html)

In [None]:
conversational_rag_chain.invoke(
{"input": "What is Task Decomposition?"},
    config={
"configurable": {"session_id": "abc123"}
},  # constructs a key "abc123" in `store`.
)["answer"]



'Task decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves transforming big tasks into multiple manageable tasks to facilitate problem-solving. This process can be done through prompting techniques like Chain of Thought or Tree of Thoughts, which help agents plan and execute tasks effectively.'

In [None]:
conversational_rag_chain.invoke(
{"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": "abc123"}},
)["answer"]



'Task decomposition can be achieved through various methods such as using prompting techniques like Chain of Thought or Tree of Thoughts, which guide models to break down tasks into smaller steps. Additionally, task-specific instructions can be provided to help agents understand how to decompose a task effectively, such as asking them to "Write a story outline" for writing a novel. Human inputs can also be used to decompose tasks into manageable components.'

上面我们搭建好了一个拥有记忆功能并检索本地知识库的应用，但是这个问答应用不管我们询问什么问题都会检索本地知识库，其实我们在实际的使用过程中可能只有某几个相关的问题才有必要使用本地知识库检索，那我们能不能让大模型自己判断回答我们的问题是不是需要执行检索？

## Agents 智能体[](https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/#agents)

Agents智能体 可以利用LLM的推理能力在执行过程中做出决策。使用智能体可以让我们在检索过程中由大模型自主判断是否需要检索。尽管大模型的行为与定义好的链相比更加不可预测，但是这也提供了一些优势。

- Agents智能体直接生成输入以供检索器使用，而不一定需要我们明确构建上下文;
- 智能体可以根据情况查询执行多个检索步骤，或完全不执行检索步骤（例如，响应用户的问候语）。

### Retrieval tool 检索器工具[](https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/#retrieval-tool)

Agents 可以访问并使用工具，我们将上面的本地知识库检索器 `retriever` 封装成智能体可以使用的工具。

In [9]:
from langchain.tools.retriever import create_retriever_tool

tool = create_retriever_tool(
    retriever,
"blog_post_retriever",
"Searches and returns excerpts from the Autonomous Agents blog post.",
)
tools = [tool]

**API 调用:**[create_retriever_tool](https://api.python.langchain.com/en/latest/tools/langchain_core.tools.create_retriever_tool.html)

定义好的工具实现了 [Runnables](https://python.langchain.com/v0.2/docs/concepts/#langchain-expression-language) 相关的接口，可以直接使用 `invoke` 等方法调用

In [None]:
tool.invoke("task decomposition")

'Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.\n\nFig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The mode


### 构造 Agent智能体[](https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/#agent-constructor)

现在我们已经定义了工具和大型语言模型（LLM），可以创建智能体。我们使用LangGraph来构建这个代理。目前，我们直接使用一个高级接口来构建智能体，但LangGraph的优势在于，这个高级接口背后有一个低级的、高度可控制的API，以便在需要时可以修改代理的逻辑。

In [12]:
from langgraph.prebuilt import create_react_agent

agent_executor = create_react_agent(llm, tools)

LangGraph自带持久性功能，所以不需要使用 `ChatMessageHistory` 。我们可以直接将一个检查点器（ `checkpointer`）传递给我们的LangGraph代理。

In [11]:
from langgraph.checkpoint.sqlite import SqliteSaver

memory = SqliteSaver.from_conn_string(":memory:")

agent_executor = create_react_agent(llm, tools, checkpointer=memory)

我们尝试执行一下agent，如果我们输入的查询不需要执行检索步骤，那么智能体就不会执行检索步骤。

In [13]:
config = {"configurable": {"thread_id": "abc123"}}

for s in agent_executor.stream(
{"messages": [HumanMessage(content="Hi! I'm bob")]}, config=config
):
  print(s)
  print("----")

{'agent': {'messages': [AIMessage(content='Hello Bob! How can I assist you today?', response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 67, 'total_tokens': 78}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-52df3916-e45d-483e-9b6c-dd2b7d864940-0', usage_metadata={'input_tokens': 67, 'output_tokens': 11, 'total_tokens': 78})]}}
----



此外，如果我们输入的查询需要执行检索步骤，agent将生成输入到工具中。

In [None]:
query = "What is Task Decomposition?"

for s in agent_executor.stream(
{"messages": [HumanMessage(content=query)]}, config=config
):
  print(s)
  print("----")

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_77rTwT2Qlk7ZxgOITlJh6Pzl', 'function': {'arguments': '{"query":"Task Decomposition"}', 'name': 'blog_post_retriever'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 92, 'total_tokens': 111}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_811936bd4f', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-a1ce3179-3378-446c-b39c-aab63eeda001-0', tool_calls=[{'name': 'blog_post_retriever', 'args': {'query': 'Task Decomposition'}, 'id': 'call_77rTwT2Qlk7ZxgOITlJh6Pzl'}], usage_metadata={'input_tokens': 92, 'output_tokens': 19, 'total_tokens': 111})]}}
----
{'tools': {'messages': [ToolMessage(content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT

在上面的例子中，agent 并没有直接将我们的查询原封不动地插入到工具中，而是去除了像“what”和“is”这样的不必要的词汇。

允许智能体在需要时利用对话的上下文。

In [None]:
query = "What according to the blog post are common ways of doing it? redo the search"

for s in agent_executor.stream(
{"messages": [HumanMessage(content=query)]}, config=config
):
  print(s)
  print("----")

{'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_db6SM3QoPbNabE9gF3UUq8RW', 'function': {'arguments': '{"query":"common ways of task decomposition"}', 'name': 'blog_post_retriever'}, 'type': 'function'}]}, response_metadata={'token_usage': {'completion_tokens': 21, 'prompt_tokens': 1488, 'total_tokens': 1509}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': 'fp_811936bd4f', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-de500626-1fe2-47cb-93b2-cea08ea21d4f-0', tool_calls=[{'name': 'blog_post_retriever', 'args': {'query': 'common ways of task decomposition'}, 'id': 'call_db6SM3QoPbNabE9gF3UUq8RW'}], usage_metadata={'input_tokens': 1488, 'output_tokens': 21, 'total_tokens': 1509})]}}
----
{'tools': {'messages': [ToolMessage(content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thou


请注意，agent能够推断出我们查询中的“它”指的是“任务分解”，并因此生成了一个合理的搜索查询——在这个例子中是“任务分解的常见方法”。

langsmith

### 整合

In [None]:
import bs4
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.tools.retriever import create_retriever_tool
from langchain_chroma import Chroma
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langgraph.checkpoint.sqlite import SqliteSaver

memory = SqliteSaver.from_conn_string(":memory:")
llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

### Construct retriever ###
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
)
),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

### Build retriever tool ###
tool = create_retriever_tool(
    retriever,
"blog_post_retriever",
"Searches and returns excerpts from the Autonomous Agents blog post.",
)
tools = [tool]

agent_executor = create_react_agent(llm, tools, checkpointer=memory)

**API 调用：**[AgentExecutor](https://api.python.langchain.com/en/latest/agents/langchain.agents.agent.AgentExecutor.html) | [create_tool_calling_agent](https://api.python.langchain.com/en/latest/agents/langchain.agents.tool_calling_agent.base.create_tool_calling_agent.html) | [create_retriever_tool](https://api.python.langchain.com/en/latest/tools/langchain_core.tools.create_retriever_tool.html) | [ChatMessageHistory](https://api.python.langchain.com/en/latest/chat_history/langchain_core.chat_history.ChatMessageHistory.html) | [WebBaseLoader](https://api.python.langchain.com/en/latest/document_loaders/langchain_community.document_loaders.web_base.WebBaseLoader.html) | [BaseChatMessageHistory](https://api.python.langchain.com/en/latest/chat_history/langchain_core.chat_history.BaseChatMessageHistory.html) | [ChatPromptTemplate](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.chat.ChatPromptTemplate.html) | [MessagesPlaceholder](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.chat.MessagesPlaceholder.html) | [RunnableWithMessageHistory](https://api.python.langchain.com/en/latest/runnables/langchain_core.runnables.history.RunnableWithMessageHistory.html) | [ChatOpenAI](https://api.python.langchain.com/en/latest/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html) | [OpenAIEmbeddings](https://api.python.langchain.com/en/latest/embeddings/langchain_openai.embeddings.base.OpenAIEmbeddings.html) | [RecursiveCharacterTextSplitter](https://api.python.langchain.com/en/latest/character/langchain_text_splitters.character.RecursiveCharacterTextSplitter.html)

## 总结

在本文中，在RAG链式调用的基础上**添加了整合历史消息的逻辑**。然后构建了一个能够自主决定是否调用知识库查询工具的agent智能体。