# 第四章 构建一个智能体推理循环

<div class="toc">
    <ul class="toc-item">
        <li><span><a href="#一引言" data-toc-modified-id="一、引言">一、引言</a></span></li>
        <li><span><a href="#二示例" data-toc-modified-id="二、示例">二、示例</a></span></li>
            <ul class="toc-item">
                <li><span><a href="#21-英文示例" data-toc-modified-id="2.1 英文示例">2.1 英文示例</a></span></li>
                <ul class="toc-item">
                    <li><span><a href="#211-导入相应库及获得大模型-api-key" data-toc-modified-id="2.1.1 导入相应库及获得大模型 API Key">2.1.1 导入相应库及获得大模型 API Key</a></span></li>
                    <li><span><a href="#212-设置工具" data-toc-modified-id="2.1.2 设置工具">2.1.2 设置工具</a></span></li>
                    <li><span><a href="#213-设置使用函数的代理" data-toc-modified-id="2.1.3 设置使用函数的代理">2.1.3 设置使用函数的代理</a></span></li>
                    <li><span><a href="#214-维护一段历史对话" data-toc-modified-id="2.1.4 维护一段历史对话">2.1.4 维护一段历史对话</a></span></li>
                    <li><span><a href="#215-高级研究助理调试和控制" data-toc-modified-id="2.1.5 高级研究助理：调试和控制">2.1.5 高级研究助理：调试和控制</a></span></li>
                </ul>
            </ul>
            <ul class="toc-item">
                <li><span><a href="#22-中文示例" data-toc-modified-id="2.2 中文示例">2.2 中文示例</a></span></li>
                <ul class="toc-item">
                    <li><span><a href="#221-初步多步骤询问" data-toc-modified-id="2.2.1 初步多步骤询问">2.2.1 初步多步骤询问</a></span></li>
                    <li><span><a href="#222-维护历史记录" data-toc-modified-id="2.2.2 维护历史记录">2.2.2 维护历史记录</a></span></li>
                    <li><span><a href="#223-高级研究助理" data-toc-modified-id="2.2.3 高级研究助理">2.2.3 高级研究助理</a></span></li>
                </ul>
            </ul>
        <li><span><a href="#三总结" data-toc-modified-id="三、总结">三、总结</a></span></li>
    </ul>
</div>

## 一、引言
***

本节课主要学习了如何定义一个完整的代理推理循环，以及如何使用函数调用代理实现与LLM的函数调用能力原生集成。以下是本节课的主要内容：

- **代理推理循环的定义**：与一次性工具调用不同，代理可以在多个步骤中对工具进行推理，从而处理更复杂的查询和需要澄清的模糊问题。
- **函数调用代理的实现**：使用大模型定义了由代理工作者和代理运行者组成的函数调用代理。代理工作者负责执行给定代理的下一步，而代理运行者负责整体任务调度。
- **多步骤查询的执行**：通过函数调用代理，将总体问题分解为多个步骤，并分别调用相应的工具来回答各个子问题。
- **对话历史的维护**：代理能够在对话记忆缓冲区中维护聊天记录，从而在多个步骤中保持对话的连续性和上下文。
- **低级代理接口的使用**：通过低级代理接口，可以逐步控制代理的执行，进行调试和控制，从而提高可调试性和可控性。

通过本节课的学习，我们将能够更灵活地处理复杂的查询，并能够逐步控制代理的执行，从而提高代理的准确性和效果。


## 二、示例

### 2.1 英文示例

#### 2.1.1 导入相应库及获得大模型 API Key

In [1]:
# 载入模型 API 相关环境变量
from helper import get_openai_api_key
OPENAI_API_KEY = get_openai_api_key()

In [2]:
# nest_asyncio 库用于修补 asyncio，使得在已经运行的事件循环中可以嵌套运行新的事件循环
import nest_asyncio
# 这个方法会修补当前的 asyncio 事件循环，使其能够嵌套运行新的事件循环
nest_asyncio.apply()

#### 2.1.2 设置工具

In [4]:
from utils_Ch4 import get_doc_tools

vector_tool, summary_tool = get_doc_tools("files/metagpt.pdf", "metagpt")

#### 2.1.3 设置使用函数的代理

In [5]:
from llama_index.llms.openai import OpenAI

# 用 OpenAI 的 API KEY 建立一个大模型
# 如果是第三方的，需要填写 api_base
# 第三方我是用的是 https://aiproxy.io/，可以使用邀请连接获得额外奖励：https://aiproxy.io/?i=liusiryyds
llm = OpenAI(model="gpt-3.5-turbo", temperature=0, api_key=OPENAI_API_KEY, api_base='https://api.aiproxy.io/v1')

![image1](images/4-1.png)

在LlamaIndex中，代理由两个主要组件组成：**代理工作者（AgentWorker）**和**代理运行者（AgentRunner）**。代理工作者负责执行给定代理的下一步，而代理运行者则是整体任务调度员，负责创建任务、协调代理工作者在给定任务上的运行，并能够返回最终的响应给用户。

In [8]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

# 首先创建 agent_worker，包括上述大模型，两类工具
# verbose 设为 True 以查看中间输出
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
# 创建整个代理
agent = AgentRunner(agent_worker)

In [9]:
# 开始询问
response = agent.query(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT include the Product Manager, Architect, Project Manager, Engineer, and QA Engineer. The Product Manager is responsible for generating the Product Requirement Document (PRD) and competitive analysis. The Architect focuses on technical specifications, system architecture diagrams, and interface definitions. The Project Manager breaks down the project into tasks for execution. The Engineer carries out development tasks based on defined specifications. Lastly, the QA Engineer generates unit test code and reviews it to ensure high-quality software.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents communicate with each other in MetaGPT"}
=== Function Output ===
Agents 

可以看到两次，大模型都选择了摘要工具。这种选择不一定是最精准的，如果选择更强大的模型，选择会更加精确及合理。

刚刚的两个问题是多步骤的，我们需要确保能够追踪这些来源，以下代码可以查看相关来源元数据

In [14]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

page_label: 1
file_name: metagpt.pdf
file_path: files\metagpt.pdf
file_type: application/pdf
file_size: 16911937
creation_date: 2024-06-04
last_modified_date: 2024-06-04

Preprint
METAGPT: M ETA PROGRAMMING FOR A
MULTI -AGENT COLLABORATIVE FRAMEWORK
Sirui Hong1∗, Mingchen Zhuge2∗, Jonathan Chen1, Xiawu Zheng3, Yuheng Cheng4,
Ceyao Zhang4,Jinlin Wang1,Zili Wang ,Steven Ka Shing Yau5,Zijuan Lin4,
Liyang Zhou6,Chenyu Ran1,Lingfeng Xiao1,7,Chenglin Wu1†,J¨urgen Schmidhuber2,8
1DeepWisdom,2AI Initiative, King Abdullah University of Science and Technology,
3Xiamen University,4The Chinese University of Hong Kong, Shenzhen,
5Nanjing University,6University of Pennsylvania,
7University of California, Berkeley,8The Swiss AI Lab IDSIA/USI/SUPSI
ABSTRACT
Remarkable progress has been made on automated problem solving through so-
cieties of agents based on large language models (LLMs). Existing LLM-based
multi-agent systems can already solve simple dialogue tasks. Solutions to more
complex tasks, how

以上代码显示了第一个来源节点的内容，这只是论文的第一页。**注意：这种做法允许一次性查看代理，但不保留状态**

#### 2.1.4 维护一段历史对话

记忆模块可以自定义，但默认情况下它是一个平坦的项目列表，是一个滚动缓冲区，其取决于LLM的上下文窗口大小。因此，当代理决定使用一个工具时，它不仅使用当前的聊天，还可以使用先前的对话历史来采取下一步或执行下一个动作。在此，我们将使用 agent.chat() 而非 agent.query()


下面的两轮对话实际上显示代理能够在记忆缓冲区中维护聊天记录

In [9]:
response = agent.chat(
    "Tell me about the evaluation datasets used."
)

Added user message to memory: Tell me about the evaluation datasets used.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "evaluation datasets used in MetaGPT"}
=== Function Output ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and a self-generated SoftwareDev dataset. HumanEval consists of 164 handwritten programming tasks, while MBPP comprises 427 Python tasks. The SoftwareDev dataset contains 70 representative software development tasks covering various scopes like mini-games, image processing algorithms, and data visualization. These datasets were utilized to evaluate the performance of MetaGPT in code generation tasks.
=== LLM Response ===
The evaluation datasets used in MetaGPT include HumanEval, MBPP, and a self-generated SoftwareDev dataset. HumanEval consists of 164 handwritten programming tasks, MBPP comprises 427 Python tasks, and the SoftwareDev dataset contains 70 representative software development tasks. These da

In [10]:
response = agent.chat("Tell me the results over one of the above datasets.")

Added user message to memory: Tell me the results over one of the above datasets.
=== Calling Function ===
Calling function: vector_tool_metagpt with args: {"query": "results over HumanEval dataset", "page_numbers": ["6"]}
=== Function Output ===
The results over the HumanEval dataset were part of the experimental setting in the study.
=== LLM Response ===
The results over the HumanEval dataset were part of the experimental setting in the study. If you would like more detailed information, please let me know.


#### 2.1.5 高级研究助理：调试和控制

在接下来的部分中，我们将展示让一步步控制代理的功能。这不仅允许创建一个高级的研究助理，还可以进行调试和控制。这里的一些好处包括**更高的可调试性**以及**通过允许注入用户反馈来实现更高的可控性**。

拥有这个低级代理接口有两个主要原因。**第一个是可调试性**。如果你是一个构建代理的开发者，你可能希望对代理的内部操作有更高的透明度和可见性，特别是当你的代理第一次运行不成功时。然后你可以进入，跟踪代理的执行，看看它在哪里失败，并实际尝试不同的输入，看看这是否会修改代理的执行以得到正确的响应。**另一个原因是实际上启用更丰富的用户体验**，在这个核心代理能力周围构建产品体验。例如，假设你希望在代理执行过程中听取人类的反馈，而不仅仅是在给定任务完成后。那么你可以想象创建某种异步队列，在整个代理执行过程中监听来自人类的输入，如果人类输入实际上出现，你可以在代理执行过程中中断并修改代理的执行，而不必等到代理任务完成。

In [25]:
# 操作和之前一样
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [26]:
# 与之前一致
# task 是一个任务对象，其中包含输入以及任务对象中的其他状态
task = agent.create_task(
    "Tell me about the agent roles in MetaGPT, "
    "and then how they communicate with each other."
)

In [27]:
# 现在，通过以下代码执行任务的一步，并返回一个步骤输出
step_output = agent.run_step(task.task_id)

Added user message to memory: Tell me about the agent roles in MetaGPT, and then how they communicate with each other.
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "agent roles in MetaGPT"}
=== Function Output ===
The agent roles in MetaGPT include the Product Manager, Architect, Project Manager, Engineer, and QA Engineer. The Product Manager is responsible for creating the Product Requirement Document (PRD), the Architect designs the system architecture and technical specifications, the Project Manager breaks down tasks for execution, the Engineer implements the code, and the QA Engineer generates unit tests and reviews the code for bugs. Each role has specific responsibilities in the software development process within the MetaGPT framework.


In [28]:
# 查看已完成步骤数量及对应结果
completed_steps = agent.get_completed_steps(task.task_id)
print(f"Num completed for task {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)

Num completed for task 370e24c7-95e0-47be-96f6-bf4b6222b706: 1
The agent roles in MetaGPT include the Product Manager, Architect, Project Manager, Engineer, and QA Engineer. The Product Manager is responsible for creating the Product Requirement Document (PRD), the Architect designs the system architecture and technical specifications, the Project Manager breaks down tasks for execution, the Engineer implements the code, and the QA Engineer generates unit tests and reviews the code for bugs. Each role has specific responsibilities in the software development process within the MetaGPT framework.


In [29]:
# 接着，我们还可以查看代理任何即将到来的步骤，如下：
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"Num upcoming steps for task {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]

Num upcoming steps for task 370e24c7-95e0-47be-96f6-bf4b6222b706: 1


TaskStep(task_id='370e24c7-95e0-47be-96f6-bf4b6222b706', step_id='54991760-83f4-48fa-a8f5-02da23dd8e39', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)

这个调试界面的好处是，如果你现在想暂停执行，可以这样做。**你可以获取中间结果，而不必完成代理流程**。但让我们继续前进，运行接下来的两步，并实际尝试注入用户输入。

In [30]:
step_output = agent.run_step(
    task.task_id, input="What about how agents share information?"
)

Added user message to memory: What about how agents share information?
=== Calling Function ===
Calling function: summary_tool_metagpt with args: {"input": "how agents communicate with each other in MetaGPT"}
=== Function Output ===
Agents in MetaGPT communicate with each other through structured communication interfaces, a publish-subscribe mechanism, and natural language. The structured communication interfaces define a schema and format for each role, ensuring that individuals provide necessary outputs based on their specific role and context. Additionally, there is a shared message pool where agents publish their structured messages and access messages from other agents transparently. This mechanism enhances communication efficiency by allowing agents to exchange information directly and subscribe to relevant messages based on their role profiles. The interactive communication between agents is facilitated by MetaGPT, ensuring transparency in the system. The communication is real-t

In [31]:
# 就行最后一步
step_output = agent.run_step(task.task_id)

=== LLM Response ===
Agents in MetaGPT communicate with each other through structured communication interfaces, a publish-subscribe mechanism, and natural language. The structured communication interfaces define a schema and format for each role, ensuring that individuals provide necessary outputs based on their specific role and context. Additionally, there is a shared message pool where agents publish their structured messages and access messages from other agents transparently. This mechanism enhances communication efficiency by allowing agents to exchange information directly and subscribe to relevant messages based on their role profiles. The interactive communication between agents is facilitated by MetaGPT, ensuring transparency in the system. The communication is real-time, and the interpretation and operation of natural language are displayed on the screen and logs, allowing for seamless interaction and collaboration among the different agents involved in the software developm

In [32]:
# 我们还可以通过一下代码来双重验证是否为最后一步
print(step_output.is_last)

True


In [33]:
# 为得到最终答案，可用以下代码：
response = agent.finalize_response(task.task_id)

In [34]:
print(str(response))

Agents in MetaGPT communicate with each other through structured communication interfaces, a publish-subscribe mechanism, and natural language. The structured communication interfaces define a schema and format for each role, ensuring that individuals provide necessary outputs based on their specific role and context. Additionally, there is a shared message pool where agents publish their structured messages and access messages from other agents transparently. This mechanism enhances communication efficiency by allowing agents to exchange information directly and subscribe to relevant messages based on their role profiles. The interactive communication between agents is facilitated by MetaGPT, ensuring transparency in the system. The communication is real-time, and the interpretation and operation of natural language are displayed on the screen and logs, allowing for seamless interaction and collaboration among the different agents involved in the software development process.


### 2.2 中文示例

#### 2.2.1 初步多步骤询问

In [9]:
# 载入模型 API 相关环境变量
from helper import get_openai_api_key
# nest_asyncio 库用于修补 asyncio，使得在已经运行的事件循环中可以嵌套运行新的事件循环
import nest_asyncio
from llama_index.llms.openai import OpenAI
from utils_Ch4 import get_doc_tools
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner
# 这个方法会修补当前的 asyncio 事件循环，使其能够嵌套运行新的事件循环
nest_asyncio.apply()

OPENAI_API_KEY = get_openai_api_key()
# 用 OpenAI 的 API KEY 建立一个大模型
# 如果是第三方的，需要填写 api_base
# 第三方我是用的是 https://aiproxy.io/，可以使用邀请连接获得额外奖励：https://aiproxy.io/?i=liusiryyds
llm = OpenAI(model="gpt-3.5-turbo", temperature=0, api_key=OPENAI_API_KEY, api_base='https://api.aiproxy.io/v1')


vector_tool, summary_tool = get_doc_tools("files/1.pdf", "1")

# 首先创建 agent_worker，包括上述大模型，两类工具
# verbose 设为 True 以查看中间输出
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
# 创建整个代理
agent = AgentRunner(agent_worker)

Multiple definitions in dictionary at byte 0x1f14c for key /GSP1
Multiple definitions in dictionary at byte 0x1f2ff for key /GSP1
Multiple definitions in dictionary at byte 0x1f4e1 for key /GSP1
Multiple definitions in dictionary at byte 0x1f6c3 for key /GSP1
Multiple definitions in dictionary at byte 0x1f8b1 for key /GSP1
Multiple definitions in dictionary at byte 0x1fa3f for key /GSP1
Multiple definitions in dictionary at byte 0x1fbb9 for key /GSP1
Multiple definitions in dictionary at byte 0x1fd10 for key /GSP1


In [11]:
# 开始询问
response = agent.query(
    "1中智能体是什么意思？"
    "强化学习运动规划和智能体有什么关系？"
)

Added user message to memory: 1中智能体是什么意思？强化学习运动规划和智能体有什么关系？
=== Calling Function ===
Calling function: vector_tool_1 with args: {"query": "1\u4e2d\u667a\u80fd\u4f53\u662f\u4ec0\u4e48\u610f\u601d\uff1f\u5f3a\u5316\u5b66\u4e60\u8fd0\u52a8\u89c4\u5212\u548c\u667a\u80fd\u4f53\u6709\u4ec0\u4e48\u5173\u7cfb\uff1f"}
=== Function Output ===
智能体在这里指的是参与运动规划和强化学习的个体或实体，通常是指机器人或其他智能系统。在强化学习运动规划中，智能体通过学习和决策来规划其在环境中的运动路径或行为，以达到特定的目标。强化学习是一种机器学习方法，通过智能体与环境的交互学习，以最大化累积奖励来改善其决策过程。因此，智能体在强化学习运动规划中扮演着学习者和决策者的角色，通过不断的试错和学习来提高其在复杂环境中的运动规划能力。
=== LLM Response ===
智能体在这里指的是参与运动规划和强化学习的个体或实体，通常是指机器人或其他智能系统。在强化学习运动规划中，智能体通过学习和决策来规划其在环境中的运动路径或行为，以达到特定的目标。强化学习是一种机器学习方法，通过智能体与环境的交互学习，以最大化累积奖励来改善其决策过程。因此，智能体在强化学习运动规划中扮演着学习者和决策者的角色，通过不断的试错和学习来提高其在复杂环境中的运动规划能力。


In [12]:
print(response.source_nodes[0].get_content(metadata_mode="all"))

page_label: 8
file_name: 1.pdf
file_path: files\1.pdf
file_type: application/pdf
file_size: 1458912
creation_date: 2024-06-06
last_modified_date: 2024-06-06

第x期 孙辉辉等 :事件触发式多智能体分层安全强化学习运动规划 7
SAC CPO MEHSRL808595
/UNI5f3a/UNI5316/UNI5b66/UNI4e60/UNI65b9/UNI6cd590100
/UNI4efb/UNI52a1/UNI6210/UNI529f/UNI7387/%
86.5092.6096.80
图10测试环境中不同方法的任务成功率
通过分析对比不同运动规划方法的任务成功率、 奖
励值和代价值的差异可以明显得出 :所提出多智能
体事件触发式分层安全强化学习方法 MEHSRL 在复
杂的非安全环境中展现出较好的安全性和稳定性 ,机
器人受到非安全因素影响较小 ,代价值和奖励值可保
持在一个合理的区域内 ,机器人的导航、 编队和搜索
的任务成功率得到了全面提高 ,增强了在动态复杂的
非安全环境中良好的运动规划能力和环境的适应性 .
4结 论
本文提出了一种基于事件触发的多智能体分层
安全强化学习运动规划方法 (MEHSRL) 来改善多机
器人运动规划中安全约束不足的问题 .所提出方法
基于受限马尔可夫决策过程 ,建立了分层多智能体
安全强化学习框架 ,实现了策略网络的快速学习与安
全约束的平衡 ;通过引入李雅普诺夫代价函数网络 ,
构建了带有条件约束的动作策略优化目标 ,同时利用
拉格朗日乘子法优化了策略的求解过程 ,保证了状态
轨迹可在有限时间内从危险状态中恢复至安全空间 .
通过与其他方法的实验对比结果可以发现 ,所提出运
动规划方法任务执行成功率高、 安全性强 ,为保证复
杂环境下多机器人安全运动规划提供了理论参考 .
参考文献 (References)
[1]温广辉 ,杨涛 ,周佳玲 ,等.强化学习与自适应动态
规划 :从基础理论到多智能体系统中的应用进展综
述[J].控制与决策 , 2023, 38(5): 1200-1230.
(Wen G H, Yang T, Zhou J L, et al. Reinfor

#### 2.2.2 维护历史记录

In [13]:
response = agent.chat(
    "这篇文章解决了什么问题？"
)

Added user message to memory: 这篇文章解决了什么问题？
=== Calling Function ===
Calling function: summary_tool_1 with args: {"input": "What problems does this paper solve?"}
=== Function Output ===
The paper addresses the issue of insufficient safety constraints in multi-robot motion planning by proposing a multi-agent event-triggered hierarchical security reinforcement learning method. This method aims to balance rapid learning of policy networks with safety constraints, ensuring that state trajectories can recover from dangerous states to safe spaces within a limited time. The approach enhances the safety and stability of robot navigation, coordination, and task success rates in complex and non-safe environments.
=== LLM Response ===
这篇论文解决了多机器人运动规划中安全约束不足的问题，提出了一种多智能体事件触发的分层安全强化学习方法。该方法旨在平衡策略网络的快速学习与安全约束，确保状态轨迹能够在有限时间内从危险状态恢复到安全空间。这种方法增强了机器人在复杂和非安全环境中的导航、协调和任务成功率的安全性和稳定性。


In [14]:
response = agent.chat(
    "这篇文章解决这个问题的前提或假设是什么？"
)

Added user message to memory: 这篇文章解决这个问题的前提或假设是什么？
=== Calling Function ===
Calling function: summary_tool_1 with args: {"input": "What are the premises or assumptions of this paper in addressing the problem?"}
=== Function Output ===
The paper assumes a scenario where multiple agents need to navigate in a complex environment while considering safety constraints. It assumes the use of a multi-agent reinforcement learning framework based on constrained Markov decision processes. The method incorporates a hierarchical safety reinforcement learning approach with event-triggered mechanisms to ensure safe decision-making. Additionally, the paper assumes the effectiveness of introducing a Lyapunov evaluation network to establish safety constraints and optimize multi-agent decision-making. The approach also relies on the use of a dual delayed deep deterministic policy gradient algorithm to balance safety and learning objectives.
=== LLM Response ===
这篇论文假设存在这样一种情景：多个智能体需要在复杂环境中导航，同时考虑安全约束。论文假

#### 2.2.3 高级研究助理

In [33]:
# 操作和之前一样
agent_worker = FunctionCallingAgentWorker.from_tools(
    [vector_tool, summary_tool], 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [34]:
# 与之前一致
# task 是一个任务对象，其中包含输入以及任务对象中的其他状态
task = agent.create_task(
    "1中智能体是什么意思？"
    "强化学习运动规划和智能体有什么关系？"
)

In [35]:
# 现在，通过以下代码执行任务的一步，并返回一个步骤输出
step_output = agent.run_step(task.task_id)

Added user message to memory: 1中智能体是什么意思？强化学习运动规划和智能体有什么关系？
=== Calling Function ===
Calling function: vector_tool_1 with args: {"query": "1\u4e2d\u667a\u80fd\u4f53\u662f\u4ec0\u4e48\u610f\u601d\uff1f\u5f3a\u5316\u5b66\u4e60\u8fd0\u52a8\u89c4\u5212\u548c\u667a\u80fd\u4f53\u6709\u4ec0\u4e48\u5173\u7cfb\uff1f"}
=== Function Output ===
智能体在这里指的是参与强化学习运动规划的个体，通常是指机器人或其他智能实体。在强化学习运动规划中，智能体是执行动作、接收奖励并学习如何在环境中达到特定目标的主体。强化学习运动规划侧重于智能体如何在环境中做出决策以达到最优目标，因此智能体在这个过程中扮演着关键的角色。


In [36]:
# 查看已完成步骤数量及对应结果
completed_steps = agent.get_completed_steps(task.task_id)
print(f"已完成任务数 {task.task_id}: {len(completed_steps)}")
print(completed_steps[0].output.sources[0].raw_output)

已完成任务数 d754247a-c3e0-4a20-9e2e-f0581c14d5dc: 1
智能体在这里指的是参与强化学习运动规划的个体，通常是指机器人或其他智能实体。在强化学习运动规划中，智能体是执行动作、接收奖励并学习如何在环境中达到特定目标的主体。强化学习运动规划侧重于智能体如何在环境中做出决策以达到最优目标，因此智能体在这个过程中扮演着关键的角色。


In [37]:
# 接着，我们还可以查看代理任何即将到来的步骤，如下：
upcoming_steps = agent.get_upcoming_steps(task.task_id)
print(f"接下来未完成任务数 {task.task_id}: {len(upcoming_steps)}")
upcoming_steps[0]

接下来未完成任务数 d754247a-c3e0-4a20-9e2e-f0581c14d5dc: 1


TaskStep(task_id='d754247a-c3e0-4a20-9e2e-f0581c14d5dc', step_id='c9798d7d-5f0d-4937-aceb-b4c6fc5e9a69', input=None, step_state={}, next_steps={}, prev_steps={}, is_ready=True)

In [38]:
step_output = agent.run_step(
    task.task_id, input="这篇文章有什么贡献？"
)

Added user message to memory: 这篇文章有什么贡献？
=== Calling Function ===
Calling function: summary_tool_1 with args: {"input": "What are the contributions of this paper?"}
=== Function Output ===
The contributions of this paper include proposing a multi-agent event-triggered hierarchical security reinforcement learning method for motion planning, utilizing a constrained Markov decision process framework, introducing a Lyapunov evaluation network for additional safety constraints, ensuring robot decision safety through multi-constraint objective optimization learning, and demonstrating improved security and coordination capabilities in multi-robot reinforcement learning scenarios.


In [39]:
# 就行最后一步
step_output = agent.run_step(task.task_id)

=== LLM Response ===
这篇论文的贡献包括提出了一种用于运动规划的多智能体事件触发的分层安全强化学习方法，利用了受限马尔可夫决策过程框架，引入了一种Lyapunov评估网络用于额外的安全约束，通过多约束目标优化学习确保机器人决策的安全性，并展示了在多机器人强化学习场景中改进的安全性和协调能力。


In [40]:
# 我们还可以通过一下代码来双重验证是否为最后一步
print(step_output.is_last)

True


In [41]:
# 为得到最终答案，可用以下代码：
response = agent.finalize_response(task.task_id)

In [42]:
print(str(response))

这篇论文的贡献包括提出了一种用于运动规划的多智能体事件触发的分层安全强化学习方法，利用了受限马尔可夫决策过程框架，引入了一种Lyapunov评估网络用于额外的安全约束，通过多约束目标优化学习确保机器人决策的安全性，并展示了在多机器人强化学习场景中改进的安全性和协调能力。


## 三、总结
***

在本节课中，我们深入探讨了如何通过定义一个完整的代理推理循环来处理更复杂的查询和需要澄清的模糊问题。与一次性工具调用不同，代理可以在多个步骤中对工具进行推理，从而提高查询的准确性和效果。我们学习了如何使用函数调用代理实现与大模型的函数调用能力原生集成，以及如何使用代理工作者和代理运行者这两个主要组件来创建和管理代理。我们还了解了如何维护对话历史，以及如何通过低级代理接口进行调试和控制。通过本节课的学习，我们将能够更灵活地处理复杂的查询，并能够逐步控制代理的执行，从而提高代理的准确性和效果。