iohub · iohub · May 9, 2026 · May 8, 2026 · May 8, 2026 · May 8, 2026
diff --git a/.codeactor/skills/commit.md b/.codeactor/skills/commit.md
@@ -29,18 +29,14 @@
 1. 运行 `git status --short` 获取所有变更文件列表。
 2. **过滤排除以下文件**：
 
-   | 类别 | 排除规则 |
-   |------|----------|
-   | **数据文件** | 扩展名：`.csv`, `.tsv`, `.xlsx`, `.xls`, `.parquet`, `.arrow`, `.feather`, `.h5`, `.hdf5`, `.npz`, `.npy`, `.pkl`, `.joblib`, `.sqlite`, `.sqlite3`, `.db`, `.dta`, `.sav`, `.rds`, `.rda` |
-   | **二进制/编译产物** | 扩展名：`.exe`, `.dll`, `.so`, `.a`, `.o`, `.obj`, `.bin`, `.pt`, `.pth`, `.onnx`, `.safetensors`, `.gguf`, `.wasm`, `.pyc`, `.pyo`, `.class`, `.jar`, `.war`, `.apk`, `.ipa`, `.whl`, `.egg` |
-   | **媒体文件** | 扩展名：`.png`, `.jpg`, `.jpeg`, `.gif`, `.bmp`, `.tif`, `.tiff`, `.webp`, `.svg`（非图标/UI资源时排除），`.mp3`, `.wav`, `.flac`, `.ogg`, `.mp4`, `.avi`, `.mov`, `.mkv`, `.webm` |
-   | **压缩包** | 扩展名：`.zip`, `.tar`, `.gz`, `.bz2`, `.7z`, `.rar`, `.xz`, `.zst`, `.tgz`, `.tar.gz`, `.tar.bz2` |
-   | **测试数据/夹具** | 路径包含：`test/data/`、`tests/data/`、`test/fixtures/`、`tests/fixtures/`、`testdata/`、`__test_data__/`、`sample_data/`、`*.testdata.*` |
-   | **大文件提醒** | 单个文件超过 **5MB** 时，跳过并提醒用户手动处理 |
+   | **数据文件** |
+   | **二进制/编译产物** |
+   | **媒体文件** | 
+   | **压缩包** | 
+   | **测试数据/夹具** |
 
 3. 对过滤后的代码文件执行 `git add <file1> <file2> ...`。
 4. 执行 `git commit -m "<message>"` 提交。**无需用户确认，直接提交。**
-5. 如果过滤后没有任何文件可提交，告知用户「没有需要提交的代码文件（数据文件、二进制文件已自动排除）」，然后结束。
 
 ## 步骤 4：展示提交结果
 提交完成后，运行 `git log --oneline -3` 展示最近3条提交记录，让用户确认 commit 内容是否正确。
@@ -53,4 +49,3 @@
 **注意事项**：
 - 如果仓库没有变更，直接告知用户 "没有需要提交的变更" 并结束
 - 所有 git 命令使用 `run_bash` 工具执行
-- 保持交互自然，不要过度自动化
diff --git a/docs/Browser_Agent_Design.md b/docs/Browser_Agent_Design.md
diff --git a/docs/Prompt_Cache_Optimization_Plan.md b/docs/Prompt_Cache_Optimization_Plan.md
@@ -0,0 +1,224 @@
+# Prompt 缓存优化方案
+
+> 审计日期：2025-07-16
+> 审计依据：`docs/Prompt_cache.md` 最佳实践文档
+> 决策方法：经过 `deepthinking` 深度分析后确定
+
+---
+
+## 一、审计背景
+
+对照 LLM Prompt Cache 最佳实践文档的五项核心检查清单，对项目中 7 个 Agent（Conductor、Coding、Repo、Chat、DevOps、Meta、ImplPlan）的 prompt 构建方式进行了全面审计。
+
+LLM 缓存采用**严格前缀匹配**（Prefix Matching）机制：从第一个 Token 开始必须完全一致，一旦中途有任何字符不同，该字符之后的所有缓存全部失效。
+
+## 二、审计结论总览
+
+| 编号 | 严重程度 | 问题 | 决策 |
+|------|---------|------|------|
+| **A** | 🔴 P0 | Conductor 动态项目上下文放在静态 prompt 之前 | **必须修复** |
+| **B** | 🟡 P0 | RepoAgent 动态数据插入顺序不当 | **必须修复** |
+| **C** | 🟢 P1 | `FunctionDef.Parameters` 使用 `map[string]any` | **建议优化**（仅加防御注释） |
+| **D** | 🟡 P0 | `FormatPrompt` 中 Environment 字段条件拼接 | **必须修复** |
+| **E** | ⚪ P2 | 未实现路由亲和性 | **暂不修复**（集群部署时再处理） |
+
+---
+
+## 三、P0 必须修复项
+
+### 问题 A：Conductor 动态项目上下文放在静态 prompt 之前
+
+**文件**：`internal/agents/conductor.go`（约第 688 行）
+
+**当前代码**：
+```go
+// ❌ 动态上下文被 prepend 到静态 prompt 之前
+systemPrompt = fmt.Sprintf("### Project Workspace Context\n%s\n\n", loadResult.Content) + systemPrompt
+```
+
+**问题分析**：
+- `loadProjectContext()` 加载的 `CODEACTOR.md`/`CLAUDE.md`/`AGENTS.md` 每个项目内容不同
+- 它被放在 `conductor.prompt.md`（139 行静态 prompt）的**最前面**
+- 切换项目时，第一个 token 就不同 → 整个 System Prompt 缓存 100% 失效
+- Conductor 的 prompt 是系统中最大、最复杂的，缓存失效代价极高
+
+**修复方案**：将 Project Context 移到 system prompt **末尾**
+
+```go
+// ✅ 正确：动态上下文放在末尾
+systemPrompt := a.GlobalCtx.FormatPrompt(conductorPrompt)
+// ... 追加 Custom Agents 注册信息 ...
+if shouldLoadProjectContext {
+    systemPrompt += "\n\n### Project Workspace Context\n" + loadResult.Content + "\n"
+}
+```
+
+**预期收益**：
+- 139 行静态模板 + Environment + Language Instructions 成为固定前缀
+- 跨项目、跨会话共享缓存
+- 缓存命中率预计提升 60%~80%
+
+---
+
+### 问题 B：RepoAgent 动态数据插入顺序不当
+
+**文件**：`internal/agents/repo.go`（约第 195-217 行）
+
+**当前代码**：
+```go
+// ❌ 动态 investigation 数据插在静态 prompt 和环境信息之间
+systemPrompt := repoPrompt       // 54 行静态
+systemPrompt += info             // ← 动态数据插在中间
+systemPrompt = a.GlobalCtx.FormatPrompt(systemPrompt)  // 追加 Environment
+```
+
+**问题分析**：
+- `doPreInvestigate()` 返回的 Directory Tree、Core Functions、File Skeletons 每个项目不同，甚至同一项目代码变化后也不同
+- 动态数据后的 Environment + Language Instructions 缓存连带失效
+- RepoAgent 是高频调用 Agent
+
+**修复方案**：先 `FormatPrompt`（静态 + 环境），最后追加动态调查数据
+
+```go
+// ✅ 正确：静态在前，动态在最后
+systemPrompt := a.GlobalCtx.FormatPrompt(repoPrompt)
+systemPrompt += info  // investigation 数据放在最后
+```
+
+**预期收益**：
+- 54 行静态指令完全固化于前缀
+- 动态数据放尾部符合 LLM 注意力机制的"近因效应"
+
+---
+
+### 问题 D：`FormatPrompt` 中 Environment 字段条件拼接
+
+**文件**：`internal/globalctx/global_context.go`（`FormatPrompt` 方法）
+
+**当前代码**：
+```go
+// ❌ 条件判断导致同环境下前缀不一致
+if g.ProjectPath != "" {
+    sb.WriteString(fmt.Sprintf("- **Project Path**: %s\n", g.ProjectPath))
+}
+if g.OS != "" {
+    sb.WriteString(fmt.Sprintf("- **Operating System**: %s\n", g.OS))
+}
+if g.Arch != "" {
+    sb.WriteString(fmt.Sprintf("- **Architecture**: %s\n", g.Arch))
+}
+```
+
+**问题分析**：
+- 条件分支导致相同环境下的请求前缀长度/内容不同
+- 若某字段为空被跳过，Environment 块的结构发生变化
+- 虽然 Environment 在最末尾，不会破坏前面的静态缓存，但会影响完整前缀一致性
+
+**修复方案**：移除条件判断，始终输出完整字段结构
+
+```go
+// ✅ 正确：始终输出完整字段，空值用占位符
+projectPath := g.ProjectPath
+if projectPath == "" {
+    projectPath = "[NOT SET]"
+}
+os := g.OS
+if os == "" {
+    os = "[NOT SET]"
+}
+arch := g.Arch
+if arch == "" {
+    arch = "[NOT SET]"
+}
+
+sb.WriteString("\n\n### Environment\n")
+sb.WriteString(fmt.Sprintf("- **Project Path**: %s\n", projectPath))
+sb.WriteString(fmt.Sprintf("- **Operating System**: %s\n", os))
+sb.WriteString(fmt.Sprintf("- **Architecture**: %s\n", arch))
+```
+
+**预期收益**：
+- 保证了 Environment 块的结构和前缀长度绝对一致
+- 最大化缓存命中率
+
+---
+
+## 四、P1 建议优化项
+
+### 问题 C：`FunctionDef.Parameters` 使用 `map[string]any`
+
+**文件**：`internal/llm/engine.go`（`FunctionDef` 结构体）
+
+**现状**：
+```go
+type FunctionDef struct {
+    Name        string         `json:"name"`
+    Description string         `json:"description,omitempty"`
+    Parameters  map[string]any `json:"parameters,omitempty"`
+}
+```
+
+**分析**：
+- Go 标准库 `encoding/json` 对 `map` 序列化时按 key **字母排序**，行为是确定性的 ✅
+- 但如果未来切换 JSON 库（如 `sonic`、`jsoniter`），需确保启用 `SortMapKeys` 配置
+- 当前改为 struct 的工程成本高、收益低，不建议重构
+
+**建议方案**：仅添加防御性注释
+
+```go
+// ⚠️ IMPORTANT: The current implementation relies on encoding/json's deterministic
+// sorting of map keys (alphabetical order). If migrating to sonic, jsoniter, or
+// another JSON library in the future, ensure SortMapKeys is enabled to maintain
+// deterministic key ordering and prevent prompt cache fragmentation.
+type FunctionDef struct {
+    Name        string         `json:"name"`
+    Description string         `json:"description,omitempty"`
+    Parameters  map[string]any `json:"parameters,omitempty"`
+}
+```
+
+---
+
+## 五、P2 暂不修复项
+
+### 问题 E：未实现路由亲和性
+
+**分析**：
+- 当前开发环境为单节点运行，路由亲和性无实际影响
+- 引入 `prompt_cache_key` 或 Session Router 需改造 LLM Client 层，增加状态管理复杂度
+
+**规划**：
+- 集群部署时再引入一致性哈希路由或共享 Redis 缓存层
+- 可在 `llm.CallOptions` 中预留 `PromptCacheKey` 字段供未来使用
+
+---
+
+## 六、其他发现：可接受的架构代价
+
+### Compact 压缩导致的缓存 Miss
+
+压缩引擎的 L3（丢弃早期消息）和 L2（截断工具输出）会改变消息结构，导致后续 LLM 调用的前缀变化。这是**可接受的架构代价**：缓存 Miss 是换取 Token 超限安全的必要手段，不应为保缓存而限制压缩。
+
+### 动态 Agent 注册导致的工具定义变化
+
+Meta-Agent 运行时注册自定义 Agent 会改变 `tool_defs` 列表。由于 `tool_defs` 作为 API 请求参数参与前缀匹配，动态注册天然导致 Cache Miss。这属于业务特性，无法避免。
+
+---
+
+## 七、实施优先级
+
+| 优先级 | 问题 | 预计改动量 | 风险 |
+|--------|------|-----------|------|
+| **1** | D — FormatPrompt 条件拼接 | ~10 行 | 极低 |
+| **2** | B — RepoAgent 顺序调整 | ~3 行 | 低 |
+| **3** | A — Conductor 上下文移至末尾 | ~5 行 | 低（需验证 LLM 指令遵循度） |
+| **4** | C — 添加防御注释 | ~5 行 | 零风险 |
+
+---
+
+## 八、验证策略
+
+1. **单元测试**：验证 `FormatPrompt` 在不同参数下输出前缀一致
+2. **集成测试**：使用 Mock LLM 拦截请求，统计前缀命中率
+3. **LLM 行为回归**：选取典型编码任务，验证修复后指令遵循率、工具调用准确率无退化
+4. **回滚方案**：所有变更通过独立 Git Commit 隔离，异常时一键 Revert
diff --git a/docs/Prompt_cache.md b/docs/Prompt_cache.md
@@ -0,0 +1,58 @@
+
+
+
+---
+
+### 一、 提示词结构设计：严格分离静态与动态
+
+LLM 缓存的底层机制是**前缀匹配（Prefix Matching）**：即缓存必须从第一个 Token 开始完全一致，一旦中途有任何一个字符不同，该字符之后的所有缓存将全部失效。
+
+*   **黄金法则：静态在前，动态在后**
+    *   **最前部（Static Prefix）**：系统角色（System Prompt）、行为准则、全量工具定义（Tool Descriptions）、知识库文档。这些内容在 Agent 运行周期内几乎不变，占据了绝大多数 Token，应在此部分末尾打上缓存断点（Cache Breakpoint）。
+    *   **最后部（Dynamic Suffix）**：用户的当前提问、实时的环境变量、最新的观察结果（Observation）和步骤输出。
+*   **反模式：严禁在系统提示词头部注入动态变量**
+    *   很多开发者习惯在 System Prompt 开头加上当前时间戳（`Current Time: xxx`）、请求 ID、或当前的任务进度状态。这会导致整个系统提示词前缀每一轮都在变化，使得数万 Token 的缓存 100% 彻底失效。时间戳或状态应作为独立的 System Message **追加**到对话末尾。
+*   **确定性序列化（Deterministic Serialization）**
+    *   Agent 经常需要将 JSON、字典对象或代码树序列化后放入上下文。必须确保每次序列化时的**字段顺序是固定且一致的**（例如总是按 Key 的字母排序）。由于库的随机性导致的键位倒置，会在不经意间破坏前缀一致性。
+
+### 二、 对话与状态管理：遵守“只追加（Append-Only）”原则
+
+在传统的软件工程中，我们习惯修改变量来更新状态；但在面向 LLM 缓存编程时，**上下文不是可编辑的变量，而是只追加的日志**。
+
+*   **绝对避免修改历史记录**
+    *   **❌ 错误做法：滑动窗口截断（Sliding Window）**。当历史记录过长时，直接删除最前面的几轮对话。这会改变动态部分的开头，导致整个对话历史的缓存全毁。
+    *   **❌ 错误做法：修改历史消息**。如修改此前某一步的中间思考过程或纠正历史错误。
+    *   **✅ 正确做法：只追加内容**。以新的消息告知模型之前的错误，或将状态变化作为新的 user/system message 附在最后。
+*   **使用工具调用代替模式切换**
+    *   不要通过修改 System Prompt 来让 Agent 切换工作模式（如从“规划模式”切换到“执行模式”）。应将所有模式对应的逻辑写死在静态工具列表中，让 Agent 通过触发特定工具来切换状态。统一的工具前缀命名（如 `browser_x`, `shell_y`）也能增加命中率的稳定性。
+
+### 三、 系统架构与路由调度设计
+
+即使提示词设计完美，如果系统调度不当，同样无法命中缓存。
+
+*   **路由粘性（Routing Stickiness）**
+    *   **现象**：由于分布式集群中有多台推理服务器，如果同一 Agent 会话的不同步骤被负载均衡分配到了不同的机器，也会导致 Cache Miss。
+    *   **对策**：如果使用第三方 API（如 OpenAI），可以在 API 请求中带上 `prompt_cache_key` 参数或 `Session ID` 参数，确保具有相同前缀的请求倾向于被路由到同一台已缓存该 KV 状态的服务器上。自托管模型（如 vLLM 架构）也需配合 Session 路由打通 Prefix Cache。
+*   **多智能体架构（Multi-Agent Swarm）优化**
+    *   不要试图构建一个包含 100 个工具、具有超级庞大 System Prompt 的“万能 Agent”。
+    *   应使用多智能体架构，每个专精子 Agent 拥有高度稳定的、固定的工具集和提示词（例如专职代码审计的 Agent 只加载代码审计的 Prompt）。这样单个子 Agent 被反复调用时，其头部缓存命中率会极高。
+*   **注意缓存生命周期（TTL 悬崖）**
+    *   多数 API 厂商（如 Anthropic）的提示词缓存默认存活时间（TTL）只有 5 分钟。
+    *   **对策**：如果 Agent 有异步任务或长时间等待用户反馈（超过 5 分钟），缓存会失效。设计时应尽量让 Agent 密集执行任务；或对于超高价值的共享上下文，通过低频率的定时“Ping”来预热或维持缓存。
+
+### 四、 应用层缓存补充（Exact / Semantic Caching）
+
+除了底层的 KV 缓存，Agent 系统自身也应该设计应用级缓存，拦截对 LLM 的不必要调用。
+
+*   **精准匹配缓存（Exact Match Caching）**
+    *   对于高度重复的 Agent 宏动作或工具调用。如果前置依赖和状态（如针对同一网页的相同查询）一致，直接从 Redis 等存储中返回上一次的工具解析或摘要结果。
+*   **语义缓存（Semantic Caching）**
+    *   Agent 经常会遇到“表述不同但意图相同”的用户指令。通过引入轻量级的 Embedding 模型（如 GPTCache），计算当前请求与历史请求的向量相似度。如果相似度极高，Agent 可以直接复用之前的规划路线（Plan）或输出，从而实现 100% 避免全量 LLM 推理。
+
+### 总结：Agent 缓存优化的核心清单
+
+1.  **静态/动态拆分**：系统设定和工具说明放在最前，会话历史次之，当前任务和动态参数压轴。
+2.  **清理系统提示词的“脏数据”**：移除一切时间戳、UUID 等动态变量。
+3.  **遵循 Append-Only**：绝不随意修改、删除对话历史中的中间项。
+4.  **固定输出格式**：强制业务系统的序列化（如 JSON）具有确定性的键值排序。
+5.  **设计路由亲和性**：保障同一 Agent 任务的后续请求发往同样的缓存节点。
diff --git a/go.sum b/go.sum
@@ -12,8 +12,6 @@ github.com/atotto/clipboard v0.1.4 h1:EH0zSVneZPSuFR11BlR9YppQTVDbh5+16AmcJi4g1z
 github.com/atotto/clipboard v0.1.4/go.mod h1:ZY9tmq7sm5xIbd9bOK4onWV4S6X0u6GY7Vn0Yu86PYI=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1 h1:HwpRHbFMcZLEVr42D4p7XBqjyuxQH5SMiErDT4WkJ2k=
 github.com/aymanbagabas/go-osc52/v2 v2.0.1/go.mod h1:uYgXzlJ7ZpABp8OJ+exZzJJhRNQ2ASbcXHWsFqH8hp8=
-github.com/aymanbagabas/go-udiff v0.2.0 h1:TK0fH4MteXUDspT88n8CKzvK0X9O2xu9yQjWpi6yML8=
-github.com/aymanbagabas/go-udiff v0.2.0/go.mod h1:RE4Ex0qsGkTAJoQdQQCA0uG+nAzJO/pI/QwceO5fgrA=
 github.com/aymanbagabas/go-udiff v0.4.1 h1:OEIrQ8maEeDBXQDoGCbbTTXYJMYRCRO1fnodZ12Gv5o=
 github.com/aymanbagabas/go-udiff v0.4.1/go.mod h1:0L9PGwj20lrtmEMeyw4WKJ/TMyDtvAoK9bf2u/mNo3w=
 github.com/aymerick/douceur v0.2.0 h1:Mv+mAeH1Q+n9Fr+oyamOlAkUNPWPlA8PPGR0QAaYuPk=

diff --git a/internal/agents/coding.go b/internal/agents/coding.go
@@ -70,6 +70,8 @@ func NewCodingAgent(globalCtx *globalctx.GlobalCtx, llm llm.Engine, maxSteps int
 			}
 		case "micro_agent":
 			fn = globalCtx.MicroAgentTool.Execute
+		case "deepthinking":
+			fn = globalCtx.DeepThinkingTool.Execute
 		case "agent_exit":
 			fn = globalCtx.FlowOps.ExecuteAgentExit
 		case "ask_user_for_help":

diff --git a/internal/agents/coding.prompt.md b/internal/agents/coding.prompt.md
@@ -34,6 +34,8 @@ You have access to the following tools. You must use them to interact with the s
 *   **Thinking & Debugging**:
     *   Use the `thinking` tool to analyze complex problems, plan multi-step tasks, or debug errors.
     *   *Trigger*: If a tool execution fails (e.g., test failed, compilation error), you **MUST** use the `thinking` tool to analyze the error before retrying. **Analyze -> Plan -> Fix**.
+    *   The `micro_agent` tool can delegate focused subtasks to a specialized micro-agent.
+    *   The `deepthinking` tool is an extremely expensive, last-resort analysis tool — see constraints below.
 
 # Workflow
 1.  **Analyze**: Understand the user's intent. If ambiguous, use the `thinking` tool or ask clarifying questions (only if necessary).
@@ -58,3 +60,6 @@ You have access to the following tools. You must use them to interact with the s
 *   **Be Proactive**: Don't wait for the user to drive every step. Take initiative.
 *   **Be Thorough**: Verify your work. Don't leave broken code.
 *   **Be Safe**: Protect the user's environment.
+
+### DeepThinking Tool (Last Resort)
+- **`deepthinking`**: An extremely expensive, isolated deep analysis tool. ONLY use when conventional methods (thinking tool, micro_agent, code analysis) have been exhausted and the problem requires systematic multi-dimensional analysis. Input: `context` (full problem context including errors, background, what failed) and `goal` (specific objective). This tool is VERY expensive — do NOT use for simple issues.