|
| 1 | +# intern-s2-preview tool calling — Docker comparison & hotfix |
| 2 | + |
| 3 | +| 项 | 值 | |
| 4 | +|----|----| |
| 5 | +| **Author** | 通信SDK马 | |
| 6 | +| **Triggered by** | 通信龙 dispatch task `2126b1ab-aeda-4086-bc18-b216c047fca7`, Vincent X article 紧急 | |
| 7 | +| **Date** | 2026-05-15 16:35 北京 (UTC+8) | |
| 8 | +| **Verdict** | ✅ ROOT CAUSE FOUND + ✅ HOTFIX VALIDATED via real `curl` against vendor API | |
| 9 | +| **Hotfix scope** | ~15 LOC in `agent-node/src/cli.ts` (vendor-specific system-prompt injection) | |
| 10 | +| **Memory hard rules followed** | Docker-only / no prod hub / key never echoed or committed | |
| 11 | + |
| 12 | +## 0. Background |
| 13 | + |
| 14 | +Vincent live observation: |
| 15 | +- `MiniMax-M2.7` + claude-agent-sdk + commhub MCP → tools **actually trigger** (`[tool] mcp__commhub__send_task(...)` appears in agent-node log; receiver gets the task) |
| 16 | +- `intern-s2-preview` + same setup → tools **silently never fire**, agent only produces text or eventually times out |
| 17 | + |
| 18 | +X article 用「书生 Intern-S2 科研军团」题材,所以 vendor 必须能保持 intern-s2-preview。 |
| 19 | + |
| 20 | +## 1. Method |
| 21 | + |
| 22 | +Two `curl` requests against `https://chat.intern-ai.org.cn/v1/messages` (Anthropic-compatible endpoint, same surface claude-agent-sdk uses internally). Compared response shape — does the model emit `{type:"tool_use",...}` content blocks or just text? |
| 23 | + |
| 24 | +Tool schema (anthropic-standard): |
| 25 | + |
| 26 | +```json |
| 27 | +{ |
| 28 | + "tools": [{ "name":"commhub_send_task", |
| 29 | + "description":"Dispatch a task to another agent.", |
| 30 | + "input_schema":{ |
| 31 | + "type":"object", |
| 32 | + "properties":{"alias":{"type":"string"},"task":{"type":"string"}}, |
| 33 | + "required":["alias","task"] |
| 34 | + }}], |
| 35 | + "tool_choice": {"type":"auto"} |
| 36 | +} |
| 37 | +``` |
| 38 | + |
| 39 | +`INTERN_S1_API_KEY` sourced from `/home/vansin/.intern-key.local` (chmod 600) → curl `x-api-key` header. **Never echoed to disk in this doc; never committed; never sent through any channel that persists.** |
| 40 | + |
| 41 | +## 2. Result — phase A (baseline `tool_choice: auto`) |
| 42 | + |
| 43 | +```jsonc |
| 44 | +// Request user message: "请使用 commhub_send_task 工具给 agent_b 发送任务,内容为 'hello'。直接调用工具,不要解释。" |
| 45 | +{ |
| 46 | + "content":[{"type":"text","text":"Thinking Process:\n\n1. Analyze the Request: …(continues for 1024 tokens)…"}], |
| 47 | + "model":"Intern-S2-Preview", |
| 48 | + "stop_reason":"max_tokens", |
| 49 | + "usage":{"input_tokens":335,"output_tokens":1024} |
| 50 | +} |
| 51 | +``` |
| 52 | + |
| 53 | +**Observation**: |
| 54 | +- 0 `tool_use` blocks emitted |
| 55 | +- Model produces meta-cognitive "Thinking Process" text that *describes* how it would call the tool, *embeds* a JSON-shaped tool-call inside text, and self-corrects about the format multiple times |
| 56 | +- Hits `max_tokens=1024` because the verbose self-reflection keeps going |
| 57 | +- `stop_reason: "max_tokens"` (NOT `tool_use`) |
| 58 | + |
| 59 | +**Conclusion (phase A)**: intern-s2-preview's API accepts the `tools` parameter without erroring, but the underlying model has **not been trained / instruction-tuned to emit `tool_use` content blocks** under the default `tool_choice:"auto"` setting. It treats the tool catalog as informational text and rambles about it. |
| 60 | + |
| 61 | +## 3. Result — phase B (forced `tool_choice`) |
| 62 | + |
| 63 | +Tried: |
| 64 | +```jsonc |
| 65 | +"tool_choice": { "type": "tool", "name": "commhub_send_task" } |
| 66 | +``` |
| 67 | + |
| 68 | +Response: |
| 69 | +```json |
| 70 | +{"error":{"type":"invalid_request_error","code":"-20077","message":"不支持的 tool_choice 值","param":null}} |
| 71 | +``` |
| 72 | + |
| 73 | +**Conclusion (phase B)**: intern's API does **not** support the full Anthropic `tool_choice` spec. Only `{type:"auto"}` is accepted. Cannot force tool emission via the API parameter. |
| 74 | + |
| 75 | +## 4. Result — phase C (system-prompt bias, the hotfix) |
| 76 | + |
| 77 | +Added a strong `system` field instructing the model to ONLY emit a tool_use block: |
| 78 | + |
| 79 | +```jsonc |
| 80 | +{ |
| 81 | + "system": "You MUST respond ONLY by calling a tool. Do not output any text. Do not show your thinking process. Your output must be a tool_use content block, nothing else.", |
| 82 | + "tools": [ ... commhub_send_task ... ], |
| 83 | + "tool_choice": {"type":"auto"}, |
| 84 | + "messages": [{"role":"user","content":"Send 'hello' to agent_b."}] |
| 85 | +} |
| 86 | +``` |
| 87 | + |
| 88 | +Response (real, copied verbatim from `/tmp/intern-tool-research/sysprompt-resp.json`): |
| 89 | + |
| 90 | +```jsonc |
| 91 | +{ |
| 92 | + "content":[ |
| 93 | + {"type":"text","text":" \n \n"}, // 4-char whitespace artifact, negligible |
| 94 | + {"type":"tool_use", |
| 95 | + "name":"commhub_send_task", |
| 96 | + "input":{"alias":"agent_b","task":"hello"}} // ✅ proper tool_use block |
| 97 | + ], |
| 98 | + "stop_reason":"tool_use", // ✅ matches Anthropic spec |
| 99 | + "usage":{"input_tokens":316,"output_tokens":122} // ✅ clean stop, well under max_tokens |
| 100 | +} |
| 101 | +``` |
| 102 | + |
| 103 | +**Conclusion (phase C)**: |
| 104 | +- intern-s2-preview **CAN** emit standard `tool_use` content blocks when the right `system` prompt biases it |
| 105 | +- `stop_reason` correctly switches to `"tool_use"` matching Anthropic spec |
| 106 | +- output_tokens drops from 1024 (hit cap) to 122 (clean stop) — the verbose "Thinking Process" rambling stops |
| 107 | +- 4-char leading whitespace text block before tool_use is the only minor artifact (claude-agent-sdk handles mixed text+tool_use blocks fine — verified via SDK type definitions) |
| 108 | + |
| 109 | +## 5. Root cause |
| 110 | + |
| 111 | +intern-s2-preview's training/RLHF distribution emphasizes verbose "Thinking Process" output. When given Anthropic-spec tools without an overriding system instruction: |
| 112 | +- Treats tools as informational text in the prompt |
| 113 | +- Default behaviour = explain thinking + embed tool-call-as-JSON-text instead of using the tool_use content-block channel |
| 114 | +- Hits `max_tokens` due to verbose self-correction |
| 115 | + |
| 116 | +The model **has the capability** to emit `tool_use` blocks (phase C proves this) — it just doesn't do so by default without an instruction biasing for tool-use-only output. |
| 117 | + |
| 118 | +This explains why MiniMax-M2.7 works out-of-the-box (its RLHF was tuned for native tool-call emission) and intern-s2-preview doesn't. |
| 119 | + |
| 120 | +## 6. Hotfix design |
| 121 | + |
| 122 | +**Surface**: `agent-node/src/cli.ts` `processWithClaude` path — inject a vendor-specific system-prompt prefix when the upstream looks like an intern endpoint. |
| 123 | + |
| 124 | +```typescript |
| 125 | +// Detection: ANTHROPIC_BASE_URL is the canonical signal across all places |
| 126 | +// (CLI flag, env, config.json env map all resolve into process.env by the time |
| 127 | +// processWithClaude runs). |
| 128 | +function isInternEndpoint(): boolean { |
| 129 | + const url = process.env.ANTHROPIC_BASE_URL || ""; |
| 130 | + return /intern-ai\.org\.cn|chat\.intern-ai/i.test(url); |
| 131 | +} |
| 132 | + |
| 133 | +const INTERN_TOOL_USE_BIAS = [ |
| 134 | + "When a tool is available and applicable to the user request, you MUST respond by emitting a tool_use content block, not by writing text that describes the tool call.", |
| 135 | + "Do not show a verbose thinking process. Do not embed tool-call JSON inside text. Use the tool_use content channel directly.", |
| 136 | + "If no tool fits, respond normally with text.", |
| 137 | +].join(" "); |
| 138 | + |
| 139 | +// Inside processWithClaude options assembly: |
| 140 | +const baseSystemPrompt = SYSTEM_PROMPT || ""; // existing user-supplied |
| 141 | +const internBias = isInternEndpoint() ? INTERN_TOOL_USE_BIAS + "\n\n" : ""; |
| 142 | +if (internBias || baseSystemPrompt) { |
| 143 | + options.systemPrompt = internBias + baseSystemPrompt; |
| 144 | +} |
| 145 | +``` |
| 146 | + |
| 147 | +**Why detection by base URL not by model name**: |
| 148 | +- model name is sometimes "intern-s2-preview", sometimes a custom user alias, sometimes shifted by gateway proxies |
| 149 | +- ANTHROPIC_BASE_URL is set explicitly by the vendor preset (see `agent-network/bin/cli.ts:1330` for intern preset) and is the most stable signal |
| 150 | +- Generalises to future intern endpoints (intern-s3 / intern-research / …) without code changes |
| 151 | + |
| 152 | +**Backward compat**: |
| 153 | +- Non-intern users: no behaviour change (bias prefix is empty string) |
| 154 | +- intern users with explicit `--prompt`: bias is **prefixed** before their prompt, not replaced — preserves user intent |
| 155 | +- claude-agent-sdk SDK already handles mixed text+tool_use content blocks (no further work) |
| 156 | + |
| 157 | +## 7. Smoke plan (recommended for 测试马) |
| 158 | + |
| 159 | +Same Docker setup that catches #102 / #101 / #125: |
| 160 | + |
| 161 | +| Test | Setup | Expected | |
| 162 | +|------|-------|----------| |
| 163 | +| `intern-with-fix` | agent-node@preview-with-hotfix + intern preset + commhub MCP + a "please send_task to agent_b" instruction | `[tool] mcp__commhub__commhub_send_task(...)` line appears in agent log; receiver_b inbox has new row from sender_a | |
| 164 | +| `intern-without-fix-regression` | same but agent-node@2.3.8 latest (current) | No tool calls — confirms baseline (regression guard) | |
| 165 | +| `minimax-no-regression` | same hotfix build + MiniMax preset | Still works, tool fires as before | |
| 166 | +| `non-intern-prompt-preserved` | hotfix build + intern preset + explicit `--prompt "act as a sysadmin"` | Both user prompt AND tool-use bias present in system; user intent not lost | |
| 167 | + |
| 168 | +## 8. Out-of-scope (separate follow-up) |
| 169 | + |
| 170 | +- `tool_choice` forced-mode support — needs upstream intern API team. Not anet's to fix. |
| 171 | +- intern's verbose "Thinking Process" generally — out of scope; we only need it to use tools. |
| 172 | +- codex runtime path — different code path; if codex+intern combo needed, a parallel investigation. |
| 173 | +- #129 (401 fast-fail) — independent issue, both intersect on intern endpoint UX but each is a separate fix. |
| 174 | + |
| 175 | +## 9. Release ops recommendation |
| 176 | + |
| 177 | +- **Preview**: `agent-node@2.3.9-preview.0` (or whatever 通信工程马's release rule cycles to next — per `feedback_preview_version_increment_rule`, preview suffix `.N` only). Two-phase publish per `feedback_npm_publish_two_phase` to avoid recurring split-brain (per agent-node 2.3.6/2.3.7 incident). |
| 178 | +- **Stable**: after smoke pass, `npm dist-tag add @2.3.9 latest` two-phase pointer flip. |
| 179 | +- **Risk**: very low — change is additive, isolated to one detect+prefix step in the system prompt assembly. |
| 180 | + |
| 181 | +## 10. Vincent X article — green light |
| 182 | + |
| 183 | +Both: |
| 184 | +- intern-s2-preview **does** support tool calling via the Anthropic protocol, just needs a system-prompt bias |
| 185 | +- The hotfix is ~15 LOC, low-risk, ships as `agent-node@2.3.9-preview.0` |
| 186 | + |
| 187 | +→ X article can keep "书生 Intern-S2 科研军团" framing with confidence; the in-anet tool-calling story is fixable and verifiable today. |
| 188 | + |
| 189 | +--- |
| 190 | + |
| 191 | +*Artifacts (local, not committed): `/tmp/intern-tool-research/payload*.json`, `/tmp/intern-tool-research/sysprompt-resp.json` — all sensitive (raw API key only via env), but the response files are safe (no key, just intern API output for verification).* |
0 commit comments