fix(agents): propagate provider errorMessage in transport stream throws by CoKeFish · Pull Request #69091 · openclaw/openclaw

CoKeFish · 2026-04-19T21:22:26Z

Summary

Problem: When finalizeTransportStream / the OpenAI Responses transport detects output.stopReason === "error" with a concrete output.errorMessage (e.g. Provider finish_reason: MALFORMED_FUNCTION_CALL), the subsequent throw new Error("An unknown error occurred") discards that message.
Why it matters: Callers like pi-embedded-runner log only the generic string, leaving post-mortems blind to the real provider signal (safety blocks, malformed tool calls, upstream 5xx surfaced via SSE). Related to Gemini finishReason (SAFETY, RECITATION, MALFORMED_FUNCTION_CALL) swallowed as generic error #1914 (fixed in pi-ai but still swallowed here), Model fallback treats finish_reason:error as success, stops fallback chain #59524, Sub-agent reports 'completed successfully' when provider returns stopReason: error #60473.
What changed: Thread output.errorMessage into the throw, falling back to the generic string when absent. Three call sites updated (1 shared helper + 2 OpenAI Responses variants). Added unit tests locking in both branches.
What did NOT change (scope boundary): No flow-control or failover-classification change — classifyFailoverReason reads msg.errorMessage, not the thrown Error message, so retry/fallback logic is untouched. No user-visible defaults.

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Closes #
Related Gemini finishReason (SAFETY, RECITATION, MALFORMED_FUNCTION_CALL) swallowed as generic error #1914, Model fallback treats finish_reason:error as success, stops fallback chain #59524, Sub-agent reports 'completed successfully' when provider returns stopReason: error #60473
This PR fixes a bug or regression

Root Cause (if applicable)

Root cause: the three throws in transport-stream-shared.ts and openai-transport-stream.ts hard-coded "An unknown error occurred" even when the stream had already populated output.errorMessage with the provider signal. The outer catch then overwrites output.errorMessage with the thrown string, so the real cause is lost end-to-end.
Missing detection / guardrail: no unit test asserted that finalizeTransportStream propagates output.errorMessage on the error branch.
Contributing context: Gemini finishReason (SAFETY, RECITATION, MALFORMED_FUNCTION_CALL) swallowed as generic error #1914 fixed the analogous case inside @mariozechner/pi-ai, but the openclaw transport layer kept its own generic throws.

Regression Test Plan (if applicable)

Coverage level that should have caught this:
- Unit test
- Seam / integration test
- End-to-end test
- Existing coverage already sufficient
Target test or file: src/agents/transport-stream-shared.test.ts
Scenario the test should lock in: finalizeTransportStream with stopReason: "error" throws the real output.errorMessage; with errorMessage absent, falls back to the generic string.
Why this is the smallest reliable guardrail: the OpenAI variants duplicate the shared logic; locking the shared helper pins the contract.
Existing test that already covers this (if any): None.
If no new test is added, why not: N/A

User-visible / Behavior Changes

Error logs and UI that surface stream errors now show the provider-reported reason (e.g. Provider finish_reason: MALFORMED_FUNCTION_CALL) instead of the generic "An unknown error occurred". No config/default changes.

Diagram (if applicable)

Before:
[provider stream sets output.errorMessage = "Provider finish_reason: MALFORMED_FUNCTION_CALL"]
  -> [finalize detects stopReason === "error"]
  -> [throw new Error("An unknown error occurred")]
  -> [outer catch overwrites output.errorMessage with the thrown string]
  -> [caller logs "An unknown error occurred"]  // real cause lost

After:
[provider stream sets output.errorMessage = "Provider finish_reason: MALFORMED_FUNCTION_CALL"]
  -> [finalize detects stopReason === "error"]
  -> [throw new Error(output.errorMessage || "An unknown error occurred")]
  -> [outer catch reassigns output.errorMessage = error.message, which already is the real cause]
  -> [caller logs "Provider finish_reason: MALFORMED_FUNCTION_CALL"]

Security Impact (required)

New permissions/capabilities? No
Secrets/tokens handling changed? No
New/changed network calls? No
Command/tool execution surface changed? No
Data access scope changed? No

Note: output.errorMessage carries provider error strings that may contain request identifiers or truncated payload snippets. This is the same surface pi-embedded-runner already logs via output.errorMessage on the non-throwing paths, so no new sensitive-data exposure is introduced.

Repro + Verification

Environment (original bug)

OS: Ubuntu 22.04 x64 (production VPS)
Runtime: Node 22 + openclaw 2026.4.14 (323493f), systemd-managed gateway
Provider: Gemini via google-genai transport
Symptom: mid-flow invoice ingestion (2026-04-19 ~19:58 UTC) failed with output.errorMessage = "Provider finish_reason: MALFORMED_FUNCTION_CALL", but the pi-embedded-runner log surfaced only "An unknown error occurred". Root-caused by tracing the stream to the three throw sites in transport-stream-shared.ts and openai-transport-stream.ts.

Environment (fix verification)

OS: Windows 11 + Node 22.22 + pnpm 10.33 (fresh checkout of main)
pnpm exec vitest run src/agents/transport-stream-shared.test.ts → 6/6 pass
pnpm exec vitest run src/agents/openai-transport-stream.test.ts → 65/65 pass
New unit tests fail against pre-patch main (assert the real errorMessage survives the throw), pass after the change.

Evidence

Failing test/log before + passing after
Trace/log snippets
Screenshot/recording
Perf numbers (if relevant)

Human Verification (required)

Verified scenarios:
- finalizeTransportStream error branch with concrete errorMessage → throws the real message (new unit test)
- finalizeTransportStream error branch without errorMessage → throws the generic fallback (new unit test)
- Existing transport-stream-shared.test.ts suite still green (4 pre-existing tests)
- Existing openai-transport-stream.test.ts suite still green (65 tests, covers the two updated call sites)
Edge cases checked:
- errorMessage set to empty string → falls back to generic (|| short-circuits)
- stopReason === "aborted" path unchanged (no errorMessage populated; still surfaces generic)
What I did not verify:
- Full pnpm test run on Windows (unrelated platform-specific flakiness in other suites)
- Live Gemini repro against the production VPS after patching (the observed-bug environment runs a node_modules hotpatch; the upstream fix is equivalent)

Compatibility / Migration

Backward compatible? Yes
Config/env changes? No
Migration needed? No
If yes, exact upgrade steps: N/A

Risks and Mitigations

Risk: a provider errorMessage may contain characters or patterns that downstream log parsers didn't expect (they used to always see the literal "An unknown error occurred").
- Mitigation: the same strings are already surfaced via output.errorMessage on the non-throwing paths (failTransportStream), so any parser that reads error events already tolerates them. No new surface.
Risk: classifyFailoverReason behavior could shift if it ever starts reading the thrown Error.message.
- Mitigation: current implementation in pi-embedded-helpers/errors.ts:1272 reads msg.errorMessage, not the thrown string — unchanged.

AI-assisted

AI-assisted (Claude Opus 4.7)
Degree of testing: lightly tested (targeted unit suites green; did not run full pnpm test on Windows due to unrelated platform flakiness).
Confirmed I understand what the code does: yes — three identical call sites swap a hardcoded string for output.errorMessage || "An unknown error occurred"; added unit tests lock in both branches in the shared helper that the OpenAI variants duplicate.
Codex review: not run (no local access).

When a provider stream sets `output.stopReason = "error"` with a concrete `output.errorMessage` (e.g. `Provider finish_reason: MALFORMED_FUNCTION_CALL`), the subsequent `throw new Error("An unknown error occurred")` discarded that message. Callers like `pi-embedded-runner` only logged the generic string, making post-mortems blind to the real provider signal. Thread the real `output.errorMessage` into the throw, falling back to the generic string when it is absent. No flow-control change — only the thrown message. Refs openclaw#1914 (root cause acknowledged in pi-ai, still swallowed here), openclaw#59524 (model fallback treats finish_reason:error as success), openclaw#60473 (sub-agent reports success on stopReason:error). AI-assisted: Claude Opus 4.7 · lightly tested (new unit tests + existing transport-stream-shared + openai-transport-stream suites green locally).

greptile-apps · 2026-04-19T21:24:17Z

Greptile Summary

This PR fixes a bug where finalizeTransportStream and the two OpenAI Responses transport variants discarded the provider-supplied output.errorMessage (e.g. "Provider finish_reason: MALFORMED_FUNCTION_CALL") by hardcoding "An unknown error occurred" in their throw statements. The fix is a minimal one-liner per call site (output.errorMessage || "An unknown error occurred"), and two new unit tests lock in both branches of the shared helper.

Confidence Score: 5/5

Safe to merge — minimal, well-tested change with no P0/P1 findings.

The fix is a targeted one-liner at three identical call sites, unit tests cover both the propagation and fallback branches, and existing test suites remain green. No logic regressions or security concerns identified.

No files require special attention.

_{Reviews (1): Last reviewed commit: "fix(agents): propagate provider errorMes..." | Re-trigger Greptile}

openclaw-barnacle bot added agents Agent runtime and tooling size: XS labels Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(agents): propagate provider errorMessage in transport stream throws#69091

fix(agents): propagate provider errorMessage in transport stream throws#69091
CoKeFish wants to merge 1 commit intoopenclaw:mainfrom
CoKeFish:fix/propagate-transport-error-message

CoKeFish commented Apr 19, 2026

Uh oh!

greptile-apps bot commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

CoKeFish commented Apr 19, 2026

Summary

Change Type (select all)

Scope (select all touched areas)

Linked Issue/PR

Root Cause (if applicable)

Regression Test Plan (if applicable)

User-visible / Behavior Changes

Diagram (if applicable)

Security Impact (required)

Repro + Verification

Environment (original bug)

Environment (fix verification)

Evidence

Human Verification (required)

Compatibility / Migration

Risks and Mitigations

AI-assisted

Uh oh!

greptile-apps bot commented Apr 19, 2026

Greptile Summary

Confidence Score: 5/5

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant