Skip to content

Fix smart-quoted tool call argument repair#86611

Open
ferminquant wants to merge 5 commits into
openclaw:mainfrom
ferminquant:codex/67488-smart-quote-tool-args
Open

Fix smart-quoted tool call argument repair#86611
ferminquant wants to merge 5 commits into
openclaw:mainfrom
ferminquant:codex/67488-smart-quote-tool-args

Conversation

@ferminquant
Copy link
Copy Markdown
Contributor

@ferminquant ferminquant commented May 25, 2026

Summary

  • Fixes Cron job result serialization fails on special characters in edit tool arguments #67488.
  • Repairs malformed smart-quoted tool-call argument objects inside the existing embedded-runner malformed argument repair wrapper.
  • Keeps the change local to src/agents/embedded-agent-runner/run/attempt.tool-call-argument-repair.ts; no shared JSON helper, cron status, delivery, or dependency changes.
  • Adds focused stream-wrapper regressions in src/agents/embedded-agent-runner/run/attempt.tool-call-argument-repair.test.ts.

AI assistance disclosure

This PR was implemented with Codex assistance. I reviewed the change and understand the affected embedded-runner tool-call argument repair path.

Real behavior proof

Behavior addressed: malformed smart-quoted edit/write/exec tool-call arguments are repaired before the embedded runner observes empty {} arguments.

Real environment tested: local Node.js v22.22.0 loading the TypeScript source with tsx, exercising the actual wrapStreamFnRepairMalformedToolCallArguments stream wrapper with synthetic streamed tool calls. No live credentials or delivery channels were used.

Exact steps or command run after this patch: node --import tsx --input-type=module script importing src/agents/embedded-agent-runner/run/attempt.tool-call-argument-repair.ts, streaming .functions.<tool>:0 plus smart-quoted JSON-like args, draining the wrapped stream, and printing repaired tool-call args. Local proof log: .artifacts/67488-smart-quote-repair-proof.log.

Evidence after fix:

node_version: v22.22.0
json_parse_structural_smart_quotes: SyntaxError: Expected property name or '}' in JSON at position 1 (line 1 column 2)
repair_gate_openai_completions: true
edit_case_final_args: {"path":"notes/报告.md","oldText":"旧的 **草稿**","newText":"更新 **草稿** with “smart”, “sure” and code \"x\"\nJSON-ish “alpha”, “path”: “ignored” snippet\nSee [“quoted”](https://example.test)\nconst re = /\\d+/;\n内部内容"}
edit_case_all_surfaces_match: true
edit_case_content_preserved: true
jsonish_prose_guard_final_args: {"path":"safe.txt","content":"Use ”, “foo”: “bar” in prose"}
jsonish_prose_guard_no_synthetic_foo: true
exec_workdir_final_args: {"command":"pwd","workdir":"/tmp"}
exec_workdir_repaired: true
exact_no_preamble_final_args: {"path":"safe.txt"}
exact_no_preamble_repaired: true
escape_case_final_args: {"path":"safe.txt","content":"line\nnext \"quoted\" path C:\\tmp mark ✓ invalid \\d"}
escape_case_matches_json_semantics: true
escape_case_all_surfaces_match: true
escape_case_actual_newline: true
escape_case_invalid_escape_preserved: true
nested_edit_case_final_args: {"path":"notes/报告.md","edits":[{"oldText":"旧的 **草稿**","newText":"更新 \"草稿\"\nnext"},{"oldText":"tail","newText":"done"}]}
nested_edit_case_repaired: true
nested_edit_case_all_surfaces_match: true

Observed result after fix: Node still rejects structural smart quotes directly, but the existing repair wrapper recovers the streamed malformed tool arguments and writes identical repaired args into the partial, streamed, end-message, and final tool-call surfaces. Smart-quoted string values now decode JSON escapes for newline, quote, backslash, and Unicode sequences while preserving invalid prose-style escapes such as \d.

What was not tested: live cron scheduling, live Telegram delivery, and real provider credentials. The proof targets the smallest representative runtime boundary for the reported failure: the embedded-runner malformed tool-call argument repair wrapper.

Root Cause

The current repair path first tries JSON.parse(raw) and then uses extractBalancedJsonPrefix(raw) plus JSON.parse(extracted.json). Both are ASCII-quote JSON paths. Provider output that uses Unicode smart quotes as structural string delimiters is visually JSON-like but invalid JSON, so the wrapper could fail to recover edit/write args and downstream status reporting could preserve the parse error even after delivery.

Dependency contract

Node/ECMAScript JSON.parse requires JSON object keys and string values to use ASCII double quotes. The proof shows Node v22.22.0 rejecting {“path”:“notes/报告.md”} with SyntaxError.

The internal repair contract remains: valid JSON is preserved first, existing balanced-prefix/trailing-junk recovery is preserved next, and this patch only adds a local smart-quote fallback for the existing malformed tool-call repair gate.

Regression Test Plan

  • src/agents/embedded-agent-runner/run/attempt.tool-call-argument-repair.test.ts covers smart-quoted edit args with CJK/Markdown/inner smart quotes, ASCII-delimited content with smart quotes, same-direction smart quote delimiters, command/workdir args, and member-looking prose that must not synthesize extra args.
  • src/agents/embedded-agent-runner/run/attempt.test.ts keeps the broader embedded-runner coverage green.

Security Impact

No credential, auth, sandbox, workflow, dependency, lockfile, or network behavior changes. The parser is bounded to the existing malformed tool-call argument repair gate and synthetic recovery of streamed arguments.

Human Verification

I reviewed the narrowed diff after removing the shared-helper change. The local mini-guide for the affected architecture and proof is .artifacts/issue-67488-smart-quote-repair-guide.html.

CODEOWNERS / owner review

Checked .github/CODEOWNERS; the touched files do not match a special owner-restricted pattern. Expect normal maintainer review for src/agents/embedded-agent-runner/run/*.

Changelog

No contributor CHANGELOG.md edit. Maintainers can add a release note during landing if needed.

Review conversations

Existing ClawSweeper review on the first draft called out shared-helper exposure through src/shared/balanced-json.ts. This revision removes that shared-helper change and keeps the repair local to the embedded-runner wrapper. A later ClawSweeper review found that exact smart-quoted argument objects without preamble or trailing junk were still rejected; this revision now repairs that case and adds a focused regression. A May 28 ClawSweeper review then found that accepted smart-quoted values preserved JSON escapes literally; this revision decodes JSON string escapes and adds a regression for newline, quote, backslash, Unicode, and invalid-escape preservation. A later May 28 ClawSweeper review found the current nested edit-array schema was not repaired; this revision now repairs smart-quoted edits: [{ oldText, newText }] arrays and adds a focused regression.

Local codex review --base origin/main could not complete after the revision because the account hit the usage limit:

ERROR: You've hit your usage limit. Visit https://chatgpt.com/codex/settings/usage to purchase more credits or try again at 5:31 PM.
Review was interrupted. Please re-run /review and wait for it to complete.

Risks

  • Smart-quote repair is heuristic because malformed JSON has no complete formal grammar.
  • The fallback intentionally focuses on top-level tool-argument objects and common string-heavy tool args, so unusual malformed smart-quoted nested structures may still fail closed.

Behavior tradeoffs

  • Freeform text fields such as content, text, message, oldText, and newText keep JSON-looking prose inside the string instead of synthesizing extra top-level args.
  • For edit-style replacements, oldText may close before its expected newText successor.
  • Existing valid JSON and existing ASCII balanced-prefix/trailing-junk repair behavior continue to run before the smart-quote fallback.

Scope boundaries

  • Does not change cron state accounting, delivery status, or error persistence.
  • Does not change src/shared/balanced-json.ts or CLI-output JSON extraction.
  • Does not expand repair to direct openai-responses transports.
  • Does not add dependencies or change package metadata.

Validation

node scripts/run-vitest.mjs src/agents/embedded-agent-runner/run/attempt.tool-call-argument-repair.test.ts src/agents/embedded-agent-runner/run/attempt.test.ts
3 files passed, 261 tests passed, Vitest reported 8.81s after rebasing onto upstream main c2c29588f4

node scripts/run-oxlint.mjs src/agents/embedded-agent-runner/run/attempt.tool-call-argument-repair.ts src/agents/embedded-agent-runner/run/attempt.tool-call-argument-repair.test.ts
passed

git diff --check
passed

node --import tsx --input-type=module <synthetic stream repair proof>
passed; see Real behavior proof

pnpm check:changed was not rerun after the last narrow parser follow-up; the branch was rebased onto upstream main c2c29588f4 to avoid the unrelated stale-base check-prod-types failure in src/secrets/path-utils.ts.

@openclaw-barnacle openclaw-barnacle Bot added agents Agent runtime and tooling size: L proof: supplied External PR includes structured after-fix real behavior proof. labels May 25, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 25, 2026

Codex review: needs maintainer review before merge. Reviewed May 28, 2026, 1:22 PM ET / 17:22 UTC.

Summary
The PR adds a local smart-quote-aware fallback to the embedded-runner malformed tool-call argument repair wrapper and focused stream-wrapper regression tests.

PR surface: Source +406, Tests +190. Total +596 across 2 files.

Reproducibility: yes. at source level. Node rejects structural smart-quote JSON and current main's wrapper only has JSON.parse plus ASCII balanced-prefix parsing; I did not run a live cron or Telegram scenario.

Review metrics: 1 noteworthy metric.

  • Current-head checks: 1 in progress, 0 failed in fetched check-run summary. The PR should not merge until the exact head's remaining Critical Quality lane finishes.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • Let the remaining Critical Quality lane finish on the current head.
  • [P2] Have maintainers explicitly accept the bounded smart-quote repair scope or request the specific extra malformed-shape tests they want.

Risk before merge

  • [P1] The smart-quote fallback is heuristic for malformed provider output, so unusual invalid nested or freeform shapes may still fail closed or need future parser tuning.
  • [P1] One current-head Critical Quality lane was still in progress when checked, so merge should wait for the exact head checks to settle.

Maintainer options:

  1. Wait For Checks And Accept Scope (recommended)
    Require the remaining Critical Quality lane to finish successfully, then land if maintainers accept the documented bounded smart-quote parser behavior.
  2. Ask For Broader Malformed-Shape Tests
    If maintainers want less heuristic risk, request additional tests for the malformed nested/freeform shapes they expect the wrapper to recover.
  3. Accept Remaining Parser Limits
    Maintainers can intentionally land the focused repair while tracking future parser tuning if a real provider emits unsupported malformed shapes.

Next step before merge

  • [P2] The remaining action is maintainer acceptance of the bounded parser heuristic plus current-head CI gating; there is no narrow automated code repair indicated.

Security
Cleared: The diff is limited to local TypeScript stream argument repair and tests, with no workflow, dependency, lockfile, credential, network, or publishing changes.

Review details

Best possible solution:

Land the local wrapper fallback after current-head checks are green and maintainers accept the bounded malformed-output heuristic, keeping valid JSON and existing balanced-prefix recovery ahead of smart-quote repair.

Do we have a high-confidence way to reproduce the issue?

Yes, at source level. Node rejects structural smart-quote JSON and current main's wrapper only has JSON.parse plus ASCII balanced-prefix parsing; I did not run a live cron or Telegram scenario.

Is this the best way to solve the issue?

Yes, the proposed direction is the narrow repair path. Keeping it local to the embedded-runner malformed argument wrapper is preferable to changing cron status accounting, provided checks pass and maintainers accept the bounded heuristic.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against b2fc8af1b109.

Label changes

Label changes:

  • add proof: sufficient: Contributor real behavior proof is sufficient. The PR body provides after-fix terminal output from Node/tsx exercising the actual stream wrapper and showing repaired tool-call arguments on the relevant stream surfaces.

Label justifications:

  • P2: This is a normal-priority embedded agent bug fix with limited blast radius but real cron/session-state impact in the linked report.
  • merge-risk: 🚨 other: The PR changes recovery heuristics for malformed provider output, where focused tests cannot cover every invalid smart-quoted tool-argument shape.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (terminal): The PR body provides after-fix terminal output from Node/tsx exercising the actual stream wrapper and showing repaired tool-call arguments on the relevant stream surfaces.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body provides after-fix terminal output from Node/tsx exercising the actual stream wrapper and showing repaired tool-call arguments on the relevant stream surfaces.
Evidence reviewed

PR surface:

Source +406, Tests +190. Total +596 across 2 files.

View PR surface stats
Area Files Added Removed Net
Source 1 436 30 +406
Tests 1 190 0 +190
Docs 0 0 0 0
Config 0 0 0 0
Generated 0 0 0 0
Other 0 0 0 0
Total 2 626 30 +596

What I checked:

Likely related people:

  • steipete: GitHub commit history shows Peter Steinberger recently internalized the agent runtime path and authored the shared balanced JSON extraction helper used by this repair path. (role: recent area contributor and shared-helper refactor author; confidence: high; commits: bb46b79d3c14, f006678f3c9a; files: src/agents/embedded-agent-runner/run/attempt.tool-call-argument-repair.ts, src/agents/embedded-agent-runner/run/attempt.tool-call-argument-repair.test.ts, src/shared/balanced-json.ts)
  • MonkeyLeeT: Commit history for the old pi-embedded-runner path identifies Ted Li's recent malformed tool-call argument repair enablement and transport scoping work that this PR extends. (role: adjacent behavior author; confidence: medium; commits: 49c7319ea5a2, 2ba43f5d270c; files: src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts, src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.test.ts)
  • vignesh07: Commit history identifies Vignesh Natarajan as the author of the earlier prefixed malformed tool-call JSON recovery path that the smart-quote repair builds on. (role: adjacent behavior author; confidence: medium; commits: a2e4707cfe4c; files: src/agents/pi-embedded-runner/run/attempt.tool-call-argument-repair.ts)
  • vincentkoc: Local blame on current main attributes the current tryExtractUsableToolCallArguments block to Vincent Koc's recent main commit in this checkout, likely through broad recent file movement or repair. (role: recent current-file contributor; confidence: low; commits: 3e2994b975f8; files: src/agents/embedded-agent-runner/run/attempt.tool-call-argument-repair.ts)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal backlog priority with limited blast radius. merge-risk: 🚨 other 🚨 Merging this PR has meaningful risk outside the owned taxonomy. labels May 25, 2026
@clawsweeper
Copy link
Copy Markdown
Contributor

clawsweeper Bot commented May 25, 2026

ClawSweeper PR egg: 🔥 warming; proof passed, review follow-up or readiness checks remain. Hatch with @clawsweeper hatch when eligible.

Rules and details

Hatchability:

  • Merged PRs are hatchable.
  • Open PRs are hatchable when they are status: 👀 ready for maintainer look, status: 🚀 automerge armed, or labeled clawsweeper:automerge.
  • Closed unmerged PRs are hatchable only when one of those hatchable labels is still present in the durable record.

About:

  • Eggs appear after real-behavior proof passes. They are collectible flavor only.
  • Review momentum changes the shell state: follow-up work warms it, re-review makes it wobble, and a clean final review lets it hatch.
  • The hatch is seeded from this repository and PR number, so the same PR keeps the same creature; the reviewed head SHA can only change safe visual details.
  • Rarity is just collectible sparkle: 🥚 common, 🌱 uncommon, 💎 rare, ✨ glimmer, and 🌈 legendary.

@ferminquant ferminquant force-pushed the codex/67488-smart-quote-tool-args branch from 46fb2ee to 20091cd Compare May 25, 2026 20:50
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 25, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 25, 2026
@ferminquant ferminquant marked this pull request as ready for review May 26, 2026 01:24
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 26, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 26, 2026
@ferminquant ferminquant force-pushed the codex/67488-smart-quote-tool-args branch from 1bd79e6 to 0d9d4ee Compare May 28, 2026 11:18
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
@ferminquant ferminquant force-pushed the codex/67488-smart-quote-tool-args branch from 0d9d4ee to cde46bd Compare May 28, 2026 11:21
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. and removed rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. labels May 28, 2026
@clawsweeper clawsweeper Bot added status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. and removed status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. labels May 28, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
@ferminquant ferminquant force-pushed the codex/67488-smart-quote-tool-args branch from 5fc074a to d7882f5 Compare May 28, 2026 16:17
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
@ferminquant ferminquant force-pushed the codex/67488-smart-quote-tool-args branch from d7882f5 to e417ec8 Compare May 28, 2026 16:21
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
@clawsweeper clawsweeper Bot added proof: sufficient ClawSweeper judged the real behavior proof convincing. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. and removed rating: 🦐 gold shrimp Decent PR readiness signal, but merge confidence is limited. status: ⏳ waiting on author ClawSweeper has contributor-facing work open and is waiting for author action. labels May 28, 2026
@ferminquant ferminquant force-pushed the codex/67488-smart-quote-tool-args branch from 62557fc to 4301233 Compare May 28, 2026 17:04
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
@ferminquant ferminquant force-pushed the codex/67488-smart-quote-tool-args branch from 4301233 to 19d855b Compare May 28, 2026 17:13
@openclaw-barnacle openclaw-barnacle Bot removed the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
@clawsweeper clawsweeper Bot added the proof: sufficient ClawSweeper judged the real behavior proof convincing. label May 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling merge-risk: 🚨 other 🚨 Merging this PR has meaningful risk outside the owned taxonomy. P2 Normal backlog priority with limited blast radius. proof: sufficient ClawSweeper judged the real behavior proof convincing. proof: supplied External PR includes structured after-fix real behavior proof. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. size: L status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cron job result serialization fails on special characters in edit tool arguments

1 participant