Skip to content

Pass ensure_ascii=False when serializing MCP tool results#7730

Open
adityasingh2400 wants to merge 1 commit into
microsoft:mainfrom
adityasingh2400:fix-mcp-tool-json-ensure-ascii-false-6995
Open

Pass ensure_ascii=False when serializing MCP tool results#7730
adityasingh2400 wants to merge 1 commit into
microsoft:mainfrom
adityasingh2400:fix-mcp-tool-json-ensure-ascii-false-6995

Conversation

@adityasingh2400
Copy link
Copy Markdown

Why

McpToolAdapter.return_value_as_string builds the string that flows back to the model via json.dumps(...). json.dumps defaults to ensure_ascii=True, so every non-ASCII character (Japanese, Chinese, Arabic, Hebrew, emoji, ...) is escaped into \uXXXX sequences before the model ever sees it.

That has two effects on the LLM side:

  1. Token bloat. 日本語 is 1 visible character per glyph but 6 tokens per glyph after escaping ( etc.). For tools returning paragraphs of CJK text this is a real cost and latency tax.
  2. Quality drop. Frontier models technically unescape the sequences, but smaller / older models often quote them back literally or get confused, which is exactly what the issue reporter saw for Japanese.

Passing ensure_ascii=False keeps the original characters intact. JSON is still valid (UTF-8 JSON is the default in every spec since RFC 8259) and downstream consumers that already expect UTF-8 are unaffected.

What changed

  • python/packages/autogen-ext/src/autogen_ext/tools/mcp/_base.py: one-line change, json.dumps(..., ensure_ascii=False) on the result serialization path.
  • python/packages/autogen-ext/tests/tools/test_mcp_tools.py: new regression test test_return_value_as_string_preserves_non_ascii_text that round-trips Japanese and Chinese TextContent through return_value_as_string and asserts the original characters survive (and that no \u escapes leak through).

Fixes #6995.

Verification

uv run pytest packages/autogen-ext/tests/tools/test_mcp_tools.py \
  packages/autogen-ext/tests/tools/test_mcp_workbench_features.py \
  packages/autogen-ext/tests/tools/test_mcp_host.py \
  packages/autogen-ext/tests/tools/test_mcp_workbench_warnings_and_errors.py \
  packages/autogen-ext/tests/tools/test_mcp_workbench_overrides.py \
  packages/autogen-ext/tests/tools/test_mcp_actor.py

146 passed, 1 skipped. The existing test_adapter_from_server_params_with_return_value_as_string (which exercises the same code path with ASCII-only payloads) still passes unchanged, confirming no behavior change for the ASCII case.

uv run ruff check and uv run ruff format are both clean on the touched files.

Notes on PR #7256

There is an older PR (#7256) that touches the same line plus an additional json.dumps in _host/_elicitation.py. That PR has had no reviewer activity since February. This PR keeps the change minimal and focused on the user-facing path called out in the issue (tool result serialization back to the model). The elicitation json.dumps formats a schema for a local human prompt rather than something sent to the LLM, so it is intentionally not changed here; happy to follow up if reviewers want it included.

json.dumps defaults to ensure_ascii=True, which mangles every non-ASCII
string into \u escapes before the result is sent back to the model.
Powerful models technically un-escape these, but the extra tokens hurt
cost and quality for tools that return Japanese/Chinese/Arabic text.
Pass ensure_ascii=False so the original characters survive.

Fixes microsoft#6995
@adityasingh2400
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MCP tool JSON serialization lacks ensure_ascii=False, degrades LLM performance for Japanese text

1 participant