Pass ensure_ascii=False when serializing MCP tool results by adityasingh2400 · Pull Request #7730 · microsoft/autogen

adityasingh2400 · 2026-05-21T23:34:56Z

Why

McpToolAdapter.return_value_as_string builds the string that flows back to the model via json.dumps(...). json.dumps defaults to ensure_ascii=True, so every non-ASCII character (Japanese, Chinese, Arabic, Hebrew, emoji, ...) is escaped into \uXXXX sequences before the model ever sees it.

That has two effects on the LLM side:

Token bloat. 日本語 is 1 visible character per glyph but 6 tokens per glyph after escaping (日 etc.). For tools returning paragraphs of CJK text this is a real cost and latency tax.
Quality drop. Frontier models technically unescape the sequences, but smaller / older models often quote them back literally or get confused, which is exactly what the issue reporter saw for Japanese.

Passing ensure_ascii=False keeps the original characters intact. JSON is still valid (UTF-8 JSON is the default in every spec since RFC 8259) and downstream consumers that already expect UTF-8 are unaffected.

What changed

python/packages/autogen-ext/src/autogen_ext/tools/mcp/_base.py: one-line change, json.dumps(..., ensure_ascii=False) on the result serialization path.
python/packages/autogen-ext/tests/tools/test_mcp_tools.py: new regression test test_return_value_as_string_preserves_non_ascii_text that round-trips Japanese and Chinese TextContent through return_value_as_string and asserts the original characters survive (and that no \u escapes leak through).

Fixes #6995.

Verification

uv run pytest packages/autogen-ext/tests/tools/test_mcp_tools.py \
  packages/autogen-ext/tests/tools/test_mcp_workbench_features.py \
  packages/autogen-ext/tests/tools/test_mcp_host.py \
  packages/autogen-ext/tests/tools/test_mcp_workbench_warnings_and_errors.py \
  packages/autogen-ext/tests/tools/test_mcp_workbench_overrides.py \
  packages/autogen-ext/tests/tools/test_mcp_actor.py

146 passed, 1 skipped. The existing test_adapter_from_server_params_with_return_value_as_string (which exercises the same code path with ASCII-only payloads) still passes unchanged, confirming no behavior change for the ASCII case.

uv run ruff check and uv run ruff format are both clean on the touched files.

Notes on PR #7256

There is an older PR (#7256) that touches the same line plus an additional json.dumps in _host/_elicitation.py. That PR has had no reviewer activity since February. This PR keeps the change minimal and focused on the user-facing path called out in the issue (tool result serialization back to the model). The elicitation json.dumps formats a schema for a local human prompt rather than something sent to the LLM, so it is intentionally not changed here; happy to follow up if reviewers want it included.

json.dumps defaults to ensure_ascii=True, which mangles every non-ASCII string into \u escapes before the result is sent back to the model. Powerful models technically un-escape these, but the extra tokens hurt cost and quality for tools that return Japanese/Chinese/Arabic text. Pass ensure_ascii=False so the original characters survive. Fixes microsoft#6995

adityasingh2400 · 2026-05-21T23:51:18Z

@microsoft-github-policy-service agree

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pass ensure_ascii=False when serializing MCP tool results#7730

Pass ensure_ascii=False when serializing MCP tool results#7730
adityasingh2400 wants to merge 1 commit into
microsoft:mainfrom
adityasingh2400:fix-mcp-tool-json-ensure-ascii-false-6995

adityasingh2400 commented May 21, 2026

Uh oh!

adityasingh2400 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

adityasingh2400 commented May 21, 2026

Why

What changed

Verification

Notes on PR #7256

Uh oh!

adityasingh2400 commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant